[sf-perl] DB design question

David Fetter david at fetter.org
Fri Jul 28 09:07:40 PDT 2006


On Fri, Jul 28, 2006 at 08:31:59AM -0700, Rich Morin wrote:
> This is one of those questions where there is no shortage of
> answers, but some are likely to be a lot better (for various
> reasons) than the others.  So, I'm hoping for suggestions,
> rationales, etc.

Just FYI, the "entity-attribute-value" storage system is a
thoroughly documented mistake.

http://joecelkothesqlapprentice.blogspot.com/2006/06/give-me-your-thoughts-storing-format.html

There are much better ways to go about this, but we'd need more info
on just exactly what's going on in order to construct a better data
model.

Cheers,
D
> 
> Background
> 
>   *  There are billions of entities (5-10, at present).
> 
>   *  Each entity has a unique name, which could be 100+
>      characters in length.
> 
>   *  Each entity has a collection of attributes, drawn from
>      a much larger set (could also be in the billions).
> 
>   *  The "signature" of an attribute might be 50 characters.
> 
>   *  I'd like to keep the total storage constrained to (say)
>      one KB per entity.
> 
>   *  It's OK (but not necessary) to presume that entity and
>      attribute ids can only be used once.
> 
> Problem
> 
> For each new entity that I encounter, I need to determine and
> record its "unusual" attributes and save this in a way that
> will allow me to (later) find other entities which have similar
> sets of unusual attributes.
> 
> 
> Discussion
> 
> Without presuming any particular database for this project,
> I'll use MySQL syntax to sketch out a "straw man" design:
> 
>   CREATE TABLE Entities (
>     id         INT NOT NULL,
>     name       CHAR(200)
>   )
> 
>   CREATE TABLE Attrs (
>     id         INT NOT NULL,
>     sig        CHAR(50),
>     count      INT NOT NULL
>   )
> 
>   CREATE TABLE Links (
>     id_entity  INT NOT NULL,
>     id_attr    INT NOT NULL
>   )
> 
> Using this setup, I could start with an entity, pick one of
> its attributes that has a low count, and look for other
> entities that have the same attribute.  Repeating this for
> several attributes, I could build up a "cluster" of entities
> that have similar, unusual attributes.
> 
> Another setup would involve the use of a blob in the Entities
> table, containing (say) 250 id_attr values.  This limits my
> ability to use SQL, however, and it also keeps me from doing
> cute lookups in the Links table.
> 
> Anyway, I'm open to suggestions...
> 
> -r
> -- 
> http://www.cfcl.com/rdm            Rich Morin
> http://www.cfcl.com/rdm/resume     rdm at cfcl.com
> http://www.cfcl.com/rdm/weblog     +1 650-873-7841
> 
> Technical editing and writing, programming, and web development
> _______________________________________________
> SanFrancisco-pm mailing list
> SanFrancisco-pm at pm.org
> http://mail.pm.org/mailman/listinfo/sanfrancisco-pm

-- 
David Fetter <david at fetter.org> http://fetter.org/
phone: +1 415 235 3778        AIM: dfetter666
                              Skype: davidfetter

Remember to vote!


More information about the SanFrancisco-pm mailing list