[sf-perl] DB design question
david at fetter.org
Fri Jul 28 09:07:40 PDT 2006
On Fri, Jul 28, 2006 at 08:31:59AM -0700, Rich Morin wrote:
> This is one of those questions where there is no shortage of
> answers, but some are likely to be a lot better (for various
> reasons) than the others. So, I'm hoping for suggestions,
> rationales, etc.
Just FYI, the "entity-attribute-value" storage system is a
thoroughly documented mistake.
There are much better ways to go about this, but we'd need more info
on just exactly what's going on in order to construct a better data
> * There are billions of entities (5-10, at present).
> * Each entity has a unique name, which could be 100+
> characters in length.
> * Each entity has a collection of attributes, drawn from
> a much larger set (could also be in the billions).
> * The "signature" of an attribute might be 50 characters.
> * I'd like to keep the total storage constrained to (say)
> one KB per entity.
> * It's OK (but not necessary) to presume that entity and
> attribute ids can only be used once.
> For each new entity that I encounter, I need to determine and
> record its "unusual" attributes and save this in a way that
> will allow me to (later) find other entities which have similar
> sets of unusual attributes.
> Without presuming any particular database for this project,
> I'll use MySQL syntax to sketch out a "straw man" design:
> CREATE TABLE Entities (
> id INT NOT NULL,
> name CHAR(200)
> CREATE TABLE Attrs (
> id INT NOT NULL,
> sig CHAR(50),
> count INT NOT NULL
> CREATE TABLE Links (
> id_entity INT NOT NULL,
> id_attr INT NOT NULL
> Using this setup, I could start with an entity, pick one of
> its attributes that has a low count, and look for other
> entities that have the same attribute. Repeating this for
> several attributes, I could build up a "cluster" of entities
> that have similar, unusual attributes.
> Another setup would involve the use of a blob in the Entities
> table, containing (say) 250 id_attr values. This limits my
> ability to use SQL, however, and it also keeps me from doing
> cute lookups in the Links table.
> Anyway, I'm open to suggestions...
> http://www.cfcl.com/rdm Rich Morin
> http://www.cfcl.com/rdm/resume rdm at cfcl.com
> http://www.cfcl.com/rdm/weblog +1 650-873-7841
> Technical editing and writing, programming, and web development
> SanFrancisco-pm mailing list
> SanFrancisco-pm at pm.org
David Fetter <david at fetter.org> http://fetter.org/
phone: +1 415 235 3778 AIM: dfetter666
Remember to vote!
More information about the SanFrancisco-pm