[sf-perl] DB design question

David Fetter david at fetter.org
Tue Aug 1 15:31:06 PDT 2006


On Tue, Aug 01, 2006 at 03:14:14PM -0700, Josh Berkus wrote:
> Rich,
> 
> > This is one of those questions where there is no shortage of
> > answers, but some are likely to be a lot better (for various
> > reasons) than the others.  So, I'm hoping for suggestions,
> > rationales, etc.
> 
> Per previous discussions, you're up against your desire to use a completely 
> anonymous and flexible database structure (the EAV setup) and the need for 
> performance.   Billions of entities is pushing the envelope for relational 
> db performance, so you are at the point where your design may center 
> around performance and not around the logical, or normalized, design.
> 
> In fact, given how little use you'll be making of relational features, I 
> would investigate how well object databases would hold up to your needs.  
> They probably won't -- last I checked, most ODBs use hash indexes 
> exclusively -- but it's worth a check.  Or possibly there's another 
> specialized database type.
> 
> If I were doing your project in Postgres, it would require some hacking.  
> Here's what I'd do:
> 
> 1) extend the intarray module to support the new Generic Inverted Indexes 
> available in PostgreSQL 8.2 (alpha);
> 
> 2) improve GIN to support compression per Teodor's notes;
> 
> 3) Create a denormalized table to hold entites and attributes:
> 
> entity (
> 	name text not null primary key
> 	attributes intarray
> )
> 
> then you can do queries like:
> 
> select name from entities where attribues @@ ARRAY[57,201,399,18] >= 3;
> 
> .... for "get me all entities which have at least three of the four 
> attributes 57, 201, 399 and 18".
> 
> However, make no mistake: the project you're lanuching will be
> technically challenging and may require you to learn to hack RDBMS
> code to get acceptable performance (or to hire someone who can).

So we're talking about multiple years of effort just on this piece,
possibly multiple hundreds of thousand of dollars, as people people
competent to mess with RDBMS code don't come cheap, and no guarantee
even the most modest level of success.

Rich, how about something a little less risky like designing your
schema and worrying about extending it later if need be?

Cheers,
D
-- 
David Fetter <david at fetter.org> http://fetter.org/
phone: +1 415 235 3778        AIM: dfetter666
                              Skype: davidfetter

Remember to vote!


More information about the SanFrancisco-pm mailing list