[sf-perl] DB design question

Josh Berkus josh at agliodbs.com
Tue Aug 1 15:14:14 PDT 2006


Rich,

> This is one of those questions where there is no shortage of
> answers, but some are likely to be a lot better (for various
> reasons) than the others.  So, I'm hoping for suggestions,
> rationales, etc.

Per previous discussions, you're up against your desire to use a completely 
anonymous and flexible database structure (the EAV setup) and the need for 
performance.   Billions of entities is pushing the envelope for relational 
db performance, so you are at the point where your design may center 
around performance and not around the logical, or normalized, design.

In fact, given how little use you'll be making of relational features, I 
would investigate how well object databases would hold up to your needs.  
They probably won't -- last I checked, most ODBs use hash indexes 
exclusively -- but it's worth a check.  Or possibly there's another 
specialized database type.

If I were doing your project in Postgres, it would require some hacking.  
Here's what I'd do:

1) extend the intarray module to support the new Generic Inverted Indexes 
available in PostgreSQL 8.2 (alpha);

2) improve GIN to support compression per Teodor's notes;

3) Create a denormalized table to hold entites and attributes:

entity (
	name text not null primary key
	attributes intarray
)

then you can do queries like:

select name from entities where attribues @@ ARRAY[57,201,399,18] >= 3;

.... for "get me all entities which have at least three of the four 
attributes 57, 201, 399 and 18".

However, make no mistake: the project you're lanuching will be technically 
challenging and may require you to learn to hack RDBMS code to get 
acceptable performance (or to hire someone who can).

-- 
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco


More information about the SanFrancisco-pm mailing list