[Pdx-pm] (OT) Database design guru

Thu Jan 23 22:39:49 CST 2003

On Thursday January 23, 2003 18:45, Michael G Schwern wrote:
> On Thu, Jan 23, 2003 at 04:47:24PM -0800, Curtis Poe wrote:
> > I'm going nuts with a problem at work regarding a database
> > normalization problem.  I've been searching high and low for an
> > answer, but not luck so far.  On the off chance that anyone wants a
> > conundrum to chew on:
> >
> > http://www.perlmonks.org/index.pl?node_id=108949&user=Ovid
>
> Seperate out the state and country fields into a more abstract "address"
> table.  Technically this is denormalized, but its a nice logical
> breakdown and handily solves your problem.
>
> CREATE TABLE addresses (
>     id		INTEGER		NOT NULL PRIMARY KEY,
>     state	INTEGER		REFERENCES states,
>     country     INTEGER		REFERENCES countries
> );
>
> CREATE TABLE customers (
>     customer_id serial       NOT NULL PRIMARY KEY,
>     first_name  VARCHAR(50)  NOT NULL,
>     last_name   VARCHAR(100) NOT NULL,
>     address	INTEGER      REFERENCES addresses
> );

You get the same kinds of problems dealing with zip-codes.  New Zealand
doesn't have them if I remember correctly.  Makes it interesting to check 
input off of a web site.

Normalization is all very well and good, but I've found that aiming for 
simple, correct and fast SQL and other code often makes me denomalize 
things in odd ways.  What is it that normalization buys you here?  Are you 
trying to make it so that a simple JOIN will get all the data you want?

How you access the tables should drive the design.  What kind of actions do 
you need to do and in what ratio do you need to do them?

As to the second problem (France ending up with California as a state), I'm 
not sure how this could happen.  Are people entering this stuff without 
checks being done?   Michael's idea of a more abstract address should take 
care of this shouldn't it?  Or, you could make a joint primary key on the 
states table of the state_id and the country_id and then reference that as 
the foreign key. If you tried to insert (France,California), it would note 
that there wasn't any such state.

When you have a data relationship that is 1:0..N instead of 1:1..N, then 
things break down and you have to get tricky.  I would say that this is 
the core of your problem.  You do not have a guarantee that there is going 
to be at least one element in the states table for a country.  You can 
either fake it as your coworkers suggested, or you can try Michael's idea.

If it was up to me, I'd probably do this:

- add a flag field into the country table that indicates whether the 
country has subdivisions or not (states, provinces, territories etc.). 

- I'd have that blank entry in the states table.  One for each country that 
does not have subdivisions.

- I'd make the <state_id,country_id> tuple into the primary key of the 
states table and make sure that references to the states table used it as 
such.

The extra flag goes in the country table since it is a property of a 
country.  It gives the code using the results something to work with 
without having to resort to kludges like if(length($state_name)==0)...  
But, you still need some smarts in the code. 

The expanded primary key (this is possible in PostGres, right??) should 
eliminate the France/California problems.

Without knowing what you'll be doing to the table, it is hard to say what 
the best solutions are.  I'm definitely not religious about normalization 
at all.  I'll duplicate date all over the place if it speeds up the most 
common operations and can be maintained without too much effort.

Best,
Kyle