[Pdx-pm] (OT) Database design guru
Kyle Hayes
kyle_hayes at speakeasy.net
Thu Jan 23 22:39:49 CST 2003
On Thursday January 23, 2003 18:45, Michael G Schwern wrote:
> On Thu, Jan 23, 2003 at 04:47:24PM -0800, Curtis Poe wrote:
> > I'm going nuts with a problem at work regarding a database
> > normalization problem. I've been searching high and low for an
> > answer, but not luck so far. On the off chance that anyone wants a
> > conundrum to chew on:
> >
> > http://www.perlmonks.org/index.pl?node_id=108949&user=Ovid
>
> Seperate out the state and country fields into a more abstract "address"
> table. Technically this is denormalized, but its a nice logical
> breakdown and handily solves your problem.
>
> CREATE TABLE addresses (
> id INTEGER NOT NULL PRIMARY KEY,
> state INTEGER REFERENCES states,
> country INTEGER REFERENCES countries
> );
>
> CREATE TABLE customers (
> customer_id serial NOT NULL PRIMARY KEY,
> first_name VARCHAR(50) NOT NULL,
> last_name VARCHAR(100) NOT NULL,
> address INTEGER REFERENCES addresses
> );
You get the same kinds of problems dealing with zip-codes. New Zealand
doesn't have them if I remember correctly. Makes it interesting to check
input off of a web site.
Normalization is all very well and good, but I've found that aiming for
simple, correct and fast SQL and other code often makes me denomalize
things in odd ways. What is it that normalization buys you here? Are you
trying to make it so that a simple JOIN will get all the data you want?
How you access the tables should drive the design. What kind of actions do
you need to do and in what ratio do you need to do them?
As to the second problem (France ending up with California as a state), I'm
not sure how this could happen. Are people entering this stuff without
checks being done? Michael's idea of a more abstract address should take
care of this shouldn't it? Or, you could make a joint primary key on the
states table of the state_id and the country_id and then reference that as
the foreign key. If you tried to insert (France,California), it would note
that there wasn't any such state.
When you have a data relationship that is 1:0..N instead of 1:1..N, then
things break down and you have to get tricky. I would say that this is
the core of your problem. You do not have a guarantee that there is going
to be at least one element in the states table for a country. You can
either fake it as your coworkers suggested, or you can try Michael's idea.
If it was up to me, I'd probably do this:
- add a flag field into the country table that indicates whether the
country has subdivisions or not (states, provinces, territories etc.).
- I'd have that blank entry in the states table. One for each country that
does not have subdivisions.
- I'd make the <state_id,country_id> tuple into the primary key of the
states table and make sure that references to the states table used it as
such.
The extra flag goes in the country table since it is a property of a
country. It gives the code using the results something to work with
without having to resort to kludges like if(length($state_name)==0)...
But, you still need some smarts in the code.
The expanded primary key (this is possible in PostGres, right??) should
eliminate the France/California problems.
Without knowing what you'll be doing to the table, it is hard to say what
the best solutions are. I'm definitely not religious about normalization
at all. I'll duplicate date all over the place if it speeds up the most
common operations and can be maintained without too much effort.
Best,
Kyle
More information about the Pdx-pm-list
mailing list