[Wellington-pm] Perl database stuff....

Jacinta Richardson jarich at perltraining.com.au
Wed Dec 27 04:28:37 PST 2006


Sam Vilain wrote:
> michael at diaspora.gen.nz wrote:
> 
>>     A: select 1 from table where business_key = ...; # off to database
>>     if select returned nothing
>>     begin
>> 	B: select sequence.nextval into id; # off to database
>> 	C: insert into table values (id, business_key); # off to database
>>     end;
>>
>> Between A and B, unless you wrap a lock around the operation, there's a
>> race condition; you could find either C failing (if there's a uniqueness
>> constraint on "business_key"), or duplicate values in "table" (if
>> there isn't).
> 
> Surely just wrapping the entire operation in a transaction is enough?
> Or is that one of those cases that needs the TRANSACTION ISOLATION LEVEL
> set to max?

Transactions don't solve the problem, regardless of their isolation level 
(unless your levels mean something other than what I think they do).  The race 
condition exists because the underlying operating system handles multiple 
processes all at the same time.  As far as the operating system is concerned, 
the database, scripts accessing the database, and anything else on the machine 
are just processes.  Each process deserves time slices to do their things in a 
timely fashion.  These are handled due to various algorithms but essentially 
each process goes into a queue.  The OS takes one out from the front of the 
queue, sets up it's memory, runs it's instructions for a while and then swaps 
its used memory out to disk (if necessary) and moves that to the back of the 
queue.  It then takes the next one out of the queue and processes that.  This 
means that sometimes (in rare but possible cases) two or more processes can 
collide and do the unexpected.  For example consider the following:

	Process A selects stuff for business key X - it's not there
	Operating system swaps Process A to memory/disk
	Process B selects stuff for business key X - it's not there
	Process B selects the sequence next val
	Process B inserts stuff into the database
	Operating system swaps Process B to memory/disk
	Process A selects the sequence next val
	Process A inserts it's stuff into the database for key X

Now you have either an insertion failure (if you have unique indices) or a 
duplicate entry.  If Process A is using transactions and you do have unique 
indices (separate from the id) then the transaction fails because the insert 
fails.  If you don't have a separate key then the transaction succeeds because 
failure isn't detected.

How much of a concern this problem is really depends on the system.  If you can 
guarantee (due to file locking for example) that only one version of your 
program is ever going to run; and you know that you won't have other programs 
accessing the same database due to your system knowledge, then the race 
condition isn't an issue.  On the other hand, if this is a site where hundreds 
or thousands of different people will be adding content all day at the same time 
and you have a requirement that that information be unique; then you're very 
likely to hit the race condition occasionally and it will be really hard to 
track down.

Personally in most of the systems I've ever done this kind of thing, I've taken 
the lazy approach.  I select to see if it's there, and insert otherwise. 
However I make sure that I have unique indices based on the data entered (I try 
to never use an id field) and I take care to handle the case of the insert 
failing.  If I am using an id field, I set it to "auto-increment" if the 
database allows.  Most of the time these days though, I just use Class::DBI 
(probably soon to change to DBIx::Class).  I don't know what its policy is in 
this regard, but I'm happy enough to trust it.

Of course I've mostly been lucky in that in all the systems I've worked on, the 
number of co-current data editors has always been less than say 20,000.  In fact 
in most of them it's probably less than 3.  This kind of a thing is a nightmare 
when creating items for an online store (or CMS) -- where a number of 
administrators might try to add the exact same item -- but rarely a problem when 
handling a shopping cart, because most people *expect* duplicate items to appear 
if they buy the same item in the same session from two different pages.  Thus in 
my situations the chances of hitting this kind of race condition is sufficiently 
low that I was told not to spend the time worrying about it.

It's a trade off: you can write some code correctly, or quickly.  ;)

All the best,

	J


More information about the Wellington-pm mailing list