[Wellington-pm] Perl database stuff....

Wed Dec 27 12:57:03 PST 2006

Jacinta Richardson wrote:
> Sam Vilain wrote:
>   
>> michael at diaspora.gen.nz wrote:
>>
>>     
>>>     A: select 1 from table where business_key = ...; # off to database
>>>     if select returned nothing
>>>     begin
>>> 	B: select sequence.nextval into id; # off to database
>>> 	C: insert into table values (id, business_key); # off to database
>>>     end;
>>>
>>> Between A and B, unless you wrap a lock around the operation, there's a
>>> race condition; you could find either C failing (if there's a uniqueness
>>> constraint on "business_key"), or duplicate values in "table" (if
>>> there isn't).
>>>       
>> Surely just wrapping the entire operation in a transaction is enough?
>> Or is that one of those cases that needs the TRANSACTION ISOLATION LEVEL
>> set to max?
>>     
>
> Transactions don't solve the problem, regardless of their isolation level 
> (unless your levels mean something other than what I think they do).  The race 
> condition exists because the underlying operating system handles multiple 
> processes all at the same time.  As far as the operating system is concerned, 
> the database, scripts accessing the database, and anything else on the machine 
> are just processes.  Each process deserves time slices to do their things in a 
> timely fashion.  These are handled due to various algorithms but essentially 
> each process goes into a queue.  The OS takes one out from the front of the 
> queue, sets up it's memory, runs it's instructions for a while and then swaps 
> its used memory out to disk (if necessary) and moves that to the back of the 
> queue.  It then takes the next one out of the queue and processes that.  This 
> means that sometimes (in rare but possible cases) two or more processes can 
> collide and do the unexpected.  For example consider the following:
>
> 	Process A selects stuff for business key X - it's not there
> 	Operating system swaps Process A to memory/disk
> 	Process B selects stuff for business key X - it's not there
> 	Process B selects the sequence next val
> 	Process B inserts stuff into the database
> 	Operating system swaps Process B to memory/disk
> 	Process A selects the sequence next val
> 	Process A inserts it's stuff into the database for key X
>
> Now you have either an insertion failure (if you have unique indices) or a 
> duplicate entry.  If Process A is using transactions and you do have unique 
> indices (separate from the id) then the transaction fails because the insert 
> fails.  If you don't have a separate key then the transaction succeeds because 
> failure isn't detected.
>
> How much of a concern this problem is really depends on the system.  If you can 
> guarantee (due to file locking for example) that only one version of your 
> program is ever going to run; and you know that you won't have other programs 
> accessing the same database due to your system knowledge, then the race 
> condition isn't an issue.  On the other hand, if this is a site where hundreds 
> or thousands of different people will be adding content all day at the same time 
> and you have a requirement that that information be unique; then you're very 
> likely to hit the race condition occasionally and it will be really hard to 
> track down.
>
> Personally in most of the systems I've ever done this kind of thing, I've taken 
> the lazy approach.  I select to see if it's there, and insert otherwise. 
> However I make sure that I have unique indices based on the data entered (I try 
> to never use an id field) and I take care to handle the case of the insert 
> failing.  If I am using an id field, I set it to "auto-increment" if the 
> database allows.  Most of the time these days though, I just use Class::DBI 
> (probably soon to change to DBIx::Class).  I don't know what its policy is in 
> this regard, but I'm happy enough to trust it.
>
> Of course I've mostly been lucky in that in all the systems I've worked on, the 
> number of co-current data editors has always been less than say 20,000.  In fact 
> in most of them it's probably less than 3.  This kind of a thing is a nightmare 
> when creating items for an online store (or CMS) -- where a number of 
> administrators might try to add the exact same item -- but rarely a problem when 
> handling a shopping cart, because most people *expect* duplicate items to appear 
> if they buy the same item in the same session from two different pages.  Thus in 
> my situations the chances of hitting this kind of race condition is sufficiently 
> low that I was told not to spend the time worrying about it.
>
> It's a trade off: you can write some code correctly, or quickly.  ;)
>
> All the best,
>
> 	J
> _______________________________________________
> Wellington-pm mailing list
> Wellington-pm at pm.org
> http://mail.pm.org/mailman/listinfo/wellington-pm
>
>   
After re-reading the original question, I am leaning towards Peter 
Kelly's suggestion that this is probably well solved via the addition of 
unique indexes to the table in question and just handling failed 
inserts. However, not being savy to the details of the table in 
question, that may or may not be possible.

Transactions will save the day here if we LOCK TABLE at the start of the 
transaction. Yes, this also has implications but it will eliminate the 
race condition. I'll add to Jacinta's timeflow with a LOCK TABLE.

Process A
BEGIN;
LOCK TABLE;

	Process A selects stuff for business key X - it's not there
	Operating system swaps Process A to memory/disk
	Process B selects stuff for business key X - it's not there
-- SELECT HANGS as table is locked.
	Operating system swaps Process B to memory/disk
	Process A selects the sequence next val
	Process A inserts it's stuff into the database for key X

--  process A commits.

-- process B gets a query result with confirming the presence of the record and bahaves accordingly.

Yes, yes, I know LOCKing tables can get messy. Every time it's mentioned at a Perl Mongers meeting eyes roll as members remember late night sessions trying to find the reason queries are freezing or running slowly.

That said, if we double check that the locking table transaction is completing (and write lots of thorough log messages if it doesn't), and it doesn't do anything too complicated (i.e. time-consuming) or get requested 1000/sec then this is a workable solution.