[Wellington-pm] Perl database stuff....

Wed Dec 27 15:25:50 PST 2006

Andrew Boag wrote:
> Jacinta Richardson wrote:
>> Sam Vilain wrote:
>> 
>>> michael at diaspora.gen.nz wrote:
>>> 
>>> 
>>>> A: select 1 from table where business_key = ...; # off to
>>>> database if select returned nothing begin B: select
>>>> sequence.nextval into id; # off to database C: insert into
>>>> table values (id, business_key); # off to database end;
>>>> 
>>>> Between A and B, unless you wrap a lock around the operation,
>>>> there's a race condition; you could find either C failing (if
>>>> there's a uniqueness constraint on "business_key"), or
>>>> duplicate values in "table" (if there isn't).
>>>> 
>>> Surely just wrapping the entire operation in a transaction is
>>> enough? Or is that one of those cases that needs the TRANSACTION
>>> ISOLATION LEVEL set to max?
>>> 
>> Transactions don't solve the problem, regardless of their isolation
>> level (unless your levels mean something other than what I think
>> they do).  The race condition exists because the underlying
>> operating system handles multiple processes all at the same time.
>> As far as the operating system is concerned, the database, scripts
>> accessing the database, and anything else on the machine are just
>> processes.  Each process deserves time slices to do their things in
>> a timely fashion.  These are handled due to various algorithms but
>> essentially each process goes into a queue.  The OS takes one out
>> from the front of the queue, sets up it's memory, runs it's
>> instructions for a while and then swaps its used memory out to disk
>> (if necessary) and moves that to the back of the queue.  It then
>> takes the next one out of the queue and processes that.  This means
>> that sometimes (in rare but possible cases) two or more processes
>> can collide and do the unexpected.  For example consider the
>> following:
>> 
>> Process A selects stuff for business key X - it's not there 
>> Operating system swaps Process A to memory/disk Process B selects
>> stuff for business key X - it's not there Process B selects the
>> sequence next val Process B inserts stuff into the database 
>> Operating system swaps Process B to memory/disk Process A selects
>> the sequence next val Process A inserts it's stuff into the
>> database for key X
>> 
>> Now you have either an insertion failure (if you have unique
>> indices) or a duplicate entry.  If Process A is using transactions
>> and you do have unique indices (separate from the id) then the
>> transaction fails because the insert fails.  If you don't have a
>> separate key then the transaction succeeds because failure isn't
>> detected.
>> 
>> How much of a concern this problem is really depends on the system.
>> If you can guarantee (due to file locking for example) that only
>> one version of your program is ever going to run; and you know that
>> you won't have other programs accessing the same database due to
>> your system knowledge, then the race condition isn't an issue.  On
>> the other hand, if this is a site where hundreds or thousands of
>> different people will be adding content all day at the same time 
>> and you have a requirement that that information be unique; then
>> you're very likely to hit the race condition occasionally and it
>> will be really hard to track down.
>> 
>> Personally in most of the systems I've ever done this kind of
>> thing, I've taken the lazy approach.  I select to see if it's
>> there, and insert otherwise. However I make sure that I have unique
>> indices based on the data entered (I try to never use an id field)
>> and I take care to handle the case of the insert failing.  If I am
>> using an id field, I set it to "auto-increment" if the database
>> allows.  Most of the time these days though, I just use Class::DBI
>>  (probably soon to change to DBIx::Class).  I don't know what its
>> policy is in this regard, but I'm happy enough to trust it.
>> 
>> Of course I've mostly been lucky in that in all the systems I've
>> worked on, the number of co-current data editors has always been
>> less than say 20,000.  In fact in most of them it's probably less
>> than 3.  This kind of a thing is a nightmare when creating items
>> for an online store (or CMS) -- where a number of administrators
>> might try to add the exact same item -- but rarely a problem when 
>> handling a shopping cart, because most people *expect* duplicate
>> items to appear if they buy the same item in the same session from
>> two different pages.  Thus in my situations the chances of hitting
>> this kind of race condition is sufficiently low that I was told not
>> to spend the time worrying about it.
>> 
>> It's a trade off: you can write some code correctly, or quickly.
>> ;)
>> 
>> All the best,
>> 
>> J _______________________________________________ Wellington-pm
>> mailing list Wellington-pm at pm.org 
>> http://mail.pm.org/mailman/listinfo/wellington-pm
>> 
>> 
> After re-reading the original question, I am leaning towards Peter 
> Kelly's suggestion that this is probably well solved via the addition
> of unique indexes to the table in question and just handling failed 
> inserts. However, not being savy to the details of the table in 
> question, that may or may not be possible.
> 
> Transactions will save the day here if we LOCK TABLE at the start of
> the transaction. Yes, this also has implications but it will
> eliminate the race condition. I'll add to Jacinta's timeflow with a
> LOCK TABLE.
> 
> Process A BEGIN; LOCK TABLE;
> 
> Process A selects stuff for business key X - it's not there Operating
> system swaps Process A to memory/disk Process B selects stuff for
> business key X - it's not there -- SELECT HANGS as table is locked. 
> Operating system swaps Process B to memory/disk Process A selects the
> sequence next val Process A inserts it's stuff into the database for
> key X
> 
> --  process A commits.
> 
> -- process B gets a query result with confirming the presence of the
> record and bahaves accordingly.
> 
> Yes, yes, I know LOCKing tables can get messy. Every time it's
> mentioned at a Perl Mongers meeting eyes roll as members remember
> late night sessions trying to find the reason queries are freezing or
> running slowly.
> 
> That said, if we double check that the locking table transaction is
> completing (and write lots of thorough log messages if it doesn't),
> and it doesn't do anything too complicated (i.e. time-consuming) or
> get requested 1000/sec then this is a workable solution.
> 
I'm still determining what the table contains, but basically it will be

CREATE TABLE names_table (
     name_id serial NOT NULL,
     name character varying(120) UNIQUE NOT NULL,
	.... other columns .......
     PRIMARY KEY (name_id)
);

I think I'll stick with my original plan:

1) SELECT * FROM names_table WHERE name = 'new_name' ;
(test success, expect fail).

2) INSERT INTO name_table VALUES (default, 'new_name') ;
(test success, expect success).

3) SELECT * FROM names_table WHERE name = 'new_name') ;
(test success, expect success).

Between 1) and 2) there *might* be a successful insert, but that won't 
matter because of the UNIQUE constraint on the 'name' column. It will 
prevent the *same* row being inserted twice. Between 2) and 3) there may 
be a successful insert, but again that won't matter because the select 
will pick up the correct name_id.

In practise the table will get updated once a day, so race conditions 
are unlikely in the extreme!

Cheers,

Cliff