[Wellington-pm] Perl database stuff....
Jacinta Richardson
jarich at perltraining.com.au
Wed Dec 27 04:28:37 PST 2006
Sam Vilain wrote:
> michael at diaspora.gen.nz wrote:
>
>> A: select 1 from table where business_key = ...; # off to database
>> if select returned nothing
>> begin
>> B: select sequence.nextval into id; # off to database
>> C: insert into table values (id, business_key); # off to database
>> end;
>>
>> Between A and B, unless you wrap a lock around the operation, there's a
>> race condition; you could find either C failing (if there's a uniqueness
>> constraint on "business_key"), or duplicate values in "table" (if
>> there isn't).
>
> Surely just wrapping the entire operation in a transaction is enough?
> Or is that one of those cases that needs the TRANSACTION ISOLATION LEVEL
> set to max?
Transactions don't solve the problem, regardless of their isolation level
(unless your levels mean something other than what I think they do). The race
condition exists because the underlying operating system handles multiple
processes all at the same time. As far as the operating system is concerned,
the database, scripts accessing the database, and anything else on the machine
are just processes. Each process deserves time slices to do their things in a
timely fashion. These are handled due to various algorithms but essentially
each process goes into a queue. The OS takes one out from the front of the
queue, sets up it's memory, runs it's instructions for a while and then swaps
its used memory out to disk (if necessary) and moves that to the back of the
queue. It then takes the next one out of the queue and processes that. This
means that sometimes (in rare but possible cases) two or more processes can
collide and do the unexpected. For example consider the following:
Process A selects stuff for business key X - it's not there
Operating system swaps Process A to memory/disk
Process B selects stuff for business key X - it's not there
Process B selects the sequence next val
Process B inserts stuff into the database
Operating system swaps Process B to memory/disk
Process A selects the sequence next val
Process A inserts it's stuff into the database for key X
Now you have either an insertion failure (if you have unique indices) or a
duplicate entry. If Process A is using transactions and you do have unique
indices (separate from the id) then the transaction fails because the insert
fails. If you don't have a separate key then the transaction succeeds because
failure isn't detected.
How much of a concern this problem is really depends on the system. If you can
guarantee (due to file locking for example) that only one version of your
program is ever going to run; and you know that you won't have other programs
accessing the same database due to your system knowledge, then the race
condition isn't an issue. On the other hand, if this is a site where hundreds
or thousands of different people will be adding content all day at the same time
and you have a requirement that that information be unique; then you're very
likely to hit the race condition occasionally and it will be really hard to
track down.
Personally in most of the systems I've ever done this kind of thing, I've taken
the lazy approach. I select to see if it's there, and insert otherwise.
However I make sure that I have unique indices based on the data entered (I try
to never use an id field) and I take care to handle the case of the insert
failing. If I am using an id field, I set it to "auto-increment" if the
database allows. Most of the time these days though, I just use Class::DBI
(probably soon to change to DBIx::Class). I don't know what its policy is in
this regard, but I'm happy enough to trust it.
Of course I've mostly been lucky in that in all the systems I've worked on, the
number of co-current data editors has always been less than say 20,000. In fact
in most of them it's probably less than 3. This kind of a thing is a nightmare
when creating items for an online store (or CMS) -- where a number of
administrators might try to add the exact same item -- but rarely a problem when
handling a shopping cart, because most people *expect* duplicate items to appear
if they buy the same item in the same session from two different pages. Thus in
my situations the chances of hitting this kind of race condition is sufficiently
low that I was told not to spend the time worrying about it.
It's a trade off: you can write some code correctly, or quickly. ;)
All the best,
J
More information about the Wellington-pm
mailing list