[Wellington-pm] Perl database stuff....

Tue Dec 26 00:57:14 PST 2006

In message <E1Gz5jc-0004lS-00 at israel.diaspora.gen.nz>, michael at diaspora.gen.nz writes:
>    A: select 1 from table where business_key = ...; # off to database
>    if select returned nothing
>    begin
>	B: select sequence.nextval into id; # off to database
>	C: insert into table values (id, business_key); # off to database
>    end;
>
>    [Race between A and B/C without external locking]

Since we're talking PostgreSQL, you can turn B and C into one statement:

INSERT INTO table (id, business_key, ....)
SELECT nextval('sequence'), 'business_key', 'other', 'parameters', ....

(and then use "SELECT currval('sequence');" to get the ID used -- if you
only want it to use in another insert, you can do the same thing as
above on the next INSERT or UPDATE, but with currval() on that sequence
-- I do this quite frequently when preparing interrelated test data just
with basic SQL statements).

But you still have the race between A and the combined B/C statement
without some more locking.

Where it's possible to create a unique index on the business key, that's
probably the most robust solution (then just try the insert, and check
if it failed because of a duplicate business key, rather than selecting
for it first).  It also avoids duplicates arrising in some other way
(eg, initial data import).

Where that's not possible, it's more tricky to avoid the race, and hence
the duplicates.  PostgreSQL supports "SELECT FOR UPDATE":

http://www.postgresql.org/docs/8.1/static/sql-select.html#SQL-FOR-UPDATE-SHARE

but that appears not to lock against inserts (presumably it relies on
having an actual row to lock, rather than just a potential key; the 
locking section of the PostgreSQL documentation seems to confirm that
it's not locked against inserts since "SELECT FOR UPDATE" takes a less
exclusive lock than "INSERT").

Another possibility would be to explicitly acquire the "access
exclusive" table lock on the table in question:

http://www.postgresql.org/docs/8.1/static/explicit-locking.html

prior to doing the "is it there" select.  But that's (a) non-portable,
and (b) potentially blocking the database for more time than is
desirable.  (It also appears to need to be done in an explicit
transaction as the way of releasing it.)

A third possibility seems to be:

INSERT INTO table (id, business_key, ....)
SELECT nextval('sequence'), 'business_key', ....
WHERE NOT EXISTS 
      (SELECT 1 FROM table WHERE business_key='business_key' limit 1);

as suggested here:

http://beta.nntp.perl.org/group/perl.dbi.users/2006/08/msg29954.html

It appears to be valid PostgreSQL, and to work in the simple cases (ie,
record doesn't exist, record exists, only one simultaneous statement),
but I'm not yet convinced it doesn't still have a tiny race inside
the database.  Creating a UNIQUE INDEX on the business key seems safer
(and less involved!).

This discussion (Drupal, PHP) is possibly of interest too:

http://drupal.org/node/1811

They appear to have settled on taking a table lock and dealing with
database variations in doing that.  (Presumably with PHP they're 
already dealing with enough other database access variations that
it doesn't matter.)

Ewen

PS: You could do locking in the application, but then you're effectively
    doing advisory locking rather than mandatory locking, and you will
    race against anything which doesn't play along with your advisory
    locking.