From grant at mclean.net.nz Tue Dec 5 00:30:32 2006 From: grant at mclean.net.nz (Grant McLean) Date: Tue, 05 Dec 2006 21:30:32 +1300 Subject: [Wellington-pm] Perl Mongers Quiz Night Next Tuesday Message-ID: <1165307432.6133.12.camel@localhost.localdomain> Hi Mongers Just a quick reminder - the December meeting of Wellington.PM is next Tuesday and will be a quiz night. I've put up a web page that hopefully answers any questions you might have: http://wellington.pm.org/quiz_night.html See you there. Grant From grant at mclean.net.nz Mon Dec 11 12:03:32 2006 From: grant at mclean.net.nz (Grant McLean) Date: Tue, 12 Dec 2006 09:03:32 +1300 Subject: [Wellington-pm] Perl Mongers Quiz Night Tonight Message-ID: <1165867413.15386.1.camel@putnam.wgtn.cat-it.co.nz> Hi All Just a quick reminder - the December meeting of Wellington.PM is a quiz night tonight! http://wellington.pm.org/quiz_night.html See you there. Grant From grant at mclean.net.nz Wed Dec 13 01:59:52 2006 From: grant at mclean.net.nz (Grant McLean) Date: Wed, 13 Dec 2006 22:59:52 +1300 Subject: [Wellington-pm] Round up of Tuesday's Meeting Message-ID: <1166003992.7266.28.camel@localhost.localdomain> Hi Mongers Thanks to everyone for joining in to make the last meeting of 2006 so much fun. Congratulations to the winning team: Ewen, Donovan, Matt and Srdjan - let the adulation begin :-) Special thanks to Michael for helping prepare questions and handling the scoring on the night; and to Matt for the wonderful title slides and for organising the nibbles at the 11th hour. I'm planning to offer the quiz questions to other PM group coordinators so I won't be putting them up on the wellington.pm web site. The first meeting of 2007 will be a lightning talk meeting in February. That means early in the New Year I'll start bugging *you* to come up with a 5 minute talk. But I won't start that yet. In the meantime, I'd like to wish everyone a Merry Christmas and a Happy New Year. See you in 2007. Regards Grant From hta.lists at gmail.com Wed Dec 13 12:45:30 2006 From: hta.lists at gmail.com (Donovan Jones) Date: Thu, 14 Dec 2006 09:45:30 +1300 Subject: [Wellington-pm] Round up of Tuesday's Meeting In-Reply-To: <1166003992.7266.28.camel@localhost.localdomain> References: <1166003992.7266.28.camel@localhost.localdomain> Message-ID: <81bb842d0612131245q5cd4ec71od28759e2a18215eb@mail.gmail.com> On 12/13/06, Grant McLean wrote: > > Hi Mongers > > Thanks to everyone for joining in to make the last meeting of 2006 so > much fun. > Yes, thankyou to the organisers, it was fun. Also thanks to Grant for all the work put into runing Wellington.pm for another year. I only caught the last 3 meetings but they were all worthwile. Thanks! Donovan -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.pm.org/pipermail/wellington-pm/attachments/20061214/dcc2d90e/attachment.html From enkidu at cliffp.com Sat Dec 23 17:29:44 2006 From: enkidu at cliffp.com (Cliff Pratt) Date: Sun, 24 Dec 2006 14:29:44 +1300 Subject: [Wellington-pm] Perl database stuff.... Message-ID: <458DD808.6050803@cliffp.com> I want to write a row, using Perl, to a PostgreSQL database, *but only if the row does not already exist*. If I add a row, I need to know the ID of the row. Currently what I have is: 1) Select the row using a WHERE clause to select the row I want to add, to see if it is already there. 2) If the row does exist return an error (details don't matter). 3) If the row doesn't exist insert it, defaulting the ID which is generated by a sequence. 4) Read it again to find the ID which was generated during the insert. Blah! This works but is long-winded, untidy and generally blah! Is there a smarter way to do it, either on the Perl side or on the database side? Currently the code is plastered with '...or die....' but having just read Grant's piece on error handling I think I can tidy THAT up. I intend to wrap it up as a transaction eventually, so that will be tidied up too, and that just leaves the select, insert, select as the remaining nastiness. Any ideas? Cheers, Cliff From perlmonger at pck.co.nz Sat Dec 23 17:49:01 2006 From: perlmonger at pck.co.nz (Peter Kelly) Date: Sun, 24 Dec 2006 09:49:01 +0800 Subject: [Wellington-pm] Perl database stuff.... In-Reply-To: <458DD808.6050803@cliffp.com> References: <458DD808.6050803@cliffp.com> Message-ID: <458DDC8D.1070506@pck.co.nz> Cliff Pratt wrote: > I want to write a row... Hi Points 1-3: this sounds like a problem that can be solved by appropriate unique indexes on the table, which would then generate an error automatically on insert. On point four, see http://archives.postgresql.org/pgsql-novice/2003-08/msg00163.php currval is your friend. Cheers, Peter From andrew at catalyst.net.nz Sat Dec 23 23:12:25 2006 From: andrew at catalyst.net.nz (Andrew McMillan) Date: Sun, 24 Dec 2006 20:12:25 +1300 Subject: [Wellington-pm] Perl database stuff.... In-Reply-To: <458DD808.6050803@cliffp.com> References: <458DD808.6050803@cliffp.com> Message-ID: <1166944345.5116.34.camel@ubu.mcmillan.net.nz> On Sun, 2006-12-24 at 14:29 +1300, Cliff Pratt wrote: > I want to write a row, using Perl, to a PostgreSQL database, *but only > if the row does not already exist*. If I add a row, I need to know the > ID of the row. > > Currently what I have is: > > 1) Select the row using a WHERE clause to select the row I want to add, > to see if it is already there. > > 2) If the row does exist return an error (details don't matter). > > 3) If the row doesn't exist insert it, defaulting the ID which is > generated by a sequence. > > 4) Read it again to find the ID which was generated during the insert. > > Blah! This works but is long-winded, untidy and generally blah! > > Is there a smarter way to do it, either on the Perl side or on the > database side? Perhaps. Certainly using nextval() or currval() are better approaches for getting hold of the generated sequence. Here are some more alternative approaches: (a) On the database side you could write a function which accepted all of the row parameters, selected the row returning the id (possibly throwing an error). If the row didn't exist it could insert the row, returning the id either using currval('sequence') to get the sequence value after the fact, or by using nextval('sequence') and explicitly assigning that for the ID column. (b) On the database side you could write a fancy insert ... select ... statement such as: INSERT INTO table1 (col1,col2) SELECT 1 AS col1, 2 AS col2 FROM table1 WHERE NOT exists(SELECT 1 FROM table1 WHERE col1=1 AND col2=2) LIMIT 1; That's a pretty sneaky one, and might not make maintenance of the program as straightforward as it might otherwise be. It won't give you an error when the row already exists - it will just silently not insert a row, which can sometimes be what you want. (c) In the perl you can make it "select, select, insert" as: 1. select the record (and fail). 2. select the sequence nextval 3. insert the row using the sequence. Which is actually the way I would tend to do it myself because it seems so much cleaner to know the ID in advance and then write multiple SQL calls using that value. I tend to take the (a) approach when I have some values that I want to 'either update or insert these values', which is slightly different from your question here. Cheers, Andrew. ------------------------------------------------------------------------- Andrew @ Catalyst .Net .NZ Ltd, PO Box 11-053, Manners St, Wellington WEB: http://catalyst.net.nz/ PHYS: Level 2, 150-154 Willis St DDI: +64(4)803-2201 MOB: +64(272)DEBIAN OFFICE: +64(4)499-2267 Planning an election? Call us! ------------------------------------------------------------------------- -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://mail.pm.org/pipermail/wellington-pm/attachments/20061224/0c2f308d/attachment.bin From enkidu at cliffp.com Sun Dec 24 01:29:34 2006 From: enkidu at cliffp.com (Cliff Pratt) Date: Sun, 24 Dec 2006 22:29:34 +1300 Subject: [Wellington-pm] Perl database stuff.... In-Reply-To: <1166944345.5116.34.camel@ubu.mcmillan.net.nz> References: <458DD808.6050803@cliffp.com> <1166944345.5116.34.camel@ubu.mcmillan.net.nz> Message-ID: <458E487E.3030800@cliffp.com> Andrew McMillan wrote: > On Sun, 2006-12-24 at 14:29 +1300, Cliff Pratt wrote: >> I want to write a row, using Perl, to a PostgreSQL database, *but only >> if the row does not already exist*. If I add a row, I need to know the >> ID of the row. >> >> Currently what I have is: >> >> 1) Select the row using a WHERE clause to select the row I want to add, >> to see if it is already there. >> >> 2) If the row does exist return an error (details don't matter). >> >> 3) If the row doesn't exist insert it, defaulting the ID which is >> generated by a sequence. >> >> 4) Read it again to find the ID which was generated during the insert. >> >> Blah! This works but is long-winded, untidy and generally blah! >> >> Is there a smarter way to do it, either on the Perl side or on the >> database side? > > Perhaps. Certainly using nextval() or currval() are better approaches > for getting hold of the generated sequence. > > Here are some more alternative approaches: > > (a) On the database side you could write a function which accepted all > of the row parameters, selected the row returning the id (possibly > throwing an error). If the row didn't exist it could insert the row, > returning the id either using currval('sequence') to get the sequence > value after the fact, or by using nextval('sequence') and explicitly > assigning that for the ID column. > > (b) On the database side you could write a fancy insert ... select ... > statement such as: > > INSERT INTO table1 (col1,col2) > SELECT 1 AS col1, 2 AS col2 FROM table1 > WHERE NOT exists(SELECT 1 FROM table1 WHERE col1=1 AND col2=2) > LIMIT 1; > > That's a pretty sneaky one, and might not make maintenance of the > program as straightforward as it might otherwise be. It won't give you > an error when the row already exists - it will just silently not insert > a row, which can sometimes be what you want. > > > (c) In the perl you can make it "select, select, insert" as: > 1. select the record (and fail). > 2. select the sequence nextval > 3. insert the row using the sequence. > > Which is actually the way I would tend to do it myself because it seems > so much cleaner to know the ID in advance and then write multiple SQL > calls using that value. I tend to take the (a) approach when I have > some values that I want to 'either update or insert these values', which > is slightly different from your question here. > > Cheers, > Andrew. > Thanks for that. I like solution c) which is a bit neater than my solution. Re solution b), I was wondering if there was a DB side answer, but that is too tricky for me! a) is interesting and I might look further into that. Cheers, Cliff From jarich at perltraining.com.au Sun Dec 24 03:55:18 2006 From: jarich at perltraining.com.au (Jacinta Richardson) Date: Sun, 24 Dec 2006 22:55:18 +1100 Subject: [Wellington-pm] Perl database stuff.... In-Reply-To: <1166944345.5116.34.camel@ubu.mcmillan.net.nz> References: <458DD808.6050803@cliffp.com> <1166944345.5116.34.camel@ubu.mcmillan.net.nz> Message-ID: <458E6AA6.70208@perltraining.com.au> Andrew McMillan wrote: > (c) In the perl you can make it "select, select, insert" as: > 1. select the record (and fail). > 2. select the sequence nextval > 3. insert the row using the sequence. > > Which is actually the way I would tend to do it myself because it seems > so much cleaner to know the ID in advance and then write multiple SQL > calls using that value. I tend to take the (a) approach when I have > some values that I want to 'either update or insert these values', which > is slightly different from your question here. Make sure you're aware of how your database handles sequencing here, and also what requirements you have on your data. For example, if your database does not increment the sequence when you merely look at it, but rather does so when it is used then that could cause the same sequence number to be given out to multiple records. You can test this pretty easily by just doing a number of reads and seeing whether they're all different. If you're using the sequence number to just ensure unique identifiers then that's fine. However, if you need all numbers to exist, then you've got a problem again. How do you handle the case where you select the sequence number successfully, but then fail to insert the row using that number? Assuming that sequencing is handled correctly (incremented upon view) then you'll end up with holes in your numbering. Finally, you need to know how your database handles such holes (which may also be caused by item deletion). Some databases re-use missing id numbers, some don't. Are you relying on id numbers to give you chronological ordering of your data? If so, does it matter that item 56 might actually hit the database sooner than items 54 and 55 due to network congestion? If you don't want to have to worry about this sort of thing, you may find a database abstraction layer to be a better solution. For example Class::DBI allows you to to solve this problem with code similar to the following: my $object = $class->find_or_create(\%data); my $seq = "$object"; # assuming this seq number is the primary key I'm sure there are equally easy solutions in DBIx::Class, Tanagram and the others. All the best, Jacinta From andrew at catalyst.net.nz Mon Dec 25 00:12:48 2006 From: andrew at catalyst.net.nz (Andrew McMillan) Date: Mon, 25 Dec 2006 21:12:48 +1300 Subject: [Wellington-pm] Perl database stuff.... In-Reply-To: <458E6AA6.70208@perltraining.com.au> References: <458DD808.6050803@cliffp.com> <1166944345.5116.34.camel@ubu.mcmillan.net.nz> <458E6AA6.70208@perltraining.com.au> Message-ID: <1167034368.5116.54.camel@ubu.mcmillan.net.nz> On Sun, 2006-12-24 at 22:55 +1100, Jacinta Richardson wrote: > Andrew McMillan wrote: > > > (c) In the perl you can make it "select, select, insert" as: > > 1. select the record (and fail). > > 2. select the sequence nextval > > 3. insert the row using the sequence. > > > > Which is actually the way I would tend to do it myself because it seems > > so much cleaner to know the ID in advance and then write multiple SQL > > calls using that value. I tend to take the (a) approach when I have > > some values that I want to 'either update or insert these values', which > > is slightly different from your question here. > > Make sure you're aware of how your database handles sequencing here, and also > what requirements you have on your data. For example, if your database does not > increment the sequence when you merely look at it, but rather does so when it is > used then that could cause the same sequence number to be given out to multiple > records. You can test this pretty easily by just doing a number of reads and > seeing whether they're all different. Good points in general, Jacinta. In this case Cliff explicitly said PostgreSQL, which does sequences in an efficient and transaction safe manner, and which I know very well, so I gave him a fairly explicit answer. > However, if you need all numbers to exist, then you've got a > problem again. Yeah. Hopefully it's merely a problem with whoever wrote your requirements, and you can solve it by educating or replacing them :-) Otherwise it's just a deadly recipe for race conditions or applications having to block on a table lock. :-) Andrew. ------------------------------------------------------------------------- Andrew @ Catalyst .Net .NZ Ltd, PO Box 11-053, Manners St, Wellington WEB: http://catalyst.net.nz/ PHYS: Level 2, 150-154 Willis St DDI: +64(4)803-2201 MOB: +64(272)DEBIAN OFFICE: +64(4)499-2267 Today is what happened to yesterday. ------------------------------------------------------------------------- -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://mail.pm.org/pipermail/wellington-pm/attachments/20061225/4725f875/attachment.bin From michael at diaspora.gen.nz Mon Dec 25 03:27:38 2006 From: michael at diaspora.gen.nz (michael at diaspora.gen.nz) Date: Tue, 26 Dec 2006 00:27:38 +1300 Subject: [Wellington-pm] Perl database stuff.... In-Reply-To: Your message of "Sun, 24 Dec 2006 20:12:25 +1300." <1166944345.5116.34.camel@ubu.mcmillan.net.nz> Message-ID: >(c) In the perl you can make it "select, select, insert" as: > 1. select the record (and fail). > 2. select the sequence nextval > 3. insert the row using the sequence. There is of course a race condition here; between the select record and select the sequence nextval, another thread could perform the full process. So be prepared to handle failures on insert if your sequence is not the primary key, or potentially deal with duplicate rows, or wrap a reliable transaction of some sort around the block. (Personally, I'd do an INSERT ... RETURNING to give me back the nextval, in the belief that minimizing database round trips is a good idea.) DB2 has I believe a primitive for "INSERT OR UPDATE" which goes some way to solving this problem; MySQL has INSERT ... IGNORE and INSERT ... ON DUPLICATE KEY UPDATE, which may do the trick also. -- michael. From andrew at catalyst.net.nz Mon Dec 25 18:42:24 2006 From: andrew at catalyst.net.nz (Andrew McMillan) Date: Tue, 26 Dec 2006 15:42:24 +1300 Subject: [Wellington-pm] Perl database stuff.... In-Reply-To: References: Message-ID: <1167100944.5116.59.camel@ubu.mcmillan.net.nz> On Tue, 2006-12-26 at 00:27 +1300, michael at diaspora.gen.nz wrote: > >(c) In the perl you can make it "select, select, insert" as: > > 1. select the record (and fail). > > 2. select the sequence nextval > > 3. insert the row using the sequence. > > There is of course a race condition here; between the select record > and select the sequence nextval, another thread could perform the full > process. Ah, no. We're talking about PostgreSQL here, so there isn't. Cheers, Andrew. ------------------------------------------------------------------------- Andrew @ Catalyst .Net .NZ Ltd, PO Box 11-053, Manners St, Wellington WEB: http://catalyst.net.nz/ PHYS: Level 2, 150-154 Willis St DDI: +64(4)803-2201 MOB: +64(272)DEBIAN OFFICE: +64(4)499-2267 Chess tonight. ------------------------------------------------------------------------- -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://mail.pm.org/pipermail/wellington-pm/attachments/20061226/af26e9e2/attachment.bin From michael at diaspora.gen.nz Mon Dec 25 22:24:23 2006 From: michael at diaspora.gen.nz (michael at diaspora.gen.nz) Date: Tue, 26 Dec 2006 19:24:23 +1300 Subject: [Wellington-pm] Perl database stuff.... In-Reply-To: Your message of "Tue, 26 Dec 2006 15:42:24 +1300." <1167100944.5116.59.camel@ubu.mcmillan.net.nz> Message-ID: >>>(c) In the perl you can make it "select, select, insert" as: >>> 1. select the record (and fail). >>> 2. select the sequence nextval >>> 3. insert the row using the sequence. >> There is of course a race condition here; between the select record >> and select the sequence nextval, another thread could perform the full >> process. >Ah, no. We're talking about PostgreSQL here, so there isn't. Sure there is. I even qualified it by saying that the impact of the race would depend on your schema. I understand that the "nextval" operation is atomic (and good on it; dealing with DBs that *don't* have that operation is a pain in the ass), but as I understood things, the model was something like (in pseudo code, but meant to be implemented in Perl): A: select 1 from table where business_key = ...; # off to database if select returned nothing begin B: select sequence.nextval into id; # off to database C: insert into table values (id, business_key); # off to database end; Between A and B, unless you wrap a lock around the operation, there's a race condition; you could find either C failing (if there's a uniqueness constraint on "business_key"), or duplicate values in "table" (if there isn't). The fact that you'd need to do a select first implies that there is a business key, rather than just a surrogate ID generated from a sequence, so you need to worry about this. -- michael. From ewen at naos.co.nz Tue Dec 26 00:57:14 2006 From: ewen at naos.co.nz (Ewen McNeill) Date: Tue, 26 Dec 2006 21:57:14 +1300 Subject: [Wellington-pm] Perl database stuff.... In-Reply-To: Message from michael@diaspora.gen.nz of "Tue, 26 Dec 2006 19:24:23 +1300." Message-ID: <20061226085714.C1220112DB2@wat.la.naos.co.nz> In message , michael at diaspora.gen.nz writes: > A: select 1 from table where business_key = ...; # off to database > if select returned nothing > begin > B: select sequence.nextval into id; # off to database > C: insert into table values (id, business_key); # off to database > end; > > [Race between A and B/C without external locking] Since we're talking PostgreSQL, you can turn B and C into one statement: INSERT INTO table (id, business_key, ....) SELECT nextval('sequence'), 'business_key', 'other', 'parameters', .... (and then use "SELECT currval('sequence');" to get the ID used -- if you only want it to use in another insert, you can do the same thing as above on the next INSERT or UPDATE, but with currval() on that sequence -- I do this quite frequently when preparing interrelated test data just with basic SQL statements). But you still have the race between A and the combined B/C statement without some more locking. Where it's possible to create a unique index on the business key, that's probably the most robust solution (then just try the insert, and check if it failed because of a duplicate business key, rather than selecting for it first). It also avoids duplicates arrising in some other way (eg, initial data import). Where that's not possible, it's more tricky to avoid the race, and hence the duplicates. PostgreSQL supports "SELECT FOR UPDATE": http://www.postgresql.org/docs/8.1/static/sql-select.html#SQL-FOR-UPDATE-SHARE but that appears not to lock against inserts (presumably it relies on having an actual row to lock, rather than just a potential key; the locking section of the PostgreSQL documentation seems to confirm that it's not locked against inserts since "SELECT FOR UPDATE" takes a less exclusive lock than "INSERT"). Another possibility would be to explicitly acquire the "access exclusive" table lock on the table in question: http://www.postgresql.org/docs/8.1/static/explicit-locking.html prior to doing the "is it there" select. But that's (a) non-portable, and (b) potentially blocking the database for more time than is desirable. (It also appears to need to be done in an explicit transaction as the way of releasing it.) A third possibility seems to be: INSERT INTO table (id, business_key, ....) SELECT nextval('sequence'), 'business_key', .... WHERE NOT EXISTS (SELECT 1 FROM table WHERE business_key='business_key' limit 1); as suggested here: http://beta.nntp.perl.org/group/perl.dbi.users/2006/08/msg29954.html It appears to be valid PostgreSQL, and to work in the simple cases (ie, record doesn't exist, record exists, only one simultaneous statement), but I'm not yet convinced it doesn't still have a tiny race inside the database. Creating a UNIQUE INDEX on the business key seems safer (and less involved!). This discussion (Drupal, PHP) is possibly of interest too: http://drupal.org/node/1811 They appear to have settled on taking a table lock and dealing with database variations in doing that. (Presumably with PHP they're already dealing with enough other database access variations that it doesn't matter.) Ewen PS: You could do locking in the application, but then you're effectively doing advisory locking rather than mandatory locking, and you will race against anything which doesn't play along with your advisory locking. From ewen at naos.co.nz Tue Dec 26 01:16:26 2006 From: ewen at naos.co.nz (Ewen McNeill) Date: Tue, 26 Dec 2006 22:16:26 +1300 Subject: [Wellington-pm] Perl database stuff.... In-Reply-To: Message from Ewen McNeill of "Tue, 26 Dec 2006 21:57:14 +1300." <20061226085714.C1220112DB2@wat.la.naos.co.nz> Message-ID: <20061226091626.25828112DB2@wat.la.naos.co.nz> In message <20061226085714.C1220112DB2 at wat.la.naos.co.nz>, Ewen McNeill writes: >A third possibility seems to be: > >INSERT INTO table (id, business_key, ....) >SELECT nextval('sequence'), 'business_key', .... >WHERE NOT EXISTS > (SELECT 1 FROM table WHERE business_key='business_key' limit 1); > >[http://beta.nntp.perl.org/group/perl.dbi.users/2006/08/msg29954.html] but it seems I was right to be cautious about this being a complete solution. A more involved approach: http://www.thescripts.com/forum/thread172627.html which involves doing a "no op" UPDATE on some row on the table to force the database to take a lock. Although just looking at that, it appears to primarily be using the update as a sort of advisory locking and relying on the INSERT... WHERE NOT EXISTS... for more robust protection against things not playing along. And of course it relies on an explicit transaction to hold the lock scope open through the INSERT. IME many many applications just ignore this sort of problem and "hope for the best" (or claim there's no solution). It seems particularly common for people who started out with "toy" databases that aren't ACID compliant. Ewen From srdjan at catalyst.net.nz Tue Dec 26 14:06:36 2006 From: srdjan at catalyst.net.nz (Srdjan) Date: Wed, 27 Dec 2006 11:06:36 +1300 Subject: [Wellington-pm] Perl database stuff.... In-Reply-To: <20061226085714.C1220112DB2@wat.la.naos.co.nz> References: <20061226085714.C1220112DB2@wat.la.naos.co.nz> Message-ID: <45919CEC.30801@catalyst.net.nz> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Yes, been there. In those cases table invariably refers to some other_table with business_key (if I'm corect) as the primary key. So what I do is: BEGIN; SELECT * FROM other_table WHERE business_key = 'blah'; $id = SELECT id FROM table WHERE whatever; if ($row) { UPDATE table SET whatever WHERE id = $id; } else { $id = SELECT nextval('sequence'); INSERT... } COMMIT; (or ROLLBACK) at liking. Srdjan Ewen McNeill wrote: > In message , michael at diaspora.gen.nz writes: >> A: select 1 from table where business_key = ...; # off to database >> if select returned nothing >> begin >> B: select sequence.nextval into id; # off to database >> C: insert into table values (id, business_key); # off to database >> end; >> >> [Race between A and B/C without external locking] > > Since we're talking PostgreSQL, you can turn B and C into one statement: > > INSERT INTO table (id, business_key, ....) > SELECT nextval('sequence'), 'business_key', 'other', 'parameters', .... > > (and then use "SELECT currval('sequence');" to get the ID used -- if you > only want it to use in another insert, you can do the same thing as > above on the next INSERT or UPDATE, but with currval() on that sequence > -- I do this quite frequently when preparing interrelated test data just > with basic SQL statements). > > But you still have the race between A and the combined B/C statement > without some more locking. > > Where it's possible to create a unique index on the business key, that's > probably the most robust solution (then just try the insert, and check > if it failed because of a duplicate business key, rather than selecting > for it first). It also avoids duplicates arrising in some other way > (eg, initial data import). > > Where that's not possible, it's more tricky to avoid the race, and hence > the duplicates. PostgreSQL supports "SELECT FOR UPDATE": > > http://www.postgresql.org/docs/8.1/static/sql-select.html#SQL-FOR-UPDATE-SHARE > > but that appears not to lock against inserts (presumably it relies on > having an actual row to lock, rather than just a potential key; the > locking section of the PostgreSQL documentation seems to confirm that > it's not locked against inserts since "SELECT FOR UPDATE" takes a less > exclusive lock than "INSERT"). > > Another possibility would be to explicitly acquire the "access > exclusive" table lock on the table in question: > > http://www.postgresql.org/docs/8.1/static/explicit-locking.html > > prior to doing the "is it there" select. But that's (a) non-portable, > and (b) potentially blocking the database for more time than is > desirable. (It also appears to need to be done in an explicit > transaction as the way of releasing it.) > > A third possibility seems to be: > > INSERT INTO table (id, business_key, ....) > SELECT nextval('sequence'), 'business_key', .... > WHERE NOT EXISTS > (SELECT 1 FROM table WHERE business_key='business_key' limit 1); > > as suggested here: > > http://beta.nntp.perl.org/group/perl.dbi.users/2006/08/msg29954.html > > It appears to be valid PostgreSQL, and to work in the simple cases (ie, > record doesn't exist, record exists, only one simultaneous statement), > but I'm not yet convinced it doesn't still have a tiny race inside > the database. Creating a UNIQUE INDEX on the business key seems safer > (and less involved!). > > This discussion (Drupal, PHP) is possibly of interest too: > > http://drupal.org/node/1811 > > They appear to have settled on taking a table lock and dealing with > database variations in doing that. (Presumably with PHP they're > already dealing with enough other database access variations that > it doesn't matter.) > > Ewen > > PS: You could do locking in the application, but then you're effectively > doing advisory locking rather than mandatory locking, and you will > race against anything which doesn't play along with your advisory > locking. > _______________________________________________ > Wellington-pm mailing list > Wellington-pm at pm.org > http://mail.pm.org/mailman/listinfo/wellington-pm > > -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFFkZzsZtcHxCitRpgRAsNxAKCpL1lqFzctKrHMzuzlPLj20hOREwCfcxcG QCn8bDIr+7vQ8AEbaMvT13w= =OTUD -----END PGP SIGNATURE----- From sam at vilain.net Wed Dec 27 03:19:01 2006 From: sam at vilain.net (Sam Vilain) Date: Thu, 28 Dec 2006 00:19:01 +1300 Subject: [Wellington-pm] Perl database stuff.... In-Reply-To: References: Message-ID: <459256A5.5060007@vilain.net> michael at diaspora.gen.nz wrote: > A: select 1 from table where business_key = ...; # off to database > if select returned nothing > begin > B: select sequence.nextval into id; # off to database > C: insert into table values (id, business_key); # off to database > end; > > Between A and B, unless you wrap a lock around the operation, there's a > race condition; you could find either C failing (if there's a uniqueness > constraint on "business_key"), or duplicate values in "table" (if > there isn't). Surely just wrapping the entire operation in a transaction is enough? Or is that one of those cases that needs the TRANSACTION ISOLATION LEVEL set to max? Sam. From jarich at perltraining.com.au Wed Dec 27 04:28:37 2006 From: jarich at perltraining.com.au (Jacinta Richardson) Date: Wed, 27 Dec 2006 23:28:37 +1100 Subject: [Wellington-pm] Perl database stuff.... In-Reply-To: <459256A5.5060007@vilain.net> References: <459256A5.5060007@vilain.net> Message-ID: <459266F5.3000105@perltraining.com.au> Sam Vilain wrote: > michael at diaspora.gen.nz wrote: > >> A: select 1 from table where business_key = ...; # off to database >> if select returned nothing >> begin >> B: select sequence.nextval into id; # off to database >> C: insert into table values (id, business_key); # off to database >> end; >> >> Between A and B, unless you wrap a lock around the operation, there's a >> race condition; you could find either C failing (if there's a uniqueness >> constraint on "business_key"), or duplicate values in "table" (if >> there isn't). > > Surely just wrapping the entire operation in a transaction is enough? > Or is that one of those cases that needs the TRANSACTION ISOLATION LEVEL > set to max? Transactions don't solve the problem, regardless of their isolation level (unless your levels mean something other than what I think they do). The race condition exists because the underlying operating system handles multiple processes all at the same time. As far as the operating system is concerned, the database, scripts accessing the database, and anything else on the machine are just processes. Each process deserves time slices to do their things in a timely fashion. These are handled due to various algorithms but essentially each process goes into a queue. The OS takes one out from the front of the queue, sets up it's memory, runs it's instructions for a while and then swaps its used memory out to disk (if necessary) and moves that to the back of the queue. It then takes the next one out of the queue and processes that. This means that sometimes (in rare but possible cases) two or more processes can collide and do the unexpected. For example consider the following: Process A selects stuff for business key X - it's not there Operating system swaps Process A to memory/disk Process B selects stuff for business key X - it's not there Process B selects the sequence next val Process B inserts stuff into the database Operating system swaps Process B to memory/disk Process A selects the sequence next val Process A inserts it's stuff into the database for key X Now you have either an insertion failure (if you have unique indices) or a duplicate entry. If Process A is using transactions and you do have unique indices (separate from the id) then the transaction fails because the insert fails. If you don't have a separate key then the transaction succeeds because failure isn't detected. How much of a concern this problem is really depends on the system. If you can guarantee (due to file locking for example) that only one version of your program is ever going to run; and you know that you won't have other programs accessing the same database due to your system knowledge, then the race condition isn't an issue. On the other hand, if this is a site where hundreds or thousands of different people will be adding content all day at the same time and you have a requirement that that information be unique; then you're very likely to hit the race condition occasionally and it will be really hard to track down. Personally in most of the systems I've ever done this kind of thing, I've taken the lazy approach. I select to see if it's there, and insert otherwise. However I make sure that I have unique indices based on the data entered (I try to never use an id field) and I take care to handle the case of the insert failing. If I am using an id field, I set it to "auto-increment" if the database allows. Most of the time these days though, I just use Class::DBI (probably soon to change to DBIx::Class). I don't know what its policy is in this regard, but I'm happy enough to trust it. Of course I've mostly been lucky in that in all the systems I've worked on, the number of co-current data editors has always been less than say 20,000. In fact in most of them it's probably less than 3. This kind of a thing is a nightmare when creating items for an online store (or CMS) -- where a number of administrators might try to add the exact same item -- but rarely a problem when handling a shopping cart, because most people *expect* duplicate items to appear if they buy the same item in the same session from two different pages. Thus in my situations the chances of hitting this kind of race condition is sufficiently low that I was told not to spend the time worrying about it. It's a trade off: you can write some code correctly, or quickly. ;) All the best, J From Andrew.Boag at catalyst.net.nz Wed Dec 27 12:57:03 2006 From: Andrew.Boag at catalyst.net.nz (Andrew Boag) Date: Thu, 28 Dec 2006 09:57:03 +1300 Subject: [Wellington-pm] Perl database stuff.... In-Reply-To: <459266F5.3000105@perltraining.com.au> References: <459256A5.5060007@vilain.net> <459266F5.3000105@perltraining.com.au> Message-ID: <4592DE1F.7010308@catalyst.net.nz> Jacinta Richardson wrote: > Sam Vilain wrote: > >> michael at diaspora.gen.nz wrote: >> >> >>> A: select 1 from table where business_key = ...; # off to database >>> if select returned nothing >>> begin >>> B: select sequence.nextval into id; # off to database >>> C: insert into table values (id, business_key); # off to database >>> end; >>> >>> Between A and B, unless you wrap a lock around the operation, there's a >>> race condition; you could find either C failing (if there's a uniqueness >>> constraint on "business_key"), or duplicate values in "table" (if >>> there isn't). >>> >> Surely just wrapping the entire operation in a transaction is enough? >> Or is that one of those cases that needs the TRANSACTION ISOLATION LEVEL >> set to max? >> > > Transactions don't solve the problem, regardless of their isolation level > (unless your levels mean something other than what I think they do). The race > condition exists because the underlying operating system handles multiple > processes all at the same time. As far as the operating system is concerned, > the database, scripts accessing the database, and anything else on the machine > are just processes. Each process deserves time slices to do their things in a > timely fashion. These are handled due to various algorithms but essentially > each process goes into a queue. The OS takes one out from the front of the > queue, sets up it's memory, runs it's instructions for a while and then swaps > its used memory out to disk (if necessary) and moves that to the back of the > queue. It then takes the next one out of the queue and processes that. This > means that sometimes (in rare but possible cases) two or more processes can > collide and do the unexpected. For example consider the following: > > Process A selects stuff for business key X - it's not there > Operating system swaps Process A to memory/disk > Process B selects stuff for business key X - it's not there > Process B selects the sequence next val > Process B inserts stuff into the database > Operating system swaps Process B to memory/disk > Process A selects the sequence next val > Process A inserts it's stuff into the database for key X > > Now you have either an insertion failure (if you have unique indices) or a > duplicate entry. If Process A is using transactions and you do have unique > indices (separate from the id) then the transaction fails because the insert > fails. If you don't have a separate key then the transaction succeeds because > failure isn't detected. > > How much of a concern this problem is really depends on the system. If you can > guarantee (due to file locking for example) that only one version of your > program is ever going to run; and you know that you won't have other programs > accessing the same database due to your system knowledge, then the race > condition isn't an issue. On the other hand, if this is a site where hundreds > or thousands of different people will be adding content all day at the same time > and you have a requirement that that information be unique; then you're very > likely to hit the race condition occasionally and it will be really hard to > track down. > > Personally in most of the systems I've ever done this kind of thing, I've taken > the lazy approach. I select to see if it's there, and insert otherwise. > However I make sure that I have unique indices based on the data entered (I try > to never use an id field) and I take care to handle the case of the insert > failing. If I am using an id field, I set it to "auto-increment" if the > database allows. Most of the time these days though, I just use Class::DBI > (probably soon to change to DBIx::Class). I don't know what its policy is in > this regard, but I'm happy enough to trust it. > > Of course I've mostly been lucky in that in all the systems I've worked on, the > number of co-current data editors has always been less than say 20,000. In fact > in most of them it's probably less than 3. This kind of a thing is a nightmare > when creating items for an online store (or CMS) -- where a number of > administrators might try to add the exact same item -- but rarely a problem when > handling a shopping cart, because most people *expect* duplicate items to appear > if they buy the same item in the same session from two different pages. Thus in > my situations the chances of hitting this kind of race condition is sufficiently > low that I was told not to spend the time worrying about it. > > It's a trade off: you can write some code correctly, or quickly. ;) > > All the best, > > J > _______________________________________________ > Wellington-pm mailing list > Wellington-pm at pm.org > http://mail.pm.org/mailman/listinfo/wellington-pm > > After re-reading the original question, I am leaning towards Peter Kelly's suggestion that this is probably well solved via the addition of unique indexes to the table in question and just handling failed inserts. However, not being savy to the details of the table in question, that may or may not be possible. Transactions will save the day here if we LOCK TABLE at the start of the transaction. Yes, this also has implications but it will eliminate the race condition. I'll add to Jacinta's timeflow with a LOCK TABLE. Process A BEGIN; LOCK TABLE; Process A selects stuff for business key X - it's not there Operating system swaps Process A to memory/disk Process B selects stuff for business key X - it's not there -- SELECT HANGS as table is locked. Operating system swaps Process B to memory/disk Process A selects the sequence next val Process A inserts it's stuff into the database for key X -- process A commits. -- process B gets a query result with confirming the presence of the record and bahaves accordingly. Yes, yes, I know LOCKing tables can get messy. Every time it's mentioned at a Perl Mongers meeting eyes roll as members remember late night sessions trying to find the reason queries are freezing or running slowly. That said, if we double check that the locking table transaction is completing (and write lots of thorough log messages if it doesn't), and it doesn't do anything too complicated (i.e. time-consuming) or get requested 1000/sec then this is a workable solution. From enkidu at cliffp.com Wed Dec 27 15:25:50 2006 From: enkidu at cliffp.com (Cliff Pratt) Date: Thu, 28 Dec 2006 12:25:50 +1300 Subject: [Wellington-pm] Perl database stuff.... In-Reply-To: <4592DE1F.7010308@catalyst.net.nz> References: <459256A5.5060007@vilain.net> <459266F5.3000105@perltraining.com.au> <4592DE1F.7010308@catalyst.net.nz> Message-ID: <459300FE.5090305@cliffp.com> Andrew Boag wrote: > Jacinta Richardson wrote: >> Sam Vilain wrote: >> >>> michael at diaspora.gen.nz wrote: >>> >>> >>>> A: select 1 from table where business_key = ...; # off to >>>> database if select returned nothing begin B: select >>>> sequence.nextval into id; # off to database C: insert into >>>> table values (id, business_key); # off to database end; >>>> >>>> Between A and B, unless you wrap a lock around the operation, >>>> there's a race condition; you could find either C failing (if >>>> there's a uniqueness constraint on "business_key"), or >>>> duplicate values in "table" (if there isn't). >>>> >>> Surely just wrapping the entire operation in a transaction is >>> enough? Or is that one of those cases that needs the TRANSACTION >>> ISOLATION LEVEL set to max? >>> >> Transactions don't solve the problem, regardless of their isolation >> level (unless your levels mean something other than what I think >> they do). The race condition exists because the underlying >> operating system handles multiple processes all at the same time. >> As far as the operating system is concerned, the database, scripts >> accessing the database, and anything else on the machine are just >> processes. Each process deserves time slices to do their things in >> a timely fashion. These are handled due to various algorithms but >> essentially each process goes into a queue. The OS takes one out >> from the front of the queue, sets up it's memory, runs it's >> instructions for a while and then swaps its used memory out to disk >> (if necessary) and moves that to the back of the queue. It then >> takes the next one out of the queue and processes that. This means >> that sometimes (in rare but possible cases) two or more processes >> can collide and do the unexpected. For example consider the >> following: >> >> Process A selects stuff for business key X - it's not there >> Operating system swaps Process A to memory/disk Process B selects >> stuff for business key X - it's not there Process B selects the >> sequence next val Process B inserts stuff into the database >> Operating system swaps Process B to memory/disk Process A selects >> the sequence next val Process A inserts it's stuff into the >> database for key X >> >> Now you have either an insertion failure (if you have unique >> indices) or a duplicate entry. If Process A is using transactions >> and you do have unique indices (separate from the id) then the >> transaction fails because the insert fails. If you don't have a >> separate key then the transaction succeeds because failure isn't >> detected. >> >> How much of a concern this problem is really depends on the system. >> If you can guarantee (due to file locking for example) that only >> one version of your program is ever going to run; and you know that >> you won't have other programs accessing the same database due to >> your system knowledge, then the race condition isn't an issue. On >> the other hand, if this is a site where hundreds or thousands of >> different people will be adding content all day at the same time >> and you have a requirement that that information be unique; then >> you're very likely to hit the race condition occasionally and it >> will be really hard to track down. >> >> Personally in most of the systems I've ever done this kind of >> thing, I've taken the lazy approach. I select to see if it's >> there, and insert otherwise. However I make sure that I have unique >> indices based on the data entered (I try to never use an id field) >> and I take care to handle the case of the insert failing. If I am >> using an id field, I set it to "auto-increment" if the database >> allows. Most of the time these days though, I just use Class::DBI >> (probably soon to change to DBIx::Class). I don't know what its >> policy is in this regard, but I'm happy enough to trust it. >> >> Of course I've mostly been lucky in that in all the systems I've >> worked on, the number of co-current data editors has always been >> less than say 20,000. In fact in most of them it's probably less >> than 3. This kind of a thing is a nightmare when creating items >> for an online store (or CMS) -- where a number of administrators >> might try to add the exact same item -- but rarely a problem when >> handling a shopping cart, because most people *expect* duplicate >> items to appear if they buy the same item in the same session from >> two different pages. Thus in my situations the chances of hitting >> this kind of race condition is sufficiently low that I was told not >> to spend the time worrying about it. >> >> It's a trade off: you can write some code correctly, or quickly. >> ;) >> >> All the best, >> >> J _______________________________________________ Wellington-pm >> mailing list Wellington-pm at pm.org >> http://mail.pm.org/mailman/listinfo/wellington-pm >> >> > After re-reading the original question, I am leaning towards Peter > Kelly's suggestion that this is probably well solved via the addition > of unique indexes to the table in question and just handling failed > inserts. However, not being savy to the details of the table in > question, that may or may not be possible. > > Transactions will save the day here if we LOCK TABLE at the start of > the transaction. Yes, this also has implications but it will > eliminate the race condition. I'll add to Jacinta's timeflow with a > LOCK TABLE. > > Process A BEGIN; LOCK TABLE; > > Process A selects stuff for business key X - it's not there Operating > system swaps Process A to memory/disk Process B selects stuff for > business key X - it's not there -- SELECT HANGS as table is locked. > Operating system swaps Process B to memory/disk Process A selects the > sequence next val Process A inserts it's stuff into the database for > key X > > -- process A commits. > > -- process B gets a query result with confirming the presence of the > record and bahaves accordingly. > > Yes, yes, I know LOCKing tables can get messy. Every time it's > mentioned at a Perl Mongers meeting eyes roll as members remember > late night sessions trying to find the reason queries are freezing or > running slowly. > > That said, if we double check that the locking table transaction is > completing (and write lots of thorough log messages if it doesn't), > and it doesn't do anything too complicated (i.e. time-consuming) or > get requested 1000/sec then this is a workable solution. > I'm still determining what the table contains, but basically it will be CREATE TABLE names_table ( name_id serial NOT NULL, name character varying(120) UNIQUE NOT NULL, .... other columns ....... PRIMARY KEY (name_id) ); I think I'll stick with my original plan: 1) SELECT * FROM names_table WHERE name = 'new_name' ; (test success, expect fail). 2) INSERT INTO name_table VALUES (default, 'new_name') ; (test success, expect success). 3) SELECT * FROM names_table WHERE name = 'new_name') ; (test success, expect success). Between 1) and 2) there *might* be a successful insert, but that won't matter because of the UNIQUE constraint on the 'name' column. It will prevent the *same* row being inserted twice. Between 2) and 3) there may be a successful insert, but again that won't matter because the select will pick up the correct name_id. In practise the table will get updated once a day, so race conditions are unlikely in the extreme! Cheers, Cliff From andrew at catalyst.net.nz Thu Dec 28 00:02:21 2006 From: andrew at catalyst.net.nz (Andrew McMillan) Date: Thu, 28 Dec 2006 08:02:21 +0000 Subject: [Wellington-pm] Perl database stuff.... In-Reply-To: References: Message-ID: <1167292941.5116.108.camel@ubu.mcmillan.net.nz> On Tue, 2006-12-26 at 19:24 +1300, michael at diaspora.gen.nz wrote: > >>>(c) In the perl you can make it "select, select, insert" as: > >>> 1. select the record (and fail). > >>> 2. select the sequence nextval > >>> 3. insert the row using the sequence. > > >> There is of course a race condition here; between the select record > >> and select the sequence nextval, another thread could perform the full > >> process. > > >Ah, no. We're talking about PostgreSQL here, so there isn't. > > Sure there is. I even qualified it by saying that the impact of the > race would depend on your schema. > > I understand that the "nextval" operation is atomic (and good on it; > dealing with DBs that *don't* have that operation is a pain in the ass), > but as I understood things, the model was something like (in pseudo code, > but meant to be implemented in Perl): > > A: select 1 from table where business_key = ...; # off to database > if select returned nothing > begin > B: select sequence.nextval into id; # off to database > C: insert into table values (id, business_key); # off to database > end; > > Between A and B, unless you wrap a lock around the operation, there's a > race condition; you could find either C failing (if there's a uniqueness > constraint on "business_key"), or duplicate values in "table" (if > there isn't). Sorry, yes. I misunderstood exactly which code you were referring to, thinking that you were suggesting nextval itself had a race condition. And yes, Cliff's design very likely does have race conditions, which may or may not occur in the real world. You could work around that by doing that final insert as: INSERT (... columns ...) SELECT ... values ... WHERE NOT EXISTS(SELECT 1 FROM table WHERE business key) but I got the impression from Cliff's original post that actually having an error if a row already existed for that business key was fine, so maybe the race just results in the INSERT failing, an error gets thrown and everyone is happy. Or you could use transaction isolation-level serializable, but using that can produce more problems than it solves, in my experience. Cheers, Andrew. > The fact that you'd need to do a select first implies that there is a > business key, rather than just a surrogate ID generated from a sequence, > so you need to worry about this. > > -- michael. > _______________________________________________ > Wellington-pm mailing list > Wellington-pm at pm.org > http://mail.pm.org/mailman/listinfo/wellington-pm ------------------------------------------------------------------------- Andrew @ Catalyst .Net .NZ Ltd, PO Box 11-053, Manners St, Wellington WEB: http://catalyst.net.nz/ PHYS: Level 2, 150-154 Willis St DDI: +64(4)803-2201 MOB: +64(272)DEBIAN OFFICE: +64(4)499-2267 The truth is rarely pure, and never simple. -- Oscar Wilde ------------------------------------------------------------------------- -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://mail.pm.org/pipermail/wellington-pm/attachments/20061228/4248a57d/attachment.bin