[Phoenix-pm] PERL DBI

Bill Nash billn at odyssey.billn.net
Mon Mar 13 13:47:23 PST 2006


You're very likely pressing up against DB execution overhead.

For starters, you're doing one insert per row. The database side mechanics 
of this are:

1. Accept insert.
2. Update indexes.
3. Log query where applicable.
4. Platform specific spooky buffer action and related flushing.

Platform independent, if you're going to chuck a couple million rows like 
this, disable your indexing first.

I don't know how you'd optimize this with Oracle, but mysql supports bulk 
insert syntax, ala:

insert into foo (a, b, c, d, beer) values (1, 2, 3, 4, 'bar'), (3, 4, 5, 
6, 'heineken'), ...;

Even with indexing enabled, this is far more efficient, as indexes are 
updated en masse after the insert is accepted.

In psuedocode, something like:

prepare query
execute query
while iterating results,
 	push result set onto array.
 	increment counter.
 	if counter hits 5000,
 		bulk insert entries stored in array.
 		erase array.
 		reset counter.
 	end if
end while

I've gotten DBI on good hardware to iterate upwards of 20k rows a second. 
Realistic insert rates are a different animal, because of the i/o (your 
mileage may vary depending on media speeds), but 6-8k a second is not 
unrealistic. Other factors include the engine type, the data i/o bus of 
the machine, etc.

- billn

On Mon, 13 Mar 2006, Scott Walters wrote:

> On  0, "David A. Sinck" <phx-pm-list at grueslayer.com> wrote:
>>
>> HAHAHAHAHA.  Guess again.  There's an instance, at least with MySQL
>
> Okay, I don't use MySQL heavy (for reasons of it better garbage), so
> I over generalized.  I conceede your point that rare bugs will make
> this generalization not always true.
>
>> \_ and you don't suffer
>> \_ from SQL command length limitations,
>>
>> I bet I could could craft something that hits a buffer still using
>> placeholders if I wanted.  :-)
>
> I didn't say he didn't suffer from *buffer* length considerations -- that
> would be foolish indeed.  I only said that he doesn't suffer from this
> particular one.  I've seen databases cap the command buffer at 64k.
> Generally speaking (true in most cases, perhaps false in some) the
> length-counted buffers for placeholder data will be significantly larger
> than the command buffer.
>
>>
>> \_ and the database doesn't have to parse potentially
>> \_ megs of data just to parse the SQL command,
>>
>> Depends on the driver.  MySQL dbd, last I checked, actually sub'd the
>> values in before it got passed to the engine.  YMMV.
>
> That must have been a long time ago indeed.  But you caught me --
> I made another false generalization.  I should have said "Generally, "
> in front of it.
>
> By the way, people who argue small points while ignoring the point of
> what was said annoy me.  Did you really think someone on the list
> would be damaged thinking that placeholders always send data out of band,
> even when using Informix?
>
> I wrote a hasty reply out of desire to be helpful to someone who was
> having trouble with something, not to debate minutia, and have been
> punished for it.
>
> -scott
>
>>
>> David
>> _______________________________________________
>> Phoenix-pm mailing list
>> Phoenix-pm at pm.org
>> http://mail.pm.org/mailman/listinfo/phoenix-pm
> _______________________________________________
> Phoenix-pm mailing list
> Phoenix-pm at pm.org
> http://mail.pm.org/mailman/listinfo/phoenix-pm
>


More information about the Phoenix-pm mailing list