[Pdx-pm] DBI middle ground

Tkil tkil at scrye.com
Mon Jan 29 11:15:25 PST 2007


>>>>> "MR" == Michael Rasmussen <mikeraz at patch.com> writes:

MR> I'm querying a large data set.  The current code works something like:
MR>    while( @lease_information = $lease->fetchrow_array) {
MR>          modification of data, plugging in defaults where currently null
MR>          ensuring constraints are met
MR>          # another_db_sth is an update command
MR>          $another_db_sth->execute(@lease_information);
MR>    }

MR> I don't have control over the source database, so the changes need
MR> to be made in my code.

MR> This takes longer to execute than I'd like.  I can't just do a
MR> fetchall or selectall because the dataset is larger than my
MR> available memory.  Nor have I found a middle ground.

MR> Is there one?  If so, that I may study up, what are references to
MR> it?

A few things to check:

1. Are you using autocommit?  That can be really painful in this
   situation.  My usual technique is to turn off autocommit, then
   explicitly commit() every so often (either by time, or by number of
   DML statements, or by number of rows affected by those DML, it
   varies.)

2. It'll almost always be faster to do this work in the DB, if you
   can.  You should be able to use MySQL's IFNULL() or Oracle's NVL()
   to fill in the blanks, and hopefully you can use standard WHERE
   conditions to check constraints.  (Although the lack of proper
   regex support in most DBs is painful.)

3. Profile your code locally, to see if it's CPU bound somehow.  As
   others pointed out, binding values with "fetch" is advertised as
   the fastest way to get data through the DBI; doing some local
   profiling might point fingers.  (There's also the DBI trace
   methods, which can use high-resolution timing (i think?) to give
   you further insight.)

4. Talk to your DBA, if you have one.  It might be that your update
   can be tuned, or it might be much faster to load up a temp table
   and then do the update in one swell foop.

Happy hacking,
t.


More information about the Pdx-pm-list mailing list