[sf-perl] yapop

Fri Mar 1 09:22:34 PST 2013

Yet Another Perl Optimization Puzzle.

Okay, I admit it - I'm weird.  I wrote a program called sqlpj[1] that 
uses Tim Bunce's JDBC module, which is his experimental wrapper to 
expose java.sql classes via Inline::Java.

I like JDBC because it is easy and it is ubiquitous.  Everything is 
in one jar for each database.  Every vendor has one.

I don't like DBI:: as much, because it means a couple days stuck in 
the CPAN mud to get it installed for various databases.  Frankly, 
that has always been a barrier to me, so I never bothered to spend 
much serious time with it.

(JDBC.pm has similar (cpan) problems because there is an impedance 
mismatch somewhere which prevents it from installing cleanly.  Long 
story short: work-around is to use perl 5.8.9 or lower.  However, you 
only have to install it once, for all databases.)

Writing a front-end to SQL in perl is a joy compared to writing it in 
java.  As I derived the perl program from an earlier java program, I 
have some knowledge on the issue.  For example, you can treat JDBC 
meta calls as data, and voila, you have a full jdbc meta-data 
explorer for the price of a cut-and-paste from the javadocs.

All was well until I did a query that returned a million rows.  Then 
things got dull, as I waited for the cursor to return.

As usual, my initial idea about what was wrong was, well wrong. 
What's the saying? 
<http://c2.com/cgi/wiki?ProfileBeforeOptimizing>Measure twice, 
optimize once?

So I did, and not surprisingly[2], all my time was getting spent in 
the loop that fetched from the JDBC Result set 
(<http://docs.oracle.com/javase/1.5.0/docs/api/java/sql/ResultSet.html>java.sql.ResultSet):

     while ($rset->next()) {
         push @allrows, [&getRow($rset, $self->getXmlDisplay(), @colmap)];
     }

Here "rset" is a reference to an instance of java.sql.ResultSet.  As 
you can see, all the work is done in getRow.  Here is the relevant 
loop there:

sub getRow
{
...
     my $m        = $rset->getMetaData();
     my $colcnt   = $m->getColumnCount();
     for (my $ii = 1; $ii <= $colcnt; $ii++) {
         next unless ($colmap[$ii-1]);   #skip if column is not selected

         #note - you have to do the fetch first,
         #which sets wasNull() for the current column.
         my $str = $rset->getString($ii);

         #if we are displaying xml rowsets...
         if ($xmldisplay) {
             #...then set SQL NULL elements to undef:
             push @data, ($rset->wasNull() ? undef : $str);
         } else {
             #otherwise, we will display the string "(NULL)":
             push @data, ($rset->wasNull() ? "(NULL)" : $str);
         }
...
}

Geez, lots of opportunities for optimization here, where should I start?

What would you do to improve the performance here?

I will post what I actually did in a couple of days to give you a 
chance to think about it.

Hint: anything derived from $rset is handled by Inline::Java, which 
means at least one write/read cycle to a socket connected java VM.

cheers,
-Russ

[1] I will be posting the source for sqlpj soon, but email me if you 
are interested.
[2] After the fact, it was not surprising, as little is.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.pm.org/pipermail/sanfrancisco-pm/attachments/20130301/e8f3fd77/attachment.html>