[sf-perl] yapop
Russ Tremain
russt at releasetools.org
Fri Mar 1 09:22:34 PST 2013
Yet Another Perl Optimization Puzzle.
Okay, I admit it - I'm weird. I wrote a program called sqlpj[1] that
uses Tim Bunce's JDBC module, which is his experimental wrapper to
expose java.sql classes via Inline::Java.
I like JDBC because it is easy and it is ubiquitous. Everything is
in one jar for each database. Every vendor has one.
I don't like DBI:: as much, because it means a couple days stuck in
the CPAN mud to get it installed for various databases. Frankly,
that has always been a barrier to me, so I never bothered to spend
much serious time with it.
(JDBC.pm has similar (cpan) problems because there is an impedance
mismatch somewhere which prevents it from installing cleanly. Long
story short: work-around is to use perl 5.8.9 or lower. However, you
only have to install it once, for all databases.)
Writing a front-end to SQL in perl is a joy compared to writing it in
java. As I derived the perl program from an earlier java program, I
have some knowledge on the issue. For example, you can treat JDBC
meta calls as data, and voila, you have a full jdbc meta-data
explorer for the price of a cut-and-paste from the javadocs.
All was well until I did a query that returned a million rows. Then
things got dull, as I waited for the cursor to return.
As usual, my initial idea about what was wrong was, well wrong.
What's the saying?
<http://c2.com/cgi/wiki?ProfileBeforeOptimizing>Measure twice,
optimize once?
So I did, and not surprisingly[2], all my time was getting spent in
the loop that fetched from the JDBC Result set
(<http://docs.oracle.com/javase/1.5.0/docs/api/java/sql/ResultSet.html>java.sql.ResultSet):
while ($rset->next()) {
push @allrows, [&getRow($rset, $self->getXmlDisplay(), @colmap)];
}
Here "rset" is a reference to an instance of java.sql.ResultSet. As
you can see, all the work is done in getRow. Here is the relevant
loop there:
sub getRow
{
...
my $m = $rset->getMetaData();
my $colcnt = $m->getColumnCount();
for (my $ii = 1; $ii <= $colcnt; $ii++) {
next unless ($colmap[$ii-1]); #skip if column is not selected
#note - you have to do the fetch first,
#which sets wasNull() for the current column.
my $str = $rset->getString($ii);
#if we are displaying xml rowsets...
if ($xmldisplay) {
#...then set SQL NULL elements to undef:
push @data, ($rset->wasNull() ? undef : $str);
} else {
#otherwise, we will display the string "(NULL)":
push @data, ($rset->wasNull() ? "(NULL)" : $str);
}
...
}
Geez, lots of opportunities for optimization here, where should I start?
What would you do to improve the performance here?
I will post what I actually did in a couple of days to give you a
chance to think about it.
Hint: anything derived from $rset is handled by Inline::Java, which
means at least one write/read cycle to a socket connected java VM.
cheers,
-Russ
[1] I will be posting the source for sqlpj soon, but email me if you
are interested.
[2] After the fact, it was not surprising, as little is.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.pm.org/pipermail/sanfrancisco-pm/attachments/20130301/e8f3fd77/attachment.html>
More information about the SanFrancisco-pm
mailing list