SPUG: axkit

Aaron W. West tallpeak at hotmail.com
Sat Feb 21 16:45:45 CST 2004

Has anyone used axkit?

And can anyone comment on using Embperl vs. using Mason vs. pure mod_perl?

But actually, I'll probably just use PerlIS.DLL for now, and switch to
another approach if performance becomes poor. As I understand, ActiveState
PerlIS.DLL may have to recompile for every hit, whereas AS's PerlEx and Perl
for .NET, and Apache's mod_perl and EmbPerl and Mason, all cache compiled
code in-memory (but I'm not sure about that.)


Mentions Perl, Embperl, Mason, mod_perl, and axkit

(Read no further unless you're bored)

I wrote a long rambling email earlier, but decided that was the only
question/comment I had of value in it. (Lies, damn lies, and benchmarks...
don't optimize if there's no need, right?)


Whether to optimize first or last is one of those nagging questions I always
have on my mind. Optimize too early-on, and you waste time and mental energy
on mundane details of implementation, whereas by the time you are through
you may take a totally different approach and throw out all your earlier
painstaking optimizations. In general, I think, if optimization means
greater code size, save it for later, but if a huge gain can be had cheaply,
do it now. That's the approach I've tended to follow, yet admittedly end up
wasting time on needless optimization sometimes. Below, comparing unpack,
substr and /regex/, I was just doing it for the education (who cares about 4
microseconds in something that is database-constrained, anyway?).

-----Original Message-----
From: Aaron W. West [mailto:tallpeak at speakeasy.net]
Sent: Saturday, February 21, 2004 10:08 AM
To: spug-list at pm.org
Subject: Comments on line-parsing and other random ramblings...

Perhaps I shouldn't go knocking Tim Maher's suggestion to turn on line
terminator handling for scripts, but...

Is it *really* a good idea to use the -l option in scripts? My issue with it
is that it's a global setting. What if you have various modules in your app,
and each one likes to handle things differently? What if you decide to put
your code into Mason or a templating system, and it breaks because this
option is off?

Is there a perl variable to control "-l"? It just seems unnatural to rely on
#perl -l to set the option.

I suppose the solution is the use of local for $/ or $\ in those
modules/functions that need it, and designing all modules to be careful not
to be sensitive to the -l setting. I'll admit, it gets mildly tedious trying
to remember to put "\n" at the end of every "print", but what if I want
somewhere to print without a "\n". I know I can use local in a block:

$ perl -le '{local ($\)=""; print "a"; print 1;} print 2; print 3'




Maybe I'm just an ol' C programmer used to doing things the hard way, and
maybe after forgetting chomp or \n a few more times I'll reform and decide
he's right...


Globals such as $/ always have seemed "wrong" to me. The language "should"
tie such attributes to the filehandles, or an object. eg:

open IN,"<myfile";

IN->record_separator="\n"; # or something like that

How could anyone write a safe multithreaded app in Perl, with global
variables in different states in different parts of the application? Of
course, Perl wasn't really designed for multithreading, initially. But it
supports threading, in recent versions. I imagine perl blocks other threads
in many situations where globals are used (putting a mutex lock around the
block or a portion of it), or creates thread-local versions of those
variables ($1, $2, etc), and saves/restores state when switching threads.

By the way, I noticed Activestate has a Perl for .NET. They charge a pretty
penny, though ($395?) I don't suppose they give Larry any of the proceeds.

I suspect Apache+mod_perl should perform as well (on Windows 2003) as
ActiveState's products. Does anyone know?

Why use Windows 2003, you might ask? Well, as much as I might like Linux, I
wasn't confident that OLAP (decision-support/data-mining/aggregations) could
be as well in Postgresql/MySQL as they could be in MS SQL Server, which
includes Analysis Server for creating ROLAP/MOLAP/HOLAP aggregations.

Of course, since an OLAP application has intense query requirements, and is
dependent mostly upon the database, even pure CGI may be adequate for Perl.

I think I'll be switching from IIS (back) to Apache, for mod_perl & AxKit
(Enhydra+XMLC is also interesting, but I'm not sure I can stand Java), or
for Mason:


Mentions Perl, Embperl, Mason, mod_perl, and axkit

Mason sure has a lot of content management systems available for it (three
listed on masonhq).


(Okay, everyone can stop here and ignore my ramblings about performance...)

Perl sure has a lot of ways to do things, eg: (parsing dates):


$purchase_stamp =~ /(\d{4})[^\d]*(\d\d)[^\d]*(\d\d)/ || $error=1;

$purchase_int = $1 . $2 . $3;





#$purchase_int = substr($purchase_stamp,0,4) .

# substr($purchase_stamp,5,2) . substr($purchase_stamp,8,2);

I decided on (1) the regex approach (slowest), since it would work on dates
which lack delimiters, should we happen across any.

Timing shows substr is faster than unpack which is faster than regex.


An example timed command:

$ time perl -le


real 0m1.107s

user 0m1.081s

sys 0m0.030s

And my table of results (user time):

time command

1.081us $a=substr($x,0,3);$b=substr($x,5,2);$c=substr($x,8,2);

2.052us ($a,$b,$c)=unpack("A4xA2xA2",$x);

2.914us $x =~ /[^\d]*(\d{4})[^\d]*(\d\d)[^\d]*(\d\d)/;$a=$1;$b=$2;$c=$3;

2.613us $x =~ /[^\d]*(\d{4}).(\d\d).(\d\d)/; $a=$1;$b=$2;$c=$3;

More interesting results:

1.081us $x =~ /[^\d]*(\d{4}).(\d\d).(\d\d)/;

1.612us $x =~ /[^\d]*(\d{4}).(\d\d).(\d\d)/;($a,$b,$c)=("2004","12","29");

2.613us $x =~ /[^\d]*(\d{4}).(\d\d).(\d\d)/;($a,$b,$c)=($1,$2,$3)

It seems that retrieving captured matches takes some time. Perhaps each
retrieval calls substr, but the difference is actually slower than 3
substr's and assignments (first command), so maybe perl is doing something
like optimizing out the whole (meaningless) regex. Assignment to a variable
from any of $1, $2 or $3 takes 0.66 usec, so I imagine. Assignment of
($a,$b,$c)=("2004","12","29") takes much less time than
($a,$b,$c)=($1,$2,$3), so the actual assignment is relatively time-consuming
for some reason.

Well, back to optimizing stored procedures and ColdFusion...

More information about the spug-list mailing list