[Neworleans-pm] Perl optimization article

Mon Oct 25 23:12:58 CDT 2004

http://www-106.ibm.com/developerworks/library/l-optperl.html

Optimize Perl
Squeeze the most from your code

Level: Intermediate

Martin C. Brown (questions at mcslp.com)
Freelance writer and consultant
 19 Oct 2004
Perl is an incredibly flexible language, but its ease of use can lead to
some sloppy and lazy programming habits. We're all guilty of them, but there
are some quick steps you can take to improve the performance of your Perl
applications. In this article, I'll look at the key areas of optimization,
which solutions work and which don't, and how to continue to build and
extend your applications with optimization and speed in mind.

Sloppy programming, sloppy performance
I'll be honest: I love Perl and I use it everywhere. I've written Web sites,
administration scripts, and games using Perl. I frequently save time by
getting Perl to do and check things automatically for me, everything from my
lottery numbers to the stock markets, and I even use it to automatically
file my e-mail. Because Perl makes it so easy to do all of these things,
there's a tendency to forget about optimization. In many cases this isn't
the end of the world. So what if it takes an extra few milliseconds to look
up your stock reports or parse those log files?

However, those same lazy habits that cost milliseconds in a small
application are multiplied when dealing with larger scale development
projects. It's the one area where the Perl mantra of TMTOWTDI (There's More
Than One Way To Do It) starts to look like a bad plan. If you need speed,
there may be only one or two ways to achieve the fastest results, whereas
there are many slower alternatives. Ultimately, sloppy programming -- even
if you achieve the desired result -- is going to result in sloppy
performance. So, in this article I'm going to look at some of the key
techniques you can use to squeeze those extra cycles out of your Perl
application.

Approaching optimization
First of all, it's worth remembering that Perl is a compiled language. The
source code you write is compiled on the fly into the bytecode that is
executed. The bytecode is itself based on a range of instructions, all of
which are written in a highly optimized form of C. However, even within
these instructions, some operations that can achieve similar results are
more highly optimized than others. Overall, this means that it's the
combination of the logic sequence you use and the bytecode that is generated
from this that ultimately affects performance. The differences between
certain similar operations can be drastic. Consider the code in Listings 1
and 2. Both create a concatenated string, one through ordinary concatenation
and the other through generating an array and concatenating it with join.
Listing 1. Concatenating a string, version 1

my $string = 'abcdefghijklmnopqrstuvwxyz';
my $concat = '';

foreach my $count (1..999999)
{
    $concat .= $string;
}

Listing 2. Concatenating a string, version 2

my $string = 'abcdefghijklmnopqrstuvwxyz';
my @concat;

foreach my $count (1..999999)
{
    push @concat,$string;
}
my $concat = join('', at concat);

Running Listing 1, I get a time of 1.765 seconds, whereas Listing 2 requires
5.244 seconds. Both generate a string, so what's taking up the time?
Conventional wisdom (including that of the Perl team) would say that
concatenating a string is a time-expensive process, because we have to
extend the memory allocation for the variable and then copy the string and
its addition into the new variable. Conversely, adding a string to an array
should be relatively easy. We also have the added problem of duplicating the
string concatenation using join(), which adds an extra second.

The problem, in this instance, is that push()-ing strings onto an array is
time-intensive; first of all, we have a function call (which means pushing
items onto a stack, and then taking them off), and we also have the
additional array management overhead. In contrast, concatenating a string is
pretty much just a case of running a single opcode to append a string
variable to an existing string variable. Even if we set the array size to
alleviate the overhead (using $#concat = 999999), we still only save another
second.

The above is an extreme example, and there are times when using an array
will be much quicker than using strings; a good example here is if you need
to reuse a particular sequence but with an alternate order or different
interstitial character. Arrays are also useful, of course, if you want to
rearrange or reorder the contents. By the way, in this example, an even
quicker way of producing a string that repeats the alphabet 999,999 times
would be to use:

 $concat = 999999 x 'abcdefghijklmnopqrstuvwxyz';

 Individually, many of the techniques covered here won't make a huge
difference, but combined in one application, you could shave a few hundred
milliseconds, or even seconds, off of your Perl applications.

Use references
If you work with large arrays or hashes and use them as arguments to
functions, use a reference instead of the variable directly. By using a
reference, you tell the function to point to the information. Without a
reference, you copy the entire array or hash onto the function call stack,
and then copy it again in the function. References also save memory (which
reduces footprint and management overheads) and simplify your programming.

String handling
If you are using static strings in your application a lot -- for example, in
a Web application -- remember to use single quotes rather than doubles.
Double quotes force Perl to look for a potential interpolation of
information, which adds to the overhead of printing out the string:

 print 'A string','another string',"\n";

 I've also used commas to separate arguments rather than using a period to
concatenate the string first. This simplifies the process; print simply
sends each argument to the output file. Concatenation would concatenate the
string and print it as one argument.

 Loops
As you've already seen, function calls with arguments are expensive, because
for the function call to work, Perl has to put the arguments onto the call
stack, call the function, and then receive the responses back through the
stack again. All of this requires overhead and processing that we could
probably do without. For this reason, excessive function calls in a loop are
generally a bad idea. Again, it comes down to a comparison of numbers.
Looping through 1,000 items and passing information to a function will
trigger the function call 1,000 times. To get around this, I just switch the
sequence around. Instead of using the format in Listing 3, I use the
approach in Listing 4.
Listing 3. Loop calling functions

foreach my $item (keys %{$values})
{
    $values->{$item}->{result} = calculate($values->{$item});
}

sub calculate
{
    my ($item) = @_;
    return ($item->{adda}+$item->{addb});
}

Listing 4. Function using loops

calculate_list($values);

sub calculate_list
{
    my ($list) = @_;
    foreach my $item (keys %{$values})
    {
        $values->{$item}->{result} = ($item->{adda}+$item->{addb});
    }
}

Better still, in a simple calculation like this one or for any
straightforward loop work, use map:

 map { $values->{$_}->{result} = $values->{$_}->{adda}+$values->{$_}->{addb}
} keys %{$values};

 Remember also that each iteration through the loop wastes time, so rather
than working through the same loop a number of times, try to perform all the
actions in one pass through the loop.

Sorts
Another common operation related to loops is sorting information,
particularly keys in a hash. It's tempting in this instance to embed some
processing of list elements into the sort operation, such as the one shown
here in Listing 5.
Listing 5. Bad sorting

my @marksorted = sort {sprintf('%s%s%s',
      $marked_items->{$b}->{'upddate'},
      $marked_items->{$b}->{'updtime'},
      $marked_items->{$a}->{itemid}) <=>
      sprintf('%s%s%s',
            $marked_items->{$a}->{'upddate'},
            $marked_items->{$a}->{'updtime'},
            $marked_items->{$a}->{itemid}) } keys %{$marked_items};

This is a fairly typical sort of complex data, in this case ordering
something by date, time, and ID number by concatenating the numbers into a
single number that we can then sort numerically. The problem is that the
sort works through the list of items and moves them up or down through the
list based on the comparison operation. In effect, this is a type of loop,
but unlike the loop examples we've already seen, a sprintf call has to be
made for each comparison. That's at least twice for each iteration, and the
exact number of iterations through the list will depend how ordered it was
to begin with. For example, with a 10,000-item list you could expect to call
sprintf over 240,000 times.

The solution is to create a list that contains the sort information, and
generate the sort field information just once. Taking the sample in Listing
5 as a guide, I'd rewrite that fragment into something like the code in
Listing 6.
Listing 6. Better sorting

map { $marked_items->{$_}->{sort} = sprintf('%s%s%s',
      $marked_items->{$_}->{'upddate'},
      $marked_items->{$_}->{'updtime'},
      $marked_items->{$_}->{itemid}) } keys %{$marked_items};
my @marksorted = sort { $marked_items->{$b}->{sort} <=>
      $marked_items->{$a}->{sort} } keys %{$marked_items};

Instead of calling sprintf all those times, we call it just once for each
item in the hash in order to generate a sort field in the hash, and then use
that sort field directly during the sort. The sorting process only has to
access the sort field's value. You have cut down the calls on that
10,000-item hash from 240,000 to just 10,000. It depends on what you are
doing in that sort section originally, but it's possible to save as much as
half the time it would take using the method shown in Listing 6.

If you produce these hashes through results from a database query -- through
MySQL or similar -- using sorting within the query and then recording the
order as you build the hash, you won't need to iterate over the information
again.

Using short circuit logic
Related to the sort operation is how to work through a list of alternative
values. Using many if statements can be incredibly time consuming. For
example, look at the code in Listing 7.
Listing 7. Making a choice

if ($userchoice > 0)
{
    $realchoice = $userchoice;
}
elsif ($systemchoice > 0)
{
    $realchoice = $systemchoice;
}
else
{
    $realchoice = $defaultchoice;
}

Aside from the waste of space in terms of sheer content, there are a couple
of problems with this structure. From a programming perspective, it has the
issue that it never checks if any of the variables have a valid value, a
fact that would be highlighted if warnings were switched on. Second, it has
to check each option until it gets to the one it wants, which is wasteful,
as comparison operations (particularly on strings) are time consuming. Both
problems can be solved by using short circuit logic.

If you use the logical || operator, Perl will use the first true value it
comes across, in order, from left to right. The moment it finds a valid
value, it doesn't bother processing any of the other values. In addition,
because Perl is looking for a true value, it also ignores undefined values
without complaining about them. So we can rewrite the above into a single
line:

 $realchoice = $userchoice || $systemchoice || $defaultchoice;

 If $userchoice is a true value, Perl doesn't even look at the other
variables. If $userchoice is false (see Table 1), then Perl checks the value
of $systemchoice and so on until it gets to the last value, which is always
used, whether it's true or not.

Table 1. $userchoice values

Value
Logical value

Negative number
True

Zero
False

Positive number
True

Empty string
False

Non-empty string
True

Undefined value
False

Empty list (including hashes)
False

List with at least one element (including hashes)
True

Use AutoLoader
One of the most expensive portions of the execution of a Perl script is the
compilation of source code into the bytecode that is actually executed. On a
small script with no external modules, the process takes milliseconds. But
start to include a few of your own external modules and the time increases.
The reason is that Perl does little more with a module than importing the
text and running it through the same compilation stage. That can turn your
200 line script into a 10,000 or 20,000 line script very quickly. The result
is that you increase the initial stages of the compilation process before
the script even starts to do any work.

During the normal execution of your script, it may be that you only use 10
percent, or even 5 percent, of all the functions defined in those modules.
So why load them all when you start the script? The solution is to use
AutoLoader, which acts a bit like a dynamic loader for Perl modules. This
uses files generated by the AutoSplit system, which divides up a module into
the individual functions. When you load the module through use, all you do
is load the stub code for the module. It's only when you call a function
contained within the module that the AutoLoader steps in and then loads and
compiles the code only for that function. The result is that you convert
that 20,000 line script with modules back into a 200-line script, speeding
up the initial loading and compilation stages.

I've saved as much as two seconds just by converting one of my applications
to use the AutoLoader system in place of preloading. It's easy to use by
just changing your modules from the format shown in Listing 8 to that shown
in Listing 9, and then making sure to use AutoSplit to create the loading
functions you need. Note that you don't need to use Exporter any more;
AutoLoader handles the loading of individual functions automatically without
you have to explicitly list them.
Listing 8. A standard module

package MyModule;

use OtherModule;
require 'Exporter';
@EXPORT = qw/MySub/;

sub MySub
{
   ...
}

1;

Listing 9. An autoloading module

package MyModule;

use OtherModule;
use AutoLoader 'AUTOLOAD';

1;

__END__

sub MySub
{
   ...
}

The main difference here is that functions you want to autoload are no
longer defined within the module's package space but in the data section at
the end of the module (after the __END__ token). AutoSplit will place any
functions defined here into the special AutoLoader files. To split up the
module, use the following command line:

 perl -e 'use AutoSplit; autosplit($ARGV[0], $ARGV[1], 0, 1, 1)' MyModule.pm
auto

 Using bytecode and the compiler back ends
There are three ways to use the compiler: bytecode production, full
compilation, or simply as a debugging/optimizing tool. The first two methods
rely on converting your original Perl source into its compiled bytecode form
and storing this precompiled version for execution. This is best used
through the perlcc command. These two modes follow the same basic model but
produce the final result differently. In bytecode mode, the resulting
compiled bytecode is written out to another Perl script. The script consists
of the ByteLoader preamble, with the compiled code stored as a byte string.
To create this bytecode version, use the -B option to the perlcc command.
For example:

$ perlcc -B script.pl

 This will create a file, a.out. The output, however, is not very Web
friendly. The resulting file can be executed with any Perl executable on any
platform (Perl bytecode is platform independent):

 $ perl a.out

 What this does is save Perl from having to compile the script from its
source code into the bytecode each time. Instead, it just runs the bytecode
that was generated. This is similar to the process behind Java compilation
and is in fact that same one-step away from being a truly compiled form of
the language. On short scripts, especially those that use a number of
external modules, you probably won't notice a huge speed increase. On larger
scripts that "stand alone" without a lot of external module use, you should
see a noticeable improvement.

The full compilation mode is almost identical, except that instead of
producing a Perl script with the compiled bytecode embedded in it, perlcc
produces a version embedded into C source that is then compiled into a
full-blown, standalone executable. This is not cross-platform compatible,
but it does allow you to distribute an executable version of a Perl script
without giving out the source. Note, however, that this doesn't convert the
Perl into C, it just embeds Perl bytecode into a C-based application. This
is actually the default mode of perlcc, so a simple: $ perlcc script.pl will
create, and compile, a standalone application called a.out.

One of the lesser-known solutions for both debugging and optimizing your
code is to use the Perl compiler with one of the many "back ends."

The back ends are actually what drive the perlcc command, and it's possible
to use a back-end module directly to create a C source file that you can
examine. The Perl compiler works by taking the generated bytecode and then
outputting the results in a variety of different ways. Because you're
looking at the opcodes generated during the compilation stage, you get to
see the code after Perl's own internal optimizations have been applied.
Providing you know the Perl opcodes, you can begin to identify where the
potential bottlenecks might be. From a debugging perspective, go with back
ends such as Terse (which is itself a wrapper on Concise) and Showlex. You
can see in Listing 10 what the original Listing 1 looks like through the
Terse back end.
Listing 10. Using Terse to study bytecode

LISTOP (0x306230) leave [1]
    OP (0x305f60) enter
    COP (0x3062d0) nextstate
    BINOP (0x306210) sassign
        SVOP (0x301ab0) const [7] PV (0x1809f9c)
"abcdefghijklmnopqrstuvwxyz"
        OP (0x305c30) padsv [1]
    COP (0x305c70) nextstate
    BINOP (0x305c50) sassign
        SVOP (0x306330) const [8] PV (0x180be60) ""
        OP (0x306310) padsv [2]
    COP (0x305f20) nextstate
    BINOP (0x305f00) leaveloop
        LOOP (0x305d10) enteriter [3]
            OP (0x305cf0) null [3]
            UNOP (0x305cd0) null [141]
                OP (0x305e80) pushmark
                SVOP (0x3065d0) const [9] IV (0x180be30) 1
                SVOP (0x3065f0) const [10] IV (0x1801240) 999999
        UNOP (0x305ee0) null
            LOGOP (0x305ec0) and
                OP (0x305d50) iter
                LISTOP (0x305e60) lineseq
                    COP (0x305e10) nextstate
                    BINOP (0x305df0) concat [6]
                        OP (0x305d70) padsv [2]
                        OP (0x305dd0) padsv [1]
                    OP (0x305ea0) unstack
concat1.pl syntax OK

Other tools
What I've covered here looks entirely at the code that makes up your
applications. While that's where most of the problems will be, there are
tools and systems you can use that can help identify and locate problems in
your code that might ultimately help with performance.

Warnings/strict execution
It's a common recommendation, but it really can make a difference. Use the
warnings and strict pragmas to ensure nothing funny is going on with
variable use, typos, and other inconsistencies. Using them in all your
scripts will help you eliminate all sorts of problems, many of which can be
the source of performance bottlenecks. Common faults picked up by these
pragmas are ambiguous references and de-references, use of undefined values,
and some help identifying typos for unused or undefined functions.

All of this help, though, comes at a slight performance cost. I keep
warnings and strict on while programming and debugging, and I switch it off
once the script is ready to be used in the real world. It won't save much,
but every millisecond counts.

Profiling
Profiling is a useful tool for optimizing code, but all it does is identify
the potential location of the problem; it doesn't actually point out what
the potential issue is or how to resolve it. Also, because profiling relies
on monitoring the number of executions of different parts of your
application it can, on occasion, give misleading advice about where a
problem lies and the best approach for resolving it.

However, profiling is still a useful, and often vital, part of the
optimization process. Just don't rely on it to tell you everything you need
to know.

Debugging
To me, a badly optimized program means that it has a bug. The reverse is
also true: bugs often lead to performance problems. Classic examples are
badly de-referenced variables or reading and/or filtering the wrong
information. It doesn't matter whether your debugging technique involves
using print statements or the full-blown debugger provided by Perl. The
sooner you eliminate the bugs, the sooner you will be able to start
optimizing your application.

Putting it all together
Now that you know the techniques, here is the way to go about using them
together to produce optimized applications. I generally follow this sequence
when optimizing:
    1.       Write the program as optimized as possible using the techniques
above. Once you start to use them regularly, they become the only way you
program.
    2.      Once the program is finished or at least in a releasable state,
go through and double check that you are using the most efficient solution
by hand by reading the code. You'll be able to spot a number of issues just
by re-reading, and you might pick up a few potential bugs, too.
    3.      Debug your program. Bugs can cause performance problems, so you
should always eliminate the bugs first before doing a more intense
optimization.
    4.      Run the profiler. I always do this once on any serious
application, just to see if there's something -- often obvious -- that I
might have missed.
    5.      Go back to step 1 and repeat. I've lost track of the number of
times I've completely missed a potential optimization the first time around.
Either I'll go back and repeat the process two or three times in one
session, or I'll leave, do another project, and return a few days, weeks, or
months later. Weeks and months after, you'll often have found an alternative
way of doing something that saves time.

At the end of the day, there is no magic wand that will optimize your
software for you. Even with the debugger and profiler, all you get is
information about what might be causing a performance problem, not
necessarily any helpful advice on what you should do to fix it. Be aware as
well that there is a limit to what you can optimize. Some operations will
simply take a lot of time to complete. If you have to work through a
10,000-item hash, there's no way of simplifying that process. But as you've
seen, there might be ways of reducing the overhead in each case.

 Resources
    €      Read about techniques for debugging Perl in Cultured Perl:
Debugging Perl with ease (developerWorks, November 2000).

    €     You'll find more information on Perl programming in Ted Zlatanov's
Cultured Perl column on developerWorks.

    €     Visit CPAN for all the Perl modules you could ever want.

    €     Check out the O'Reilly Network's Perl.com for Perl information and
related resources.

    €     Find more resources for Linux developers in the developerWorks
Linux zone.

    €     Download no-charge trial versions of IBM middleware products that
run on Linux, including WebSphere® Studio Application Developer, WebSphere
Application Server, DB2® Universal Database, Tivoli® Access Manager, and
Tivoli Directory Server, and explore how-to articles and tech support, in
the Speed-start your Linux app section of developerWorks.

    €     Get involved in the developerWorks community by participating in
developerWorks blogs.

    €     Purchase Linux books at discounted prices in the Linux section of
the Developer Bookstore.

About the author
Martin C. Brown is a former IT Director with experience in cross-platform
integration. A keen developer, he has produced dynamic sites for blue-chip
customers including HP and Oracle and is the Technical Director of
Foodware.net. Now a freelance writer and consultant, MC, as he is better
known, works closely with Microsoft as an SME, is the LAMP Technologies
Editor for LinuxWorld magazine, a core member of the AnswerSquad.com team,
and has written a number of books on topics as diverse as Microsoft
Certification, iMacs, and open source programming. Despite his best
attempts, he remains a regular and voracious programmer on many platforms
and numerous environments. MC can be contacted at questions at mcslp.com, or
through his Web site.