Conserving memory - was SPUG: Fw: greetings

Sun Oct 21 13:27:08 CDT 2001

It's a rainy Sunday morning so I did some benchmarking. The while method
is definitely the memory efficient way to go, but what about speed...

The attached files are the Benchmark output from my box
(print_comparison.txt) and the actual benchmark program (comparison.txt,
I've seen .pl attachments get nerfed by outlook / exchange so I renamed
it).

All the tests read a 3M text file (common last names and their
distributions from the census) and print the file line by line to
/dev/null to avoid including disk write times.

Here's what I learned:

1) You don't need to check for defined inside a while loop condition if
it is of the form <FILE> or $var = <FILE>. Perl does this for you.

You can use the B::Deparse module to take the opcodes back to perl and
look if you don't believe me. Try running perl -MO=Deparse
comparision.txt and compare four_a and four_b before and after. Or if
you are source code oriented look at Perl_newWHILEOP in perl's op.c.
Some people will say this is more readable or clear, they're lying. To
whatever extent perl keeps you from having to worry about "0" and "",
take advantage of it.

2) Assigning to the default variable or assigining to a plain package
variable doesn't measurably make a difference for the examples I tried
(although using the default variable will pass use strict).

You can do 'while (<FILE>)', 'while ($line = <FILE>)', 'my $line; while
($line = <FILE>)', or 'while (my $line = <FILE>)'. I generally stick
with the first or fourth method and occasionally the third. I generally
go for readability / maintainability, for efficiency you should avoid
the fourth method.

3) "Reveresed" notation is measurably faster. So 'while (<FILE>) {
new-scope }' is slower than 'no-new-scope while <FILE>'.

The reversed notation does not create a new lexical scope for each time
through the loop. This shouldn't even enter in to your mind unless you
are writing a very tight loop in an otherwise blazzingly fast program.
Beware of writing all your loops backwards for efficiency, that way lies
madness (or something like that).

In conclusion, this was the most efficient (memory & speed) way I found
in perl to output a file is:
print while <FILE>;

Of course I usually go for maintainable over efficient but if all you
are doing is outputting a file I think this is the best of both worlds.

Jeremy (did someone use --verbose or what ;-) )

PS: like all builtin perl functions perldoc -f print will tell you what
context print uses

-----Original Message-----
From: owner-spug-list at pm.org [mailto:owner-spug-list at pm.org] On Behalf
Of Tim Maher/CONSULTIX
Sent: Saturday, October 20, 2001 2:31 PM
To: Ken Clarke
Cc: spug-list at pm.org
Subject: Re: Conserving memory - was SPUG: Fw: greetings

On Sat, Oct 20, 2001 at 02:18:05PM -0700, Ken Clarke wrote:
> Hi Folks,
> 
>     I love picking up efficiency tips like the one at the bottom.  
> Which context does print use?

print, like all functions (and subroutines), provides the LIST context
to its arguments.

> 
> # Start returning results to client browser
> $| = 1;
> open(FH, "<templates/RealTop.html");
> print "Content-Type: text/html\n\n";
> print <FH>;

That approach stores the entire file in memory (to no avail;
as you show below, an equivalent result can be obtained reading just one
line at a time).

> close(FH);
> 
> or should I be using:
> 
> open(FH, "<templates/RealTop.htm");
> print "Content-Type: text/html\n\n";
> while ($line = <FH>) {
>     print "$line\n";
> }
> >> Ken Clarke
> >> Contract Web Programmer / E-commerce Technologist 
> >> www.perlprogrammer.net

For efficiency, this is definitely the way to  go, but
the while loop is better written as:

	 while ( defined ($line = <FH>) ) {

-Tim
*=======================================================================
==*
| Dr. Tim Maher, CEO, Consultix        (206) 781-UNIX/8649;  ask for
FAX# |
| EMAIL: tim at consultix-inc.com         WEB: http://www.consultix-inc.com
|
| TIM MAHER: UNIX/Perl  DAMIAN CONWAY: OO Perl  COLIN MEYER: Perl 
|CGI/DBI | CLASSES:Int Perl 10/22; UNIX 11/26; Minimal Perl 11/30;
Perl+Modules 12/3|
| /etc/cotd:  find /earth -follow -name bin-laden -print | xargs rm -rf
|
*=======================================================================
==*

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
-
     POST TO: spug-list at pm.org       PROBLEMS: owner-spug-list at pm.org
      Subscriptions; Email to majordomo at pm.org:  ACTION  LIST  EMAIL
  Replace ACTION by subscribe or unsubscribe, EMAIL by your
Email-address  For daily traffic, use spug-list for LIST ;  for weekly,
spug-list-digest
     Seattle Perl Users Group (SPUG) Home Page: http://zipcon.net/spug/

-------------- next part --------------
Benchmark: running five, four_a, four_b, one, six, three_a, three_b, two_a, two_b, each for at least 30 CPU seconds...
      five: 32 wallclock secs (29.59 usr +  1.15 sys = 30.74 CPU) @  0.94/s (n=29)
    four_a: 30 wallclock secs (29.28 usr +  1.14 sys = 30.42 CPU) @  1.45/s (n=44)
    four_b: 28 wallclock secs (28.83 usr +  1.30 sys = 30.13 CPU) @  1.46/s (n=44)
       one: 31 wallclock secs (29.42 usr +  1.34 sys = 30.76 CPU) @  1.20/s (n=37)
       six: 31 wallclock secs (29.91 usr +  1.05 sys = 30.96 CPU) @  0.90/s (n=28)
   three_a: 31 wallclock secs (29.09 usr +  1.24 sys = 30.33 CPU) @  1.45/s (n=44)
   three_b: 31 wallclock secs (29.43 usr +  1.24 sys = 30.67 CPU) @  1.43/s (n=44)
     two_a: 31 wallclock secs (29.08 usr +  1.21 sys = 30.29 CPU) @  1.58/s (n=48)
     two_b: 30 wallclock secs (28.57 usr +  1.49 sys = 30.06 CPU) @  1.56/s (n=47)
           Rate    six   five    one three_b four_a three_a four_b  two_b  two_a
six     0.904/s     --    -4%   -25%    -37%   -37%    -38%   -38%   -42%   -43%
five    0.943/s     4%     --   -22%    -34%   -35%    -35%   -35%   -40%   -40%
one      1.20/s    33%    28%     --    -16%   -17%    -17%   -18%   -23%   -24%
three_b  1.43/s    59%    52%    19%      --    -1%     -1%    -2%    -8%    -9%
four_a   1.45/s    60%    53%    20%      1%     --     -0%    -1%    -7%    -9%
three_a  1.45/s    60%    54%    21%      1%     0%      --    -1%    -7%    -8%
four_b   1.46/s    61%    55%    21%      2%     1%      1%     --    -7%    -8%
two_b    1.56/s    73%    66%    30%      9%     8%      8%     7%     --    -1%
two_a    1.58/s    75%    68%    32%     10%    10%      9%     9%     1%     --
-------------- next part --------------
#!/usr/bin/perl
use Benchmark qw(cmpthese);

open OUT, ">/dev/null";

sub one {
    open FILE, "bigfile.txt";
    print OUT <FILE>;
    close FILE;
}

sub two_a {
    open FILE, "bigfile.txt";
    print OUT while <FILE>;
    close FILE;
}

sub two_b {
    open FILE, "bigfile.txt";
    print OUT while defined ($_ = <FILE>);
    close FILE;
}

sub three_a {
    open FILE, "bigfile.txt";
    while (<FILE>) {
        print OUT;
    }
    close FILE;
}

sub three_b {
    open FILE, "bigfile.txt";
    while (defined ($_ = <FILE>)) {
        print OUT;
    }
    close FILE;
}

sub four_a {
    open FILE, "bigfile.txt";
    while ($line = <FILE>) {
        print OUT $line;
    }
    close FILE;
}

sub four_b {
    open FILE, "bigfile.txt";
    while (defined ($line = <FILE>)) {
        print OUT $line;
    }
    close FILE;
}

sub five {
    open FILE, "bigfile.txt";
    print OUT foreach <FILE>;
    close FILE;
}

sub six {
    open FILE, "bigfile.txt";
    foreach (<FILE>) {
        print OUT;
    }
    close FILE;
}

cmpthese(-30, {'one'   => \&one,
               'two_a' => \&two_a,
               'two_b' => \&two_b,
               'three_a' => \&three_a,
               'three_b' => \&three_b,
               'four_a' => \&four_a,
               'four_b' => \&four_b,
               'five' => \&five,
               'six' => \&six});

close OUT;