LPM: File reading performance

llang at baywestpaper.com llang at baywestpaper.com
Wed Jan 19 08:38:57 CST 2000


Well - not having worked much (read: at all) with benchmarking, I
cut/pasted/messed with this code and kicked it off.  Works great with a
file of 4Mb, but a 6Mb file gives an "Out Of Memory!" error on the slurp
(the other two work fine with that size).

Can somebody shed some light on what the memory limitations are.  I'm
running ActiveState build 521 with 128 Meg Ram.  Is it a DOS limitation?


Loren Lang               Phone:    606-734-0538 x326
Network Administrator         Fax: 606-734-8210
Bay West Paper Corporation    email:     llang at baywestpaper.com

"There is no greater burden than great potential." - Charles Schultz





                                                                                                                     
                    Mik Firestone                                                                                    
                    <fireston at lexmark.co        To:     Perl Geeks <lexington-pm-list at happyfunball.pm.org>           
                    m>                          cc:                                                                  
                    Sent by:                    Subject:     Re: LPM: File reading performance                       
                    owner-lexington-pm-l                                                                             
                    ist at pm.org                                                                                       
                                                                                                                     
                                                                                                                     
                    01/19/00 08:39 AM                                                                                
                    Please respond to                                                                                
                    lexington-pm-list                                                                                
                                                                                                                     
                                                                                                                     



Yes, the speed improvement is primarily due to using the file I/O buffers
better.  However, we can squeeze a little more out of this, if you want.

>          @content = <FILE>;
>          $whole = join ("", @content);

The first thing we can try is to reduce the number of memory copies you are
making.  Not only willl this speed up the code, but it will also reduce the
memory foot print.  The array @content seems to serve no further purpose in
this code, so lets get rid of it.  I would first try using the snippet:
    $whole = join( "", <FILE> );
Since <FILE> is being used in list context, perl will read in the entire
file
and we have avoided the @content.  According to my benchmarking reading a
6 Mb file (230184 lines), the first method took 21 seconds for 10
interations
and the second took 18 seconds for the same 10 iterations.

Just for grins, I tried my favourite method which would be to undefine the
input record separator.  The code looks something like this:
    local $/ = undef;
    $whole = <FILE>;
$/ is a perl magic variable ( which is why I protect it with a local ) that
tells perl how to split records when reading a file.  This defaults to \n,
so
the statement $line = <FILE> will do what you expect.  By undefining it, I
tell perl there are no separators, so perl will read the entire file into a
single scalar variable.

My benchmarking ( code is attached at bottom for those interested )
indicated
this was at least 3 times faster than the other two methods.  Perl still
has
to be concerned with chopping the file correctly when using the @array =
<FILE> syntax.  My last method relieves perl of that burden and we get the
speed advantages.

I am not sure this helps, but it is early morning and I haven't had any
caffiene yet.  Check the examples and I will clarify anything I haven't
explained well.

Mik



use Benchmark;

#
# 17 - 18 seconds for 10 interations
sub ByArray {
    my (@arr, $str);

    open TMP, "/tmp/foobar" or die "couldn't open test file\n";
    @arr = <TMP>;
    $str = join("", at arr);
    $str =~ s/\n//g;
}

# 21 - 22 seconds for 10 interations
sub ByAnonArray {
    my (@arr, $str);

    open TMP, "/tmp/foobar" or die "couldn't open test file\n";
    $str = join("",<TMP>);
    $str =~ s/\n//g;
}

# 6 - 7 seconds for 10 interations
sub ByInputRecSep {
    my $str;

    local $/ = undef;
    open TMP, "/tmp/foobar" or die "couldn't open test file\n";
    $str = <TMP>;
    $str =~ s/\n//g;
}

timethese( 10, { 'ByArray' => \&ByArray,
                       'ByAnonArray' => \&ByAnonArray,
                       'ByInputRec'  => \&ByInputRecSep
                      }
             );


--
Mik Firestone fireston at lexmark.com
When I become an Evil Overlord:
I will not include a self-destruct mechanism unless absolutely necessary.
If
it is necessary, it will not be a large red button labelled "Danger: Do Not
Push".











More information about the Lexington-pm mailing list