LPM: File reading performance

Wed Jan 19 07:39:20 CST 2000

Yes, the speed improvement is primarily due to using the file I/O buffers
better.  However, we can squeeze a little more out of this, if you want.

> 	@content = <FILE>;
> 	$whole = join ("", @content);

The first thing we can try is to reduce the number of memory copies you are
making.  Not only willl this speed up the code, but it will also reduce the
memory foot print.  The array @content seems to serve no further purpose in
this code, so lets get rid of it.  I would first try using the snippet:
    $whole = join( "", <FILE> );
Since <FILE> is being used in list context, perl will read in the entire file
and we have avoided the @content.  According to my benchmarking reading a
6 Mb file (230184 lines), the first method took 21 seconds for 10 interations
and the second took 18 seconds for the same 10 iterations.  

Just for grins, I tried my favourite method which would be to undefine the
input record separator.  The code looks something like this:
    local $/ = undef;
    $whole = <FILE>;
$/ is a perl magic variable ( which is why I protect it with a local ) that
tells perl how to split records when reading a file.  This defaults to \n, so
the statement $line = <FILE> will do what you expect.  By undefining it, I
tell perl there are no separators, so perl will read the entire file into a
single scalar variable.

My benchmarking ( code is attached at bottom for those interested ) indicated
this was at least 3 times faster than the other two methods.  Perl still has
to be concerned with chopping the file correctly when using the @array =
<FILE> syntax.  My last method relieves perl of that burden and we get the
speed advantages.

I am not sure this helps, but it is early morning and I haven't had any
caffiene yet.  Check the examples and I will clarify anything I haven't
explained well.

Mik

use Benchmark;

#
# 17 - 18 seconds for 10 interations
sub ByArray {
    my (@arr, $str);

    open TMP, "/tmp/foobar" or die "couldn't open test file\n";
    @arr = <TMP>;
    $str = join("", at arr);
    $str =~ s/\n//g;
}

# 21 - 22 seconds for 10 interations
sub ByAnonArray {
    my (@arr, $str);

    open TMP, "/tmp/foobar" or die "couldn't open test file\n";
    $str = join("",<TMP>);
    $str =~ s/\n//g;
}

# 6 - 7 seconds for 10 interations
sub ByInputRecSep {
    my $str;

    local $/ = undef;
    open TMP, "/tmp/foobar" or die "couldn't open test file\n";
    $str = <TMP>;
    $str =~ s/\n//g;
}

timethese( 10, { 'ByArray' => \&ByArray,
		  'ByAnonArray' => \&ByAnonArray,
		  'ByInputRec'  => \&ByInputRecSep
		 }
	  );

-- 
Mik Firestone fireston at lexmark.com
When I become an Evil Overlord:
I will not include a self-destruct mechanism unless absolutely necessary.  If
it is necessary, it will not be a large red button labelled "Danger: Do Not
Push".