LPM: File reading performance
llang at baywestpaper.com
llang at baywestpaper.com
Wed Jan 19 08:38:57 CST 2000
Well - not having worked much (read: at all) with benchmarking, I
cut/pasted/messed with this code and kicked it off. Works great with a
file of 4Mb, but a 6Mb file gives an "Out Of Memory!" error on the slurp
(the other two work fine with that size).
Can somebody shed some light on what the memory limitations are. I'm
running ActiveState build 521 with 128 Meg Ram. Is it a DOS limitation?
Loren Lang Phone: 606-734-0538 x326
Network Administrator Fax: 606-734-8210
Bay West Paper Corporation email: llang at baywestpaper.com
"There is no greater burden than great potential." - Charles Schultz
Mik Firestone
<fireston at lexmark.co To: Perl Geeks <lexington-pm-list at happyfunball.pm.org>
m> cc:
Sent by: Subject: Re: LPM: File reading performance
owner-lexington-pm-l
ist at pm.org
01/19/00 08:39 AM
Please respond to
lexington-pm-list
Yes, the speed improvement is primarily due to using the file I/O buffers
better. However, we can squeeze a little more out of this, if you want.
> @content = <FILE>;
> $whole = join ("", @content);
The first thing we can try is to reduce the number of memory copies you are
making. Not only willl this speed up the code, but it will also reduce the
memory foot print. The array @content seems to serve no further purpose in
this code, so lets get rid of it. I would first try using the snippet:
$whole = join( "", <FILE> );
Since <FILE> is being used in list context, perl will read in the entire
file
and we have avoided the @content. According to my benchmarking reading a
6 Mb file (230184 lines), the first method took 21 seconds for 10
interations
and the second took 18 seconds for the same 10 iterations.
Just for grins, I tried my favourite method which would be to undefine the
input record separator. The code looks something like this:
local $/ = undef;
$whole = <FILE>;
$/ is a perl magic variable ( which is why I protect it with a local ) that
tells perl how to split records when reading a file. This defaults to \n,
so
the statement $line = <FILE> will do what you expect. By undefining it, I
tell perl there are no separators, so perl will read the entire file into a
single scalar variable.
My benchmarking ( code is attached at bottom for those interested )
indicated
this was at least 3 times faster than the other two methods. Perl still
has
to be concerned with chopping the file correctly when using the @array =
<FILE> syntax. My last method relieves perl of that burden and we get the
speed advantages.
I am not sure this helps, but it is early morning and I haven't had any
caffiene yet. Check the examples and I will clarify anything I haven't
explained well.
Mik
use Benchmark;
#
# 17 - 18 seconds for 10 interations
sub ByArray {
my (@arr, $str);
open TMP, "/tmp/foobar" or die "couldn't open test file\n";
@arr = <TMP>;
$str = join("", at arr);
$str =~ s/\n//g;
}
# 21 - 22 seconds for 10 interations
sub ByAnonArray {
my (@arr, $str);
open TMP, "/tmp/foobar" or die "couldn't open test file\n";
$str = join("",<TMP>);
$str =~ s/\n//g;
}
# 6 - 7 seconds for 10 interations
sub ByInputRecSep {
my $str;
local $/ = undef;
open TMP, "/tmp/foobar" or die "couldn't open test file\n";
$str = <TMP>;
$str =~ s/\n//g;
}
timethese( 10, { 'ByArray' => \&ByArray,
'ByAnonArray' => \&ByAnonArray,
'ByInputRec' => \&ByInputRecSep
}
);
--
Mik Firestone fireston at lexmark.com
When I become an Evil Overlord:
I will not include a self-destruct mechanism unless absolutely necessary.
If
it is necessary, it will not be a large red button labelled "Danger: Do Not
Push".
More information about the Lexington-pm
mailing list