[Chicago-talk] @ARGV while(<>)
Steven Lembark
lembark at wrkhors.com
Sun Jan 13 16:12:15 PST 2008
> You're doing repeated concatenation onto a string, which gets bigger
each time through the loop.
> This might mean a lot of realloc's and copying of the string in
memory. (I may be wrong, I haven't
> studied perl internals).
>
> Try putting the lines into an array and then join'ing them at the very
end.
>
> while (<>)
> {
> push (@lines, $_) unless (/espf\[/);
> }
> my $body = join("", @lines);
Perl strings are C strings with a start pointer and size_t
length. Eating into the start simply updates the starting
pointer; cutting off the end reduces the length.
So, yes, $a .= $b will become expensive.
Catch: It is equally expenseive in the array case
since you end up having to re-allocate the array
when it grows. Not quite as much coping, but likley
the number of lines in the file.
Ways to get around this include:
- Presize the array and grow it in chunks
as it progresses. This will reduce the
number of copy operations to manageable
level (e.g., 1_000 line chunks):
my @bufferz = ();
$#bufferz = 1_000;
my $i = 0;
while( <ARGV> )
{
# clean up $_;
$bufferz[ $i ] = $_;
if( ++$i > $#bufferz )
{
$#bufferz += 1_000;
$size = $#bufferz;
}
}
- Write the stuff out as it is processed and
read it back with a single array/slurp read:
open my $tmpfile, '+>', '/var/tmp/$$.tmp'
or die ...;
while( <ARGV> )
{
...
print $tmpfile $_;
}
seek $tmpfile, 0, 0;
my @linz = <$tmpfile>
...
# or my $linz = do { local $/; <$tmpfile> }
--
Steven Lembark 85-09 90th St.
Workhorse Computing Woodhaven, NY, 11421
lembark at wrkhors.com +1 888 359 3508
More information about the Chicago-talk
mailing list