[tpm] Fast(er) way of deleting X lines from start of a file

Shlomi Fish shlomif at iglu.org.il
Thu Oct 1 09:52:31 PDT 2009


On Thursday 01 Oct 2009 17:44:54 Madison Kelly wrote:
> Shlomi Fish wrote:
> > On Wednesday 30 Sep 2009 17:39:14 Madison Kelly wrote:
> >> Hi all,
> >>
> >>    Thus far, I've opened a file, read it in, shifted/ignored the first X
> >> number of line and then wrote out the resulting file. This works, but
> >> strikes me as horribly inefficient. Doubly so when the file could be
> >> larger than RAM.
> >>
> >>    So then, is there a more efficient/faster way of saying "Delete the
> >> first X number of lines from file foo"?
> >
> > What I would do in pure-Perl is: (untested)
> >
> > <<<<<<<<<<<<<<<
> > my $num_lines_to_del = shift(@ARGV);
> > my $filename = shift(@ARGV);
> > # Maybe use File::Temp here
> > my $temp_fn = $filename.".new";
> >
> > open my $in_fh, "<", $filename
> > 	or die "Could not open '$filename' - $! !";
> >
> >
> > open my $temp_out_fh, ">", $temp_fn
> > 	or die "Could not open temp filename - $!";
> >
> > foreach my $i (1 .. $num_lines_to_del)
> > {
> > 	# Read one line.
> > 	scalar(<$in_fh>);
> > }
> >
> > my $buf_len = 1024 * 16;
> >
> > my $buffer;
> > while (read($in_fh, $buffer, $buf_len))
> > {
> > 	print {$temp_out_fh} $buffer;
> > }
> > close($temp_out_fh);
> > close($in_fh);
> >
> > rename($temp_fn, $filename);
> >
> > Like I said - untested, but I hope you get the idea.
> >
> > Regards,
> >
> > 	Shlomi Fish
> 
> If I read this right, the entire file will be copied, sans initial
> lines, and never use more that 1024*16 bytes of memory. That would
> certainly ensure I never eat up all the RAM, regardless of the file size
> being messed with. Awesome.

That's mostly right. The perl 5 interpreter overhead is much more than 16KB of 
RAM, and it also has some overhead for every variable. Furthermore, if the 
first $num_lines_to_del lines are very long that it is possible that skipping 
them using "perldoc -f scalar" will consume much more memory. I'm not sure how 
smart is <$filehandle> is, but maybe it's better in void context.

But you are right that this is a pure-Perl solution that mostly fulfills these 
constraints. Like I said - it's untested so all warnings apply.

Regards,

	Shlomi Fish

-- 
-----------------------------------------------------------------
Shlomi Fish       http://www.shlomifish.org/
Stop Using MSIE - http://www.shlomifish.org/no-ie/

Chuck Norris read the entire English Wikipedia in 24 hours. Twice.


More information about the toronto-pm mailing list