[oak perl] Comparing two files

Michael Paoli mp at rawbw.com
Sun May 29 07:39:14 PDT 2005


A few items to consider.
There are lots of ways to compare and look at differences among
files - most notably beyond determining if the entire data contents are
identical or not.  That's really a topic unto itself.  The source to diff(1)
might be a useful/interesting place to start looking at that, and/or
suitable information on various algorithms.

If the size of the files is relatively small compared to the virtual
memory available, it may be most/quite efficient to have perl read each
of the entire files into arrays, and one can then handle, compare, etc.
that data as desired, without need to reread the files.

As for repositioning in a file, take a look at the seek perl function, and
other related perl functions.  If the files are quite large relative to
the virtual memory available, this may be a preferable approach.  The
operating system may also help significantly with caching, so some/many
logical rereads may not require physical rereading of on-disk data.

I'd guestimate the more efficient approaches probably avoid rereading the
files, or portions thereof ... but then there are always the tradeoffs
between machine efficiency, programmer efficiency, and time, and for
sufficiently small problem tasks, optimization may not be a significant
factor.

Quoting "M. Lewis" <cajun at cajuninc.com>:

> my $shortfile;
> my $longfile;
> my $differences;
> 
> 
> I'm writing a script to compare two text files ($shortfile & $longfile). 
> If a line appears in $shortfile, but that line is not in $longfile, then 
> I want to write that line out to $differences
> 
> I'm relatively certain it is not efficient to open $longfile for each 
> entry in $shortfile. Both files are of the magnitude of 800+ lines.
> 
> For example, a given line in $shortfile is found at line 333 in 
> $longfile. Without closing and reopening $longfile, I don't know how to 
> reset the 'pointer' in $longfile back to line 1.
> 
> Perhaps there is a better way of doing this. I hope I've explained what 
> I'm trying to do clearly.
> 
> Suggestions ?


More information about the Oakland mailing list