[oak perl] Comparing two files

Tue May 31 00:55:00 PDT 2005

Mike, 

In my tests the most efficient way to determine 
 unique from duplicate lines was with a Perl hash. 

Enclosed please find:
 "dups.pl" does the work.
 "dupslib.pm" puts content in your scalers & 
              consumes $differences. 

dups.pl  uses a group of lines to fill the hash with $longfile, 
it then uses another group of lines to search for unique lines 
in $shortfile.
The intent is to make it clear what is going on. 

dups.pl  has 2 lines commented out at the end.  
They do the same thing as the 2 groups of earlier lines, 
but are more obscure. 

Did you enjoy the deluge of responses?

You wrote:
>my $shortfile;
>my $longfile;
>my $differences;
>
>
>I'm writing a script to compare two text files ($shortfile & $longfile). 
>If a line appears in $shortfile, but that line is not in $longfile, then 
>I want to write that line out to $differences
>
>I'm relatively certain it is not efficient to open $longfile for each 
>entry in $shortfile. Both files are of the magnitude of 800+ lines.
>
>For example, a given line in $shortfile is found at line 333 in 
>$longfile. Without closing and reopening $longfile, I don't know how to 
>reset the 'pointer' in $longfile back to line 1.
>
>Perhaps there is a better way of doing this. I hope I've explained what 
>I'm trying to do clearly.
>
>Suggestions ?
>
>Thanks,
>Mike

Chris Yager
(510)317-5900
iceman at prado.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dups.pl
Type: application/octet-stream
Size: 1406 bytes
Desc: not available
Url : http://mail.pm.org/pipermail/oakland/attachments/20050531/fc3010d8/dups.obj
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dupslib.pm
Type: application/octet-stream
Size: 710 bytes
Desc: not available
Url : http://mail.pm.org/pipermail/oakland/attachments/20050531/fc3010d8/dupslib.obj