SPUG: "diff" utility within Perl
Matt Tucker
tuck at whistlingfish.net
Tue Nov 6 15:49:20 CST 2001
-- ced at carios2.ca.boeing.com spake thusly:
>> Does anyone know of if a "diff" like module exists within
>> Perl?? I have script that uses "-s" to do simple diff,
>> but I've found that the files are same size BUT different
>> in content (single char in date 5 vs 6). I'd like to
>> use some kina "diff" utlity to verify if indeed the two files
>> are identical of not (don't need to know the difference,
>> just that they exist is good enought) If no module exists
>> does anyone have utility that would work???
>
> I have no experience here but String::DiffLine maybe:
>
> http://search.cpan.org/search?mode=module&query=diff
That same search also returns Algorithm::Diff, which offers quite a bit
more power. For what you're doing, though, String::DiffLine is probably
enough, although I had to patch it to get it to compile:
---- begin DiffLine.xs.patch ----
--- DiffLine.xs.orig Tue Nov 6 13:27:05 2001
+++ DiffLine.xs Tue Nov 6 13:27:18 2001
@@ -46,7 +46,7 @@
lines++,lpos=i+1;
}
if(l1==l2)
- PUSHs(&sv_undef);
+ PUSHs(&PL_sv_undef);
else
PUSHs(sv_2mortal(newSViv(l)));
PUSHs(sv_2mortal(newSViv(lines)));
---- end DiffLine.xs.patch ----
I did a bit of benchmarking, and for this purpose String::DiffLine is
considerably faster, although perhaps I was going about things the
wrong way. Splitting the file before passing it to A:D:diff slows
things down considerably (~10x), so that's commented out for the
benchmark.
The main problem with A:D is that it parses the entire file rather than
just looking for the first difference. I believe it's also working
entirely in Perl, rather than in C. It does, however, have a much more
general usefulness.
Benchmark results and code follow:
[70] src at benzene$ ~/projects/try/benchmark.pl 10000
Benchmark: timing 10000 iterations of algo_diff, diffline...
algo_diff: 24 wallclock secs (23.07 usr + 0.95 sys = 24.02 CPU) @
416.32/s (n=10000)
diffline: 2 wallclock secs ( 1.57 usr + 0.00 sys = 1.57 CPU) @
6369.43/s (n=10000)
Rate algo_diff diffline
algo_diff 416/s -- -93%
diffline 6369/s 1430% --
---- begin benchmark.pl ----
#!/usr/bin/perl
use Benchmark qw(cmpthese);
use Data::Dumper;
use File::Slurp;
use warnings;
no warnings qw(uninitialized);
use strict;
main();
sub main {
my $count = $ARGV[0] || 1000;
cmpthese($count, benchmarkCode());
}
sub benchmarkCode {
use vars qw($file1 $file2);
$file1 = read_file "/etc/apache/httpd.conf";
$file2 = read_file "/etc/apache/httpd.conf.0";
require Algorithm::Diff;
my $algo_diff_split = sub {
my @lines1 = split "\n", $file1;
my @lines2 = split "\n", $file2;
my @diffs = Algorithm::Diff::diff (\@lines1, \@lines2);
return scalar @diffs;
};
my $algo_diff_nosplit = sub {
my @diffs = Algorithm::Diff::diff ([ $file1 ], [ $file2 ]);
return scalar @diffs;
};
require String::DiffLine;
my $diffline = sub {
my @result = String::DiffLine::diffline($file1, $file2);
return defined $result[0];
};
my %subs =
(
# algo_diff_split => $algo_diff_split,
algo_diff_nosplit => $algo_diff_nosplit,
diffline => $diffline,
);
while (my ($name, $sub) = each %subs) {
warn "$name: failed to differing files" unless $sub->();
{
local $file2 = $file1;
warn "$name: failed to detect matching files" if $sub->();
}
}
return \%subs;;
}
---- end benchmark.pl ----
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 232 bytes
Desc: not available
Url : http://mail.pm.org/archives/spug-list/attachments/20011106/ed2922a6/attachment.bin
More information about the spug-list
mailing list