SPUG: "diff" utility within Perl

Tue Nov 6 15:49:20 CST 2001

-- ced at carios2.ca.boeing.com spake thusly:

>> Does anyone know of if a "diff" like module exists within
>> Perl?? I have script that uses "-s" to do simple diff,
>> but I've found that the files are same size BUT different
>> in content (single char in date 5 vs 6). I'd like to
>> use some kina "diff" utlity to verify if indeed the two files
>> are identical of not (don't need to know the difference, 
>> just that they exist is good enought) If no module exists
>> does anyone have utility that would work???
> 
> I have no experience here but String::DiffLine maybe:
> 
>  http://search.cpan.org/search?mode=module&query=diff

That same search also returns Algorithm::Diff, which offers quite a bit
more power. For what you're doing, though, String::DiffLine is probably
enough, although I had to patch it to get it to compile:

---- begin DiffLine.xs.patch ----

--- DiffLine.xs.orig    Tue Nov  6 13:27:05 2001
+++ DiffLine.xs Tue Nov  6 13:27:18 2001
@@ -46,7 +46,7 @@
           lines++,lpos=i+1;
     }
     if(l1==l2)
-      PUSHs(&sv_undef);
+      PUSHs(&PL_sv_undef);
     else
       PUSHs(sv_2mortal(newSViv(l)));
     PUSHs(sv_2mortal(newSViv(lines)));
---- end DiffLine.xs.patch ----

I did a bit of benchmarking, and for this purpose String::DiffLine is
considerably faster, although perhaps I was going about things the
wrong way. Splitting the file before passing it to A:D:diff slows
things down considerably (~10x), so that's commented out for the
benchmark.

The main problem with A:D is that it parses the entire file rather than
just looking for the first difference. I believe it's also working
entirely in Perl, rather than in C. It does, however, have a much more
general usefulness.

Benchmark results and code follow:

[70] src at benzene$ ~/projects/try/benchmark.pl 10000
Benchmark: timing 10000 iterations of algo_diff, diffline...
 algo_diff: 24 wallclock secs (23.07 usr +  0.95 sys = 24.02 CPU) @
416.32/s (n=10000)
  diffline:  2 wallclock secs ( 1.57 usr +  0.00 sys =  1.57 CPU) @
6369.43/s (n=10000)
            Rate algo_diff  diffline
algo_diff  416/s        --      -93%
diffline  6369/s     1430%        --

---- begin benchmark.pl ----
#!/usr/bin/perl

use Benchmark qw(cmpthese);

use Data::Dumper;
use File::Slurp;

use warnings;
no warnings qw(uninitialized);
use strict;

main();
sub main {
	my $count = $ARGV[0] || 1000;
	cmpthese($count, benchmarkCode());
}

sub benchmarkCode {
	use vars qw($file1 $file2);
	$file1 = read_file "/etc/apache/httpd.conf";
	$file2 = read_file "/etc/apache/httpd.conf.0";

	require Algorithm::Diff;
	my $algo_diff_split = sub {
		my @lines1 = split "\n", $file1;
		my @lines2 = split "\n", $file2;
		my @diffs = Algorithm::Diff::diff (\@lines1, \@lines2);
		return scalar @diffs;
	};
	my $algo_diff_nosplit = sub {
		my @diffs = Algorithm::Diff::diff ([ $file1 ], [ $file2 ]);
		return scalar @diffs;
	};

	require String::DiffLine;
	my $diffline = sub {
		my @result = String::DiffLine::diffline($file1, $file2);
		return defined $result[0];
	};

	my %subs = 
		(
#		 algo_diff_split   => $algo_diff_split,
		 algo_diff_nosplit => $algo_diff_nosplit,
		 diffline          => $diffline,
		);
	while (my ($name, $sub) = each %subs) {
		warn "$name: failed to differing files" unless $sub->();
		{
			local $file2 = $file1;
			warn "$name: failed to detect matching files" if $sub->();
		}
	}

	return \%subs;;
}
---- end benchmark.pl ----
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 232 bytes
Desc: not available
Url : http://mail.pm.org/archives/spug-list/attachments/20011106/ed2922a6/attachment.bin