[Thamesvalley-pm] Fuzzy Search on text

Andy Wardley abw at wardley.org
Wed Jun 4 23:33:37 PDT 2008


Iain Emsley wrote:
> Does any one have any ideas on the best way of getting it?

Unless I'm missing something obvious, you just need to slice the next
$find_len characters from @$search, starting at $i.

> while (my $search = <DATA>) {
>   chomp $search;
>   $search = [split //, $search];
>     for my $i ( 0..@$search-$find_len ) {
>         FIND:
>         for my $find ( @find ) {
>             my $misses = 0;
>             for my $j ( 0..$find_len-1 ) {
>                 $misses++ if $search->[$i+$j] ne $find->[$j];
>                 next FIND if $misses > $fuzzy;
>             }

             print "Line $. Match ($misses) at $i, ",
                 join('', @$search[$i..$i+$find_len-1]), "\n";

>         }
>     }
> } 

output:
Line 8 Match (1) at 30, Marley
Line 12 Match (3) at 36, larly
Line 19 Match (1) at 0, Marley
Line 23 Match (3) at 30, many y
Line 31 Match (1) at 15, Marley
Line 32 Match (3) at 13, tarted
Line 32 Match (1) at 49, Marley
Line 36 Match (3) at 0, Hamlet
Line 37 Match (3) at 24, markab
Line 37 Match (3) at 27, kable
Line 44 Match (1) at 30, Marley
Line 46 Match (3) at 18, Marlie
Line 47 Match (1) at 12, Marley
Line 48 Match (3) at 9, called
Line 48 Match (2) at 47, Marlee


HTH
A


More information about the Thamesvalley-pm mailing list