[Melbourne-pm] Regular expression fun

Jacinta Richardson jarich at perltraining.com.au
Thu Sep 4 01:32:08 PDT 2014


G'day folk,

It's been a while, and I shall endeavour to arrange some kind of meeting 
before the year is out, I promise.

Anyway, I've recently been thinking of a regular expression problem and 
wondered if anyone had any suggestions on how I should approach it.

Consider this situation.  I have a number of clients and a page of text 
on which they might match.  I want to tell whether any of my clients 
match, but I also want to know which client matched.  I could solve 
this, this way.

my %clients = (
     031234 => "John Smith",
      234345 => "Jane Brown",
      345345 => "Jameel Bayan"
);

foreach my $client_id (keys %clients) {
     my $client = $clients{$client_id};

     my (first, $last) = split /\w+/, $client;
     my $first_initial_only = substr($first, 0, 1);

     my $match_first_initial_last = qr{$first_initial_only\s+$last};
     my $match_surname_first   = qr{$last\W+$first};

     if(    $page =~ /$client/
        ||  $page =~ $match_first_initial_last
        || $page  =~ $match_surname_first
      ) {
             say "yay I matched $client_id";
}

Now there's an obvious improvement there, where I can make that all one 
regular expression against $page:

       if(  $page =~ 
m{$client|$match_first_initial_last|$match_surname_first} ) {
       }

and that's great.  But what I really want to do is be able to build a 
regular expression to try to match all of my clients against the page, 
at once, and to still know who matched.  Mostly because the contortions 
I go to increase successful matches while limiting false positive 
matches are a little more complicated than the above.

So ideally I'd like to do something like this:

my ($match_full_name, $match_first_initial_last, $match_surname_first);

foreach my $client_id (keys %clients) {
     my $client = $clients{$client_id};

     my (first, $last) = split /\w+/, $client;
     my $first_initial_only = substr($first, 0, 1);

     $match_full_name         .= qr{$client};
     $match_first_initial_last .= qr{$first_initial_only\s+$last};
     $match_surname_first   .= qr{$last\W+$first};
}

if(    $page =~ $match_full_name
        ||  $page =~ $match_first_initial_last
        || $page  =~ $match_surname_first
) {
        # Which client_id did I match?
}

Does anyone have any suggestions for how that might be done?

      J


More information about the Melbourne-pm mailing list