[Melbourne-pm] Regular expression fun

Thu Sep 4 01:40:06 PDT 2014

Build a grammar up from the clients, or not, but either way match each
client to a named capture point rather than an array of anonymous matches.
Then check on keys %+ for the client ids?
On 04/09/2014 6:32 pm, "Jacinta Richardson" <jarich at perltraining.com.au>
wrote:

> G'day folk,
>
> It's been a while, and I shall endeavour to arrange some kind of meeting
> before the year is out, I promise.
>
> Anyway, I've recently been thinking of a regular expression problem and
> wondered if anyone had any suggestions on how I should approach it.
>
> Consider this situation.  I have a number of clients and a page of text on
> which they might match.  I want to tell whether any of my clients match,
> but I also want to know which client matched.  I could solve this, this way.
>
> my %clients = (
>     031234 => "John Smith",
>      234345 => "Jane Brown",
>      345345 => "Jameel Bayan"
> );
>
> foreach my $client_id (keys %clients) {
>     my $client = $clients{$client_id};
>
>     my (first, $last) = split /\w+/, $client;
>     my $first_initial_only = substr($first, 0, 1);
>
>     my $match_first_initial_last = qr{$first_initial_only\s+$last};
>     my $match_surname_first   = qr{$last\W+$first};
>
>     if(    $page =~ /$client/
>        ||  $page =~ $match_first_initial_last
>        || $page  =~ $match_surname_first
>      ) {
>             say "yay I matched $client_id";
> }
>
> Now there's an obvious improvement there, where I can make that all one
> regular expression against $page:
>
>       if(  $page =~ m{$client|$match_first_initial_last|$match_surname_first}
> ) {
>       }
>
> and that's great.  But what I really want to do is be able to build a
> regular expression to try to match all of my clients against the page, at
> once, and to still know who matched.  Mostly because the contortions I go
> to increase successful matches while limiting false positive matches are a
> little more complicated than the above.
>
> So ideally I'd like to do something like this:
>
> my ($match_full_name, $match_first_initial_last, $match_surname_first);
>
> foreach my $client_id (keys %clients) {
>     my $client = $clients{$client_id};
>
>     my (first, $last) = split /\w+/, $client;
>     my $first_initial_only = substr($first, 0, 1);
>
>     $match_full_name         .= qr{$client};
>     $match_first_initial_last .= qr{$first_initial_only\s+$last};
>     $match_surname_first   .= qr{$last\W+$first};
> }
>
> if(    $page =~ $match_full_name
>        ||  $page =~ $match_first_initial_last
>        || $page  =~ $match_surname_first
> ) {
>        # Which client_id did I match?
> }
>
> Does anyone have any suggestions for how that might be done?
>
>      J
> _______________________________________________
> Melbourne-pm mailing list
> Melbourne-pm at pm.org
> http://mail.pm.org/mailman/listinfo/melbourne-pm
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.pm.org/pipermail/melbourne-pm/attachments/20140904/c35402e6/attachment.html>