[Melbourne-pm] Regular expression fun
Toby Wintermute
tjc at wintrmute.net
Thu Sep 4 01:40:06 PDT 2014
Build a grammar up from the clients, or not, but either way match each
client to a named capture point rather than an array of anonymous matches.
Then check on keys %+ for the client ids?
On 04/09/2014 6:32 pm, "Jacinta Richardson" <jarich at perltraining.com.au>
wrote:
> G'day folk,
>
> It's been a while, and I shall endeavour to arrange some kind of meeting
> before the year is out, I promise.
>
> Anyway, I've recently been thinking of a regular expression problem and
> wondered if anyone had any suggestions on how I should approach it.
>
> Consider this situation. I have a number of clients and a page of text on
> which they might match. I want to tell whether any of my clients match,
> but I also want to know which client matched. I could solve this, this way.
>
> my %clients = (
> 031234 => "John Smith",
> 234345 => "Jane Brown",
> 345345 => "Jameel Bayan"
> );
>
> foreach my $client_id (keys %clients) {
> my $client = $clients{$client_id};
>
> my (first, $last) = split /\w+/, $client;
> my $first_initial_only = substr($first, 0, 1);
>
> my $match_first_initial_last = qr{$first_initial_only\s+$last};
> my $match_surname_first = qr{$last\W+$first};
>
> if( $page =~ /$client/
> || $page =~ $match_first_initial_last
> || $page =~ $match_surname_first
> ) {
> say "yay I matched $client_id";
> }
>
> Now there's an obvious improvement there, where I can make that all one
> regular expression against $page:
>
> if( $page =~ m{$client|$match_first_initial_last|$match_surname_first}
> ) {
> }
>
> and that's great. But what I really want to do is be able to build a
> regular expression to try to match all of my clients against the page, at
> once, and to still know who matched. Mostly because the contortions I go
> to increase successful matches while limiting false positive matches are a
> little more complicated than the above.
>
> So ideally I'd like to do something like this:
>
> my ($match_full_name, $match_first_initial_last, $match_surname_first);
>
> foreach my $client_id (keys %clients) {
> my $client = $clients{$client_id};
>
> my (first, $last) = split /\w+/, $client;
> my $first_initial_only = substr($first, 0, 1);
>
> $match_full_name .= qr{$client};
> $match_first_initial_last .= qr{$first_initial_only\s+$last};
> $match_surname_first .= qr{$last\W+$first};
> }
>
> if( $page =~ $match_full_name
> || $page =~ $match_first_initial_last
> || $page =~ $match_surname_first
> ) {
> # Which client_id did I match?
> }
>
> Does anyone have any suggestions for how that might be done?
>
> J
> _______________________________________________
> Melbourne-pm mailing list
> Melbourne-pm at pm.org
> http://mail.pm.org/mailman/listinfo/melbourne-pm
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.pm.org/pipermail/melbourne-pm/attachments/20140904/c35402e6/attachment.html>
More information about the Melbourne-pm
mailing list