[Melbourne-pm] Regular expression fun
Jacinta Richardson
jarich at perltraining.com.au
Thu Sep 4 01:32:08 PDT 2014
G'day folk,
It's been a while, and I shall endeavour to arrange some kind of meeting
before the year is out, I promise.
Anyway, I've recently been thinking of a regular expression problem and
wondered if anyone had any suggestions on how I should approach it.
Consider this situation. I have a number of clients and a page of text
on which they might match. I want to tell whether any of my clients
match, but I also want to know which client matched. I could solve
this, this way.
my %clients = (
031234 => "John Smith",
234345 => "Jane Brown",
345345 => "Jameel Bayan"
);
foreach my $client_id (keys %clients) {
my $client = $clients{$client_id};
my (first, $last) = split /\w+/, $client;
my $first_initial_only = substr($first, 0, 1);
my $match_first_initial_last = qr{$first_initial_only\s+$last};
my $match_surname_first = qr{$last\W+$first};
if( $page =~ /$client/
|| $page =~ $match_first_initial_last
|| $page =~ $match_surname_first
) {
say "yay I matched $client_id";
}
Now there's an obvious improvement there, where I can make that all one
regular expression against $page:
if( $page =~
m{$client|$match_first_initial_last|$match_surname_first} ) {
}
and that's great. But what I really want to do is be able to build a
regular expression to try to match all of my clients against the page,
at once, and to still know who matched. Mostly because the contortions
I go to increase successful matches while limiting false positive
matches are a little more complicated than the above.
So ideally I'd like to do something like this:
my ($match_full_name, $match_first_initial_last, $match_surname_first);
foreach my $client_id (keys %clients) {
my $client = $clients{$client_id};
my (first, $last) = split /\w+/, $client;
my $first_initial_only = substr($first, 0, 1);
$match_full_name .= qr{$client};
$match_first_initial_last .= qr{$first_initial_only\s+$last};
$match_surname_first .= qr{$last\W+$first};
}
if( $page =~ $match_full_name
|| $page =~ $match_first_initial_last
|| $page =~ $match_surname_first
) {
# Which client_id did I match?
}
Does anyone have any suggestions for how that might be done?
J
More information about the Melbourne-pm
mailing list