From jarich at perltraining.com.au Thu Sep 4 01:32:08 2014 From: jarich at perltraining.com.au (Jacinta Richardson) Date: Thu, 04 Sep 2014 18:32:08 +1000 Subject: [Melbourne-pm] Regular expression fun Message-ID: <54082388.5060207@perltraining.com.au> G'day folk, It's been a while, and I shall endeavour to arrange some kind of meeting before the year is out, I promise. Anyway, I've recently been thinking of a regular expression problem and wondered if anyone had any suggestions on how I should approach it. Consider this situation. I have a number of clients and a page of text on which they might match. I want to tell whether any of my clients match, but I also want to know which client matched. I could solve this, this way. my %clients = ( 031234 => "John Smith", 234345 => "Jane Brown", 345345 => "Jameel Bayan" ); foreach my $client_id (keys %clients) { my $client = $clients{$client_id}; my (first, $last) = split /\w+/, $client; my $first_initial_only = substr($first, 0, 1); my $match_first_initial_last = qr{$first_initial_only\s+$last}; my $match_surname_first = qr{$last\W+$first}; if( $page =~ /$client/ || $page =~ $match_first_initial_last || $page =~ $match_surname_first ) { say "yay I matched $client_id"; } Now there's an obvious improvement there, where I can make that all one regular expression against $page: if( $page =~ m{$client|$match_first_initial_last|$match_surname_first} ) { } and that's great. But what I really want to do is be able to build a regular expression to try to match all of my clients against the page, at once, and to still know who matched. Mostly because the contortions I go to increase successful matches while limiting false positive matches are a little more complicated than the above. So ideally I'd like to do something like this: my ($match_full_name, $match_first_initial_last, $match_surname_first); foreach my $client_id (keys %clients) { my $client = $clients{$client_id}; my (first, $last) = split /\w+/, $client; my $first_initial_only = substr($first, 0, 1); $match_full_name .= qr{$client}; $match_first_initial_last .= qr{$first_initial_only\s+$last}; $match_surname_first .= qr{$last\W+$first}; } if( $page =~ $match_full_name || $page =~ $match_first_initial_last || $page =~ $match_surname_first ) { # Which client_id did I match? } Does anyone have any suggestions for how that might be done? J From tjc at wintrmute.net Thu Sep 4 01:40:06 2014 From: tjc at wintrmute.net (Toby Wintermute) Date: Thu, 4 Sep 2014 18:40:06 +1000 Subject: [Melbourne-pm] Regular expression fun In-Reply-To: <54082388.5060207@perltraining.com.au> References: <54082388.5060207@perltraining.com.au> Message-ID: Build a grammar up from the clients, or not, but either way match each client to a named capture point rather than an array of anonymous matches. Then check on keys %+ for the client ids? On 04/09/2014 6:32 pm, "Jacinta Richardson" wrote: > G'day folk, > > It's been a while, and I shall endeavour to arrange some kind of meeting > before the year is out, I promise. > > Anyway, I've recently been thinking of a regular expression problem and > wondered if anyone had any suggestions on how I should approach it. > > Consider this situation. I have a number of clients and a page of text on > which they might match. I want to tell whether any of my clients match, > but I also want to know which client matched. I could solve this, this way. > > my %clients = ( > 031234 => "John Smith", > 234345 => "Jane Brown", > 345345 => "Jameel Bayan" > ); > > foreach my $client_id (keys %clients) { > my $client = $clients{$client_id}; > > my (first, $last) = split /\w+/, $client; > my $first_initial_only = substr($first, 0, 1); > > my $match_first_initial_last = qr{$first_initial_only\s+$last}; > my $match_surname_first = qr{$last\W+$first}; > > if( $page =~ /$client/ > || $page =~ $match_first_initial_last > || $page =~ $match_surname_first > ) { > say "yay I matched $client_id"; > } > > Now there's an obvious improvement there, where I can make that all one > regular expression against $page: > > if( $page =~ m{$client|$match_first_initial_last|$match_surname_first} > ) { > } > > and that's great. But what I really want to do is be able to build a > regular expression to try to match all of my clients against the page, at > once, and to still know who matched. Mostly because the contortions I go > to increase successful matches while limiting false positive matches are a > little more complicated than the above. > > So ideally I'd like to do something like this: > > my ($match_full_name, $match_first_initial_last, $match_surname_first); > > foreach my $client_id (keys %clients) { > my $client = $clients{$client_id}; > > my (first, $last) = split /\w+/, $client; > my $first_initial_only = substr($first, 0, 1); > > $match_full_name .= qr{$client}; > $match_first_initial_last .= qr{$first_initial_only\s+$last}; > $match_surname_first .= qr{$last\W+$first}; > } > > if( $page =~ $match_full_name > || $page =~ $match_first_initial_last > || $page =~ $match_surname_first > ) { > # Which client_id did I match? > } > > Does anyone have any suggestions for how that might be done? > > J > _______________________________________________ > Melbourne-pm mailing list > Melbourne-pm at pm.org > http://mail.pm.org/mailman/listinfo/melbourne-pm > -------------- next part -------------- An HTML attachment was scrubbed... URL: From damian at conway.org Thu Sep 4 02:46:58 2014 From: damian at conway.org (Damian Conway) Date: Thu, 4 Sep 2014 19:46:58 +1000 Subject: [Melbourne-pm] Regular expression fun In-Reply-To: <54082388.5060207@perltraining.com.au> References: <54082388.5060207@perltraining.com.au> Message-ID: One possible solution would be to use named captures to identify each match (and to do so for all the possible matches at once). Like so: my $names_pattern = join '|', map { my $name = $clients{$_}; my ($first, $initial, $last) = $name =~ /((.).*)\s+(.*)/; my $names = join '|', reverse sort ( "$first\\s+$last", "$initial\\.?\\s+$last", "$last,?\\s*(?:$first|$initial)", ); "(?<_$_>$names)" } keys %clients; if ($page =~ m{\A(?:$names_pattern|.)*\Z}s) { use Data::Dumper 'Dumper'; my @matched_ids = map {substr($_,1)} keys %+; for my $client_id (@matched_ids) { say "yay I matched $client_id"; } } Note that this might sometimes be significantly *slower*, as it always has to examine the entire string, rather than short-circuiting on each match. Damian From ddick at iinet.net.au Thu Sep 4 03:06:55 2014 From: ddick at iinet.net.au (David Dick) Date: Thu, 04 Sep 2014 20:06:55 +1000 Subject: [Melbourne-pm] Regular expression fun In-Reply-To: <54082388.5060207@perltraining.com.au> References: <54082388.5060207@perltraining.com.au> Message-ID: <540839BF.7080600@iinet.net.au> On 09/04/2014 06:32 PM, Jacinta Richardson wrote: > Does anyone have any suggestions for how that might be done? Or this? my $page = "asdfasd fasdf asdf asdf Smith John"; print "Page Contents:$page\n"; my %clients = ( 031234 => "John Smith", 234345 => "Jane Brown", 345345 => "Jameel Bayan" ); my %matches; foreach my $client_id (keys %clients) { my $client = $clients{$client_id}; $matches{$client} = $client_id; my ($first, $last) = split /\s+/, $client; my $first_initial_only = substr $first, 0, 1; $matches{$first_initial_only} = $client_id; my $match_first_initial_last = qr{$first_initial_only\s+$last}; my $match_surname_first = qr{$last\W+$first}; $matches{$match_surname_first} = $client_id; } my $regex = join q[|], map { '(' . $_ . ')' } keys %matches; my ($count, $index) = (-1, 0); my $matched = grep { defined $_ ? $index = $count : $count += 1; } $page =~ /$regex/; print "Found " . $clients{$matches{substr((split /[|]/, $regex)[$index], 1, -1)}} . "\n"; From kahlil.hodgson at dealmax.com.au Thu Sep 18 23:25:47 2014 From: kahlil.hodgson at dealmax.com.au (Kahlil Hodgson) Date: Fri, 19 Sep 2014 16:25:47 +1000 Subject: [Melbourne-pm] Join our small team and be part of something big Message-ID: DealMax is an Australian financial services company that's developed a unique and proprietary consumer-direct-to-bank technology platform. Our technology powers websites like dealmax.com.au and is also used in-house by Australian financial institutions. Our company is currently in an exciting growth phase and we need to grow our small technology team. We use a variety of languages, frameworks and technologies, acknowledging that they all have their strengths and weaknesses, however, we do have a strong focus on Linux and modern Perl. We are looking for one or more generalist software developers. If you enjoy creating innovative solutions to challenging problems, have experience in a variety of programming languages and development frameworks and are comfortable with picking up and exploring new technologies, then we'd like to hear from you. Testing, devops, database administration and system engineering experience is a definite bonus. Please forward a one page resume to careers at dealmax.com.au, highlighting the diversity of your skills and experience. Cheers, Kal Kahlil (Kal) Hodgson GPG: C9A02289 Head of Technology (m) +61 (0) 4 2573 0382 DealMax Pty Ltd (w) +61 (0) 3 9008 5281 Suite 1415 401 Docklands Drive Docklands VIC 3008 Australia "All parts should go together without forcing. You must remember that the parts you are reassembling were disassembled by you. Therefore, if you can't get them together again, there must be a reason. By all means, do not use a hammer." -- IBM maintenance manual, 1925