[Pdx-pm] common elements in two lists

Eric Wilhelm enobacon at gmail.com
Sat Nov 5 02:36:02 PDT 2011


# from Tom Keller
# on Friday 04 November 2011 15:34:

>my @out = map { $_ =~  m/\w+[-_]*(\w*[-_]*\d+[a-z]*).*/ } @names2;

This regexp isn't anchored, so it might do surprising things sometimes.

>( grep { $names1[$c] =~ m/$_/ } @out ) > 0) {

This match isn't anchored either. Is that intentional, or does it even 
need to be regexp match vs just a literal?  i.e. `eq` (or hash lookup 
as others have suggested.)

If it is a literal rather than substring match, you should get a nice 
boost from the hash lookup.  Otherwise you're stuck comparing N regexps 
to M strings .  It depends on how the "slightly different conventions 
for naming the same thing" differ and if you can normalize both lists 
as Schwern suggested.

Also note that '0 .. $#names1' is somewhat like the 'for(my $i = 0; 
$i++; $i < ...)' idiom in that you rarely really need to loop through 
indices and can typically just loop over values.

  my @common = grep {my $n = $_; first {$n =~ m/$_/} @out } @names;

The exception would be if you really needed the @index list as a result.

Also beware the memory limit if you go the hash route (and with your 
slurpy get_names() implementation.  On very large lists, you may need 
them both sorted on disk to be able to page through.  Or, iterate the 
filehandle of a big list with a small list in a hash if that does the 
trick.  e.g. `for(<$fh1>) { chomp; print "$_\n" if $names2{$_}; }`

--Eric
-- 
The only thing that could save UNIX at this late date would be a new $30
shareware version that runs on an unexpanded Commodore 64.
--Don Lancaster (1991)
---------------------------------------------------
    http://scratchcomputing.com
---------------------------------------------------


More information about the Pdx-pm-list mailing list