[Pdx-pm] common elements in two lists
Eric Wilhelm
enobacon at gmail.com
Sat Nov 5 02:36:02 PDT 2011
# from Tom Keller
# on Friday 04 November 2011 15:34:
>my @out = map { $_ =~ m/\w+[-_]*(\w*[-_]*\d+[a-z]*).*/ } @names2;
This regexp isn't anchored, so it might do surprising things sometimes.
>( grep { $names1[$c] =~ m/$_/ } @out ) > 0) {
This match isn't anchored either. Is that intentional, or does it even
need to be regexp match vs just a literal? i.e. `eq` (or hash lookup
as others have suggested.)
If it is a literal rather than substring match, you should get a nice
boost from the hash lookup. Otherwise you're stuck comparing N regexps
to M strings . It depends on how the "slightly different conventions
for naming the same thing" differ and if you can normalize both lists
as Schwern suggested.
Also note that '0 .. $#names1' is somewhat like the 'for(my $i = 0;
$i++; $i < ...)' idiom in that you rarely really need to loop through
indices and can typically just loop over values.
my @common = grep {my $n = $_; first {$n =~ m/$_/} @out } @names;
The exception would be if you really needed the @index list as a result.
Also beware the memory limit if you go the hash route (and with your
slurpy get_names() implementation. On very large lists, you may need
them both sorted on disk to be able to page through. Or, iterate the
filehandle of a big list with a small list in a hash if that does the
trick. e.g. `for(<$fh1>) { chomp; print "$_\n" if $names2{$_}; }`
--Eric
--
The only thing that could save UNIX at this late date would be a new $30
shareware version that runs on an unexpanded Commodore 64.
--Don Lancaster (1991)
---------------------------------------------------
http://scratchcomputing.com
---------------------------------------------------
More information about the Pdx-pm-list
mailing list