[Pdx-pm] common elements in two lists

Tom Keller kellert at ohsu.edu
Fri Nov 4 15:34:26 PDT 2011


Greetings,
I have two very long lists of names. They have slightly different conventions for naming the same thing, so I devised a regex to compare the two lists. I need to extract the names common to both. (Acknowledgement: "Effective Perl Programming, 1st ed.")
But it is taking an ungodly amount of time, since
names1 contains 46227 names.
names2 contains 5726 names.

Here's the code:
########
my @names1 = get_names($file1);
my @names2 = get_names($file2);
#say join(", ", @names1);

my @out = map { $_ =~  m/\w+[-_]*(\w*[-_]*\d+[a-z]*).*/ } @names2;
my @index = grep {
my $c = $_;
if ( $c > $#names1  or # always false
( grep { $names1[$c] =~ m/$_/ } @out ) > 0) {
1;  ## save
} else {
0;  ## skip
}
} 0 .. $#names1;

my @common = map { $names1[$_] } @index;
########

Is there a faster/better way to do this?

thanks,
Tom
MMI DNA Services Core Facility<http://www.ohsu.edu/xd/research/research-cores/dna-analysis/>
503-494-2442
kellert at ohsu.edu<http://ohsu.edu>
Office: 6588 RJH (CROET/BasicScience)

OHSU Shared Resources<http://www.ohsu.edu/xd/research/research-cores/index.cfm>






-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.pm.org/pipermail/pdx-pm-list/attachments/20111104/ac6aa111/attachment.html>


More information about the Pdx-pm-list mailing list