[Pdx-pm] common elements in two lists

Joshua Keroes joshua at keroes.com
Fri Nov 4 15:41:12 PDT 2011


Well, grep is always going to check every single item in the list, so if
you can avoid that, you'll can save time.

Out of curiosity, have you tried out or benchmarked against something like
List::Compare's get_intersection()?

-Joshua

2011/11/4 Tom Keller <kellert at ohsu.edu>

> Greetings,
> I have two very long lists of names. They have slightly different
> conventions for naming the same thing, so I devised a regex to compare the
> two lists. I need to extract the names common to both. (Acknowledgement:
> "Effective Perl Programming, 1st ed.")
> But it is taking an ungodly amount of time, since
> names1 contains 46227 names.
> names2 contains 5726 names.
>
> Here's the code:
> ########
> my @names1 = get_names($file1);
> my @names2 = get_names($file2);
> #say join(", ", @names1);
>
> my @out = map { $_ =~  m/\w+[-_]*(\w*[-_]*\d+[a-z]*).*/ } @names2;
> my @index = grep {
> my $c = $_;
> if ( $c > $#names1  or # always false
>  ( grep { $names1[$c] =~ m/$_/ } @out ) > 0) {
> 1;  ## save
> } else {
>  0;  ## skip
> }
> } 0 .. $#names1;
>
> my @common = map { $names1[$_] } @index;
> ########
>
> Is there a faster/better way to do this?
>
> thanks,
> Tom
> MMI DNA Services Core Facility<http://www.ohsu.edu/xd/research/research-cores/dna-analysis/>
> 503-494-2442
> kellert at ohsu.edu
> Office: 6588 RJH (CROET/BasicScience)
>
> OHSU Shared Resources<http://www.ohsu.edu/xd/research/research-cores/index.cfm>
>
>
>
>
>
>
>
> _______________________________________________
> Pdx-pm-list mailing list
> Pdx-pm-list at pm.org
> http://mail.pm.org/mailman/listinfo/pdx-pm-list
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.pm.org/pipermail/pdx-pm-list/attachments/20111104/f43df10c/attachment.html>


More information about the Pdx-pm-list mailing list