<HTML><BODY style="word-wrap: break-word; -khtml-nbsp-mode: space; -khtml-line-break: after-white-space; ">Hi, <DIV>I was curious about the fastest way to get a subset of lines from a file where a line of interest will have a match of the first field with one of the names in a list of names, and got some really helpful solutions. </DIV><DIV>Background: the input file has a fixed structure and is in this case only some 4,000 lines. It could be much larger, hence my interest in speed.</DIV><DIV> I've tested the two most straightforward approaches: a string comparison and a hash method</DIV><DIV># @names was declared in main::</DIV><DIV><BR class="khtml-block-placeholder"></DIV><DIV>my $fh1 = new IO::File;</DIV><DIV>my $fh2 = new IO::File;</DIV><DIV>cmpthese( $count,</DIV><DIV>{</DIV><DIV> 'with_string_cmp' => sub {</DIV><DIV> my $names_join = join '|', @names; </DIV><DIV> my @goi;</DIV><DIV> if ($fh1->open("< $annot_file")) {</DIV><DIV> my @lines = <$fh1>;</DIV><DIV> foreach (@lines) {</DIV><DIV> chomp;</DIV><DIV> push @goi, $_ if $_ =~ m/($names_join)/;</DIV><DIV> }</DIV><DIV> } else { die "Could not get the filehandle $fh1: $!." }</DIV><DIV> $fh1->close;</DIV><DIV> },</DIV><DIV> </DIV><DIV> 'with_hash' => sub {</DIV><DIV> my %have_name = map({$_ => 1} @names);</DIV><DIV> if ($fh2->open("< $annot_file")) {</DIV><DIV> my $header = <$fh2>;</DIV><DIV> while(my $line = <$fh2>) {</DIV><DIV> my ($name,$else) = split(/\t/, $line, 2);</DIV><DIV> $have_name{$name} = [split(/\t/, $else)]</DIV><DIV> or next;</DIV><DIV> }</DIV><DIV> } else { die "Could not get the filehandle $fh2: $!." }</DIV><DIV> $fh2->close;</DIV><DIV> } </DIV><DIV>});</DIV><DIV><BR class="khtml-block-placeholder"></DIV><DIV>I think this is a fair comparison. The data that gets saved is the same. </DIV><DIV>(Though the hash is easier to get at down the road.)</DIV><DIV><BR class="khtml-block-placeholder"></DIV><DIV>... drumrole ..</DIV><DIV> (warning: too few iterations for a reliable count)</DIV><DIV> Rate with_string_cmp with_hash</DIV><DIV>with_string_cmp 1.45/s -- -87%</DIV><DIV>with_hash 11.4/s 687% --</DIV><DIV><BR class="khtml-block-placeholder"></DIV><DIV><BR class="khtml-block-placeholder"></DIV><DIV>Hashes rule!</DIV><DIV><BR class="khtml-block-placeholder"></DIV><DIV>Thanks Eric, Ben, Rafael and Andy for your helpful suggestions.</DIV><DIV><BR class="khtml-block-placeholder"></DIV><BR><BR><DIV> <SPAN class="Apple-style-span" style="border-collapse: separate; border-spacing: 0px 0px; color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; text-align: auto; -khtml-text-decorations-in-effect: none; text-indent: 0px; -apple-text-size-adjust: auto; text-transform: none; orphans: 2; white-space: normal; widows: 2; word-spacing: 0px; "><SPAN class="Apple-style-span" style="border-collapse: separate; border-spacing: 0px 0px; color: rgb(0, 0, 0); font-family: Verdana; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; text-align: auto; -khtml-text-decorations-in-effect: none; text-indent: 0px; -apple-text-size-adjust: auto; text-transform: none; orphans: 2; white-space: normal; widows: 2; word-spacing: 0px; "><SPAN class="Apple-style-span" style="border-collapse: separate; border-spacing: 0px 0px; color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; text-align: auto; -khtml-text-decorations-in-effect: none; text-indent: 0px; -apple-text-size-adjust: auto; text-transform: none; orphans: 2; white-space: normal; widows: 2; word-spacing: 0px; "><DIV style="font-family: Helvetica; "><SPAN class="Apple-style-span" style="font-family: Helvetica; ">Tom</SPAN></DIV><DIV style="font-family: Helvetica; "><SPAN class="Apple-style-span" style="font-family: Helvetica; "><A href="mailto:kellert@ohsu.edu">kellert@ohsu.edu</A></SPAN></DIV><DIV style="font-family: Helvetica; "><SPAN class="Apple-style-span" style="font-family: Helvetica; ">503-494-2442</SPAN></DIV><BR class="Apple-interchange-newline"></SPAN></SPAN></SPAN> </DIV><BR></BODY></HTML>