[Pdx-pm] hash from array question

Thomas Keller kellert at ohsu.edu
Fri May 25 22:55:40 PDT 2007


Onward and upward.
The problem I am working on is that I have a tab delimited file of  
many thousands of lines, and I have a second file of names which is a  
subset of the names contained as the first field of the larger file.
I'm using a riff on p.60 of Intermediate Perl by R. Schwartz (any  
problems are my own).
So the code snippet looks like this:
	my @index = grep {
		my $c = $_;
		if ($c > $#lines or 	# always false
			( grep { $lines[$c] =~ m/$_/ } @names ) > 0 ) {
			1; 	#yes, select it
		} else {
			0;		# no, skip it
		}
	} 0..$#lines;
	my @gene_of_interest = @lines[@index];		# ref Int. Perl, p.60

This works, but it is really slow. Is there a faster way?

thanks,
Tom K
Here is a short version of the data:

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: STM_Genome_Annotation2.txt
Url: http://mail.pm.org/pipermail/pdx-pm-list/attachments/20070525/ec0ce7a0/attachment.txt 
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: names01.txt
Url: http://mail.pm.org/pipermail/pdx-pm-list/attachments/20070525/ec0ce7a0/attachment-0001.txt 
-------------- next part --------------



On May 25, 2007, at 4:39 PM, Eric Wilhelm wrote:

> # from Thomas J Keller
> # on Friday 25 May 2007 02:23 pm:
>
>> line 32 print "genomes string: $genomes\n";
>> line 33 print "split gives: ", split(/,\s*/,$genomes),"\n";
>> line 34 my %genomes = split(/,\s*/,$genomes) ;
>> line 35 print map { "Key: $_  has value: $genomes{$_}\n" } sort  
>> keys %
>> $genomes;
>
> Yep.  What Andy said.
>
> While $genomes and %genomes are distinct, it is usually helpful to  
> avoid
> using the same word with different sigils within a single scope.  Had
> there not been a $genomes, the use of %$genomes would have  
> triggered the
> error [Global symbol "$genomes" requires ...] rather than the slightly
> less obvious error [Can't use string ... as a hash reference].
>
> e.g.
>
>   my %genomes = split(/,\s*/, $config{"$project.genomes"});
>
> Anyway, now for the comic relief.  I'll admit to not having any  
> clue what
> I'm doing here, but I find the scary avoidance of exception  
> handling rather
> ironic when coupled with the fact that string comparison is done so  
> elegantly
> via "==" operator overloading.  Also note the "lets all invent our own
> string libraries" fun coming from both ends.  And of course the 2-line
> grep/map statement that would be required to do this in a 10-line  
> method in
> Perl.  Whee!
>
> bool wxMozillaBrowser::ScrollToElementByID(wxString id)
> {
>   //fprintf(stderr, ((wxT("id: ")+ id + wxT("\n")).mb_str()));
>   nsCOMPtr<nsIDOMWindow> domWindow;
>   nsresult rv;
>   rv = m_Mozilla->mWebBrowser->GetContentDOMWindow(getter_AddRefs 
> (domWindow));
>   if (!domWindow)
>     return FALSE;
>   if (NS_FAILED(rv))
>     return FALSE;
>
>   nsCOMPtr<nsIDOMDocument>doc;
>   rv = domWindow->GetDocument(getter_AddRefs(doc));
>   if (NS_FAILED(rv))
>     return FALSE;
>
>   nsString element_id = wxString_to_nsString(id, wxConvISO8859_1);
>   nsCOMPtr<nsIDOMElement> domElement;
>   rv = doc->GetElementById(element_id, getter_AddRefs(domElement));
>
>   if(domElement) {
>     // GRR all our doc is a wiki, yay.  So how do I use this  
> domElement?
>     // We must make it be a member of the class that has the method.
>     nsCOMPtr<nsIDOMNSHTMLElement> hElement(do_QueryInterface 
> (domElement));
>     // TODO maybe should check that this succeeds?
>     hElement->ScrollIntoView(TRUE);
>     return TRUE;
>   }
>
>   // ok, by-id got us nothing, so try to find the first named anchor
>   // fprintf(stderr, "seaching for a name=\n");
>   nsCOMPtr<nsIDOMNodeList> a_tags;
>   doc->GetElementsByTagName(
>     NS_LITERAL_STRING("a"), getter_AddRefs(a_tags)
>   );
>
>   if(!a_tags) return FALSE;
>
>   PRUint32 count;
>   a_tags->GetLength(&count);
>
>   if(!count) return FALSE;
>
>   for(PRUint32 i = 0; i < count; i++) {
>     // fprintf(stderr, "check tag %i\n", i);
>
>     nsCOMPtr<nsIDOMNode> node;
>     rv = a_tags->Item(i, getter_AddRefs(node));
>     if (NS_FAILED(rv) || !node) continue;
>
>     nsCOMPtr<nsIDOMHTMLAnchorElement> anc;
>     anc = do_QueryInterface(node);
>     if(!anc) continue;
>
>     // make thing, pass in to get return value, lather rinse,  
> repeat...
>     nsAutoString name;
>     rv = anc->GetName(name);
>     if (NS_FAILED(rv)) continue;
>
>     fprintf(stderr, ((wxT("now check ") +
>       nsString_to_wxString(name, wxConvISO8859_1) + wxT 
> ("\n")).mb_str()));
>     if(name == element_id) {
>       nsCOMPtr<nsIDOMNSHTMLElement> hElement(do_QueryInterface(node));
>       hElement->ScrollIntoView(TRUE);
>       return TRUE;
>     }
>   }
>
>   return FALSE;
> }
>
>
> --Eric
> -- 
> "It works better if you plug it in!"
> --Sattinger's Law
> ---------------------------------------------------
>     http://scratchcomputing.com
> ---------------------------------------------------
> _______________________________________________
> Pdx-pm-list mailing list
> Pdx-pm-list at pm.org
> http://mail.pm.org/mailman/listinfo/pdx-pm-list
>



More information about the Pdx-pm-list mailing list