lexical problems (i think)

nkuipers nkuipers at uvic.ca
Mon Aug 26 16:30:17 CDT 2002


Hello all,

I need help with variable persistence.  As preamble, carefully consider an 
input file whose contents look like this:

start of file
>header1
sequence...
sequence...
>header2
.
.
.
>headerN
sequenceN...
end of file

My goal is a parsing one: parse the file contents into a hash keyed by header.
 Here is the subroutine.


sub parse_fasta_file {
	my ($fh, $href) = @_;
	my $header      = undef;
	my $sequence    = undef;
	#hash: header   => 'sequence'
	while (<$fh>) {
		if    ( /^>(.*)\n$/ && !defined $header ) { ##first '>' in file
                        $header = $1 }
		elsif ( /^>(.*)\n$/ &&  defined $header ) {
			$sequence =~ s/\s//g;
			$href->{$header} = $sequence;
			$sequence = undef;
			$header   = $1; } #want persistence here
		elsif ( /^[acgtACGT\n]+$/ )               { $sequence .= $_ }
	}
        #last sequence (no '>' signal followed for dumping into hash)
	$sequence =~ s/\s//g;         #This gets done, last sequence is perfect.
	$href->{$header} = $sequence; #Header undefined for hashing!
}

I get the following error messages:

"Use of uninitialized value in hash element at REPfind line 121, <IN> line 
252702." (get this one twice in a row)
"Use of uninitialized value in concatenation (.) at REPfend line 121, <IN> 
line 252702."

Line 252702 is the very last line in the file, consisting only of letters 
acgt.

Printing $sequence to STDOUT gives what I expect.  It's $header that is 
undefined.  I don't understand why the value of $header is apparently not 
retained after the while loop, while the value of $sequence is.  I've looked 
at what the last header is and it is absolute equivalent to all the other 
headers as far as format goes, so my regex is not breaking.

Thanks, hope everything is well for everyone,

Nathanael Kuipers




More information about the Victoria-pm mailing list