lexical problems (i think)
nkuipers
nkuipers at uvic.ca
Mon Aug 26 16:30:17 CDT 2002
Hello all,
I need help with variable persistence. As preamble, carefully consider an
input file whose contents look like this:
start of file
>header1
sequence...
sequence...
>header2
.
.
.
>headerN
sequenceN...
end of file
My goal is a parsing one: parse the file contents into a hash keyed by header.
Here is the subroutine.
sub parse_fasta_file {
my ($fh, $href) = @_;
my $header = undef;
my $sequence = undef;
#hash: header => 'sequence'
while (<$fh>) {
if ( /^>(.*)\n$/ && !defined $header ) { ##first '>' in file
$header = $1 }
elsif ( /^>(.*)\n$/ && defined $header ) {
$sequence =~ s/\s//g;
$href->{$header} = $sequence;
$sequence = undef;
$header = $1; } #want persistence here
elsif ( /^[acgtACGT\n]+$/ ) { $sequence .= $_ }
}
#last sequence (no '>' signal followed for dumping into hash)
$sequence =~ s/\s//g; #This gets done, last sequence is perfect.
$href->{$header} = $sequence; #Header undefined for hashing!
}
I get the following error messages:
"Use of uninitialized value in hash element at REPfind line 121, <IN> line
252702." (get this one twice in a row)
"Use of uninitialized value in concatenation (.) at REPfend line 121, <IN>
line 252702."
Line 252702 is the very last line in the file, consisting only of letters
acgt.
Printing $sequence to STDOUT gives what I expect. It's $header that is
undefined. I don't understand why the value of $header is apparently not
retained after the while loop, while the value of $sequence is. I've looked
at what the last header is and it is absolute equivalent to all the other
headers as far as format goes, so my regex is not breaking.
Thanks, hope everything is well for everyone,
Nathanael Kuipers
More information about the Victoria-pm
mailing list