performance question
Tom Keller
kellert at ohsu.edu
Tue Feb 19 16:14:46 CST 2002
Thanks all for your help. I ended up combining several ideas into two
subroutines. Besides our own pdx-perl ideas I found Beginning Perl
for Bioinformatics, James Tisdall, very useful.
The main program takes a fasta format sequence file and a glimmer
"gene predictor" output file.
I can then process the putative genes directly, identified by the
"glimmer id#" as the key to my genes_hash, or create a secondary
input file to plug into the pipeline (not shown).
[The latter refers to the GCG "Wisconsin package" of bioinformatics
programs. It's a suite of utilities and analysis tools that many
universities buy for analyzing biological data. It includes such
things as fragment assembly and the blast genbank database query
tool, and many other programs. ]
sub create_segment_list
{
my ($glim_file, $seq_file, $prefix) = @_;
print "in sub, glim_file and prefix($prefix) are: $glim_file
of $seq_file\n"; ## sanity check
my ($annotation, $putative_genes, @putative_genes);
open GLIMMER_O, "$glim_file" or die "Can't open $glim_file\n";
undef $/ ;
my $record = <GLIMMER_O>;
$/ = my $save_input_separator; #resets $/
($annotation, $putative_genes) = ($record =~ /^(.*Putative
Genes:\s*\n)(.*)/s);
close GLIMMER_O;
@putative_genes = split "\n", $putative_genes;
return $annotations, \@putative_genes;
}
sub create_genes_hash
{
#input: list of putative genes (last section of glimmer output)
my $array_ref = shift;
my @input = @{$array_ref};
my ($id, $start, $stop, $comment, %pairs);
foreach my $line (@input)
{
if ( $line =~ m/^\s+(\d+)\s+(\d+)\s+(\d+)\s+\[(.*)\]/ )
{
$id = $1; $start = $2; $stop = $3; $comment= $4;
$pairs{$id} = [ $start, $stop, $comment ];
## hash, key=id, value=array_ref to list of
start, stop, and comment
}
}
return \%pairs;
}
Thanks for your help.
Tom
--
Thomas J. Keller, Ph.D.
MMI Research Core Facility
Oregon Health & Science University
3181 SW Sam Jackson Park Rd
Portland, Oregon 97201
TIMTOWTDI
More information about the Pdx-pm-list
mailing list