thoughts on this script?

nkuipers nkuipers at
Mon Jul 22 19:29:59 CDT 2002

I'd like to hear anything anyone can say about this...what people would do 
differently, what they would think to talk about at our meetings after seeing 
this code, that sort of thing...anything really. I've tried to keep the code 
properly grouped and documented. Thanks in advance.


#Program name: parse_noncoding
#Author: Nathanael Kuipers (nkuipers at
#Date written: July 22, 2002
#Last updated: July 22, 2002
#Purpose: parse a blastx xml file and find noncoding queries, then compare 
# to a list of all queries and extract the corresponding sequences
#Use: >perl path/parse_noncoding xmlfile masterfile


use strict;
use warnings;

my $infile1 = shift;
my $infile2 = shift;
my $iteration_message1= "No hits found";
my $iteration_message2= "BLASTSetUpSearch: Unable to calculate 

my $query_def_prefix = "<BlastOutput_query-def>";
my $query_def_suffix = "</BlastOutput_query-def>";
my $query_def = '';
my @candidates = ();

my $flag = 0;
my $sequence = '';
my %final_list = ();

open IN, $infile1 or die;

#parse the xml file and build an array of noncoding-DNA query-IDs

while (<IN>) {
  if (/${query_def_prefix}(.*)${query_def_suffix}/) { $query_def = $1 }
  elsif (/<Iteration_message>${iteration_message1}/) {
          push @candidates, ">$query_def" }
  elsif (/<Iteration_message>${iteration_message2}/) {
          push @candidates, ">$query_def" }

#compare each noncoding query to a master-list, write matching IDs and
#corresponding DNA sequences to candidates file. as in the @candidates array,
#query IDs in the master-file start with >. so the flag is set when 
#"begin sequence" and unset when "begin new query ID". 

for my $i (@candidates) {
  open IN, $infile2 or die;
  while (<IN>) {
    if (/^${i}$/) { $flag = 1 }
    elsif ($flag && /^[acgtACGT]/) { $sequence .= $_ }
    elsif ($flag && /^>/) {
      chomp $sequence;
      $final_list{$i} = $sequence;
      $flag = 0;
      $sequence = '';
      next ROOT }

close IN;
open OUT, ">>candidates" or die;

#print sorted hash (kind of)

my @sorted = sort keys %final_list;
for (@sorted) {
  print OUT "$_\n$final_list{$_}\n";

"I think for my lunch tomorrow I'll make a tuna and pickle triangle bunwich."

More information about the Victoria-pm mailing list