thoughts on this script?

nkuipers nkuipers at uvic.ca
Mon Jul 22 19:29:59 CDT 2002


I'd like to hear anything anyone can say about this...what people would do 
differently, what they would think to talk about at our meetings after seeing 
this code, that sort of thing...anything really. I've tried to keep the code 
properly grouped and documented. Thanks in advance.

-nathanael

#Program name: parse_noncoding
#Author: Nathanael Kuipers (nkuipers at uvic.ca)
#Date written: July 22, 2002
#Last updated: July 22, 2002
#Purpose: parse a blastx xml file and find noncoding queries, then compare 
them
# to a list of all queries and extract the corresponding sequences
#Use: >perl path/parse_noncoding xmlfile masterfile

#!/usr/bin/perl

use strict;
use warnings;

my $infile1 = shift;
my $infile2 = shift;
my $iteration_message1= "No hits found";
my $iteration_message2= "BLASTSetUpSearch: Unable to calculate 
Karlin-Altschul";

my $query_def_prefix = "<BlastOutput_query-def>";
my $query_def_suffix = "</BlastOutput_query-def>";
my $query_def = '';
my @candidates = ();

my $flag = 0;
my $sequence = '';
my %final_list = ();

open IN, $infile1 or die;

#
#parse the xml file and build an array of noncoding-DNA query-IDs
#

while (<IN>) {
  if (/${query_def_prefix}(.*)${query_def_suffix}/) { $query_def = $1 }
  elsif (/<Iteration_message>${iteration_message1}/) {
          push @candidates, ">$query_def" }
  elsif (/<Iteration_message>${iteration_message2}/) {
          push @candidates, ">$query_def" }
}

#
#compare each noncoding query to a master-list, write matching IDs and
#corresponding DNA sequences to candidates file. as in the @candidates array,
#query IDs in the master-file start with >. so the flag is set when 
#"begin sequence" and unset when "begin new query ID". 
#

ROOT:
for my $i (@candidates) {
  open IN, $infile2 or die;
  while (<IN>) {
    if (/^${i}$/) { $flag = 1 }
    elsif ($flag && /^[acgtACGT]/) { $sequence .= $_ }
    elsif ($flag && /^>/) {
      chomp $sequence;
      $final_list{$i} = $sequence;
      $flag = 0;
      $sequence = '';
      next ROOT }
  }
}

close IN;
open OUT, ">>candidates" or die;

#
#print sorted hash (kind of)
#

my @sorted = sort keys %final_list;
for (@sorted) {
  print OUT "$_\n$final_list{$_}\n";
}

"I think for my lunch tomorrow I'll make a tuna and pickle triangle bunwich."




More information about the Victoria-pm mailing list