[Omaha.pm] pos() - WHERE did my regex match?

Jay Hannah jay at jays.net
Tue Sep 11 13:04:10 PDT 2007


pos() is neat. Rarely do I care WHERE a regex hit a string, but in the 
example below I do care, very deeply, WHERE the hits were. Enter pos().

The part of my code that uses pos():

while ($seqstr =~ /$primer_seq/g) {
  printf("   Found '%s'.  Next attempt at character %s\n", $&, 
pos($seqstr)+1);

Yoinked from this website:
  http://www.regular-expressions.info/perl.html
  Finding All Matches In a String

That website is actually more helpful than (perldoc -f pos)

I end up Googling for this about once a year.  :)

Cheers,

j




primer_finder.pl

#!/usr/bin/perl

use Bio::SeqIO;

# A hash of all our known primers...
my %primers;
$primers{"18S_F"} = uc("attggagggcaagtctggtg");
$primers{"18S_R"} = uc("ctatgccgactagggatcgg");
$primers{"M1"} = "GGAAGTAAAAGTCGTAACAAGGTT";
$primers{"I1"} = "CCGTAGGTGAACCTGCG";
$primers{"I4"} = "GCATATCAATAAGCGGAGGA";
$primers{"H2R8"} = "CCTCGGATCAGGTAGGGATAC";
$primers{"I2"} = "GCATCGATGAAGAACGCAGC";
$primers{"I3"} = "CGAGTCTTTGAACGCACATTG";

my $io = Bio::SeqIO->new(
   #-file => '/home/dbastola/genbakDownload/161_88107/gbbct24.seq',
   -file => 'fake_data.gbk',
   -format => 'genbank'
);

while (my $seq = $io->next_seq) {
   # $seq is now a Bio::Seq object
   my $acc = $seq->accession;
   my $seqstr = uc($seq->seq);
   print "Searching $acc...\n";
   foreach my $primer_name (keys %primers) {
      my $primer_seq = $primers{$primer_name};
      print "   looking for $primer_name ($primer_seq)...\n";
      while ($seqstr =~ /$primer_seq/g) {
         printf("   Found '%s'.  Next attempt at character %s\n", $&, 
pos($seqstr)+1);
         my $start = pos($seqstr) - length( $primer_seq ) + 1;
         my $stop = pos($seqstr);
         print "   Hey, I found $primer_name at [$start..$stop]\n";
      }
   }

}



More information about the Omaha-pm mailing list