[Omaha.pm] Tweak the Perl regex engine: assign to pos()

Jay Hannah jay at jays.net
Sun May 3 14:52:14 PDT 2009


http://headrattle.blogspot.com/search/label/perl


OK, Perl is way too cool.

I was minding my own business, searching for every occurrence of  
'CCAGC' in E-coli, when I hit a snag. Several hundred of my known  
locations weren't showing up.

Why? Because the Perl regular expression engine, by default, starts  
searching for the next occurrence of something after the end of the  
occurrence it just found. This is what most humans want. But you may  
notice that in the string 'CCAGCCAGC' the thing I'm searching for  
('CCAGC') overlaps itself, so the regex engine doesn't see the second  
one.

"Crap," I thought.

But this is Perl -- maybe there's a way? 30 seconds in the  
documentation (perldoc -f pos) and it said I could assign to pos().  
Really? Sweet! Problem solved!


#!/usr/bin/perl

use strict;

open (IN, "E_coli.seq");
my $seq = <IN>;
chomp $seq;
close IN;

my $find_this = 'CCAGC';
while ($seq =~ /$find_this/g) {
    my $start = pos($seq) - length( $find_this ) + 1;
    my $stop  = pos($seq);
    pos($seq) = $start;
    print "   Found $find_this at [$start..$stop]\n";
}


More information about the Omaha-pm mailing list