<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#ffffff" text="#000000">
From the manual:<br>
<br>
<blockquote>-f, --fuzzy_MIDs<br>
If a MID sequence in a read contains an error, the read is usually
thrown<br>
away. With this option, these reads will be accepted and assigned
to the<br>
nearest pool. If the MID could be assigned to more than one pool,
a new<br>
pool is created, named after all the possible pools for the
ambiguous MID.<br>
<br>
</blockquote>
"MID", standing for "Multiplex Identifier" is Roche-speak for "bar
code". Strangely this code processes Illumina reads. Illumina calls
their bar codes "indexes". Not important, though.<br>
<br>
The code in question (with extra comments):<br>
<tt><br>
</tt>
<blockquote><tt>for my $i ( 1 .. $mid_length ) {</tt><br>
<tt> for my $base (qw{A C G T}) {</tt><br>
<tt> my $fuzzycode = $mid;</tt><br>
<tt> my $prebase_i = $i - 1;</tt><br>
<tt> my $postbase_i = $mid_length - $i;</tt><br>
<tt> $fuzzycode =~ s{</tt><br>
<tt> ^([ACGT]{$prebase_i}) #capture bases, if any,
before current base</tt><br>
<tt> ([ACGT]) #current base</tt><br>
<tt> ([ACGT]{$postbase_i})$} #capture bases, if any,
after current base</tt><br>
<tt> {$1$base$3}xms; #replace current base with
$base</tt><br>
<tt> push @{ $mid_pools{$fuzzycode} }, $pool_name;</tt><br>
<tt> }</tt><br>
<tt>}</tt><br>
</blockquote>
Actually, I don't see any problem with this code. You might get
extra speed using substr() but the number of bar codes (probably no
more than 100 or so) is drastically smaller than the number of
sequence reads that will be processed (millions, probably). So
looking for speed ups in this part of the code are unlikely to yield
much.<br>
<br>
Phillip<br>
<br>
<br>
</body>
</html>