<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

  <head>

    <meta content="text/html; charset=ISO-8859-1"

      http-equiv="Content-Type">

  </head>

  <body bgcolor="#ffffff" text="#000000">

    From the manual:<br>

    <br>

    <blockquote>-f, --fuzzy_MIDs<br>

      If a MID sequence in a read contains an error, the read is usually

      thrown<br>

      away. With this option, these reads will be accepted and assigned

      to the<br>

      nearest pool. If the MID could be assigned to more than one pool,

      a new<br>

      pool is created, named after all the possible pools for the

      ambiguous MID.<br>

      <br>

    </blockquote>

    "MID", standing for "Multiplex Identifier" is Roche-speak for "bar

    code". Strangely this code processes Illumina reads. Illumina calls

    their bar codes "indexes". Not important, though.<br>

    <br>

    The code in question (with extra comments):<br>

    <tt><br>

    </tt>

    <blockquote><tt>for my $i ( 1 .. $mid_length ) {</tt><br>

      <tt>&nbsp;&nbsp;&nbsp; for my $base (qw{A C G T}) {</tt><br>

      <tt>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; my $fuzzycode&nbsp; = $mid;</tt><br>

      <tt>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; my $prebase_i&nbsp; = $i - 1;</tt><br>

      <tt>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; my $postbase_i = $mid_length - $i;</tt><br>

      <tt>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; $fuzzycode =~ s{</tt><br>

      <tt>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ^([ACGT]{$prebase_i})&nbsp;&nbsp;&nbsp;&nbsp; #capture bases, if any,

        before current base</tt><br>

      <tt>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ([ACGT])&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nbsp; #current base</tt><br>

      <tt>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ([ACGT]{$postbase_i})$}&nbsp; #capture bases, if any,

        after current base</tt><br>

      <tt>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; {$1$base$3}xms;&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; #replace current base with

        $base</tt><br>

      <tt>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; push @{ $mid_pools{$fuzzycode} }, $pool_name;</tt><br>

      <tt>&nbsp;&nbsp;&nbsp; }</tt><br>

      <tt>}</tt><br>

    </blockquote>

    Actually, I don't see any problem with this code. You might get

    extra speed using&nbsp; substr() but the number of bar codes (probably no

    more than 100 or so) is drastically smaller than the number of

    sequence reads that will be processed (millions, probably). So

    looking for speed ups in this part of the code are unlikely to yield

    much.<br>

    <br>

    Phillip<br>

    <br>

    <br>

  </body>

</html>