Ideas?

Peter Scott Peter at PSDT.com
Wed Sep 18 15:24:06 CDT 2002


At 11:49 AM 9/18/02 -0700, nkuipers wrote:
>Hello all,
>
>I have a bit of a problem.  To present it, I need to first give a bit of a
>biology primer.
>
>A DNA sequence can be represented as a string of A,G,C,T, which are 1-letter
>representations of different nucleotides.  Think GATTACA :).  Often, a
>sequence is considered in blocks of 3 nucleotides; this block is called a
>codon.  An array of codons occupies a "reading frame", and for a given
>sequence there are 6 reading frames.  For example, for
>ACG|GTC|TTT|CGA|TAA|AAA... the frames are:
>
>1)as written
>2)remove the first nucleotide from 1), giving CGG|TCT|TTC|GAT|AAA|A...

I'm confused.  You have 5 terminal A's in (1) but 4 in (2).
How is it still a reading frame since it has an incomplete codon on the end?

If you remove a nucleotide then surely you no longer have an arr

>3)remove the first nucleotide from 2), giving GGT|CTT|TCG|ATA|AAA...

Ditto.  I quess the ... is what's throwing me off.  When you say "for a given
sequence there are 6 reading frames" and then talk about stripping off 
the first letter it seems to violate your definition.  I could guess at 
what you mean, but I'd prefer an example where you show a complete (if 
artificially small) sequence.

>The other three frames are derived with similar mechanics, but the original
>sequence is first reversed, then "complemented" (essentially, tr/ACGT/TGCA/).
>
>I am interested in finding all instances of 3 specific codons, and have
>created 2 regex objects (forward and reverse complement, for a total of 6
>codons) that do this perfectly.  I am also interested in knowing the 
>locations
>of each matched codon in the string.  Currently I am using the pos function,
>and this is fine for the first frame in either orientation.  But...my current
>implementation of creating the next frame involves removing the current first
>nucleotide from the sequence with s/^\w// which comprimises the "absolute"
>position of a match with pos.  I need ideas please.  Arrays?  Tmp vars?
>Adding/subtracting appropriate integer to the pos return (easy,viable, but
>sort of messy as I imagine it). A better logical foundation is needed? I am
>quite sure I could come up with an answer to this with more thought 
>but wanted
>to hear other opinions which are likely more elegant than mine.  How 
>would you
>best do a frame-specific search while still being able to annotate the match
>location based on the original, untouched sequence?

I'd have a better idea once I see an example.  One possibility is to 
just change the first character to a non-letter.

--
Peter Scott
Pacific Systems Design Technologies
http://www.perldebugged.com/




More information about the Victoria-pm mailing list