Ideas?
Peter Scott
Peter at PSDT.com
Wed Sep 18 15:24:06 CDT 2002
At 11:49 AM 9/18/02 -0700, nkuipers wrote:
>Hello all,
>
>I have a bit of a problem. To present it, I need to first give a bit of a
>biology primer.
>
>A DNA sequence can be represented as a string of A,G,C,T, which are 1-letter
>representations of different nucleotides. Think GATTACA :). Often, a
>sequence is considered in blocks of 3 nucleotides; this block is called a
>codon. An array of codons occupies a "reading frame", and for a given
>sequence there are 6 reading frames. For example, for
>ACG|GTC|TTT|CGA|TAA|AAA... the frames are:
>
>1)as written
>2)remove the first nucleotide from 1), giving CGG|TCT|TTC|GAT|AAA|A...
I'm confused. You have 5 terminal A's in (1) but 4 in (2).
How is it still a reading frame since it has an incomplete codon on the end?
If you remove a nucleotide then surely you no longer have an arr
>3)remove the first nucleotide from 2), giving GGT|CTT|TCG|ATA|AAA...
Ditto. I quess the ... is what's throwing me off. When you say "for a given
sequence there are 6 reading frames" and then talk about stripping off
the first letter it seems to violate your definition. I could guess at
what you mean, but I'd prefer an example where you show a complete (if
artificially small) sequence.
>The other three frames are derived with similar mechanics, but the original
>sequence is first reversed, then "complemented" (essentially, tr/ACGT/TGCA/).
>
>I am interested in finding all instances of 3 specific codons, and have
>created 2 regex objects (forward and reverse complement, for a total of 6
>codons) that do this perfectly. I am also interested in knowing the
>locations
>of each matched codon in the string. Currently I am using the pos function,
>and this is fine for the first frame in either orientation. But...my current
>implementation of creating the next frame involves removing the current first
>nucleotide from the sequence with s/^\w// which comprimises the "absolute"
>position of a match with pos. I need ideas please. Arrays? Tmp vars?
>Adding/subtracting appropriate integer to the pos return (easy,viable, but
>sort of messy as I imagine it). A better logical foundation is needed? I am
>quite sure I could come up with an answer to this with more thought
>but wanted
>to hear other opinions which are likely more elegant than mine. How
>would you
>best do a frame-specific search while still being able to annotate the match
>location based on the original, untouched sequence?
I'd have a better idea once I see an example. One possibility is to
just change the first character to a non-letter.
--
Peter Scott
Pacific Systems Design Technologies
http://www.perldebugged.com/
More information about the Victoria-pm
mailing list