[BNE-PM] A Welcome

Francis Clark fc at maths.uq.edu.au
Thu Mar 29 19:07:34 CST 2001



ok - hello perl mongers - my name is Francis and I am a computational
biologist...

Most of my work involves playing with large text files, for which I find
PERL superb. My two major projects are:


1. spliced alignments of genomic and transcript data

The genes of things like humans (in fact, most things above bacteria)  
contain streches of sequence (called introns) that do not code for
protein. After the gene is copied from the DNA into an RNA copy, a
machine called the spliceosome splices out (removes) the introns. (see
diagram)


DNA   ==============================================================
(genomic seq)

RNA copy    111111111-------------22222222222------------33333333333
of gene       exon 1    intron 1     exon 2     intron 2   exon 3

mRNA               1111111112222222222233333333333
(transcript seq)

This mRNA (messenger RNA) is now used as the recipe to make a protein.

There exist large data sets of both gene sequence and transcript sequence
(fragments of mRNAs), although these are not matched up with eachother. By
using sequence matching software (BLAST) it is possible to match up
transcript sequences with genomic sequence. By constructing such 'spliced
alignments' it is possible to:

a) find genes
b) determine the intron  / exon structure of genes
c) observe that many genes have more than one pattern of splicing, and
thus are able to produce more than one protein from a single gene.

Apart from the sequence alignments, which are generated with BLAST, all of
the data analysis and management is done using perl. One day I'll learn
about databases, but I reckon that big fat text files and perl are doing
just fine.


2. Hunting mobile elements in genomic sequence

The genomes of things like humans are mostly non gene sequence. What is
all this 'other DNA'? Is it junk? packaging? important, but of unknown
function? We do, however, know how most of it got there. Mobile elements
(including things like retroviruses) are genomic 'parasites'. They get
into genomes, and make copies of themselves everywhere. When these
elements get into the germ line, they become part of the genome. Perhaps
half of the human genome is made up of the corpses (in varying states of
decay) of such mobile elements.  It is thought that most of the mobile
elements in human are dead, although mouse is full of live stuff.

Once again, BLAST makes it possible to identify regions of homologous
sequence within a (partial) genome, and using PERL I am able to sort
through all this data and find patterns of homology indicative of various
classes of mobile element. BLAST, big text files and PERL - superb. 


Anyway, I didn't mean to go on... for anyone who makes it to here...

I'd be interested to hear from anyone does similar things.

Francis


--
Francis Clark,
Department of Mathematics and
Institute for Molecular Bioscience
University of Queensland,
Australia.





On Fri, 30 Mar 2001, Gordon Fletcher wrote:

> 
> Hello to all the new members to the group over the recent weeks.
> 
> I thought it might be appropriate to send a _short_ message as the
> traffic on the list has been extremely quiet of late. Although I suspect
> that this is partly a result of the extra security that pm.org has placed
> on the list I keep getting asked to approve spam from Indian
> micro-electronics companies!
> 
> For those of you who are new perhaps posting a quick bio of yourself and
> your interests would help everyone know who's around. 
> 
> The list is always open to suggestions - actual meetings and other
> activities that might be Perl oriented.
> 
> The forum is open - we're all listening (so to speak).
> 
> Gordon
> 
> 




More information about the Brisbane-pm mailing list