The Human Genome: Solving a Million Mysteries
The human genome, the hereditary material we pass on to our progeny, can be seen as a 3 billion letter string over a DNA alphabet of four. We currently understand 1.5% of this mass, mostly in the form of genes, DNA substrings that explain how to build proteins, the quintessential constituents of every living cell. The remainder 98.5% of our genome was deemed as "junk".
This picture changed recently when we first obtained the genome sequence of other species. By comparing these genomes to ours we were able to pinpoint the locations of a staggering one million additional human subsequences that must be important to the human cell but do not encode proteins. The functions of these regions remain largely unknown, and their sheer volume overwhelms any comprehensive experimental approach.
Guided by experimental results for few of these subsequence, we can use computational approaches to deal with the tremendous challenge of understanding this data and providing key biological observations.
I will describe a graph theoretic approach to understand these regions, analyze some of the most perplexing regions within the human genome, and track down a phenomenon of turning genomic junk into gold.
The talk will assume no prior knowledge in Molecular Biology.