Eukaryotic Genomes

Sequencing eukaryotic genomes presents two huge challenges. The first is sheer size. Compared with the genomes of bacteria and archaea, which range from 580 kb in Mycoplasma genitalium to over 6.3 Mb in Pseudomonas aeruginosa, eukarytic genomes are large. The haploid genome of Saccharomyces cerevisiae contains 13 million base pairs. The nematode, Caenorhabditis elegans has a genome of 97 Mb; the Drosophila genome contains 180 Mb, and humans, rats, mice, and cows contain roughly 3 billion base pairs each.

The second great challenge in sequencing eukaryotic genes is coping with noncoding sequences that are repeated many times. Many eukarytic genomes are dominated by repeated DNA sequences that occur between genes and that do not code for products used by the organism. These repeated sequences pose serious problems in aligning and interpreting sequence data. What are they? If such sequences don't code for a protein, why do they exist?

In many eukaryotic genomes, the exons, introns, and regulatory sequences associated with genes make up a relatively small percentage of the genome. In humans, exons constitute less than 2 percent of the total genome while repeated sequences account for well over 50 percent of the total genome. In contrast, over 90 percent of the DNA sequences in a bacterial or archaeal genome code for a product used by the cell.

When noncoding and repeated sequences were discovered, they were initially considered "junk DNA" that was nonfunctional and probably unimportant and uninteresting. But subsequent work has shown that most of the repeated sequences observed in eukaryotes are actually derived from sequences known as transposable elements. Transposable elements are parasitic segments of DNA that are capable of moving from one location to another in a genome. They are similar to viruses, except that they are not transmitted from one host cell or host individual to another by infection. Instead, transposable element transmit copies of themselves to additional locations within the host genome and are passed on to offspring along with the rest of the genome. Viruses leave a host cell and find a new cell to infect. But transposable elements never leave their host cell – they simply make copies of themselves and move to new locations in the genome. They are examples of what biologists call selfish genes – DNA sequences that survive and reproduce but that do not increase the fitness of the host genome.