Introduction to Genomics and Bioinformatics

Computer science has had a major impact on the biological sciences, and this is particularly true in the area of sequence analysis.  As sequencing technology improved and became highly automated in the 1990s, researchers around the world accumulated a wealth of information that rapidly grew beyond the scope of what scientists could analyze independently.  Scientists and government officials with considerable foresight lobbied for centralized institutions that could store this information and make these resources available to researchers worldwide through the internet.  At the same time, programmers were developing potent analysis tools that could mine the information in the databases for comparisons at a very detailed level.  With the advent of easy and near universal internet web access, researchers have come to rely on these institutions to an ever increasing extent.  Genomics (the study of the structure and function of entire genomes), and bioinformatics (the development and use of computer tools to analyze them) now contribute to virtually all areas of biological science.  It is safe to say that understanding how to make use of these resources is an essential skill for anyone in the field of biology and biomedical sciences. 

The resources available are located on thousands of different web sites and are varied in size and nature.  Some are very narrowly focused, such as databases created for a particular organism or sequencing project.  Alternatively, a site might be set up simply to allow the use of a particular type of analysis or a new software tool.  Some sites may represent the efforts of a single laboratory.  Others, such as the National Center for Biotechnology Information (NCBI) have as their mandate the collection of all publicly available sequence information and development of software tools for retrieval and analysis of these sequences.  (NCBI is part of the National Library of medicine (NLM) at the National Institutes of Health (NIH) in Bethesda, Maryland) 

The curators and computer programmers at comprehensive centers such as NCBI compile sequence and related data as it becomes available from sequencing centers around the world and present it in a form that is easily accessible to researchers.  The content of the database maintained by NCBI—known as GenBank—has grown enormously since was originally created in 1982 (click here for details).  NCBI’s efforts are coordinated with similar genome centers in other countries.  NCBI and its counterparts, such as the European Molecular Biology Laboratory (EMBL) in Germany, and the DNA Database of Japan (DDBJ), are the primary centers for the collection and archiving of sequence data.

The software tools that are used for access and analysis differ at the various web sites.  First and foremost, an archival site such as NCBI allows researchers to retrieve the original sequence record and all subsequent updates and modifications for a gene, protein, or genome.  The sites also provide tools for gene alignment that are essential in answering questions regarding the conservation of genes among organisms.  This often allows information for major genetic systems such as Saccharomyces cerevisiae, Drosophila or Arbidopsis to provide clues as to the biological function of a human gene, or vice versa.  At another level, the programmers actively participate in designing new software tools for problems as diverse as identifying the coding regions within newly sequenced genomes, for linking gene discoveries to human diseases, and for examining and comparing the three-dimensional structures of proteins.

This lab will introduce you to the basics of genomic analysis using the NCBI website.  Click here to begin the genomics lab.