Published online at:  Genome Biol 2001; 2 (7): research0025.1research0025.18 (July 4, 2001):
http://www.pubmedcentral.nih.gov./articlerender.fcgi?artid=55322

"A Draft Annotation and Overview of the Human Genome."

Fred A. Wright 1, William J. Lemon 1, Wei D. Zhao 1, Russell Sears 1, Degen Zhuo 1, Jian-Ping Wang 1, Hee-Yung Yang 2, Troy Baer 3, Don Stredney 3, 4, Joe Spitzner 2, Al Stutz 3, 4, Ralf Krahe 1, and Bo Yuan*1

1 Division of Human Cancer Genetics, The Ohio State University, 420 W. 12th Avenue, Columbus, OH 43210, USA.
2 LabBook.com, Busch Boulevard, Columbus, OH 43229, USA.
3 Ohio Supercomputer Center (OSC), Kinnear Road, Columbus, OH 43212, USA.
4 Department of Computer and Information Science, The Ohio State University, Neil Avenue, Columbus, OH 43210, USA.

*Correspondence: Bo Yuan.
E-mail:    yuan.33@osu.edu



Abstract:

Background
The recent draft assembly of the human genome provides a unified basis for describing genomic structure and function. The draft is sufficiently accurate to provide useful annotation, enabling direct observations of previously inferred biological phenomena.

Results
We report here a functionally annotated human gene index placed directly on the genome. The index is based on the integration of public transcript, protein, and mapping information, supplemented with computational prediction. We describe numerous global features of the genome and examine the relationship of various genetic maps with the assembly. In addition, initial sequence analysis reveals highly ordered chromosomal landscapes associated with paralogous gene clusters and distinct functional compartments. Finally, these annotation data were synthesized to produce observations of gene density and number that accord well with historical estimates. Such a global approach had previously been described only for chromosomes 21 and 22, which together account for 2.2% of the genome.

Conclusions
We estimate that the genome contains 65,000-75,000 transcriptional units, with exon sequences comprising 4%. The creation of a comprehensive gene index requires the synthesis of all available computational and experimental evidence.



Background:

The sequence of the human nuclear genome has been completed in draft form by an international public consortium consisting of 16 sequencing centers and associated computational facilities [1]. A private commercial version of the genome has also been sequenced and assembled using a whole genome shotgun approach [2]. Many lower organisms have been sequenced to date [3], but the 3.2 billion base pair (bp) human genome is approximately 25 times as large as the largest currently-finished genomes - Drosophila melanogaster at 120 megabases (Mb) [4] and Arabidopsis thaliana at 115 Mb [5].

As of late 2000, the public human sequence was primarily based on approximately 24,000 accessioned bacterial artificial chromosome (BAC) clones covering 97% of the euchromatic portion of the genome [6]. The sequence of these clones is approximately 93% complete to at least 4-fold coverage [7]. Thirty percent of the genome is in finished form, including the entire sequence of chromosomes 21 and 22 [7]. These clones represent the most complete sequence information available, with overlapping clones positioned on a framework map using restriction fingerprinting [8]. However, reduction to a single consensus sequence permits placement of genes and other chromosomal structures in their proper positional context. Recently, the consortium has distributed a working draft assembly of the entire genome that removes redundancies, orients sequence fragments and clearly indicates gaps arising from sequencing and assembly. The total assembled length is 3.08 billion bp - about 4% smaller than estimates of genome size based on flow cytometry [9], presumably due to the exclusion of constitutive heterochromatic regions and centromeres. Major gaps (50-200 kilobases (kb)) comprise 16% of the assembly, whereas minor gaps (100 or fewer bp) and low-quality calls comprise 0.5%.

Large-scale sequencing will continue until at least 2003. The current coverage is, however, sufficient for the Human Genome Project to enter a new phase, in which the entire sequence can be analyzed to identify genes, regulatory regions and other genomic elements and structures. Linkage and genetic association studies can be immediately followed by investigation of candidate regions. The assembly provides simplified descriptions of the genome, as disparate data sources such as GenBank and numerous expressed sequence tag (EST) and protein databases are unified. Similarly, formerly independent maps, based on cytogenetic banding patterns, meiotic crossovers and radiation hybrids, may be placed within the single consensus sequence.



References:
...
67. The International Human Genome Sequencing Consortium, "Initial Sequence Analysis of the Human Genome", Nature 2001, vol. 409: pp. 860-921., (February 15, 2001).

68. Venter JC, et al, "The Sequence of the Human Genome", Science 2001, vol. 291, pp. 1304-1351., (February 16, 2001).
...



Additional References:

1. Frenster JH, "Activation of DNA Transcription within Repressed Chromatin", 14th John Innes Symposium, (September 5-8, 2001).



Top of Page - Euchromatin Network - Current Research - Forums - Other Sites - Future Events

For Further Information and Feedback:
E-mail:   frenster@euchromatin.net
Phone:   +1 650 367 6483
Fax:   +1 650 364 1773

euchromatin:  "the most active portion of the genome within the cell nucleus".