Published in: Science, vol. 287. no. 5461, pp. 2271-2274 (March 24, 2000):
"A BAC-Based Physical Map of the Major Autosomes of Drosophila melanogaster".
Roger A. Hoskins, 1 Catherine R. Nelson, 2 Benjamin P. Berman, 2 Todd R. Laverty, 2 Reed A. George, 1 Lisa Ciesiolka, 1 Mohammed Naeemuddin, 1 Andrew D. Arenson, 3 James Durbin, 3 Robert G. David, 3 Paul E. Tabor, 3 Michael R. Bailey, 3 Denise R. DeShazo, 3 Joseph Catanese, 4 Aaron Mammoser,4 Kazutoyo Osoegawa,4 Pieter J. de Jong, 4 Susan E. Celniker, 1 Richard A. Gibbs, 3 Gerald M. Rubin, 2 Steven E. Scherer 3
1 Berkeley Drosophila Genome Project,
Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA.
2 Berkeley Drosophila Genome Project, Department of Molecular
and Cell Biology, University of California, Berkeley, CA 94720, USA.
3 Human Genome Sequencing Center, Baylor College of Medicine,
Houston, TX 77030, USA.
4 Department of Cancer Genetics, Roswell Park Cancer Institute,
Buffalo, NY 14263, USA.
We constructed a bacterial artificial chromosome (BAC)-based physical map of chromosomes 2 and 3 of Drosophila melanogaster, which constitute 81% of the genome. Sequence tagged site (STS) content, restriction fingerprinting, and polytene chromosome in situ hybridization approaches were integrated to produce a map spanning the euchromatin. Three of five remaining gaps are in repeat-rich regions near the centromeres. A tiling path of clones spanning this map and STS maps of chromosomes X and 4 was sequenced to low coverage; the maps and tiling path sequence were used to support and verify the whole-genome sequence assembly, and tiling path BACs were used as templates in sequence finishing.
The fruit fly Drosophila melanogaster is a principal model organism in metazoan genetics and molecular biology. Here, we describe a BAC-based physical map of chromosomes 2 and 3 constructed as part of the effort to determine the D. melanogaster genome sequence (1). There are five chromosomes (X, 2, 3, 4, and Y), and the second and third together account for ~97 Mb of the ~120-Mb euchromatic portion of the genome. Several clone-based physical maps have been described previously. Low-resolution yeast artificial chromosome maps of the genome have been produced by polytene chromosome in situ hybridization (2), and cosmid maps of regionsof the X chromosome have been made by STS content and fingerprint mapping (3). The most complete previous map is the P1-based map by Kimmerly et al. (4) [also see (5)],constructed by polymerase chain reaction-based STS content mapping and polytene chromosome in situ hybridization. On chromosomes 2 and 3, it comprises 348 sets of contiguously overlapping clones (contigs), each with at least two STS markers.
The contiguity of the P1 map was limited by the shallow genome coverage of the library (about sixfold) and the relatively small insert size of the clones (80 kb). BAC vectors can accommodate larger inserts, so we created a BAC map using the P1 map as a starting point. We constructed a BAC library (RPCI-98) from an isogenic y1; cn1 bw1 sp1 strain (6). High-molecular-weight (HMW) DNA was prepared from adults (7), partially digested with Eco RI andEco RI methylase, size fractionated, and cloned into the pBACe3.6vector (8). The library consists of 17,540 recombinantclones with an average insert size of 163 kb and represents ~24-foldcoverage of the euchromatic portion of the genome (9).
We hybridized radioactively labeled oligonucleotide probes made from STS markers selected from the P1 map to colony arrays representing the RPCI-98 library (10); 1226 markers from the P1 map are included in the BAC map, at an average spacing of 80 kb. Because these markers had been previously localized, the data for each of the four chromosome arms (2L, 2R, 3L, and 3R) could be assembled separately, and this reduced the complexity of the assembly process.
To join the initial contigs together, new markers were added to the map in multiple iterations of STS design, hybridization, and data assembly. The new markers included 690 designed from BAC end sequences (1), 5 designed from genomic sequences,and 2 designed from coding sequences of known genes. Potential markers with substantial sequence similarity to more than one location in the genome were rejected. These were identified by scanning databases of known repeats and scanning for instances of the sequence in multiple, nonoverlapping BAC and P1 clones.In the latter stages of the project, restriction fingerprints (see description below) were used in STS design to identify BACs that extended farthest into the map gaps. The map presented here includes 1923 markers at an average spacing of 50 kb.
STS content data were assembled by chromosome arm in the program SEGMAP v3.49 (11) and manually edited. Cytological data associated with markers from the P1 map were used to identify false joins in the BAC map. These were due to markers that hybridized to multiple sites in the genome and were resolved by removing the markers from the map. Markers that had been mapped to the wrong chromosome arm in the P1 map were identified by their failure to incorporate into assemblies and were moved. The quality of the hybridization data resulted in a map with a high degree of internal consistency (12). The accuracy of the map has been confirmed by selecting a complete tiling path of clones and sequencing them to low coverage (1).
The STS content map (5, 13)
has five gaps outside of the centromeric heterochromatin, which is not
represented in large-insert clone libraries. Three gaps are
near the centromeres, and we have been unable to identify unique
probes to close them. In an attempt to close the gaps at 57B4
and 64C5 (Figs. 1 and 2B),
we screened an alternative BAC library (14), but
no spanning clones were identified. The apparent absence of BACs
covering these two gaps may reflect random fluctuations in the
distribution of clones, an absence of appropriate restriction sites,
or sequences that cannot be cloned in the BAC vector. None of
the five gaps was spanned by clones in the whole-genome shotgun sequence
assembly (1).
Fig.
Fig 1. BAC-based physical map of D. melanogaster chromosomes 2
and 3. A representation of the euchromatic portion of the four chromosome
arms is shown, indicating regions covered by overlapping BAC clones (gray
bars). The extent of coverage has been determined by polytene chromosome
in situ hybridization of BACs (Fig. 2). The scale indicates
cytological map position along the chromosomes (2L 21A1-40F7, 2R 41A1-60F5,
3L 61A1-80F9, and 3R 81F1-100F5), and the lengths of the numbered divisions
represent their estimated relative physical lengths (23).
Regions not represented by mapped BACs are indicated, as are the positions
of the telomeres (TEL) and centromeres (CEN).
Fig.
Fig. 2. Polytene chromosome in situ hybridization of BACs (31).
DNAs for use as probes were prepared with an alkaline lysis procedure (9).
The chromosomes are Giemsa-stained (blue), and hybridized BACs are stained
with a diaminobenzidine reaction (brown). (A) BACs at contig ends demonstrating
coverage of the euchromatin. (B) BACs flanking gaps in map coverage. (C)
Overlapping BACs near the 2L telomere demonstrating resolution of the method.
Fingerprint data were assembled by means of the program FPC (fingerprinted contigs) v4.2 (17, 18); assemblies were edited manually to remove false joins, which were readily identified by means of the STS content map. We optimized stringency settings for the FPC assembly algorithm by comparing fingerprint assemblies to known BAC locations, STS order, and Eco RI sites in the finished sequence of the 2.9-Mb Adh region (19). Settings were optimized to yield large contigs, which reduced the number of manually directed merges required to achieve contiguity.We found lower stringency settings that reduced the number of contigs by 60% and resulted in <10% additional false joins relative to high-stringency settings (20).
The STS content map was used to divide the genome into segments, and restriction fingerprints of BACs within the segments were assembled and edited independently of one another. This strategy permitted multiple operators to edit segments in parallel and reduced the complexity of each assembly. First, BACs on chromosome arm 3L were assembled as a single ~24-Mb project; automated assembly in FPC generated 153 contigs, and merges that were suggested bySTS content data and confirmed by fingerprint data resulted ineight contigs. Next, chromosome arms 2L, 2R, and 3R were divided into 14 segments averaging 5 Mb in size. Automated assembly of these segments resulted in 225 contigs; fingerprint and STS content data were used to direct merges between contigs. We then merged the 5-Mb assemblies to yield a fingerprint map with 16 gaps relative to the STS content map. We collected directed fingerprints for 56 additional BACs selected from the STS content map, and these data closed four fingerprint gaps. The remaining 12 gaps may be due to sparse BAC coverage, the distribution of Eco RI sites, or low STS marker density in these regions. The fingerprint assemblies (21) corroborate the STS content assemblies, providing confidence in the integrated map.
The polytene chromosomes constitute the unambiguous physical map of D. melanogaster (22, 23). To align the BAC map with the cytological map, we mapped BACs by in situ hybridization to polytene chromosomes. First, random BACs were hybridized to provide anchor points throughout the genome; 173 mapped to specific locations on chromosomes 2 and 3. Next, an additional 547 BACs from the tiling path selected for sequencing were hybridized to provide finer alignment of the BAC map, the cytological map, and the genome sequence. These hybridized BACs represent ~1.2-fold coverage of the euchromatic portion of the two chromosomes (5).
The in situ data indicate that BAC coverage extends nearly to the
telomeres (Fig. 2A). It is more difficult to determine
how far the map extends toward the centromeres (Fig. 2A);
in pericentric regions, the morphology of hybridized chromosomesis poorly
preserved and difficult to interpret. These regions include a substantial
amount of repetitive sequence, so BACs representing them often hybridize
to multiple locations in the genome. However, each of the three small contigs
near the centromeres (Fig. 1) contains at
least one BAC that hybridizes to a single cytological location.
The in situ data also permit estimation of the sizes of the
euchromatic regions not represented in mapped BACs (Fig.
2Band Table 1). The resolution of in situ hybridization
varies across the genome because of differences in the DNA content of each
polytene chromosome band (Fig. 2C), and the relativeDNA
content of each band has been measured by Sorsa (23).We
estimate that the map covers >97.9% of the euchromatic portion of the
two chromosomes (Table 1).
|
||||||||||||||||||||||||||||
The construction of this BAC map and the recently reported BAC maps of Arabidopsis thaliana, which has a genome size similar to that of D. melanogaster, illustrate how hybridization-based STS content mapping and agarose gel-based restriction fingerprint mapping can be productively integrated to produce contiguous clone-based physical maps of large genomic regions. The STS content map of the ~130-Mb A. thaliana genome had 130 contigs (24),and the restriction fingerprint map had 169 contigs (25); integration of these data resulted in a BAC map with 14 gaps,excluding the centromeres (24). The D. melanogaster BAC map presented here has five gaps, excluding the centrome rich heterochromatin. We found it efficient to use the STS content map to direct fingerprint assembly and did not attempt to construct an independent fingerprint map. In our experience, STS content mapping with oligonucleotide probes is more effective for achieving contiguous clone coverage, and agarose gel-based restriction fingerprint mapping is more useful for measuring the extent of clone overlaps and confirming that sequence assemblies reflect the structureof the genome. The differing utilities of the two techniques arise because STS content data have higher specificity for detecting clone overlaps, and restriction fingerprint data have higher resolution for measuring them. Our success in combining STS content and restriction fingerprint data to produce an integrated, accurate, and essentially complete map argues for a similar approach for the human and mouse genomes.
The physical map described here played three key roles in the generation of the D. melanogaster genome sequence described by Adams et al. (1). First, the map provided an independent benchmark for evaluating the accuracy of whole-genome shotgun sequence assemblies (26). Second, a tiling path of overlapping BAC and P1 clones spanning the map of chromosomes 2 and 3 was shotgun sequenced to at least onefold coverage, and these data were assembled with the whole-genome shotgun data to increase total sequence coverage from 12- to 13.5-fold. These data also directly confirm the accuracy of clone overlaps in the BAC map. Third, the BACs composing the tiling path were used as templates for gap closure in sequence finishing. In addition to these roles in sequence assembly and validation, the mapped BACs facilitate the subcloning of any region of the genome.
BAC-based STS content maps of the X chromosome (27) and chromosome 4 (28) have been constructed by others. These maps will be integrated with the restriction fingerprint data to complete a BAC-based physical map of the whole genome. The contiguity and depth of coverage of these maps have ensured that the complete sequence of the euchromatic portion of the D. melanogaster genome could be correctly assembled and finished to high accuracy.
1. M. Adams et al., Science 287, 2185 (2000).
2. D. Garza, J. W. Ajioka, D. T. Burke, D. L. Hartl, Science 246, 641 (1989); J. W. Ajioka , et al., Chromosoma 100, 495 (1991); H. Cai, P. Kiefel, J. Yee, I. Duncan, Genetics 136, 1385 (1994).
3. I. Sidén-Kiamos, et al., Nucleic Acids Res. 18, 6261 (1990); E. Madueño, et al., Genetics 139, 1631 (1995).
4. W. Kimmerly, et al., Genome Res. 6, 414 (1996).
5. See information at www.fruitfly.org.
6. B. J. Brizuela, et al., Genetics 137, 803 (1994).
7. HMW DNA was prepared from a homogenate enriched for nuclei. Adult flies (2.0 g) were starved for 2 hours to reduce the gut contents, frozen in liquid nitrogen, and pulverized with a mortar and pestle. The material was suspended in 30 ml of ice-cold homogenization buffer [100 mM NaCl, 10 mM tris-Cl (pH 8.0), 10 mM EDTA, and 200 mM sucrose], disrupted in a 40-ml Dounce homogenizer (Kontes, Vineland, NJ) with five strokes each of pestles A and B, and filtered through nylon mesh (Nitex 3-46/37). The filtrate was centrifuged in a Sorvall HB-4 rotor at 4°C and 1000 rpm for 10 min, and the supernatant was filtered through a finer mesh (Nitex 3-20/14). The second filtrate was centrifuged at 4°C and 3000 rpm for 20 min. The pellet was resuspended in 30 ml of homogenization buffer and centrifuged again. The second pellet was resuspended in 2 ml of homogenization buffer, warmed to 37°C, and mixed well with an equal volume of 1% Incert agarose (FMC BioProducts, Rockland, ME) in homogenization buffer without sucrose. The mixture was aliquoted into 80-µl blocks, which were cooled on ice until solid. HMW DNA was prepared in the blocks with the dodecyl lithium sulfate (LIDS) procedure [H. Riethman, B. Birren, A. Gnirke, in Genome Analysis, A Laboratory Manual, B. Birren et al., Eds. (Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY, 1997), vol. 1, pp. 106-108].
8. K. Osoegawa, et al., Genomics 52, 1 (1998); E. Frengen, et al., Genomics 58, 250 (1999).
9. This information is available at www.chori.org/bacpac.
10. M. T. Ross, S. LaBrie, J. McPherson, V. P. Stanton Jr., in Current Protocols in Human Genetics, N. C. Dracopoli et al., Eds. (Wiley, New York, 1999), unit 5.6. The RPCI-98 BAC library was gridded on positively charged nylon filters. Each clone was spotted in duplicate, and the entire library was represented on each 22 cm by 22 cm filter. An anchor clone (Caenorhabditis briggsae clone RPCI-94 1A1) was included at multiple locations in the grid to facilitate alignment. Overlapping oligonucleotide probes (double-stranded 40-nucleotide oligomers) were designed with a Perl script provided by J. McPherson. Probe design was restricted to sequences with an average PHRED quality score of >q10 (29) unless sequence trace files were not available. Each 32P-labeled probe was hybridized along with the anchor clone probe (GTTGCCAAATTCCGAGATCTTGGCGACGAAGCCACATGAT) to a separate filter. Filter images were collected on a PhosphorImager (Storm 860, Molecular Dynamics, Sunnyvale, CA) and analyzed in the ArrayVision module of the software package AIS v5.0 (Imaging Research, St. Catharines, Ontario, Canada) with the anchor signals for alignment. Filters were stripped and reused several times.
11. E. D. Green and P. Green, PCR Methods Appl. 1, 77 (1991). Information on SEGMAP is available at www.genome.washington.edu/uwgc/analysistools/segmap.htm. Perl scripts were written to organize the STS content data (AIS output) by chromosome arm and export it to the individual SEGMAP projects. The scripts allowed an editor to move markers between chromosome arms or remove them entirely. Each data file contained only BACs hybridizing to the current probe because BACs corresponding to the previous hybridization experiment for the same filter were subtracted from it.
12. A false negative rate of 5% was calculated as the fraction of probes designed from BAC end sequences that failed to hybridize to their source BAC. A false positive rate of 8% was estimated as follows: For each BAC with multiple locations in the map, we designated all STS hits except those at the most likely location to be false positives; the most likely map location was deemed the one with the most consecutive STS hits. We then divided the total number of false positive hits by the total number of hits in the complete data set to arrive at the false positive rate. We also calculated that 81% of BACs contained neither a single false negative nor a false positive.
13. Web fig. 1 ( 30) shows a sample region of the STS content map in the SEGMAP display format. The edited STS content maps were reformatted and displayed on the World Wide Web (5) by means of custom Java tools. This public version excludes BACs with an inferred false positive hit or more than one inferred false negative hit, unless such BACs are part of the sequenced tiling path.
14. The D. melanogaster BAC (Dros BAC) library was made by A. Billaud for the European Drosophila Genome Project from DNA prepared from embryos of the isogenic y1; cn1 bw1 sp1 strain (6), partially digested with either Nde II or Hin DIII, and cloned in pBeloBAC11. Filters representing the 23,400 BACs in the library were hybridized and analyzed as described. Additional information on the Dros BAC library is available at www.hgmp.mrc.ac.uk/Biology/descriptions/dros_bac.html.
15. M. A. Marra, et al., Genome Res. 7, 1072 (1997). Restriction fingerprints were generated as described with the following modifications: For consistent growth, duplicate 1.2-ml cultures in 96-well format were inoculated with 50 µl of saturated starter culture and grown overnight. The pooled bacterial pellets were resuspended in 200 µl of GET buffer [25 mM tris-Cl (pH 8.0), 10 mM EDTA, and 150 mM glucose] supplemented with ribonuclease A (0.2 mg/ml) and lysozyme (2 mg/ml) before addition of 400 µl of 0.2 M NaOH/1% SDS and 300 µl of 3 M KOAc (pH 5.5). Lysates were vacuum-filtered (Qiagen Qiafilter or Polyfiltronics 0.45-µm polyvinylidene difluoride filter plates), and DNA was precipitated by the addition of 700 µl of isopropanol. Samples were resuspended in 20 µl of 10 mM tris-Cl (pH 8.0)/0.1 mM EDTA and digested (5 µl) in 10-µl Eco RI reactions. After digestion, 2 µl of loading dye supplemented with 0.1% SDS was added, and samples were heated to 65°C for 30 min and cooled on ice before loading on 1% agarose gels (Owl A2-BP gel boxes; custom 43-well combs). Molecular weight standards (1 Kb Extension Ladder, Life Technologies, Rockville, MD) were loaded in every sixth well, and electrophoresis was carried out at 70 V for 18 hours in TAE buffer (40 mM tris-acetate and 1 mM EDTA) recirculated at 12°C.
16. J. Sulston, et al., Comput. Appl. Biosci. 4, 125 (1988). For a description of IMAGE, see www.sanger.ac.uk/Software/Image. Gel images were captured with a Molecular Dynamics FluorImager 595, and Perl scripts were written to organize image files for FPC assembly.
17. Soderlund, I. Longden, R. Mott, Comput. Appl. Biosci. 13, 523 (1997).
18. Information on FPC is available at www.sanger.ac.uk/Software/fpc.
19. M. Ashburner, et al., Genetics 153, 179 (1999).
20. FPC assembly of chromosome arm 3L was conducted at high stringency: a fixed tolerance of 6 and a cutoff setting of e-10 with equation 1 in (17,). The finished sequence of the 2.9-Mb Adh region (19) was used to assess the accuracy of fingerprint assemblies at various FPC tolerance and cutoff settings, and a lower stringency was chosen for assembly of the 5-Mb projects: a fixed tolerance of 9 and a cutoff setting of e-6.
21. See information at www.hgsc.bcm.tmc.edu/drosophila/ mapping. As described by Marra et al. (25), the line drawings representing the fingerprint contigs may not accurately reflect the extent of BAC overlaps. Therefore, users should examine the IMAGE-processed gel lanes to verify BAC overlap; all files necessary for reassembly in FPC and fingerprint analysis are available. Web fig. 1 ( 30) shows STS content and restriction fingerprint assemblies in a sample region of the BAC map displayed in SEGMAP and FPC formats, respectively.
22. C. B. Bridges, J. Hered. 26, 6 (1935) .
23. V. Sorsa, Chromosome Maps of Drosophila, Vol. II (CRC Press, Boca Raton, FL, 1988).
24. T. Mozo, et al., Nature Genet. 22, 271 (1999).
25. M. Marra, et al., Nature Genet. 22, 265 (1999).
26. E. W. Myers, et al., Science 287, 2196 (2000). Comparison of the order of STS markers in this BAC map and the whole-genome shotgun sequence assembly identified 10 discrepancies in STS location (>99.5% concordance), not including differences in local STS order that are due to the limited resolution of STS content mapping. Three of these STS markers appear to hybridize to duplicated sequences, and the other seven cases may result from duplicated sequences or data-tracking errors. We have removed all 10 from the public version of the STS content map to avoid potential confusion. BAC order and overlap were not substantially affected by these edits.
27. A. Peters et al., unpublished material; available at www2.open.ac.uk/biology/molecular-genetics/EDGPMap.html.
28. J. Locke, L. Podemski, N. Aippersbach, H. Kemp, R. Hodgetts, Genetics, in press.
29. B. Ewing and P. Green, Genome Res. 8, 186 (1998).
30. Web fig. 1 is available at www.sciencemag.org/feature/data/1048711.shl.
31. Instructions for preparing polytene chromosome in situ hybridizations are available at www.fruitfly.org/methods/cytogenetics.html.
32. We thank A. Gnirke for advice on preparation of HMW DNA; J. McPherson for "overgo" oligonucleotide probe labeling and hybridization protocols; R. Zhang, T. Wells, and C. Hamerski for technical support; A. Loraine for her work on the ArmView and CytoView Web displays; S. Mullaney for photography and assistance with figures; and B. Kimmel for discussions at the planning stage. We acknowledge Imaging Research for software improvements resulting in the release of AIS v5.0. Finally, we thank all members of the Berkeley Drosophila Genome Project and Human Genome Sequencing Center for their support. This work was supported by NIH grant HG00750 (to G.M.R.) and Howard Hughes Medical Institute (G.M.R. and T.R.L.).
18 January 2000; accepted 1 March 2000
1. "The DNA Sequence of Human Chromosome 22".
3. "The DNA Sequence of Human Chromosome 21".
4. "Mechanisms of Repression and De-repression within Interphase Chromatin".
5. "Ultrastructure and Function of Heterochromatin and Euchromatin".
6. "Oncogenes as Molecular Targets within Active Chromatin".