Introduction - The TIGR Trypanosoma brucei Genome Project
Welcome to the TIGR Trypanosoma brucei Genome Project.
T. brucei is the causative agent of African sleeping sickness.
TIGR is funded by the National Institute of Allergy
and Infectious Diseases (NIAID) to implement a large scale
sequencing project of the African trypanosome genome. Our strategy,
detailed below, consists of two phases.
In the first phase, we generated about 20 Mb of discontinuous
single-pass sequence from the non-minichromosomal genome. This was
implemented by end-sequencing of 1,700 P1 clones, 5,000 BAC clones,
and ~14,000 small insert plasmid clones of randomly-sheared DNA. The
purpose of the initial phase was to enhance early gene discovery and
to provide markers that are important for our mapping and sequencing
strategy.
During the second phase (April 1999 to March 2004), chromosomes
II,III,IV,V,VI,VII, and VIII will be sequenced using a BAC by BAC
approach. This project is being carried in collaboration with
Drs. John Donelson (University of Iowa), Sara Melville (University of
Cambridge) and Elisabetta Ullu (Yale University), and in close
coordination with the Sanger Center
where the sequencing of chromosomes I, IX, X and XI is being carried
out. Complete information about the current activities of the African
trypanosome genome network can be obtained from the Trypanosoma brucei genome
project Web page in Cambridge.
| Trypanosoma brucei - The causative agent of African sleeping sickness
(Photograph courtesy of Dr. John Donelson, University of Iowa) |
Sequencing methodology
Overall strategy. A two-part sequencing strategy is being
utilized, yielding 24.5 Mb of discontinuous single-pass sequence (~1X
sequence coverage of the non-minichromosomal genome) and about 12 Mb
of completed sequence on selected chromosomes (50% of the
non-minichromosomal genome). In the first phase of the project, about
47,000 sequences of 500 bp each from the ends of BAC, P1 and whole
genome sheared plasmid DNA libraries were determined. The end
sequences not only enhance early gene discovery, but also serve as
markers for the construction of the high-resolution sequence-ready
map. The second phase of the project involves thorough and highly
accurate sequencing of T. brucei chromosomes II-VIII by
iteratively selecting minimally overlapping BACs for complete
sequencing. For this work, a T. brucei TREU927 GUTat 10.1 whole
genome sheared DNA library (av. insert size 2-3 kb) has been
constructed at TIGR, as well as a large-insert T. brucei BAC
library in collaboration with Dr. Pieter de Jong, at Children's Hospital Oakland Research
Institute.
A- Genome survey sequencing. End-sequences of about 5,000 of
the 18,000 individual BAC clones in the T. brucei TREU927 GUTat
10.1 library (average insert size 140 kb; >90X haploid
non-minichromosomal genome equivalents) have been determined. This
library was generated from both EcoRI and DpnII partial genomic
digests since the use of two enzymes with differing specificities has
been shown to result in libraries with different genomic
representations. The 5,000 end-sequenced BAC clones represent about 26
non-minichromosomal genome equivalents of cloned DNA and provide, on
average, a marker of 500-600 bp every 2700 bp. Such high marker
density is quite useful for the construction of a high-resolution
sequence-ready map. In addition to BAC end sequencing, nearly all
ends of about 2,000 clones from a T. brucei P1
library (average insert size 67 kb, ~4.4X coverage of the haploid
non-minichromosomal genome) have been sequenced, as well as the ends
of about 18,000 clones from the small-insert whole genome plasmid
library. The value of sequencing randomly sheared DNA clones is that
it eliminates the bias inherent to restriction fragments against
obtaining markers not anchored on a single restriction pattern, and it
permits the potential identification of telomere-proximal sequences
that can only be obtained from a sheared library. All the end
sequences have been submitted to the dbGSS division of
Genbank. The TIGR web page provides two types of searches to the
T. brucei end sequence data: search by clone name and sequence
similarity searching. This database is updated daily.
B- Clone by clone strategy for chromosome sequencing. Using
primarily BAC clones as the sequencing substrate, TIGR will sequence
about 12 Mb of T. brucei chromosomes II-VIII between April,
1999 and March, 2004. As described in detail below and shown in this
Strategy schema, the strategy involves
selection of seed clones for sequencing along the length of the
chromosome, and then extending outwards from these seed clones to
develop contigs of BAC clones. The first effort was focused on
sequencing chromosome II (1.25 Mb). Using 10 unique EST markers that
were previously assigned to this chromosome by Sara Melville, at least
three 'seed' BAC clones were identified by searching the BAC end
database, as well as by screening high-density filters containing the
gridded T. brucei BAC library. BAC clones were checked by
fingerprinting for consistency with other hybridizing BACs. Within a
few months, 3 original seed clones of chromosome II became contigs
with 7 additional clones. In all cases, the selection of the
overlapping clone was made using end sequence data as the primary
method, along with BAC fingerprinting. The overlaps achieved with end
sequences are extraordinarily good, averaging 9 kb, and appropriate
care is taken to ensure colinearity of the clone selected for
sequencing and the genome.
How are BAC end-sequence markers and fingerprints used to
construct an optimal sequence-ready map? End sequences and
restriction digest fingerprints are invaluable for making highly
efficient use of BAC clones to construct sequence-ready physical maps
and to select clones for sequencing. The BAC library and the data
pertaining to it enable the construction of minimal sequence tiling
paths of BAC clones in the following way. First, one or more 'seed'
BAC clone(s) is identified by hybridization of the gridded BAC library
with chromosome-specific probes, and sequenced to contiguity (see Strategy schema). The sequenced BAC clone
immediately identifies an average of 50 overlapping BAC clones by
virtue of their end sequences. The fingerprints of a selection of the
overlapping BAC clones are compared to identify any BACs containing
artifacts or inconsistencies so that they will be eliminated as
sequence substrates. One difficulty that could arise comes from the
presence of end sequences that fall entirely within genome-wide
repeats or very similar homology units. Once again, fingerprint data
are essential in sorting out these inconsistencies. Two BAC clones
minimally overlapping the 5' and 3' ends of the 'seed' BAC clone are
chosen for the next round of shotgun sequencing. The protocol
consists of sequencing a BAC clone, using the computer to pick the
minimum tiling path, then sequencing the next BAC clone, etc. Thus,
the minimum sequencing tiling path is chosen by the computer and does
not require additional physical mapping during the sequencing efforts.
The end-sequence strategy has several advantages. (i) It saves
substantial time and effort in constructing sequence-ready maps,
particularly the process of contig 'walking'. (ii) Smaller
overlaps can be detected than by more traditional fingerprint-only
methods of contig-building. (iii) The large DNA inserts in the
BAC library are valuable to many T. brucei researchers, so it
makes sense to invest in characterization of these libraries.
(iv) The method is 'inclusive' and results in sequence markers
for several of the clones in the library, not just ones that happen to
match an existing, mapped marker.
For Comments/Questions send mail to tbrucei@tigr.org.
|