|  Trypanosoma brucei Genome Project
  TIGR Home  |  TIGR Database  |  Genome Home  |  Introduction  |  Related Links  |  Help   

What's New


Sequencing Methodology


End Sequencing and Gene Discovery


TIGR BAC Sequencing


Annotation Database


Search

    Gene Name Search
    Locus Search
    Sequence Search
    HMM Search


Download

    Public FTP
    Licensed FTP


T.brucei Gene Index


FAQ


Data Release Policy


T. brucei Links

Introduction - The TIGR Trypanosoma brucei Genome Project

Welcome to the TIGR Trypanosoma brucei Genome Project. T. brucei is the causative agent of African sleeping sickness. TIGR is funded by the National Institute of Allergy and Infectious Diseases (NIAID) to implement a large scale sequencing project of the African trypanosome genome. Our strategy, detailed below, consists of two phases.

In the first phase, we generated about 20 Mb of discontinuous single-pass sequence from the non-minichromosomal genome. This was implemented by end-sequencing of 1,700 P1 clones, 5,000 BAC clones, and ~14,000 small insert plasmid clones of randomly-sheared DNA. The purpose of the initial phase was to enhance early gene discovery and to provide markers that are important for our mapping and sequencing strategy.

During the second phase (April 1999 to March 2004), chromosomes II,III,IV,V,VI,VII, and VIII will be sequenced using a BAC by BAC approach. This project is being carried in collaboration with Drs. John Donelson (University of Iowa), Sara Melville (University of Cambridge) and Elisabetta Ullu (Yale University), and in close coordination with the Sanger Center where the sequencing of chromosomes I, IX, X and XI is being carried out. Complete information about the current activities of the African trypanosome genome network can be obtained from the Trypanosoma brucei genome project Web page in Cambridge.

T. brucei is the causative agent of African sleeping sickness
Trypanosoma brucei - The causative agent of African sleeping sickness

(Photograph courtesy of Dr. John Donelson, University of Iowa)

Sequencing methodology

Overall strategy. A two-part sequencing strategy is being utilized, yielding 24.5 Mb of discontinuous single-pass sequence (~1X sequence coverage of the non-minichromosomal genome) and about 12 Mb of completed sequence on selected chromosomes (50% of the non-minichromosomal genome). In the first phase of the project, about 47,000 sequences of 500 bp each from the ends of BAC, P1 and whole genome sheared plasmid DNA libraries were determined. The end sequences not only enhance early gene discovery, but also serve as markers for the construction of the high-resolution sequence-ready map. The second phase of the project involves thorough and highly accurate sequencing of T. brucei chromosomes II-VIII by iteratively selecting minimally overlapping BACs for complete sequencing. For this work, a T. brucei TREU927 GUTat 10.1 whole genome sheared DNA library (av. insert size 2-3 kb) has been constructed at TIGR, as well as a large-insert T. brucei BAC library in collaboration with Dr. Pieter de Jong, at Children's Hospital Oakland Research Institute.

A- Genome survey sequencing. End-sequences of about 5,000 of the 18,000 individual BAC clones in the T. brucei TREU927 GUTat 10.1 library (average insert size 140 kb; >90X haploid non-minichromosomal genome equivalents) have been determined. This library was generated from both EcoRI and DpnII partial genomic digests since the use of two enzymes with differing specificities has been shown to result in libraries with different genomic representations. The 5,000 end-sequenced BAC clones represent about 26 non-minichromosomal genome equivalents of cloned DNA and provide, on average, a marker of 500-600 bp every 2700 bp. Such high marker density is quite useful for the construction of a high-resolution sequence-ready map. In addition to BAC end sequencing, nearly all ends of about 2,000 clones from a T. brucei P1 library (average insert size 67 kb, ~4.4X coverage of the haploid non-minichromosomal genome) have been sequenced, as well as the ends of about 18,000 clones from the small-insert whole genome plasmid library. The value of sequencing randomly sheared DNA clones is that it eliminates the bias inherent to restriction fragments against obtaining markers not anchored on a single restriction pattern, and it permits the potential identification of telomere-proximal sequences that can only be obtained from a sheared library. All the end sequences have been submitted to the dbGSS division of Genbank. The TIGR web page provides two types of searches to the T. brucei end sequence data: search by clone name and sequence similarity searching. This database is updated daily.

B- Clone by clone strategy for chromosome sequencing. Using primarily BAC clones as the sequencing substrate, TIGR will sequence about 12 Mb of T. brucei chromosomes II-VIII between April, 1999 and March, 2004. As described in detail below and shown in this Strategy schema, the strategy involves selection of seed clones for sequencing along the length of the chromosome, and then extending outwards from these seed clones to develop contigs of BAC clones. The first effort was focused on sequencing chromosome II (1.25 Mb). Using 10 unique EST markers that were previously assigned to this chromosome by Sara Melville, at least three 'seed' BAC clones were identified by searching the BAC end database, as well as by screening high-density filters containing the gridded T. brucei BAC library. BAC clones were checked by fingerprinting for consistency with other hybridizing BACs. Within a few months, 3 original seed clones of chromosome II became contigs with 7 additional clones. In all cases, the selection of the overlapping clone was made using end sequence data as the primary method, along with BAC fingerprinting. The overlaps achieved with end sequences are extraordinarily good, averaging 9 kb, and appropriate care is taken to ensure colinearity of the clone selected for sequencing and the genome.

How are BAC end-sequence markers and fingerprints used to construct an optimal sequence-ready map? End sequences and restriction digest fingerprints are invaluable for making highly efficient use of BAC clones to construct sequence-ready physical maps and to select clones for sequencing. The BAC library and the data pertaining to it enable the construction of minimal sequence tiling paths of BAC clones in the following way. First, one or more 'seed' BAC clone(s) is identified by hybridization of the gridded BAC library with chromosome-specific probes, and sequenced to contiguity (see Strategy schema). The sequenced BAC clone immediately identifies an average of 50 overlapping BAC clones by virtue of their end sequences. The fingerprints of a selection of the overlapping BAC clones are compared to identify any BACs containing artifacts or inconsistencies so that they will be eliminated as sequence substrates. One difficulty that could arise comes from the presence of end sequences that fall entirely within genome-wide repeats or very similar homology units. Once again, fingerprint data are essential in sorting out these inconsistencies. Two BAC clones minimally overlapping the 5' and 3' ends of the 'seed' BAC clone are chosen for the next round of shotgun sequencing. The protocol consists of sequencing a BAC clone, using the computer to pick the minimum tiling path, then sequencing the next BAC clone, etc. Thus, the minimum sequencing tiling path is chosen by the computer and does not require additional physical mapping during the sequencing efforts.

The end-sequence strategy has several advantages. (i) It saves substantial time and effort in constructing sequence-ready maps, particularly the process of contig 'walking'. (ii) Smaller overlaps can be detected than by more traditional fingerprint-only methods of contig-building. (iii) The large DNA inserts in the BAC library are valuable to many T. brucei researchers, so it makes sense to invest in characterization of these libraries. (iv) The method is 'inclusive' and results in sequence markers for several of the clones in the library, not just ones that happen to match an existing, mapped marker.


For Comments/Questions send mail to tbrucei@tigr.org.