tiger.gif
logo.gif

Introduction - The Plasmodium falciparum Genome Database (PFDB)


Introduction

What's New

Chromosome 2

Genome Annotation Database

FAQ

Links

Back to TDB
Back to TIGR Home

The P. falciparum genome

The P. falciparum nuclear genome is approximately 25 Mb in length and consists of 14 chromosomes ranging from ~0.6 -3.4 Mb.1,2 The chromosomes can be resolved by pulsed field gel electrophoresis. Chromosomes in different field isolates frequently vary in size due to recombination events involving the subtelomeric regions of the chromosomes. The genome is very A+T-rich (80%) and large fragments of P. falciparum DNA are unstable in E. coli. Malaria parasites and related parasites such as Toxoplasma also possess a mitochondrial genome of approximately 6 kb, and a 35 kb circular DNA that has been localized to a novel organelle called the apicoplast.3

Sequencing methodology

The P. falciparum genome is being sequenced using a whole chromosome shotgun strategy. TIGR and NMRC were assigned chromosomes 2, 10, 11, and 14, comprising approximately 9 Mb. The procedures used at the 3 sequencing centers vary slightly but involve 5 basic steps: library construction, random sequencing, assembly, closure, and annotation. Chromosome 2 was completed and published in 1998 4, and chromosomes 10, 11, and 14 are in closure.

At TIGR/NMRC, individual chromosomes were resolved on pulsed field gels and the chromosomal DNA was sheared into 1-2 kb fragments and cloned into plasmid vectors. Plasmid DNAs were prepared from randomly-selected colonies and the ends of the inserts were sequenced using M13 forward and reverse primers. As decribed below, the use of paired forward and reverse sequences from plasmid templates greatly simplifies gap closure. For chromosome 2 both dye-primer and dye-terminator chemistry were used in the random sequencing phase, whereas chromosomes 10, 11, and 14 are being sequenced with dye-terminator chemisty.

A sufficient number of sequences were produced to provide at least 10X coverage of the chromosomes. The sequences were then assembled into contigs using TIGR Assembler in 2-step process. The first assembly was performed at 99.5% similarity, and the resulting contigs and remaining unassembled sequences were assembled at 97.5% similarity. The 2-step assembly was helpful in correctly assembling repetitive regions.

Unfortunately, it is never possible to assemble the entire chromosome into a single contig at the end of the random sequencing phase due to the under-representation of some sequences in the genomic libraries, the presence of repetitive regions that confuse the assembler, or sequence ambiguities in hard- to-sequence areas. In this project, hundreds of contigs were obtained after the assembly of each chromosome.

The closure process involves the ordering of the contigs on the chromosome and closing of the gaps between them. This is frequently the most labor-intensive and time-consuming phase of a sequencing program. Linked contigs are identified using the Grouper program. Grouper examines all of the contigs and their underlying sequences and identifies sets of contigs (groups) that are linked by 2 or more plasmid templates which have their forward reads in one contig and their reverse reads in another contig. Gaps between linked contigs in a group are called sequence gaps and can be closed by primer walking on the linking plasmids or several other techniques.

The gaps at the end of groups represent areas where no linking clones between contigs can be identified and are known as physical gaps. Physical gaps are closed by synthesizing primers complementary to the contigs flanking the gaps and using PCR from genomic DNA template to obtain PCR products spanning the gaps. The PCR products are then sequenced to close the physical gaps.

To simplify the process of physical gap closure, groups of contigs are localized on the chromosomes using STS markers derived from the end-sequences of YACs previously mapped to the genome in the Wellcome Trust P. falciparum Mapping Project5, and microsatellite markers and a linkage map produced in T. Wellem's laboratory6,7. Also, groups of contigs can be ordered on the chromosomes by reference to optical restriction maps of the chromosomes prepared in David Schwartz's laboratory. Usually, groups of contigs covering 70-90% of the chromosome can mapped on the chromosomes by using these resources, but many groups of contigs cannot be placed on the map due to the scarcity of microsatellite markers or informative restriction sites in some regions. To close physical gaps, PCR reactions are first performed between groups adjacent to one another on the map and the products obtained are sequenced. If no product is obtained with primers from adjacent groups, this indicates that one or more of the unmapped groups should belong in the gap. To identify the missing group(s), combinatorial or multiplex PCR reactions are done using genomic DNA and primers from the ends of the mapped groups and the ends of all of the unmapped groups.

Due to the high AT-content of the Plasmodium genome, it is very common to find homopolymers of As and Ts scattered throughout the genome. It is very difficult to determine the precise number of bases in these tracts due to DNA polymerase slippage, and the sequence downstream of the homopolymer tract is usually of poor quality. Most of the difficult to close sequence gaps in the P. falciparum chromosomes are comprised of clones spanning these regions of repetitive tracts. In order to improve our capacity to sequence through these repetitive regions, a collaboration has been establish with Dr. Andrei Malyck, from Fidelity Systems Inc. We are now using a hyper-thermostable topoisomerase (ThermoFidelaseTM) and chemically-modified M13 forward and reverse primers (FimersTM) in our sequencing reactions when sequencing these difficult DNA templates (manuscript in preparation).

Sequencing reactions with a 3-5 fold signal-to-noise-ratios have been obtained using the attached protocol.

Once all of the sequence and physical gaps have been closed, the sequence is edited using TIGR Editor and any ambiguities are corrected by editing, or if necessary, by re-sequencing of the area in question. The sequence is also examined to ensure that each region of the sequence has 2-fold clone coverage, and that each base in the sequence is represented by 2 or more high-quality sequences. In addition, restriction maps predicted from the final chromosome sequence are checked against the optical restriction maps to verify that no major misassembly has occurred.

Full annotation of the chromosome sequence begins once all gaps have been closed, the overall structure of the final contig has been verified, and the sequence has been edited. A full annotation is not done on preliminary sequence data because misassembly of contigs or sequence errors that are not detected until late in the sequencing process can cause frameshifts and other errors in annotation. In addition, it is difficult to "forward track" annotation through the many steps of gap closure. Thus many regions of a chromosome would need to be re-annotated once the sequence was completed.

Nonetheless, we recognize the usefulness of the preliminary annotation of chromosomes 10, 11, and 14 for the malaria research community and have made preliminary annotation of these chromosomes available on this web site. The preliminary annotaton was performed by running the GlimmerM gene finder10 on the unedited contigs. The gene models predicted by GlimmerM were then searched against nucleotide and protein databases. The search results were parsed automatically to identify "good" matches. Predicted nucleotide and protein sequences of the GlimmerM models are provided, along with the tentative identification of the genes. Please be aware that the preliminary annotation has not been examined or verified by human annotators. In addition, because of unavoidable cross-contamination in the shotgun libraries prepared from pulsed-field gel purified chromosomal DNA, the presence of sequences in a specific chromosome database should not be taken as proof that those sequences are derived from that chromosome. Therefore, use caution when using this information. Please read the data release policy regarding the use of preliminary annotation.

The chromosomes 10 and 11 list of assemblies has been divided in two separate groups: chromosome 10- or chromosome 11-specific assemblies are those which harbor known chromosome-specific microsatellite markers. There are groups of assemblies with large sizes for which no known microsatellite is detected. Therefore the chromosome origin of these assemblies is yet to be defined.

1. Wellems, T.E., Su, X., Ferdig, M. & Fidock, D.A. Genome projects, genetic analysis, and the changing landscape of malaria research. Current Opinions in Microbiology 2, 415-9 (1999).

2. Gardner, M.J. The genome of the malaria parasite. Current Opinion in Genetics and Development 9, 704-708 (1999).

3. Soldati, D. The apicoplast as a potential therapeutic target in Toxoplasma and other apicomplexan parasites. Parasitol Today 15, 5-7 (1999).

4. Gardner, M.J. et al. Chromosome 2 sequence of the human malaria parasite Plasmodium falciparum. Science 282, 1126-1132 (1998).

5. Foster, J. & Thompson, J. The Plasmodium falciparum genome project: a resource for researchers. Parasitology Today 11, 1-4 (1995).

6. Su, X. et al. A Genetic Map and Recombination Parameters of the Human Malaria Parasite Plasmodium falciparum. Science 286, 1351-1353 (1999).

7. Su, X.Z. & Wellems, T.E. Toward a high-resolution Plasmodium falciparum linkage map:polymorphic markers from hundreds of simple sequence repeats. Genomics 33, 430-444 (1996).

8. Jing, J. et al. Optical mapping of Plasmodium falciparum chromosome 2. Genome Research 9, 175-181 (1999).

9. Lai, Z. et al. A shotgun optical map of the entire Plasmodium falciparum genome. Nature Genetics 23, 309-313 (1999).

10. Salzberg, S.L., Pertea, M., Delcher, A., Gardner, M.J. & Tettelin, H. Interpolated Markov models for eukaryotic gene finding. Genomics 59, 24-31 (1999).


For P.falciparum Comments/Questions send mail to pfg@tigr.org.
Introduction | What's New
Chromosome 2 | Preliminary Annotation
Data Release Policy | Links


Send mail to TIGR Search | Site Map © 1999-2000 TIGR