Coccidioides posadasii Genome Project
 |  Arabidopsis thaliana Genome Project
  TIGR Home  |  TIGR Database  |  Genome Home  |  Introduction  |  Related Links  |  Help   




Annotation Database

Search
    Gene Name Search
    Locus Search
    Sequence Search
    HMM Search

Arabidopsis Links

BAC Tiling Path

Download Sequences


Annotation of Genomic Sequence from Arabidopsis thaliana

Selected BACs proceed through sequencing and closure, and are then ready for annotation. We believe annotation of individual genes is an essential part of the genome project. This permits gene discovery in a systematic, comprehensive and consistent manner.

Steps involved in annotation:

The BAC sequence and results of all analyses are stored in our central relational database (Sybase).
  1. Database search

    • Non-redundant protein database, dps/nap
    • TIGR plant Gene Index EST databases (ESTs from Arabidopsis and other plants), dds/gap2

      These two sets of software generate gapped alignments. A combined alignment of the BAC DNA with database matches is also produced.

    • We also search each BAC sequence against a Arabidopsis repeat database to identify known repeats and transposons (DNA transposons, retroelements, MITEs, etc).
    • Simple repeats are identified and annotated with RepeatMasker2

  2. Gene prediction programs

    • Genemark.hmm (Arabidopsis)
    • Genscan+ (Arabidopsis)
    • GlimmerA
    • GeneSplicer, to predict exon/intron splicing sites
    • tRNAscan-SE, to predict tRNA

  3. Analysis of individual gene model, with the Neomorphic Annotation Station tool.

    • Information obtained from step 1 and 2 are downloaded from the database and visualized in the Chromosome Viewer and in the BAC viewer.
    • A "working gene model" is made from a predicted model.
    • Working models are edited/refined in the Gene Editor Window, where:
      • The annotation is checked against all supporting evidence.
      • DNA sequence and translation products of each exon are examined.
      • Exons are edited to fit with the evidence from different analysis.
    • The final annotation is then saved to the database.

  4. Criteria for the definition of genes

    • If a gene is identical to a previously characterized gene, the orginal gene name is preserved in our annotation.
    • Gene models with protein matches (dps score >100) are named after the database entries as "putative XXX", "XXX-like protein", to indicate similarity.
      • When the match is from beginning to end, the name of the database match is also indicated in our definition.
      • When the match is restricted to certain domains, a general name for that class of protein is used.
    • Gene models with only EST matches are named as "unknown proteins".
    • Gene models without any database matches are called "hypothetical proteins".

  5. Submitting Data to GenBank

    Final annotation for TIGR generated sequence are submitted to Genbank.
The sequences of the annotated genes, along with supporting evidence, can also be found on our web site.

Software Links

  • GeneMark.hmm (Borodovsky and Lukashin, School of Biology, Georgia Institute of Technology)
  • Genscan+ (Chris Burge, Massachusetts Institute of Technology)
  • GlimmerA (Salzberg, Pertea, at al., The Institute for Genomic Research)
  • GeneSplicer (Mihaela Pertea and Steven Salzberg, The Institute for Genomic Research)
  • tRNAscan-SE (Sean Eddy, Dept. of Genetics, Washington U. School of Medicine)
  • dds/gap2, dps/nap (Xiaoqiu Huang, Dept of Computer Science, Michigan Technological University)
  • RepeatMasker2 (A.F.A. Smit & P. Green, University of Washington)
  • Annotation Station: Neomorphic


Arabidopsis Comments/Questions