Annotation Database
Search
Gene Name Search
Locus Search
Sequence Search
HMM Search
Arabidopsis Links
BAC Tiling Path
Download Sequences
|
Annotation of Genomic Sequence from Arabidopsis thaliana
Selected BACs proceed through sequencing and
closure, and are then ready for annotation. We believe annotation of individual
genes is an essential part of the genome project. This permits gene discovery
in a systematic, comprehensive and consistent manner.
Steps involved in annotation:
The BAC sequence and results of all analyses are stored in our central relational database (Sybase).
Database search
Gene prediction programs
- Genemark.hmm (Arabidopsis)
- Genscan+ (Arabidopsis)
- GlimmerA
- GeneSplicer, to predict exon/intron splicing sites
- tRNAscan-SE, to predict tRNA
Analysis of individual gene model, with the Neomorphic Annotation Station tool.
-
Information obtained from step 1 and 2 are downloaded from the database and visualized
in the Chromosome Viewer and in the BAC viewer.
- A "working gene model" is made from a predicted model.
- Working models are edited/refined in the Gene Editor Window, where:
- The annotation is checked against all supporting evidence.
- DNA sequence and translation products of each exon are examined.
- Exons are edited to fit with the evidence from different analysis.
- The final annotation is then saved to the database.
Criteria for the definition of genes
- If a gene is identical to a previously characterized gene, the orginal
gene name is preserved in our annotation.
- Gene models with protein matches (dps score >100) are named after the database
entries as "putative XXX", "XXX-like protein", to indicate similarity.
- When the match is from beginning to end, the name of the database match
is also indicated in our definition.
- When the match is restricted to certain domains, a general name for that
class of protein is used.
- Gene models with only EST matches are named as "unknown proteins".
- Gene models without any database matches are called "hypothetical proteins".
Submitting Data to GenBank
Final annotation for TIGR generated sequence are submitted to Genbank.
The sequences of the annotated genes, along with
supporting evidence, can also be found on our web site.
Software Links
- GeneMark.hmm (Borodovsky and Lukashin, School of Biology, Georgia Institute of Technology)
- Genscan+ (Chris Burge, Massachusetts Institute of Technology)
- GlimmerA (Salzberg, Pertea, at al., The Institute for Genomic Research)
- GeneSplicer (Mihaela Pertea and Steven Salzberg, The Institute for Genomic Research)
- tRNAscan-SE (Sean Eddy, Dept. of Genetics, Washington U.
School of Medicine)
- dds/gap2, dps/nap (Xiaoqiu Huang, Dept of Computer
Science, Michigan Technological University)
- RepeatMasker2
(A.F.A. Smit & P. Green, University of Washington)
- Annotation Station: Neomorphic
Arabidopsis Comments/Questions
|