Home Genome Databases
Frequently Asked Questions Contact Us Site Map Careers About TIGR Search What's New Home

NSF 2010 Project:

Large Scale Analysis of Novel Arabidopsis Genes Predicted by Computer Algorithms and Comparative Genomics

P.I.s: Chris Town and Yongli Xiao

 

Project Summary

In the fully sequenced and annotated Arabidopsis genome, there is a group of genes, annotated as “Hypothetical Genes” whose structures are predicted solely by computer algorithms with no support from either nucleic acid or protein homologs from other species or expressed sequence matches from Arabidopsis. Among the total of 26,207 protein coding genes (ATH1 version 5.0), about 15-20% are hypothetical genes. In addition to annotated genes in Arabidopsis, comparison of the Arabidopsis genome sequence with its close relative Brassica oleracea reveals many regions of sequence conservation even in the intergenic, unannotated parts of the Arabidopsis genome, which indicates the existence of more genes in Arabidopsis. Our research demonstrates that 1) about 80% of predicted hypothetical genes are expressed in Arabidopsis; 2) many intergenic regions in the Arabidopsis genome that are conserved in Brassica do encode genes that have so far been unrecognized by the annotation process and are expressed.

The objective of this research is to generate full-length cDNAs for approximately 2,000 of these genes that represent the least well-understood genes in the genome, including hypothetical genes and additional unannotated genes in Arabidopsis. We will use 5' and 3' RACE to define the precise structure of each gene and then generate full-length cDNA clones for protein-coding genes (ORFs) in a recombination vector suitable for functional studies by the research community.

We will generate clones and sequence at a rate of approximately 100 clones per month within three months of the start of the project with a goal of producing full length cDNAs for 2,000 novel and previously uncharacterized genes over the period of the project.

Sequences of the clones will be submitted to GenBank as they are generated and will also be available from the TIGR ftp site. The clones themselves will be made freely available to the research community through the Arabidopsis Biological Resource Center (ABRC).

Reference:

1. Xiao YL, Malik M, Whitelaw CA, and Town CD. (2002).Cloning and sequencing of cDNAs for hypothetical proteins from chromosome 2 of Arabidopsis thaliana. Plant Physiol. 130: 2118-28.

2. Xiao YL, Smith SR, Ishmael N, Redman JC, Kumar N, Monaghan EL, Ayele M, Haas BJ, Wu HC, Town CD. (2005) Analysis of the cDNAs of hypothetical genes on Arabidopsis chromosome 2 reveals numerous transcript variants. Plant Physiol. 139:1323-37.

3. Underwood BA, Vanderhaeghen R, Whitford R, Town CD, and Hilson P. (2006) Simultaneous high-throughput recombinational cloning of open reading frames in closed and open configurations. Plant Biotechnology Journal.(in press)

Target Genes Techniques Finished Genes

This material is based upon work supported by the National Science Foundation under Grant No.0312656. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.