 |
NSF 2010 Project:
Large Scale Analysis of Novel Arabidopsis
Genes Predicted by Computer Algorithms and Comparative Genomics
P.I.s: Chris
Town and Yongli Xiao
Project Summary
In the fully sequenced and annotated Arabidopsis genome, there is a group
of genes, annotated as “Hypothetical Genes” whose structures are predicted solely
by computer algorithms with no support from either nucleic acid or protein homologs
from other species or expressed sequence matches from Arabidopsis. Among the
total of 26,207 protein coding genes (ATH1 version 5.0), about 15-20% are hypothetical
genes. In addition to annotated genes in Arabidopsis, comparison of the Arabidopsis
genome sequence with its close relative Brassica oleracea reveals many
regions of sequence conservation even in the intergenic, unannotated parts of
the Arabidopsis genome, which indicates the existence of more genes in Arabidopsis.
Our research demonstrates that 1) about 80% of predicted hypothetical genes
are expressed in Arabidopsis; 2) many intergenic regions in the Arabidopsis
genome that are conserved in Brassica do encode genes that have so far been
unrecognized by the annotation process and are expressed.
The objective of this research is to generate full-length cDNAs for approximately
2,000 of these genes that represent the least well-understood genes in the genome,
including hypothetical genes and additional unannotated genes in Arabidopsis.
We will use 5' and 3' RACE to define the precise structure of each gene and
then generate full-length cDNA clones for protein-coding genes (ORFs) in a recombination
vector suitable for functional studies by the research community.
We will generate clones and sequence at a rate of approximately 100 clones
per month within three months of the start of the project with a goal of producing
full length cDNAs for 2,000 novel and previously uncharacterized genes over
the period of the project.
Sequences of the clones will be submitted to GenBank as they are generated
and will also be available from the TIGR ftp site. The clones themselves will
be made freely available to the research community through the Arabidopsis Biological
Resource Center (ABRC).
Reference:
1. Xiao YL, Malik M, Whitelaw CA, and Town CD. (2002).Cloning and
sequencing of cDNAs for hypothetical proteins from chromosome 2 of
Arabidopsis thaliana. Plant Physiol. 130: 2118-28.
2. Xiao YL, Smith SR, Ishmael N, Redman JC, Kumar N, Monaghan EL, Ayele M,
Haas BJ, Wu HC, Town CD. (2005) Analysis of the cDNAs of hypothetical genes on
Arabidopsis chromosome 2 reveals numerous transcript variants.
Plant Physiol. 139:1323-37.
3. Underwood BA, Vanderhaeghen R, Whitford R, Town CD, and
Hilson P. (2006) Simultaneous high-throughput recombinational
cloning of open reading frames in closed and open configurations. Plant
Biotechnology Journal.(in press)
| This material is based upon work supported by the National Science Foundation
under
Grant No.0312656. Any opinions, findings, and conclusions or recommendations expressed in
this material are those of the author(s) and do not necessarily reflect the views of the
National Science Foundation. |  |
|