 |
 |
We are pleased to announce release 5 of the TIGR Rice
Pseudomolecules and Genome Annotation. The official release date
for this version is January 24, 2007. Click on the links below to go to the relevant section for detailed information:
Description of TIGR Rice Genome Pseudomolecules
As part of our National Science Foundation-funded Rice
Genome Annotation Project, we constructed
pseudomolecules (virtual contigs) for each of rice 12 chromosomes. In release 5, we retained our release 4 pseudomolecules as there was no significant amount of Oryza sativa (japonica cultivar-group) genomic sequences deposited in GenBank/EMBL/DDBJ in the past year.
The pseudomolecules were constructed by resolving
discrepancies between overlapping BAC/PAC clones, trimming the overlap regions at junction points in
which the phase 3 BAC/PAC sequences are preferably used, and linking
the unique sequences to form a contigous sequence. A list
of the ordered BAC/PAC clones for each of the 12 chromosomes
was obtained from the IRGSP.
Although we used all of the BAC/PAC sequences from the IRGSP and which are available in Genbank/EMBL/DDBJ,
our pseudomolecules do not represent the official pseudomolecules generated by
the IRGSP which are available here.
A total of 3,450 rice BAC/PAC clones were included in the
pseudomolecules. At the time these pseudomolecules were constructed,
3,408 BAC/PAC clones (98.8 %) were finished and 42 BAC/PAC
(1.2 %) clones were unfinished (phase 2) as defined by Genbank.
Gaps between clones (i.e., physical gaps are denoted with
1000 Ns) and the location of these gaps can be seen in the
graphical views of each of the chromosomes below. Centromeres were
identified using the CentO centromeric sequence (AY101510;
Cheng et al., 2002).
The centromeres
are adjacent to these clones on each of the 12 rice chromosomes.
Please be aware that there may also be other gaps in unfinished
BACs which also could be denoted with a string of Ns. In
total, there are 38 physical gaps within the 12
pseudomolecules in addition to gaps at 10 centromeres and
10 telomeres
All of the BAC/PAC clones were annotated using our
automated/semi-automated rice annotation pipeline (click here to
see the details).
In the current release (Osa1 version 5.0), there are 372,077,801
bp of non-overlapping rice genome sequence from the 12 rice chromosomes
and 56,278 genes (loci) were identified, of which 6,498 have 10,432
additional alternative splicing isoforms resulting in a total of 66,710
transcripts (or gene models) in the rice genome.
Note that we have excluded 740 small gene
models (<50 amino acids) from our annotated gene set.
Transposable element-related (TE-related) gene models
were identified using two approaches: BLASTN searches against the TIGR Oryza Repeat Database
and by identifying gene models containing TE-related Pfam
domains. These genes (15,232) and their models (15,424) were annotated based on the
Pfam domain or the nomenclature in the TIGR Oryza Repeat
Database. Pack-MULEs were identified only on chromosome 1
and 10. They were manually annotated as described in
Jiang et al. 2004. Transduplicate MULEs identified by
Juretic et al. 2005 were aligned to the TIGR v5 Pseudomolecules. Note that the Jiang Pack-MULEs and the transduplicate MULEs are only identified on the Genome Browser and not in our functional annotation.
A total of 33,882 gene models (24,435 genes) were further improved based on the experimental evidence provided by
EST and full-length cDNA sequences. This was done using the
TIGR PASA program. A portion of PASA validation failed models was manually reviewed and curated.
The structure of 1,648 gene models were manually annotated using EST paring information and comparative genomics analyses (Zhu and Buell, Genome Research, 2007).
Using the structural annotation from the Community Annotation project (CA), we modified 43 loci encompassing 9 different CA protein families. In addition, we added 20 new loci from 5 different CA protein families to the TIGR annotation. We updated functional assignment for 378 loci using the Community Annotation.
Please note that these pseudomolecules are constructed from
finished and unfinished sequence and a majority of the gene
models have not been manually curated.
Features of the TIGR Rice Annotation Release 5
- Our rice genome browser has been updated and now contains 62 tracks of annotation. These tracks have been updated to include the latest evidence and datasets.
- Gene model structure has been improved for 33,882 gene models (28,706 genes) with ~ 1.2 million of EST and/or full length cDNA evidence using the TIGR PASA program. A portion of models has been manually curated using the expression evidence and comparative genomics studies.
- Rice gene expression Anatomy Viewer/Digital Northern and Tissue Specific Expression page have been created.
- The rice community annotation has been carried out, and the results have been integrated into the TIGR rice annotation.
- Locus names have been assigned to chloroplast and mitochondrial genes.
- A new protein function category, conserved hypothetical protein, has been introduced for proteins only matching to proteins without known function in other organisms.
Table of Rice Pseudomolecule, Loci, and Gene Models in Release 5
| Chr |
BAC/ PAC No. |
Sequence Length in Pseudomolecule
(bp) |
Genes/Locia |
Gene Modelsa |
Ordered List of BAC/PAC Clones |
Graphic View |
Download Sequences |
| TEb |
Non-TEc |
Total |
TEb |
Non-TEc |
Total |
| 1d |
393 |
43,596,771 |
1,307 |
5,313 |
6,620 |
1,334 |
6,766 |
8,100 |
Chr01 |
Chr01 |
Download |
| 2 |
358 |
35,925,388 |
1,096 |
4,319 |
5,415 |
1,112 |
5,608 |
6,720 |
Chr02 |
Chr02 |
Download |
| 3 |
327 |
36,345,490 |
1,038 |
4,559 |
5,597 |
1,058 |
6,027 |
7,085 |
Chr03 |
Chr03 |
Download |
| 4 |
292 |
35,244,269 |
1,759 |
3,613 |
5,372 |
1,773 |
4,464 |
6,237 |
Chr04 |
Chr04 |
Download |
| 5 |
286 |
29,874,162 |
1,344 |
3,298 |
4,642 |
1,363 |
4,216 |
5,579 |
Chr05 |
Chr05 |
Download |
| 6 |
281 |
31,246,789 |
1,321 |
3,420 |
4,741 |
1,341 |
4,158 |
5,499 |
Chr06 |
Chr06 |
Download |
| 7 |
287 |
29,688,601 |
1,237 |
3,270 |
4,507 |
1,264 |
3,988 |
5,252 |
Chr07 |
Chr07 |
Download |
| 8 |
275 |
28,309,179 |
1,299 |
2,905 |
4,204 |
1,302 |
3,585 |
4,887 |
Chr08 |
Chr08 |
Download |
| 9 |
223 |
23,011,239 |
1,033 |
2,399 |
3,432 |
1,046 |
2,926 |
3,972 |
Chr09 |
Chr09 |
Download |
| 10d |
202 |
22,876,596 |
1,071 |
2,404 |
3,475 |
1,078 |
2,942 |
4,020 |
Chr10 |
Chr10 |
Download |
| 11 |
257 |
28,462,103 |
1,288 |
2,936 |
4,224 |
1,295 |
3,439 |
4,734 |
Chr11 |
Chr11 |
Download |
| 12 |
269 |
27,497,214 |
1,439 |
2,610 |
4,049 |
1,458 |
3,167 |
4,625 |
Chr12 |
Chr12 |
Download |
| Totale |
3,450 |
372,077,801 |
15,232 |
41,046 |
56,278 |
15,424 |
51,286 |
66,710 |
|
|
Download |
a Excluding small gene models (< 50 amino acids).
b TE: Transposable elements related genes and gene
models. The rice proteome was searched against
the TIGR Oryza Repeat Database with TBLASTN and
against the TE-related Pfam domains with hmmpfam. Genes and gene models with matches above cut-offs
were annotated as TE-related gene models.
c Non-TE: Non-TE related gene models.
d Pack-MULEs were only annotated using data available from
Jiang et al. 2004 and Juretic et al. 2005.
e Note these pseudomolecules do not represent the official IRGSP pseudomolecules.
You can also get a sub-dataset of TIGR rice pseudomolecules by using
TIGR Rice Genome Data Extractor
Rice Pseudomolecule Gap Table
|