Arabidopsis whole-genome arrays
The development of Arabidopsis whole-genome arrays was founded by the NSF Arabidopsis 2010 project for the study of transcriptional networks in Arabidopsis. The arrays are available from NimbleGen Systems, Inc.
Characteristics of the two arrays
Oligo design for the promoter array (ATH_P1)
Oligo design for the whole genome array (ATH_WG1)
Representation of the genes on ATH_WG1A, B and C
Specificity of the oligos
Characteristics of the two arrays
| |
ATH_P1 |
ATH_WG1 |
| Regions represented on the array |
2000 bases upstream translation start |
all |
| Total number of oligos |
|
|
Number of chips |
1 |
3 (A, B and C) |
| Version of TIGR annotation |
4 |
5 |
| Repeat-masking? |
No |
Yes, 15% repeats |
| Average oligo density |
1 per 285 nt |
1 per 90 nt |
| Oligo length |
54-65 nt |
55-70 nt |
| Maximum distance between two oligos |
600 nt |
3392 |
| Number of oligos per gene |
7 per promoter |
45 per genic region (average) |
Oligo design for the promoter array (ATH_P1)
Oligonucleotides on ATH_P1 were designed exclusively in the promoter regions of annotated genes, as these are the most probable binding regions of transcription factors. Regions 2kb upstream of each gene were extracted from the pseudomolecules regardless of the presence of neighboring genes or the existence of annotated UTR. Overlapping promoters were not consolidated into a single region. The design strategy aimed to design 7 probes within the presumptive promoter region of all genes using a scoring algorithm that takes into account the melting temperature (Tm), the proximity to other oligos, the frequency of 24-mer sliding windows, and basic oligo composition rules. The targeted Tm of 76°C was achieved by varying the length of the oligo between 54 and 65 nucleotides.
In spite of the high weight put on position in the algorithm, intervals of more than 600 bases between oligos were observed for 3400 promoters. In these cases, first-pass oligos were discarded and new oligos were re-designed within 7 given regularly spaced intervals throughout the promoter so as to insure a maximum distance of 450 bases between two adjacent oligos.
For a subset of 30 genes, 20 oligos per promoter were designed.
For a text file of the oligo sequences and coordinates click here.
Oligo design for the whole genome array (ATH_WG1)
A total of 15% of the TIGR Release 5 pseudomolecules were masked for Arabidopsis repeat sequences present in for E. coli sequences retrieved from NCBI, vector sequences present in the TIGR UniVec database, and transposon-related annotation using RepeatMasker.
The oligo design algorithm proceeded as follow on the unmasked sequences. Starting from position 1, a probe of the next 70 residues is selected and then trimmed from the 3' end until the targeted Tm of 76°C or a lower cut-off length of 55 bases is reached. If no oligo could be designed that satisfies the Tm and length constraints imposed, or if part of the oligo was masked or contain an ambiguous base, the oligo was discarded and the design process was repeated one base downstream. The oligo was also discarded if it exceeded the limitations in the number of cycles required for its synthesis. When an oligo was found in this region that satisfies all the design constraints, it was selected for synthesis. The program then steps 55 nucleotides to the right, and repeated the design process. Masked repeats were skipped during the design process.
For a text file of the oligo sequences and coordinates click here.
Representation of the genes on ATH_WG1A, B and C
The oligos were split up into three chips according to their intended mapping location on the genome
| Chip: |
ATH_WG1A |
ATH_WG1B |
ATH_WG1C |
| Chromosome 1 |
all |
no |
no |
| Chromosome 2 |
to At2g25170 |
from At2g22930 |
no |
| Chromosome 3 |
no |
all |
no |
| Chromosome 4 |
no |
to At4g13580 |
from At4g11385 |
| Chromosome 5 |
from At5g65600 |
no |
all |
Note that:
1. Some oligos are present on more than one chip. The last 9000 oligos of each chip ("Overlapping" oligos) are repeated on the next chip (the last 900 probes of chip C are repeated on chip A). In addition, 6000 randomly chosen probes are present once on each chip.
2. The physical position of the oligos is randomized on each chip
Specificity of the oligos
In order to evaluate their specificity, oligos were aligned to the genome in sliding windows of 15 bases. These alignment results are indicative of the risk for cross-hybridization of each oligo. We calculated that 87% of the oligos of ATH_P1 are unique using a 75% identity cut-off over the entire length of the oligo. See alignment of ATH_P1 oligo set to TIGR release version 5 here.
A total of 82% of the oligos of ATH_WG1 are unique using a 75% identity cut-off over the entire length of the oligo. See alignment of ATH_WG1 to the TIGR release version 5 here.
Arabidopsis Comments/Questions