|
IntroductionBAC-end survey sequencing involves obtaining a single sequence read from both ends of the environmental DNA insert. BAC-end sequence data is collected from a large number of randomly selected clones in a given library, and is used to infer something of the genomic potential and diversity of the large insert library (and thereby, the original sample). BAC-end sequencing was easily automated because all the BACs are sequenced with the same primer set, and the entire process is done in a 96-well format. The introduction of automation makes this technique both relatively rapid and inexpensive, and thereby, has allowed us to do thousands of BACs from each of the three Monterey Bay BAC libraries.The advantage of using large insert BAC libraries (rather than 1-2 kbp insert plasmids) for survey sequencing is the BAC insert contains an enormous amount of genomic information. Therefore, if an interesting gene or phylogenetic marker is found in the survey sequence the BAC can be sequenced to closure. For this reason, we screen the BAC-end data for such genes, and those that are deemed appropriate are place into the closure queue. For more information on these processes, see Functional gene screening, Phylogenetic marker screening, or Random shotgun sequencing and closure. However, survey sequencing can only represent one methodology in environmental genomics. For all the advantages this method still has a serious limitation; namely, the survey sequencing data rarely has a phylogenetic marker to anchor the BAC to any specific taxa. Therefore, this data lets one consider the genomic potential of the large insert library, but survey sequencing cannot be used to infer the metabolic activities of any single bacterial taxa. MethodsBAC clones from frozen stocks are inoculated into 1.25 ml LB broth + antibiotic into each of three deep-well culture blocks. These blocks are sealed with air permeable tape and incubated for 18 hours at 37OC and 350 rpm. The cells are then pelleted by centrifugation and the cell pellets from each of the three plates combined (this ensures enough template). Currently, Big-Dye terminator reactions are used for sequencing on ABI3700 capillary sequencers. Sequence reads are automatically trimmed of vector and uploaded into the Monterey Bay Microbial Observatory database.Data AccessThe current numbers of BAC-ends sequences completed for each library, access to the sequence and the auto annotation data, and some search tools for the survey sequence data is located on the BAC-end information page.Automated Annotation and Data Release of Survey Sequence Data.IntroductionBAC-end survey sequencing of the Monterey Bay BAC libraries has generated an enormous amount of sequence data which contains information on the microbial mediated process that can be accomplished in this environment. However, the analysis of these sequence data and dissemination of those results in some usable format has some specific difficulties, and the design of tools for data analysis is not straightforward. For example, from each sequence read there are several possibilities for what this sequence can correlate; a partial gene, two partial genes, an RNA, a non-coding region, or the complete coding open reading. Therefore, the survey sequence data has been considered very differently from a complete gene and handled differently than a finished genome.Because large scale sequence data are only useful if large numbers of researchers are able to access and use such data to further their own research, we have tried to make the data as easy to use as possible. The BAC-end information page is an interface for users to download sequence data, access the BAC-end annotation, and run BLASTX or BLASTN of their sequence data against the database. The sequence data is split into two categories: potential phylogenetic markers and putative genes. The phylogenetic markers include rRNAs (identified by BLASTN), recA, radA, elongation factors. By linking to this section people can see where and in which direction a BAC-end sequence runs on one of these phylogenetic markers (to determine if addition sequence information would be useful). The sequence at the other end of the insert is also available from these pages. Parsing BLASTX information attains putative gene assignments. The page gives putative identification and links to GenBank. Because people may be interested in the "best hit" and databases expand rapidly, we include a refresh BLAST for both GenBank and the CMR databases. MethodsBAC end sequences are searched against the internal TIGR non-redundant protein database using BLASTX. BLASTX hits with a P value of 0.001 or better are considered. Open reading frames surrounding the hit regions are deduced and translated. The annotation associated with the top BLASTX match is transferred to the identified reading frame. This information is not curated and thus should not be considered a definitive identification.
|