|  Neosartorya fischeri Genome Project
  TIGR Home  |  TIGR Database  |  Genome Home  |  Introduction  |  Related Links  |  Help   

N.fischeri Genome Project


Genome Annotation


Genome Browser


What's New


Sequence Search


Data Release Policy


Download Sequences


Links

The TIGR Aspergillus fumigatus Help

Welcome to the TIGR Aspergillus fumigatus Database. These pages have been designed to allow the user to access data from all of the Aspergillus fumigatus genome sequences completed to date. Accessible here is the Aspergillus fumigatus database, a database that contains the sequence and annotation of each of the completed chromosome as well as associated information about the organisms (such as the structure and composition of their DNA molecules (such as GC content), and many attributes of the protein sequences predicted from the DNA sequence (such as evidence for gene prediction, pI, and molecular weight).

If you are experience difficulties with this site please send us a message explaining your problem to afum@tigr.org and we will get back to you as soon as possible.

Explanation of General Terms

COGs: COGs or Clusters of Orthologous Proteins are phylogenetic classifications of proteins encoded in complete genomes. COGs were delineated by comparing protein sequences encoded in 21 complete genomes, representing 17 major phylogenetic lineages. Each COG consists of individual proteins or groups of paralogs from at least 3 lineages and thus corresponds to an ancient conserved domain. For more information, visit the COG Home Page at NCBI. Reference: Tatusov RL, et al.

Enzyme Commission Number: Enzyme Commission Numbers or EC#s are numbers assigned to enzymes that reflect their function. For more information and a complete list of all EC#s, visit the Enzyme Nomenclature Page.

MUMmer: The MUMmer or the Whole Genome Alignment Tool is a system for aligning whole genome sequences. Using an efficient data structure called a suffix tree, the system is able rapidly to align sequences containing millions of nucleotides. It is fully described in: A.L. Delcher, S. Kasif, R.D. Fleischmann, J. Peterson, O. White, and S.L. Salzberg. Alignment of whole genomes. Nucleic Acids Research, 27:11 (1999), 2369-2376.

Paralogous Gene Families: Paralogous gene families are genes which have been duplicated within a particular organism during evolution. Not all genomes in the Omniome database have paralogous gene families assigned.

Pfam: Pfam is a large collection of multiple sequence alignments and hidden Markov models covering many common protein domains. For more information on Pfam, visit the Sanger Centre Pfam Home Page.

PROSITE: Prosite is a database of protein families and domains. It consists of biologically significant sites, patterns and profiles that help to reliably identify to which known protein family (if any) a new sequence belongs. To get more information on Prosite, visit the Prosite Home Page.

Terms associated with TIGRFAMs

TIGRFAMs: TIGRFAMs are a collection of protein families featuring curated multiple sequence alignments, Hidden Markov Models (HMMs) and associated information designed to support the automated functional identification of proteins by sequence homology. Classification by equivalog family (see below), where achievable, complements classification by orthologs, superfamily, domain or motif. It provides the information best suited for automatic assignment of specific functions to proteins from large scale genome sequencing projects. To download or get more information on TIGRFAMs, go to the TIGRFAMs Home Page.

HMM: A Hidden Markov Model, or HMM, is a statistical model for any system that can be represented as a succession of transitions between discrete states. In this case, the discrete states correspond to the successive columns of a protein multiple sequence alignment. In principle, HMMs can be developed from unaligned sequences by successive rounds of optimization, but in practice, protein profile HMMs are simply built from curated multiple sequence alignments. HMM searches resemble later round PSI-BLAST searches (although based on curated alignments), with position-specific scoring for each of the amino acid, insertion, and deletion over the length of the sequence. Scores are reported both in bits of information and as an E-value.

Equivalog: Equivalogs describe members of a set of homologous proteins that are conserved with respect to function since their last common ancestor. Related proteins are grouped into equivalog families where possible, and otherwise into protein families with other hierarchically defined homology types.

Orthologs: Proteins related to each other by descent from a common ancestral sequence by speciation. Orthologs may differ in function.

Superfamily: The complete set of proteins having sequence homology over essentially their full length.

Domain: A region of sequence homology among sets of proteins that are not all full-length homologs. Homology domains often, but not always, correspond to recognizable protein folding domains.

Motif: Generally, a small region of sequence similarity (not necessarily homology) characterized by distinct patterns of amino acids at specific positions. An example of a motif is the N-glycosylation site motif N{P}[ST] (Asn, anything but Pro, choice of Ser or Thr).

EGAD: A database used to store gene, protein and TIGRFAM/HMM information.

Noise Cutoff: The HMM score below which hits to the HMM are considered uninteresting.

Trusted Cutoff: The HMM score above which there should be no false positive hits.


For Comments/Questions send mail to afum@tigr.org.