AGeS: A Software System for microbial Genome Sequence Annotation
Annotation of microbial Genome Sequences (AGeS)

AGeS is a fully integrated high-performance software system to annotate DNA sequences and assign function(s) to the predicted protein-coding regions for completed and draft bacterial genomes. It predicts genomic features using a number of bioinformatics methods and provides visualization based on the familiar genome browser GBrowse.

The cataloguing and analysis of microbial genomes sequenced using next-generation technologies opens new avenues for screening unknown microbes and analyzing their genetic diversity. For such applications, the analysis of sequenced genomes needs to be rapid, high-throughput, fully automated, integrated, and readily accessible to intended users. Although a few Web-based annotation services have recently become available, they may not be the best solution for researchers that need to annotate a large number of genomes, possibly including proprietary data, and store them locally for further analysis. To address this need, we have developed a standalone software application, the Annotation of microbial Genome Sequences (AGeS), which incorporates publicly available and in-house-developed bioinformatics programs and databases, many of which are parallelized for high-throughput performance.

The AGeS system supports three main capabilities. The first is the storage of input contigs and the resulting annotation data in a central, customized database. The second is the annotation of microbial genomes using an integrated software pipeline, which first analyzes contigs from high-throughput sequencing by locating genomic regions that code for proteins, RNA, and other genomic elements through Do-It-Yourself Annotation (DIYA) [1]. The identified protein coding regions are then functionally annotated using an in-house-developed high-throughput pipeline, the Pipeline for Protein Annotation (PIPA) [2, 3]. Full annotation results are stored in a customized database for further analysis and can be downloaded as a GenBank format file. The third capability is the visualization of annotated sequences using GBrowse. To date, we have implemented these capabilities for bacterial genomes. Gene and protein annotation for a typical bacterial genome takes <6 h using 64 cores on a high-performance computing system. The AGeS software was validated by comparing its genome annotations with those provided by three other methods, using examples of completed and draft genome sequences. Our results indicate that the tools integrated into AGeS provide annotations that are in general agreement with those provided by the compared methods. This is demonstrated by a >94% overlap in the number of identified genes, a significant number of identical annotated features, and a >90% agreement in enzyme function predictions.

      
[1]Stewart AC, Osborne B, Read TD (2009) DIYA: a bacterial annotation pipeline for any genomics lab. Bioinformatics 25: 962-963.
[2]Yu, C., N. Zavaljevski, V. Desai, S. Johnson, F. J. Stevens, and J. Reifman. The Development of PIPA: An Integrated and Automated Pipeline for Genome-wide Protein Function Annotation. BMC Bioinformatics. 2008 January 29; 9:52.
[3]Yu, C., V. Desai, N. Zavaljevski, and J. Reifman. PIPA: A High-throughput Pipeline for Protein Function Annotation. Proceedings of the HPCMP Users Group Conference. Seattle, WA. 2008 July 14-17:241-246.
Planned Upgrades
  • Comparative Genome Analysis
Resources
Publications
Kumar, K., V. Desai, L. Cheng, M. Khitrov, D. Grover, R. Vijaya Satya, C. Yu, N. Zavaljevski, and J. Reifman. AGeS: A Software System for Microbial Genome Sequence Annotation. PLoS ONE. 2011; 6(3). [PDF]