Technical Highlight - June 2015
Short description: A bioinformatics strategy takes advantage of the proximal organization of genes encoding proteins involved in metabolic pathways to predict protein function.
As sequencing data accumulate, effective approaches are needed to decipher functions of the enzymes encoded within those genomes. For organisms such as eubacteria and archaea, genes encoding enzymes and other proteins involved in the same metabolic pathway often cluster together in operons. Taking advantage of the localization in such clusters or gene neighborhoods, the groups of Jacobson, Gerlt and Almo (PSI NYSGRC) developed a new bioinformatics approach to predict in vitro activities of the encoded proteins as well as their metabolic functions in cells.
Using this strategy genome neighborhood networks (GNNs) they analyzed 2,333 unique sequences encoding proteins in the proline racemerase superfamily. The authors constructed a sequence similarity network in which varying thresholds can be set that correlate to distinct sequence identity levels; in this study, 35% and 60% cutoffs were used. The simultaneous query of all sequences results in amplification of genes for functionally related proteins; importantly, if genes for unrelated proteins occur within these neighborhoods in some species, those signals will be eliminated as noise using such analysis. For this reason, the authors suggest that this large-scale, aggregate approach is more efficient for the identification of proteins involved in metabolic pathways compared to single-genome analyses. The GNN approach predicted function for >85% of the proteins, which the authors verified by measuring in vitro enzyme activity, by assaying phenotypes and using transcriptomics as well as X-ray crystallography.
For more complex superfamilies, information from multiple sources will need to be integrated. For example, when bacterial genes are located in polycistronic transcriptional units, that information can be combined to identify pathways and predict enzyme function.
S. Zhao et al. Prediction and characterization of enzymatic activities guided by sequence similarity and genome neighborhood networks.
eLlife. 3 (2014). doi:10.7554/eLife.03275