Technical Highlight - November 2013
Short description: Identifying the structural features that can predict catalytic amino acids will enhance the functional assignment of unknown proteins in structural databases.
The rapidly expanding number of structures emerging from structural genomics projects is far outpacing the rate of functional analysis. While the activities associated with new structures are sometimes evident from prior functional characterization of related proteins, 30% of the structures deposited in databases have no functional annotation, reinforcing the need for computational approaches to predict biological roles.
Sequence conservation is among the most powerful indicators of functional relevance, but its predictive power is limited by the extent of conservation and the number of related proteins identified. Because 75% of homologous proteins share less than 30% sequence identity, structural information is additionally required for reliable functional assignment. This type of “hybrid” approach would be strengthened by complementary methods that can identify functional residues within conserved regions. Toward that end, Fajardo and Fiser directly assessed the correlation of features used to distinguish functional residues from their nonfunctional counterparts in order to determine those that most reliably predict catalytic residues from sequence and structural data. This was accomplished by analyzing 439 structures of a training dataset and determining correlations between pairs of attributes ascribed to potential functional amino acids. The features analyzed, which included distance to the general center of mass (GCM) of the structure, relative solvent accessibility (RSA), sequence conservation and closeness, and other graph centrality measures, were then used to train neural networks to identify catalytic residues.
In agreement with previous reports, the authors found that sequence conservation displays the highest correlation with function, but additional parameters can be reliable guides to catalytic propensity. Both the distance of residues to the GCM and closeness could distinguish functional and nonfunctional residues; in contrast, RSA shows essentially no correlation to function. The best predictive performance was obtained from networks using distance to the GCM and amino acid type as inputs, and was optimal when residues were preselected based on sequence conservation. This approach out-performed structure-only prediction methods, and also compared favorably with currently employed sequence-based methods. The authors note that the rapidly changing composition of sequence databases requires that sequence conservation be regularly recalculated to ensure the usefulness of sequence profile-based methods. The expanded ability to annotate protein structures for which there are presently no known functions would appear to be worth this effort.
J.E. Fajardo and A. Fiser Protein structure based prediction of catalytic residues.
BMC Bioinformatics. 14, 63 (2013). doi:10.1186/1471-2105-14-63