Inferring function from structure
Structure and function can be transferred between similar sequences because they have been conserved over long periods of time. Above 40% sequence identity, homologous proteins tend to have the same function.
Function: Biochemical: the chemical interactions ocurring in a protein; Biological: the role within the cell of the protein; Phenotypic: the role played by the protein in the organism as a whole.
EC (Enzyme Commission) provides a widely used protein functiona classification scheme. There are several databases containing funcional information: SWISS-PROT, GenProtEC, etc. There exist also multifunctional proteins. Gene Ontologies uses a controlled vocabulary for describing the roles of genes and gene products in any organism: (biological, molecular, cellular).
Functional information which can be obtained from 3D protein structures
- Basic structure: in the form of a PDB file.
- Protein-ligand complexes: can provide the biochemical function of the protein.
Protein structural classification is not of much help since some structures are under-represented. Furthermore, as the number of folds in limited in nature, similar structures can have totally different functions. Most folds have a homologous familiy associated with them, and it is expected that family members will have related function. There are, however, examples of divergence of function.
Analogues: some functions have different structural solutions (examples of convergent evolution).
Assigning function from structure
- Ab initio prediction: a protein-ligand binding site (active site) is often found to be the largest cleft in the protein.
- Structural comparisons: using structural databases such as CATH or SCOP. It is the most powerful method. Sometimes structural similarit can be the result of convergent evolution.
- Structural motifs: detailed knowledge of the active site is required. Six methods:
- SITE and SITE-Match: correlates an alignment with PDB and SWISS-PROT files.
- TESS: 3D Template Search and Superposition.
- Fuzzy Functional Forms (FFFs): derives FFFs from 3D structural information.
- SPASM, RIGOR: tools for studying constellations of small number of residues.
- Molecular Recognition: searches for similar spatial arrangements of atoms around a particular chemical moiety in proteins by superposing them.
- Protein Side Chain Patterns: detects active site in proteins via recurring amino acid side-chain patterns.