## Saturday, July 29, 2006

### Structure prediction in 1D

Limited computing resources and experimental inaccurices prevent prediction of protein structure from first principles. Therefore, the only succesful structure prediction tools are knowledge-based, using a combination of statistical theory and empirical rules.

Secondary Structure Prediction Methods
• Basic concept: segments of consecutive residues have preferences for certain secondary structure states: a pattern recognition problem (helix, strand or coil or loop). Physicochemical principles, rule-based devices, expert systems, graph theory, linear and multilinear statistics, nearest-neighbor algorithms, molecular dynamics, neural networks. The main limitation is the use of only local information, which is estimated to play for roughly 65% of secondary structure formation. To improve predictions, it is key the use of evolutionary information.
• Programs: PHD, JPred2 (JNet, NSSP, PREDATOR, PHD), PSIRED, SSPro2, HMMSTR/I, etc.
• Specialized methods: coiled-coil predictions. A coiled coil is a bundle of several helices assuming a side-chain packing geometry ("knob-into-holes"). COILS.
Solvent Accesibility Prediction Methods
• Basic concept: try different arrangements and assess them by predicting the extent to which a residue embedded in a protein structure is accesible to the solvent. PDH, PROFphd, JPred2 server.
Transmembrane Helix Prediction Methods
• Transmembrane proteins still represent a challenge. They do not crystalize, and are hardly tractable by NMR spectroscopy. Prediction is simplified by the fat of the lipid bilayer of the membrane, which reduced the degrees of freedom making the prediction almost a 2D problem.
• Basic concept: TM helices are predominantly apolar and between 12 and 35 residues long, globular regions between membrane helices are typically shorter than 60 residues, most TMH proteins have a specific distribution of the positively charged amino acids Arginine and Lysine (the "positive-inside-rule").
• Programs: ToPred2, MEMSAT, TMAP, PHD, TMHMM, HMMTOP
Public Servers
PHDsec, PROFsec : neural-network based prediction of secondary structure, accessibility and TMH.

JPred, Jpred2 : neural networks, evolutionary information. Version 2 evaluates results from 4 different neural networks (JNet, NSSP, Predator, PHD).

PROF, multiple alignments and other characteristic from databases.

PSIpred: based on profiles created by psi-blast and neural networks.

SAM-T99 : neural network and HMM.

SCRATCH: uses SSPro (recursive bidirectional neural networks).

Personal working notes extracted from B. Rost, "Prediction in 1D: Secondary Structure, Membrane Helices, and Accesibility" in Structural Bioinformatics

[+/-] show/hide this post

### Electrostatic interactions software: DelPhi

DelPhi provides numerical solutions to the Poisson-Boltzmann equation (both linear and nonlinear form) for molecules of arbitrary shape and charge distribution. The current version is fast (the best relaxation parameter is estimated at run time), accurate (calculation of the electrostatic free energy is less dependent on the resolution of the lattice) and can handle extremely high lattice dimensions. It also includes flexible features for assigning different dielectric constants to different regions of space and treating systems containing mixed salt solutions.

DelPhi takes as input a coordinate file format of a molecule or equivalent data for geometrical objects and/or charge distributions and calculates the electrostatic potential in and around the system, using a finite difference solution to the Poisson-Boltzmann equation. DelPhi is a versatile electrostatics simulation program that can be used to investigate electrostatic fields in a variety of molecular systems.

New features of DelPhi include solutions to the nonlinear form of Poisson-Boltzmann equation which provide more accurate solutions for highly charged systems; solutions to mixtures of salts of different valence; solutions to different dielectric constants to different regions of space; higher precision in the finite difference scheme through the derivation of the expression for electrostatic free energy; and estimation of the best relaxation parameter at run time. All of these features enhances the speed and versatility of DelPhi to handle more complicated systems and finite difference lattices of extremely high dimension.

[+/-] show/hide this post

### Protein docking Software: ZDOCK

In protein docking, the structure of a complex between two proteins is predicted based on the independently crystallized structures of the components.

ZDOCK uses a fast Fourier transform to search all possible binding modes for the proteins, evaluating based on shape complementarity, desolvation energy, and electrostatics. The top 2000 predictions from ZDOCK are then given to RDOCK where they are minimized by CHARMM to improve the energies and eliminate clashes, and then the electrostatic and desolvation energies are recomputed by RDOCK (in a more detailed fashion than the calculations performed by ZDOCK).

For basic information on running ZDOCK, see this site.

[+/-] show/hide this post

### Visualization software: Discovery Studio Visualizer

With DS Visualizer, you can visualize and share molecular information in a clear and consistent way, and in a wide variety of industry-standard formats. You can also create high quality graphics. DS Visualizer runs on both Windows 2000 and XP and the Red Hat Enterprise Linux operating system, versions 3.0 and 4.0.

[+/-] show/hide this post

### Structure modeling software: Modeller

Modeller is a computer program that models three-dimensional structures of proteins and their assemblies by satisfaction of spatial restraints. Modeller is most frequently used for homology or comparative protein structure modeling: The user provides an alignment of a sequence to be modeled with known related structures and Modeller will automatically calculate a model with all non-hydrogen atoms. More generally, the input to the program are restraints on the spatial structure of the amino acid sequence(s) and ligands to be modeled. The output is a 3D structure that satisfies these restraints as well as possible.

Restraints can in principle be derived from a number of different sources. These include related protein structures (comparative modeling), NMR experiments (NMR remanement), rules of secondary structure packing (combinatorial modeling), cross-linking experiments, fluorescence spectroscopy, image reconstruction in electron microscopy, site-directed mutagenesis, intuition, residue, residue and atom, atom potentials of mean force, etc. The restraints can operate on distances, angles, dihedral angles, pairs of dihedral angles and some other spatial features defined by atoms or pseudo atoms. Presently, Modeller automatically derives the restraints only from the known related structures and their alignment with the target sequence.

A 3D model is obtained by optimization of a molecular probability density function (pdf). The molecular pdf for comparative modeling is optimized with the variable target function procedure in Cartesian space that employs methods of conjugate gradients and molecular dynamics with simulated annealing. Modeller can also perform multiple comparison of protein sequences and/or structures, clustering of proteins, and searching of sequence databases. The program is used with a scripting language and does not include any graphics. It is written in standard Fortran 90 and will run on Unix, Windows, or Mac computers.

http://www.salilab.org/modeller/

A. Sali and T. L. Blundell. Comparative protein modelling by satisfaction of spatial restraints. J. Mol. Biol. 234, 779-815, 1993.

[+/-] show/hide this post

### Molecular modeling software: TINKER

The TINKER molecular modeling software is a complete and general package for molecular mechanics and dynamics, with some special features for biopolymers. TINKER has the ability to use any of several common parameter sets, such as Amber (ff94, ff96, ff98 and ff99), CHARMM (19 and 27), Allinger MM (MM2-1991 and MM3-2000), OPLS (OPLS-UA, OPLS-AA and OPLS-AA/L), Liam Dang's polarizable potentials, and our own AMOEBA polarizable atomic multipole force field. Parameter sets for other standard force fields such as GROMOS, UFF, ENCAD and MM4 are under consideration for future releases.

The TINKER package includes a variety of novel algorithms such as a new distance geometry metrization method that has greater speed and better sampling than standard methods, Elber's reaction path methods, several of our Potential Smoothing and Search (PSS) methods for global optimization, an efficient potential surface scanning procedure, a flexible implementation of atomic multipole-based electrostatics with explicit dipole polarizability, a selection of continuum solvation treatments including several variants of the generalized Born (GB/SA) model, an efficient truncated Newton (TNCG) local optimizer, surface areas and volumes with derivatives, a simple free energy perturbation facility, normal mode analysis, minimization in Cartesian, torsional or rigid body space, velocity Verlet stochastic dynamics, an improved spherical energy cutoff method, Particle Mesh Ewald summation for partial charges and regular Ewald for polarizable multipoles, a novel reaction field treatment of long range electrostatics, and more....

Force Field Explorer (FFE): A complete Java GUI for TINKER.

http://dasher.wustl.edu/tinker/

[+/-] show/hide this post

### Visualization software: MOLMOL

MOLMOL is a molecular graphics program for displaying, analyzing, and manipulating the three-dimensional structure of biological macromolecules, with special emphasis on the study of protein or DNA structures determined by NMR. The program runs on UNIX and Windows NT/95/98/2000 and is freely available.

http://hugin.ethz.ch/wuthrich/software/molmol/

Koradi, R., Billeter, M., and Wüthrich, K. (1996) J Mol Graphics 14, 51-55.
MOLMOL: a program for display and analysis of macromolecular structures.

[+/-] show/hide this post

## Thursday, July 27, 2006

### Ab initio

Seeks to predict the native conformation of a protein from the amino acid sequence alone.

Simplifications: to employ implicit solvent models, united atom representations, side chains represented using a limited set of conformations found prevalent in PDB, restricting the conformation available to the polypeptide backbone.

Rosetta: based on a model of folding in which short segments of the protein chain flicker between different local structures, consistent with their local sequences, and folding to the native state occurs when these local segments are oriented such that low free energy interactions are made throughout the protein.

Two categories of potential functions can be employed in evaluating the free energy of the peptide chain and the surrounding solvent: molecular mechanics and protein-structure derived potentials or scoring functions derived from experimental structures from the PDB.

[+/-] show/hide this post

Energy-based fold recognition methods, based on the "thermodynamic hypothesis": the native conformation of a protein corresponds to a global free energy minimum of the protein/solvent system.

Score of alignments in a sequence-based comparison is local. Energy-based scores are not local. Alignment with nonlocal functions is an NP-complete problem.

Solutions: to use an alignment technique that could work with nonlocal scoring functions (computational expensive), two-level dynamic programming to optimize interacting partners for each possible pair of aligned residues, and other energy calculations that allows the use of dynamic progrmming.

[+/-] show/hide this post

### Homology modeling

• The structure of a protein is determined by its amino acid sequence.
• During evolution, the structure is more stable and changes much slower than the associated sequence.
The twilight zone: inversely depends on percentage of identical residues and number of aligned residues.

Seven steps:
1. Template recognition and initial alignment: using BLAST or FASTA in the safe zone,
2. Alignment correction: there are many pathological example, therefore it should be used a "multiple sequence alignment", for instance with CLUSTALW,
3. Backbone generation: copy the coordinates of those template residues that show up in the alignment with the model sequence, it is better to choose a template with the fewest errors in the PDB. It is possible multiple template modeling (Swiss-Model), although it is more complex,
4. Loop modeling: conformational changes cannot happen within regular secondary structure elements (helices and strands), therefore insertions or deletions due to gaps should be placed in loops and turns. These changes are notoriously difficult to predict. Two approaches: knowledge based: 3D-Jigsaw, Insight, Modeller, WHAT IF and energy based: Monte Carlo or MD techniques,
5. Side-chain modeling: we can copy conserved residues entirely from the template to the model only at high level of sequence identity. Practically, side-chain placements are at least partly knowledg-based, libraries of common rotamers. There is a combinatorial explosion, which can be handled by the fact that certain backbone conformations strongly favor certain rotamers. Prediction accuracy is low for residues on the surface,
6. Model optimization: an iterative process: predict the rotamers, then the resulting shifts in the backbone, then the rotamers for the new backbone, and so on, until the procedure converges. Rotamer prediction and energy minimization. Two eays to achieve greater accuracy: quantum force fields and self-parameterizing force fields (adaptive),
7. Model validation: errors on sequence identity and in the template. Two ways to estimate errors in a structure:
• calculating the model's energy based on a force field, check if the bond lengths and bond angles are within normal ranges, and if there are lots of bumps in the model;
• normality indices that describe how well a given characteristic of the model resembles the same characteristic in real structures: bond lengths, and bond and torsion angles; distribution of polar and apolar residues; potential of mean force; 3D distriution functions (considering direction of atomic contacts)

[+/-] show/hide this post

### Electrostatic interactions

The high dielectric coefficient of water, together with the tendency of diffusible ions to move toward biopolymer charges of opposite sign, reduces the effective interactions amonr the biopolymer charges (solvent-screened interactions).

Electrostatic-steering effects can lead to increases in the rate constant of association by two orders of magnitude.

The thermodynamics and kinetics of protein-protein association and larger-scale supramolecular assembly can be analyzed and predicted in many cases with the aid os electrostatic calculations, supplemented in the kinetics area by simulation of the diffusional motion of proteins.

Poisson-Boltzmann Theory

Canonical expression:

$- \nabla \cdot \epsilon (\mathbf{x}) \nabla \varphi(\mathbf{x}) = \rho (\mathbf{x})$

dielectric coefficient (2-20 for molecules inside the solute and smooth evolution to 80 in the solvent ), electrostatic potential and charge distribution (a collection of Dirac delta functions that model the $N_f$ atomic partial charges of the solute.

Poisson-Boltzmann equation (PBE): a variant where mobile counterion charges are introduced to the charge distribution in a mean field fashion: $\rho(x) = {\rho_f } (x) + {\rho_m} (x)$. There exists a linearized PBE.

Energies

The free energy is a function of the electrostatic potential as well as the atomic positions, charges, and radii. The calculated energies (protonation, binding and solvation energies) are combined and converted to a $pK_a$ value.

Forces

The dynamic trajectory of a solute is calculated without the inclusion of the numerous explicit solvent molecules required for traditional MD simulations. The solvent effects are modeled by stochastic forces applied to the biomelecule.

Numerical solutions of the PBE

Software: Delphi, APBS, MEAD, UHBD, MAcroDox, AMBER, CHARMM.

Finite Difference Discretization: cartesian meshes, but it does not provide a way to locally increase the accuracy of the solution in a specific region without increasing the number of unknowns across the entire grid.

Adaptive Finite Element Discretization: offer the ability to place computational effort in specific regions of the problem domain.

Multilevel solvers: iterative method until the desired accuracy is reached by projecting the discretized system onto meshes (or grids) at multiple resolutions.

[+/-] show/hide this post

### Protein-protein interactions from evolutionary information

Protein interactions are key for understanding the functions of genes and proteins.

Multiple sequence alignments are rich sources of evolutionary information: conserved positions in structural cores and active sites; family-dependent conservation (tree determinants); coevolution (correlated mutations).

Sequence-based methods for prediction of interacting regions
• Tree-Determinant Residues and interacting surfaces,
• Correlated mutations: can point to the residues and regions implicated in the interaction between the two proteins,
Hybrid methods use both structure (docking) and sequence information: neural networks.

Computational methods based on genomic information:
• Phlyogenetic Profiles,
• Conservation of Gene Neighboring,
• Gene Fusion: related to presence of fused genes in various genomes,
Computational methods based on sequence information:
• Correlated Mutations (i2h),
• Similarity of Phylogenetic Trees (mirrortree).

[+/-] show/hide this post

### Inferring function from structure

Structure and function can be transferred between similar sequences because they have been conserved over long periods of time. Above 40% sequence identity, homologous proteins tend to have the same function.

Function: Biochemical: the chemical interactions ocurring in a protein; Biological: the role within the cell of the protein; Phenotypic: the role played by the protein in the organism as a whole.

EC (Enzyme Commission) provides a widely used protein functiona classification scheme. There are several databases containing funcional information: SWISS-PROT, GenProtEC, etc. There exist also multifunctional proteins. Gene Ontologies uses a controlled vocabulary for describing the roles of genes and gene products in any organism: (biological, molecular, cellular).

Functional information which can be obtained from 3D protein structures

• Basic structure: in the form of a PDB file.
• Protein-ligand complexes: can provide the biochemical function of the protein.
Relationship between structure and function

Protein structural classification is not of much help since some structures are under-represented. Furthermore, as the number of folds in limited in nature, similar structures can have totally different functions. Most folds have a homologous familiy associated with them, and it is expected that family members will have related function. There are, however, examples of divergence of function.

Analogues: some functions have different structural solutions (examples of convergent evolution).

Assigning function from structure

• Ab initio prediction: a protein-ligand binding site (active site) is often found to be the largest cleft in the protein.
• Structural comparisons: using structural databases such as CATH or SCOP. It is the most powerful method. Sometimes structural similarit can be the result of convergent evolution.
• Structural motifs: detailed knowledge of the active site is required. Six methods:
1. SITE and SITE-Match: correlates an alignment with PDB and SWISS-PROT files.
2. TESS: 3D Template Search and Superposition.
3. Fuzzy Functional Forms (FFFs): derives FFFs from 3D structural information.
4. SPASM, RIGOR: tools for studying constellations of small number of residues.
5. Molecular Recognition: searches for similar spatial arrangements of atoms around a particular chemical moiety in proteins by superposing them.
6. Protein Side Chain Patterns: detects active site in proteins via recurring amino acid side-chain patterns.

[+/-] show/hide this post

### Modern Drug Discovery

• Target identification: usually through biological or genetic investigation.
• An assay to look for modulators (either inhibitors, antagonists, or agonists) of the target activity.
• High-throughput screen (HTS).
• Elaboration of the initial small molecule hit through medicinal chemistry: combinatorial chemistry, quantitative structure activity relationships (QSAR), computer-aided drug design (CADD) and structure-based drug design.
• Lead optimization into a candidate drug: multidimensional optimization problem searching within the relatively limited chemical space of analogs of the lead compound.
• Large-scale production methods, preclinical animal safety studies, clinical trials.

Structural Bioinformatics on Drug Discovery

Target Assessment: target druggability: the energetically optimal protein would be spherical, with all its hydrophobic residues pointing inward. A quantitative approach is the rule of five (Lipinski):

A compound is likely to show poor absorption or permeation if:
• It has more than five hydrogen bond donors
• The molecular weight is over 500
• The Clog P (calculated octanol/water partition coefficient) is over five
• The sum of nitrogens and oxygens is over 10
• Weak inhibition (< 100 nM) is observed

Another physicochemical complementary properties: surface area and volume of the pocket, hydrophobicity and hydrophilic character, curvature and shape of the pocket.

Target Triage: computer-aided target selection (CATS), based on the importance of a gene for the organism, the occurrence of the gene in multiple target species, specificity or inhibition by reference to sequence similarity, and easiness of assay. The group of residues lining a ligand-binding site are of more
importance than long-range interaction and conformational changes.

Target Validation: knock-out of the gene of interest or RNA antisense technology to inactivate the gene.

Lead Identification: structural bioinformatics can be used for function and ligand prediction. Using structural similarity to find chemical leads (usually, when the proteins share less than 30% sequence identity the active sites are nonidentical). Virtual screening: to derive a pharmacophore describing the functionally important sites in a ligand-binding site, and docking and scoring. Creating a chemical library.

Lead Optimization: repeated cycles of determining the structure of the target in complex with a number of lead compounds and their analogs. Structural bioinformatics should be used to design suitable constructs of the outset of the project. Alignment and secondary structure prediction on multiple alignments. Homology modeling and the use of a surrogate protein (an orthologous protein from another species or a similar member of the same gene family).

ADMET Modeling: additional parameters that can affect the biopharmaceutical and safety properties of the drug: in vivo absorption, distribution, metabolism, excretion, and toxicology. Sequence-structure relationship and protein homology modeling can be used in ADMET modeling.

[+/-] show/hide this post

## Wednesday, July 26, 2006

### Test of methods

• Reassembling Complexes: generation of the experimental structure of the complex as the best scoring geometry.

$Rmsd = \sqrt{\frac{\sum_{i=1}^{N_{a t o ms}} d_{i}^2}{N_{a t o ms}}$

$d_i$ is the distance between the coordinates of atoms i in the two structures when overlaid. Generally less than 2 Amstrongs is considered acceptable for small ligands.

Several algorithms have been proposed for incorporating flexible receptor information. Rmsd may not be the best metric for systems that exhibit conformational changes. It is important to know how robust the geometry is to changes in ligand or changes (mutations) in the receptor.

• Rank ordering of energies: a docking program can do this rank. It is used in both library screening and lead optimization. The challenge is to rank correctly ligands that are very similar to each other. An improvement is to use consensus scoring.
• Virtual Screening: two metrics are used: hit rate and the enrichment factor (number of TP divided by number of TP+FP).
• Common Docking Failures: incomplete searching and innacurate scoring function. Algorithm or soft failure: probably due to the rugged, multiminima character of the binding energy landscapes. Scoring or hard failure: when the crystal structure has a higher energy than the predicted.

[+/-] show/hide this post

### Scoring functions

A practical method for calculating binding free energy: Free Energy Perturbation (FEP).

Methods for high-throughput:
• First Principle Methods: molecular mechanics force field wich contains intra/intermolecular forces. No entropic contributions. Usually no interactions with solvent.
• Semiempirical Methods: the linear interaction energy (LIE) method to calculate absolute binding freee energies without sampling. Based on the linear response approximation for electrostatic forces.
• Empirical Methods: able to score ligands very rapidly. LUDI the most well known. Structural descriptors and regression methods, based both on structural and experimental binding data.
• Knowledge-based Potentials: based on interatomic preferences between atoms.
Parametrization of molecular mechanics scoring functions:
• Charge Representation: using ab initio techniques: by the integration of Coulomb's laws over the total charge distribution. Often approximated as point charges. Charge models or an electrostatic potential fitting for reproducing a number of grid points outside the Van der Waals surface of the molecule. M,any emprical methods.
• Van der Waals Radii: represent the effective size of atoms. Usually, the Lennard-Jones or 6-12 potential function is used.
• Solvent Representation: an explicit collection of individual water molecules, where each molecule is treated as a configuration of point charges. Simplest: 3, 4 or 5 charges and rigid geometry. Computational expensive. Another approach: continuum water models. Screening effect is represented by using a distance-dependent dielectric constant. Generalized-Born model and the Poisson-Boltzmann method. The nonpolar solvation free energy term is often assumed to be proportional to the SASA.
Calculation:
• The Sampling Problem: there are many degrees of freedom. At least the 6 deegrees for rigid docking must be sampled. Also, some limited flexibility.
• Receptor Flexibility: using soft scoring functions, allowing some overlap between the ligand and the receptor, accounting for structural uncertainties. Many other approaches.

[+/-] show/hide this post

### Free energy calculation

Changes in enthalpy (Van de Waals and Coulomb interactions, and internal energy) and entropy (configurational: translational and rotational deegrees of freedom, conformational and vibrational) on formation of the complex: $\Delta G_{bi nd} = \Delta H - T \Delta S$

• Enthalpy can be calculated using a molecular mechanics force fields like:

$E_{MM} = \sum_{bonds} K_r (r-r_{e q})^2+\sum_{angl es} K_{\theta} (\theta-\theta_{e q})^2$
$+\sum_{dihedrals}\frac{V_{n}}{2}[1+cos(n\phi-\gamma)]$
$+\sum_{i < j}[\frac{A_{ij}}{R_{ij}^{12}}-\frac{B_{ij}}{R_{ij}^{6}}+\frac{q_i q_j}{\epsilon R_{ij}}]$
• Entropy can be obtained using Boltzmann's law:
$S = -k \sum_{j} P_{j} \ln P_{j}$
$P_{j} = \frac{e^{-\frac{E_{j}}{kT}}}{\sum_{j} e^{ -\frac{E_{j}}{kT} }}$

Entropy can be split into four terms:

$\Delta S_{t otal} = \Delta S_{trans}+\Delta S_{rot}+\Delta S_{conf} + \Delta S_{vibr}$

Both entropy and enthalpy are strongly temperature dependent. Other conditions: pH, ionic strength, and water activity.

Solvent influences:
• short-range solute-solvent interactions: nonpolar solvation free energies assumed to be proportional to the solvent accessible surface area (SASA),
• long-range electrostatic interactions: represented by a macroscopic dielectric constant (screening effect).

[+/-] show/hide this post

### Computer-aided molecular design

Normally leads were discovered from natural products, biochemistry or exploring analogs of know substrates or ligands. Nowadays, computational methods are beginning to play a major role.

Only in the last few years: structure available during the drug discovery process. Membrane-bound proteins specially difficult to find.

Computer-aided drug design:
• Analog-based: uses pharmacophores (an explicit geometric hypothesis of the critical features of a ligand) and QSAR (quantitative structure-activity relationships), based solely on their chemical structure: the linear free energy principle.
• Structure-based: starting from the 3D-structure of the target (by X-ray crystallography, NMR spectrocopy, computer homology methods or ab initio methods), the binding site is located by comparison or homology. Conformational analysis: lowest energy conformations when free in solution and when bound to the receptor.
Two main approaches:
1. Docking: characterization of the ligand, sampling: positioning (configurational) and conformational states [FFT], and scoring: energetic evaluation of each discrete geometry.
2. De novo: construction of molecules that have not been synthesized previously. Three methods: a) fragment placement: focus on a small number of well-placed fragments, b) connection methods: the linker provides a compatible geometry for connecting the critical fragments, c) sequential growth: a step-by-step (starting with a seed) construction of a hypothetical ligand within a binding pocket.
Virtual screening: build a library to priorize experimental efforts.

[+/-] show/hide this post

## Tuesday, July 25, 2006

### Principles of ligand design

Ligand design

Ligand: a molecule of any size that binds or interacts with another molecule through noncovalent forces (usually not involving chemical bond formation).

Target or receptor: usually the larger species.

Interactions:
• chemical/physical forces between ligand and receptor,
• and between each of these molecules and the solvent or environment.
Study of the interactions:
• fundamentally using quantum mechanics (limited by computational resources),
• other empirical computational approaches,
• comparison with experiments.
Properties connected to thermodynamics: free energy of binding, solubility in aqueous and nonaqueous enviroments, and so forth.

Kinetics: less often considered, but important in enzyme catalisis, signaling cascades, and molecular rearrangements.

Free energy of binding (difference in the free energy of the complex and the free energy of its components, the receptor and the ligand):

$\Delta G_{bi n d} = \Delta G_{compl e x} - (\Delta G_{liga nd} + \Delta G_{recept o r})$

$\Delta G_{bi n d}$ is a function of temperature, pressure, ionic strength, pH, solvent, and concentratrion of all the chemical species present.

Experimental measurent: $\Delta G_{bi n d} = -RT \ln K_{e q} = RT \ln K_{d}$

Personal working notes extracted from "Principles and Methods of Docking and Ligand Design" in Structural Bioinformatics

[+/-] show/hide this post