The theoretical basis of HSMD

Growth versus dynamical simulation procedures

HSMD enables one to calculate S and F from a sample generated by MD, Monte Carlo (MC), or any other exact simulation technique. HSMD is based the fact that the configurations of any system can, in principle, be generated exactly also by a step-by-step (growth) procedure, where particles are added gradually to an initially empty volume using transition probabilities (TPs). A trivial example is an ideal polymer chain of N bonds on a square lattice, i.e. a chain without the excluded volume interaction. An ideal chain can be simulated exactly by the dynamical MC method where the entropy, however, is unknown; alternatively, a chain configuration, i, can be constructed from nothing as a random walk where at each step a bond’s direction (out of four possible directions) is chosen with TP=1/4; here the Boltzmann probability is known, PiB = (¼)N and thus the entropy is known as well. Clearly, two large samples, one obtained by MC and the other generated as random walks are equivalent in the sense that they lead to the same thermodynamic averages and fluctuations.

For more complex systems (argon, water, peptides, etc) an exact growth procedure can be defined – the exact scanning method, which is also based on calculating TPs (see ref. 1 for argon and ref. 2 for a peptide). Again, the exact scanning method is equivalent to any other exact simulation technique (in particular, to MC or MD) in the sense discussed above (in other words, the sample does not carry a memory of the simulation method with which it has been generated).

Therefore, one can assume that a given MC or MD sample has rather been generated by the exact scanning method, which enables one to reconstruct each conformation i by calculating the TP densities that hypothetically were used to create it step-by-step. This is the essence of the “hypothetical scanning MD method” (HSMD) (or HSMC).

Illustration of the reconstruction procedure – stage 1

In the figure above we illustrate the HSMD reconstruction process of conformation (snapshot) i of a peptide consisting of three glycine residues, where for simplicity the oxygens and most of the hydrogens are discarded. First, the heavy atoms are ordered along the chain and are numbered by k’ where k’=1,..,10. The reconstruction is carried out in internal coordinates where the bond lengths are assumed to be constant; therefore, if the positions of atoms 1,…,k’-1 have been determined, the position of atom k’ is defined solely by two angles, a bond and a dihedral angle. These angles are ordered along the chain and denoted αk, where k=2k’ (k=1, 2, …,20). The reconstruction is performed atom by atom, where for each atom a TP is defined with respect to the corresponding angles αk-1 and αk (as described below). After the TP related to an atom has been determined the atom is placed in its position in conformation i and the next atom is treated and the process continues until TPs have been calculated for all the atoms.

The figure is concentrated on step (atom) k’=6. At this stage the TPs related to the “past” atoms k’=1…5 (depicted by full spheres) have already been determined and these atoms are kept fixed at their positions in conformation i. At this step (6) one calculates the TPs of bond angle αk (defined by C’-Cα-N) and dihedral angle αk-1 (defined by C’-Cα-N-C’) which are related to atom C’, where k=2k’ and thus k=12. For that one defines the “future” atoms which are the as yet untreated atoms, k’=6…10 which are depicted by empty spheres connected by dashed lines. Thus, one carries out an MD simulation of the future atoms, while the past atoms k’=1…5 are kept fixed in their positions (if the peptide is immersed in a box of water the water molecules are also considered to be future molecules and they are also moved in the MD simulation). Every, say 10 fs the Cartesian coordinates of the future atoms of the peptide (and the coordinates of the corresponding “past” fixed atoms) are stored in files for a later analysis by the program. Thus, altogether nf conformations are stored for step k’=6 and a total of 10nf conformations are stored for the molecule depicted in Fig. 1. This defines stage 1 of the process.

Notice that the calculations of stage 1 can be carried out with any of the available programs (AMBER, TINKER, NAMD, etc).

Stage 2- calculation of TPs (using the program)

The TPs are calculated in stage 2 by the program provided here which uses the stored data; the calculation of TP(k’=6) is described as follows: ρ(α11 α12| α10,…α1)

Thus, two small segments (bins) δαk-1 and δαk are centered at αk-1(i) and αk(i), respectively, and the number of simultaneous visits, nvisit, of the (stored) future chain to these two bins during the simulation is calculated; one obtains


TP(k’=6) (and the other TPs) is a conditional probability density, ρ(α11α12| α10,…α1) depending on the previous angles, α10,…α1. Clearly, the longer is the (reconstruction) MD simulation (in stage 1), the larger is nf, and the better is the approximation of TP6; the approximation will also be improved for smaller bins’ sizes (provided that the statistics is adequate, i.e., that nf is sufficiently large.), where the TPs become exact for nf → ∞ and δαk-1, δαk → 0. Thus, in practice the TPs will be somewhat approximate due to insufficient future sampling (finite nf) and relatively large bin sizes. Therefore, the probability density of conformation i, ρi will also be approximate,


i.e., it defers from the (correct) Boltzmann probability density, ρiB. The contribution of conformation i to the entropy is and the average contribution of the sample (of n reconstructed snapshots) to the entropy is


SA is a rigorous upper bound to the correct entropy, S, i.e., SAS.

It is convenient (and efficient) to calculate the probabilities ρi in stage 2 because the TPs depend on the parameters nf, αk, and αk-1 and n and the results for the entropy depend on these parameters as well. As mentioned earlier, one would expect the entropy to decrease as nf increases and αk, and αk-1 decrease (if the statistics is adequate). Verifying this behavior would suggest that the simulations carried out in stage 1 are reliable. The program enables one to produce results SA(nfkk,n) for various sets of these parameters, where the dependence on the sample size n is also important.

As discussed before, in most cases one is interested in the difference in the entropy between two microstates (e.g., helical and hairpin microstates of a peptide). In this case the entropy is calculated for each microstate, SA(hel) and SA(hair), where our main interest is in their converged difference, ΔS which is expected to be the exact difference within the statistical error,

converged (4)

Thus, we calculate SA(hel) and SA(hair) for increasing values of nf and decreasing bins, verifying that both entropies decrease monotonically as the approximation improves, i.e., both approach to the correct values from above. Typically, the convergence of ΔSA (to ΔS) is much faster than that of the individual entropies due to cancellation of comparable systematic errors in SA(hel) and SA(hair). Thus, one can obtain ΔS in any desired accuracy within a given statistical error where the required convergence is reached by increasing nf and decreasing the bin sizes (the statistical error also depends on the sample size, n and other simulation parameters).


(1) The present program can read only data that has been generated with the AMBER format.

(2) The software can treat any chain molecule with side chains, not necessarily a peptide. One has to order the atoms along the chain and define the corresponding “dihedral” and bond angles.

(3) In Eq. (1), the bond angle defining the second bin (δαk) appears with a Jacobian (sin θ), which for simplicity was not taken into account in Eq. (1). Therefore, δαk is replaced in the program by (the correct) δcos(αk). Notice, however, that for the models we have studied thus far the contributions of the Jacobians have been cancelled out in entropy differences and the results with δαk and δcos(αk) were found to be actually the same.

(4) As pointed out earlier, it is important to verify that the entropy decreases as the approximation improves. For that one has to apply adequate equilibration before analyzing the data, i.e., at each step (k’) the first m future chains (out of nf total chains created in stage 1) should be discarded from the analysis. Thus, m is an additional parameter of the method.

(5) In the case of ligand-enzyme binding the ligand is simulated in two environments, the solvent (water) and the active site. In both environments the N=3k’ coordinates (including bond lengths) are divided into N-6 internal coordinates and 6 external coordinates. The external coordinates are the 3 Cartesian coordinates of the first atom (k’=1) and the three Euler angles defined by the positions of atoms k’=1, k’=2, and k’=3. In the solvent, due to symmetry, the contribution of the external coordinates to the partition function (hence to the entropy) is calculated analytically (8π2V, where V is the simulation volume) and the program calculates only the internal entropy. In the active site, the movement of the ligand due to the external coordinates is limited and cannot be obtained analytically, and thus is obtained numerically as follows:

First, a small cube with volume v is defined around the position (x) of atom k=1 of the ligand at conformation i; the system is then simulated, and nf (full) ligand conformations are generated and stored; this is carried out in stage 1. In stage 2 our program calculates the corresponding probability density, px(i)=nvisit/(nfv), where nvisit is the number of snapshots for which the cube was visited by the first atom (k’=1) out of the total nf snapshots generated. Then, the external atom is fixed at x (in stage 1) and a similar procedure is carried out for calculating the probability density, protation(i) of the external rotation defined by the three Euler angles; thus, in stage 2 the program calculates Sexrernal=-kBTln[px(i)protation(i)] which is defined as the contribution of the external coordinates to the entropy of the ligand in the active site; this is carried out for each snapshot i. When the contribution of the external coordinates is done the program continues to calculate the contribution of each i to he internal entropy; the final difference ΔS (Eq. [(2)] between the bound and free ligand involves only the internal entropies. Notice, this part has not been tested as yet and therefore has not been implemented in the program; it will be added soon.

(6) As stated earlier, bond stretching has been ignored because its contribution to entropy differences is expected to get cancelled. However, if necessary this effect can be included easily in the framework of the HSMD reconstruction by defining a bond length bin δ(r3/2). Obviously, this will require to increase nf in order to get statistically reasonable values for nvisit, where nvisit, is the number of simultaneous visits of the (stored) future chain to the three bins δαk-1, δcos(αk), and δ(r3/2).


1. White RP, Meirovitch H. Lower and upper bounds for the absolute free energy by the hypothetical scanning Monte Carlo method: Application to liquid argon and water. J. Chem. Phys. 2004:121, 10889-10904.

2. General IJ, Meirovitch H. Relative stability of the open and closed conformations of the active site loop of streptavidin. J. Chem. Phys. 2011:134, 025104-17.

last updated 11/19/10