HSMD enables one to calculate *S* and *F*
from a sample generated by MD, Monte Carlo (MC), or any other exact
simulation technique. HSMD is based the fact that the configurations
of any system can, in principle, be generated __exactly__ also by
a step-by-step (growth) procedure, where particles are added
gradually to an initially empty volume using transition probabilities
(TPs). A trivial example is an __ideal polymer chain__ of *N*
bonds on a square lattice, i.e. a chain without the excluded volume
interaction. An ideal chain can be simulated __exactly__ by the
__dynamical__ MC method where the entropy, however, is unknown;
alternatively, a chain configuration, *i*, can be constructed
from nothing as a random walk where at each step a bond’s direction
(out of four possible directions) is chosen with TP=1/4; here the
Boltzmann probability is known, *P*_{i}^{B }=
(¼)^{N} and thus the entropy is known as well.
Clearly, two large samples, one obtained by MC and the other
generated as random walks are equivalent in the sense that they lead
to the same thermodynamic averages and fluctuations.

For more complex systems (argon, water, peptides,
etc) an * exact* growth procedure can be defined –

Therefore, one can assume that a given MC or MD
sample has rather been generated by the exact scanning method, which
enables one to reconstruct each conformation *i* by calculating
the TP densities that * hypothetically* were used to
create it step-by-step. This is the essence of the “hypothetical
scanning MD method” (HSMD) (or HSMC).

In the figure above we illustrate the HSMD
reconstruction process of conformation (snapshot) *i* of a
peptide consisting of three glycine residues, where for simplicity
the oxygens and most of the hydrogens are discarded. First, the heavy
atoms are ordered along the chain and are numbered by *k’*
where *k*’=1,..,10. The reconstruction is carried out in
internal coordinates where the bond lengths are assumed to be
constant; therefore, if the positions of atoms 1,…,*k*’-1
have been determined, the position of atom *k*’ is defined
solely by two angles, a bond and a dihedral angle. These angles are
ordered along the chain and denoted α_{k}, where
*k=*2*k’ *(*k=*1, 2, …,20). The reconstruction is
performed atom by atom, where for each atom a TP is defined with
respect to the corresponding angles α_{k-}_{1}
and α_{k} (as described below). After the TP related
to an atom has been determined the atom is placed in its position in
conformation *i* and the next atom is treated and the process
continues until TPs have been calculated for all the atoms.

The figure is concentrated on step (atom) *k*’=6.
At this stage the TPs related to the “past” atoms *k*’=1…5
(depicted by full spheres) have already been determined and these
atoms are kept fixed at their positions in conformation *i*. At
this step (6) one calculates the TPs of bond angle α_{k}_{
}(defined by C’-C_{α}-N) and dihedral angle α_{k}_{-1}
(defined by C’-C_{α}-N-C’) which are related to atom C’,
where *k*=2*k*’ and thus *k*=12. For that one
defines the “future” atoms which are the as yet untreated atoms,
*k*’=6…10 which are depicted by empty spheres connected by
dashed lines. Thus, one carries out an MD simulation of the future
atoms, while the past atoms *k*’=1…5 are kept fixed in their
positions (if the peptide is immersed in a box of water the water
molecules are also considered to be future molecules and they are
also moved in the MD simulation). Every, say 10 fs the Cartesian
coordinates of the future atoms of the peptide (and the coordinates
of the corresponding “past” fixed atoms) are stored in files for
a later analysis by the **program. **Thus, altogether *n*_{f}
conformations are stored for step *k*’=6 and a total of 10*n*_{f}
conformations are stored for the molecule depicted in Fig. 1. This
defines stage 1 of the process.

Notice that the calculations of stage 1 can be carried out with any of the available programs (AMBER, TINKER, NAMD, etc).

The TPs are calculated in stage 2 by the **program** provided
here which uses the stored data; the calculation of TP(*k’*=6)
is described as follows: ρ(α_{11} α_{12}| α_{10},…α_{1})

Thus, two small segments (bins) δα_{k-}_{1}
and δα_{k} are centered at α_{k-}_{1}(*i*)
and α_{k}(*i*), respectively, and the number of
__simultaneous____ __visits, *n*_{visit},
of the (stored) future chain to these two bins during the simulation
is calculated; one obtains

(1)

TP(*k*’=6) (and the other TPs) is a
conditional * probability density*, ρ(α

(2)

i.e., it defers from the (correct)
Boltzmann probability density, ρ_{i}^{B}. The
contribution of conformation *i* to the entropy is
and
the average contribution of the sample (of *n* reconstructed
snapshots) to the entropy is

(3)

*S*^{A} is a rigorous upper bound to the correct
entropy, *S*, i.e., *S*^{A} ≥ *S*.

It is convenient (and efficient) to calculate the
probabilities ρ_{i} in stage 2 because the TPs depend
on the parameters *n*_{f}, α_{k}, and
α_{k}_{-1} and *n* and the results for
the entropy depend on these parameters as well. As mentioned earlier,
one would expect the entropy to decrease as *n*_{f }increases
and α_{k}, and α_{k}_{-1}
decrease (if the statistics is adequate). Verifying this behavior
would suggest that the simulations carried out in stage 1 are
reliable. The **program** enables one to produce results
*S*^{A}(*n*_{f},α_{k},α_{k},*n*)
for various sets of these parameters, where the dependence on the
sample size *n* is also important.

As discussed before, in most cases one is interested
in the difference in the entropy between two microstates (e.g.,
helical and hairpin microstates of a peptide). In this case the
entropy is calculated for each microstate, *S*^{A}(hel)
and *S*^{A}(hair), where our main interest is in their
__converged____ __difference, Δ*S* which is
expected to be the *exact* difference within the statistical
error,

converged (4)

Thus, we calculate *S*^{A}(hel) and *S*^{A}(hair)
for increasing values of *n*_{f} and decreasing bins,
verifying that both entropies decrease monotonically as the
approximation improves, i.e., both approach to the correct values
from above. Typically, the convergence of Δ*S*^{A} (to
Δ*S*) is much faster than that of the individual entropies due
to cancellation of comparable systematic errors in *S*^{A}(hel)
and *S*^{A}(hair). Thus, one can obtain Δ*S* in
any desired accuracy within a given statistical error where the
required convergence is reached by increasing *n*_{f}
and decreasing the bin sizes (the statistical error also depends on
the sample size, *n* and other simulation parameters).

**(1) **The present **program**__ __can
read only data that has been generated with the AMBER format.

**(2)** The software can treat any chain molecule
with side chains, not necessarily a peptide. One has to order the
atoms along the chain and define the corresponding “dihedral” and
bond angles.

**(3)** In Eq. (1), the * bond* angle
defining the second bin (δα

**(4) **As pointed out earlier, it is important
to verify that the entropy decreases as the approximation improves.
For that one has to apply adequate equilibration before analyzing the
data, i.e., at each step (*k*’) the first *m *future
chains (out of *n*_{f} total chains created in stage 1)
should be discarded from the analysis. Thus, *m* is an
additional parameter of the method.

**(5)** In the case of ligand-enzyme binding the
ligand is simulated in two environments, the solvent (water) and the
active site. In both environments the *N*=3*k*’
coordinates (including bond lengths) are divided into *N*-6
* internal* coordinates and 6 external coordinates. The
external coordinates are the 3 Cartesian coordinates of the first
atom (

First, a small cube with volume *v* is defined
around the position (**x**) of atom *k*’**=**1 of the
ligand at conformation *i*; the system is then simulated, and *n*_{f}
(full) ligand conformations are generated and stored; this is carried
out in stage 1. In stage 2 our program calculates the corresponding
probability density, *p*_{x}(*i*)=*n*_{visit}/(*n*_{f}*v*),
where *n*_{visit} is the number of snapshots for which
the cube was visited by the first atom (*k*’=1) out of the
total *n*_{f} snapshots generated. Then, the external
atom is fixed at **x **(in stage 1) and a similar procedure is
carried out for calculating the probability density, *p*_{rotation}(*i*)
of the external rotation defined by the three Euler angles; thus, in
stage 2 the program calculates *S*_{exrernal}=-*k*_{B}*T*ln[*p*_{x}(*i*)*p*_{rotation}(*i*)]
which is defined as the contribution of the external coordinates to
the entropy of the ligand in the active site; this is carried out for
each snapshot *i*. When the contribution of the external
coordinates is done the program continues to calculate the
contribution of each *i* to he * internal* entropy;
the final difference Δ

**(6)** As stated earlier, bond stretching has
been ignored because its contribution to * entropy differences*
is expected to get cancelled. However, if necessary this effect can
be included easily in the framework of the HSMD reconstruction by
defining a bond length bin δ(r

1.
White RP, Meirovitch H.
Lower and upper bounds for the absolute free energy by the
hypothetical scanning Monte Carlo method: Application to liquid argon
and water. *J.
Chem. Phys*.
2004:121, 10889-10904.

2.
General IJ, Meirovitch H.
Relative stability of the
open and closed conformations of the active site loop of
streptavidin. *J.
Chem. Phys. *2011:*134*,
025104-17.

last
updated