A.3.1. Structural Dynamics of Macromolecular Complexes and Assemblies

Computational tools for modeling biological processes, at the molecular scale using elastic network models: GNM

 
As summarized in Table A.2, classical MD simulations (52-54) are usually confined to processes in the subnanoseconds regime. The longest MD simulation performed to date for proteins is the 1 ms folding of a 36-residue polypeptide (55), run on PSC facilities. Or, alternatively one can explore large complexes, such as the Eco RI-DNA complex (70,000 atoms with explicit solvent) (56-58), but over a short (< ns) time.  The time range of the MD trajectories is limited by the size of the adopted time step, which in turn is imposed by the fastest motions at the atomic scale, hence the need to develop methods such as Brownian Dynamics (BD) in which solvent is implicitly (via a white noise) accounted for. The large number of atoms can also obscure the analysis in most cases: Essential Dynamics analysis (41) indeed focuses on the a-carbons’ trajectories to extract their dominant (low frequency) motions by a mode decomposition technique. In both cases, the strategy is to reduce the number of variables, or degrees of freedom, in order to be able to capture larger size, or longer time scale, processes.

The pre-NPEBC participants have already performed studies in this direction. See the last column in Table A.2. At the lower level, a fruitful approach to extending the scale of simulations has been through the combination of continuum representations of the solvent with explicit representations of protein and/or nucleic acid (56). This has enabled computationally less demanding studies of RNA-protein complexes (59) and ligand binding to avidin and streptavidin (60). It is encouraging that these studies yielded reasonable results for binding free energies even though there were significant conformational changes. In this context should be mentioned a novel method for optimizing solvation parameters for continuum models and the efficient methods for conformational search for protein loops (61-66). The pre-NPEBC participants have also developed BD and Monte Carlo (MC) algorithms and PCA-based tools for higher (but molecular) level simulations and trajectory analysis (Table A.2).

Text Box:  
 
Fig A.1. GNM representation  of a biomolecule.  The interaction sites in (a) form the nodes of a network in (b)
The Gaussian Network Model (GNM, Table A.2) has been introduced by Bahar and coworkers (31;67;68) as an efficient tool for exploring the dynamics of large biomolecular structures or complexes. GNM bears close resemblance to a classical normal mode analysis (NMA) (69-72) but it is significantly simpler in that it requires no a priori knowledge of energy parameters, following the original proposition of Tirion (73), and most importantly it lends itself to a closed mathematical solution. Its roots are well founded in fundamental statistical mechanical theories of polymer networks (74;75). The major assumption in the GNM is the representation of the structure as a network of N interaction sites (Fig A.1). The pairs of sites (or nodes) closer than a cutoff distance rc representative of a first coordination shell radius are connected by identical springs (dashed lines). They form the connectors of the network. The dynamics of this network is fully controlled by the Kirchhoff matrix (G) of contacts.  G gives a complete description of the connectivity of the network. Thermodynamic characteristics are found using the Hamiltonian H = (g/2) DR G DRT of the system, DR being the N-dimensional vector of fluctuations for the individual sites.

 

An important feature of the GNM is the possibility of dissecting the observed motion into a collection of modes - by eigenvalue decomposition of G, and focusing on the slowest modes. These modes usually provide us with information on the molecular mechanisms relevant to biological function (76-80). Several studies (81-87), including the comparisons with H/D exchange (88) and NMR relaxation (89) have demonstrated the utility of the GNM for understanding the machinery of proteins and their complexes. New collaborations of the team members (Bahar & Gordon; Bahar & Ho) for interpreting NMR relaxation data also indicate the potential utility of combining/comparing the results from GNM or other computational approaches with NMR data both for constructing more accurate models and understanding the conformational dynamics of proteins. (see the support letter of Dr Ho). A close cooperation between computational molecular biologists and NMR experimentalists is thus anticipated within aim 1(i) activities, both for improving molecular models and suggesting new experiments.

 

Finally, the recent studies (90-92) show promise for the extension of GNM-based methods to multimolecular assemblies. The idea presented in ref (90) is to represent the structure at hierarchical levels of detail, and repetitively perform GNM analysis, by suitable renormalization of the network parameters. The dominant motions of influenza virus hemagglutinin A could be accurately reproduced by this method, even when adopting a very coarse-grained model of one representative site per 40 residues.  The major advantage of this approach is the increase by >3 orders of magnitude in computational speed, which permits us to explore the dynamics of multiprotein assemblies of the order of 104 residues within ‘minutes’ using for example an R10,000 SGI workstation. Caspase 8 (93), RNAP II (94), or protein 14-3-3z (95) could be potential targets for this approach, within the scope of the respective projects DP1, DP2 and DP3 (see § C.1-3).

 

The major strategy in this group of studies will be to assess the minimal level of complexity (geometry and energetics) to be adopted in simulations in order to capture the biological function of interest. The already accumulated expertise at Pitt over a broad range of molecular computations (Table A.2) and the possibility of performing/testing simulations/predictions for a given molecular mechanism with a diversity of methods by different groups and comparing/combining the results are significant opportunities that will be exploited to this aim.