.. code:: ipython3 # Execute this first # # * trigger notebook styling # * check if notebook had been modified since its distribution # # Note: executing any cells before this modifies the notebook. # %run src/init_notebooks.py hide_toggle() check_notebook() Calculating PMFs with AWH in GROMACS ==================================== :: Author: Viveca Lindahl (reviewed/updated by Alessandra Villa and Christian Blau) Goal: Learn how to calculate potential of mean forces with accelerated weight histogram (AWH) method in GROMACS Prerequisites: Know how to run an MD simulation, basis knowledge of using a Linux shell, basis knowledge of thermodynamics and awareness of accelerated weight histogram method (see reference below). Time: 80 minutes Software: GROMACS 2022 (2021, 2020, 2019, 2018), python modules: numpy, matplotlib, re, nglviewer (https://nglviewer.org/) Optional Software: visualization software [VMD](https://www.ks.uiuc.edu/Research/vmd), Xmgrace plotting tool Here we learn how to calculate the potential of mean force (PMF) along a reaction coordinate (RC) using the accelerated weight histogram method (AWH) in GROMACS. We will go through both how to set up the input files needed, as well as how to extract and analyze the output after having run the simulation. AWH applies a time-dependent bias potential along the chosen RC, which is tuned during the simulation such that it “flattens” the barriers of the PMF to improve sampling along the RC. With better sampling, the PMF can be calculated more accurately than using unbiased MD. References ---------- For more details about the AWH method and how it can be applied we refer to \* The GROMACS reference manual section about AWH, available on `the GROMACS documentation website `__ \* BioExcel Webinar: `Accelerating sampling in GROMACS with the AWH method `__ \* Lindahl, V., Lidmar, J. and Hess, B., 2014. Accelerated weight histogram method for exploring free energy landscapes. *The Journal of chemical physics*, 141(4), p.044110. `doi:10.1063/1.4890371 `__ \* Lindahl, V., Gourdon, P., Andersson, M. and Hess, B., 2018. Permeability and ammonia selectivity in aquaporin TIP2;1: linking structure to function. *Scientific reports*, 8(1), p.2995. `doi:10.1038/s41598-018-21357-2 `__ \* Lindahl, V., Villa, A. and Hess, B., 2017. Sequence dependency of canonical base pair opening in the DNA double helix. *PLoS computational biology*, 13(4), p.e1005463. `doi:10.1371/journal.pcbi.1005463 `__ The case study: DNA base pair opening ------------------------------------- We will calculate a PMF for opening a DNA base pair. The DNA double helix is a very stable structure. Opening a base pair requires breaking hydrogen bonds between the bases and crossing a high free energy barrier. That’s why we need to enhance the sampling by applying a bias! As our reaction coordinate we use the distance between the two atoms forming the central hydrogen-bond the two bases in a pair. Let’s have a look at the system and the reaction coordinate. We have hidden all the water to better see the DNA itself, and we have put punchy colors on the atoms defining the RC of our target base pair. Now run this and the DNA should pop up: .. code:: ipython3 import nglview as ng view = ng.show_file("visualization/dna-centered.pdb", default=False) view.center(selection="DA or DT") view.add_representation("cartoon") view.add_representation("base", selection="not DA and not DT") view.add_representation("ball+stick",selection="DA") view.add_representation("ball+stick",selection="DT") view.add_representation("ball+stick",selection="DA and .N1", color="green") view.add_representation("ball+stick",selection="DT and .N3", color="green") view # click and drag to rotate, zoom with your mouseweel # for more infor on this viewer have a look at https://github.com/nglviewer/nglview Rotate the structure and look for the two (nitrogen) atoms in green. The distance between these will serve as our reaction coordinate for that base pair. In alternative, you can use VMD to visualize the reaction coordinate on your local machine. The ``-e`` flag below tells VMD to excute the commands that are in the following .tcl script. These commands change how the system is visually represented. For instance, we have hidden all the waters to better see the DNA itself, and we have put punchy colors on the atoms defining the RC of our target base pair. To run VMD from within this notebook, remove the comment character (#) in the following cell and VMD should pop up: .. code:: ipython3 #!vmd visualization/dna-centered.gro -e visualization/representation.tcl Rotate the structure and look for the two (nitrogen) atoms in green. The distance between these will serve as our RC for that base pair. Quit VMD to continue. The MD parameter file --------------------- We’ll now assume we have already built and equilibrated the system, so we are almost ready to go. To use AWH we basically just need to add some extra parameters in the .mdp file. Go to and check out the directory that has all the run files of our first AWH example: .. code:: ipython3 %cd awh-simple !ls -l Find the differences between the mdp file using AWH and an mdp file for a vanilla MD simulation: .. code:: ipython3 !diff no-awh.mdp awh.mdp Here ‘<’ refers to the content of the first argument to ``diff`` (the non-AWH .mdp file) and ‘>’ to the second (the AWH .mdp file). E.g., we increased the number of steps (``nsteps``) for AWH. The more relevant parameters are the ones prefixed ``pull`` and ``awh``. The awh-``.mdp`` options are documented at https://manual.gromacs.org/current/user-guide/mdp-options.html#mdp-awh. The index file -------------- The .mdp file now depends on some definitions of atom groups, ``base_N1orN3`` and ``partner_N1orN3``. We need to make an index (.ndx) file defining these and provide as input when generating a .tpr file with ``gmx grompp``. Here our groups are as simple as they get: each group contains a single nitrogen atom (with names ‘N1’ or ‘N3’) . But don’t get tempted to edit an index file manually! The traditional tool to use is ``gmx make_ndx`` and a more general and powerful tool is ``gmx select``. We focus on AWH here and have provided the index file. Double-check that the pull groups (e.g. ``pull-group1-name``) from the .mdp file are actually defined in the index file: .. code:: ipython3 !grep -A 1 N1orN3 index.ndx # '-A 1' to show also 1 line after the match One atom per group looks right. In a real study, a better check could be to visualize the atom indices (e.g. with VMD). Starting the simulation ----------------------- Here we won’t run the actual simulation; a directory with data has already been provided. But for practice, we will generate a tpr, as usual with ``grompp``. The flag ``-quiet`` tells ``grompp`` to be less verbose. .. code:: ipython3 !gmx grompp -quiet -f awh.mdp -n index.ndx -p topol.top -o topol.tpr Note that the values of the pull coordinate values in the provided .gro file are printed by ``grompp``. Does it look reasonable? Check the log file ------------------ Now let’s look at the data we’ve provided. Some information related to the AWH initial convergence can be found in the ``mdrun`` log file. .. code:: ipython3 !grep 'covering' data/md.log !grep 'out of the' data/md.log Here, a “covering” refers to an end-to-end transition across the sampling interval [``awh1-dim1-start`` , ``awh1-dim1-end``]. Note that the first transitions are faster than the later ones. Why is that? The AWH and initial stage and convergence ----------------------------------------- The overall update size of the free energy estimate, here denoted by :math:`|\Delta F|`, is a free parameter in AWH. It is very important because converging the free energy relies on :math:`|\Delta F|\to 0` for large times :math:`t`. Also, the time-dependent bias function is a direct function of the current free energy estimate. Thus, initially :math:`|\Delta F|` has to be large enough that the bias fluctuations efficiently promote escaping free energy minima. In the AWH *initial stage*, :math:`|\Delta F|`, is kept constant *during* each transition across the sampling interval. After the interval has been covered, the update size is scaled down by a constant factor. This leads to a stepwise decrease of the update size in the initial stage, which approaches an exponential decay :math:`|\Delta F|\sim 1/e^{\alpha t}`. After exiting this stage, the update size instead decreases steadily with time, :math:`|\Delta F| \sim 1/t`. The latter has proven to be the right long-term time evolution for the free energy to converge, but if applied from the start it generally decreases the update size too quickly. The exit time is determined dynamically by AWH. The exit of the initial stage should occur before the initial exponential-like decay of :math:`|\Delta F|` becomes more rapid than :math:`\sim1/t`. The initial stage is a type of “burn-in” procedure designed to improve the robustness of the method. The idea is to first get a quick and rough free energy estimate in the initial stage, followed by refinement in the final stage. The reaction coordinate trajectory ---------------------------------- The trajectory of the pull coordinate is found in ``pullx.xvg``. .. code:: ipython3 %less data/pullx.xvg For plotting and data analysis we provide a Python function to read the ``.xvg`` file format into this notebook. .. code:: ipython3 import numpy as np import matplotlib.pyplot as plt # Function that reads data from an .xvg file def read_xvg(fname): data=[] with open(fname) as f: for line in f: # Lines with metadata or comments start with #, @ if not line.startswith(("@","#")): data.append(np.array([float(s) for s in line.split()])) data = np.vstack(data) return data We can for instance plot the covering times. .. code:: ipython3 # Extract substrings using regular expressions import re # Read pullx.xvg file and plot a selected part of the trajectory rundir='data/' data = read_xvg(rundir + "/pullx.xvg") t, x = data[:,0], data[:,1] fig, ax=plt.subplots(figsize=(16,10)) plt.plot(t, x, lw=1) # Set y-axis limits based on the chosen sampling interval awh_start, awh_end = 0.25, 0.65 plt.ylim([awh_start, awh_end]) ymin, ymax = plt.gca().get_ylim() # Find covering times plot them covering_logs = !grep covering {rundir}/md.log covering_times = [float(re.findall('t = (.+?) ps', log)[0]) for log in covering_logs] for tcover, log in zip(covering_times, covering_logs): plt.plot([tcover]*2,[ymin,ymax], color='k', lw=2, ls='-') # Also mark the exit time if there was one exit_logs = !grep 'out of the' {rundir}/md.log if len(exit_logs) > 0: texit = float(re.findall('t = (.+?)\.', exit_logs[0])[0]) plt.plot([texit],0.5*(ymin + ymax), color='k', marker='x', ms=10) # Tweak more tend=tcover*2 plt.xlim([0,min(tend,max(t))]) plt.xlabel("time (ps)") plt.ylabel("distance (nm)"); plt.title('RC trajectory'); Here, the black vertical lines indicate covering times and the :math:`\times` marks the exit time, extracted from ``md.log``. Verify that these coincide roughly with the transition times seen in the trajectory. What we expect to see here is that the covering times are shorter initially, when the free energy estimate and bias fluctuates more, and then increase and reach a constant average value as the bias stops changing. You should always check this trajectory and make sure that there are at least a few transitions after the initial stage. If not, you need to simulate longer or you have a poorly chosen RC that is not guiding the transitions efficiently. The ``gmx awh`` tool and the ``awh.xvg`` files ---------------------------------------------- Data directly related to AWH is extracted using ``gmx awh``. The data is stored in the energy file, ``ener.edr``. The default output file name is of the format ``awh_t