Enhancing sampling with AWH: the basics
=======================================

::

   authors            : Claudia Alleva, Cathrine Bergh    
   goal               : Learn the basics of AWH, how to set up a simulation and analyse the results 
   time               : 60 minutes
   software           : GROMACS 2024.2, Python libraries: numpy, matplotlib, NGL Viewer
   optional software  : VMD (visualisation), Xmgrace (plotting)
   tutorial source    : https://zenodo.org/uploads/16793062
   version            : 0.2

Target audience
---------------

This tutorial is aimed at experienced GROMACS users, familiar with the
basics of MD simulations and how to set them up using GROMACS.

Prerequisites
-------------

-  GROMACS 2024 or later (other versions will likely work, but there
   might be minor differences)
-  Having followed the `introduction to MD
   tutorial <https://tutorials.gromacs.org/md-intro-tutorial.html>`__ or
   having some experience setting up standard MD simulations.

Optional
~~~~~~~~

-  a molecular viewer, e.g. VMD
-  a plotting tool, e.g. Xmgrace

Before we start, we will import some of the Python libraries that will
be needed throughout this tutorial.

.. code:: ipython3

    import matplotlib.pyplot as plt
    import numpy as np
    import nglview as nv

Introduction to AWH
-------------------

A common challenge when running MD simulations lies in the discrepancy
between the time resolution we need to capture fast motions on the
atomic scale and the timescales of the biological or chemical processes
we are interested in. This sampling problem is the primary reason our MD
simulations often become very computationally heavy as we often need to
wait for a long time until something interesting happens. An alternative
is to try to force, or bias, the system to climb the energy barrier
separating the states we want to sample, instead of waiting for the
crossing to happen stochastically. Over the years, a whole swath of
methods applying different approaches, have been developed to introduce
such a bias to speed up state transitions. These are called **enhanced
sampling methods**. In this tutorial, we will introduce the Accelerated
Weight Histogram method (or AWH), which is implemented in GROMACS.

AWH enhances sampling by applying an adaptively tuned bias potential
along a chosen reaction coordinate describing the state transition of
interest, so that it “flattens” the barriers separating the states. This
allows the simulations to sample the state transition more frequently
than without using any bias.

For more details about the AWH method and how it can be applied we refer
to: \* The GROMACS reference manual section about AWH, available on `the
GROMACS documentation
website <http://manual.gromacs.org/current/reference-manual/special/awh.html>`__
\* BioExcel Webinar: `Accelerating sampling in GROMACS with the AWH
method <https://bioexcel.eu/webinar-accelerating-sampling-in-gromacs-with-the-awh-method-2019-09-24/>`__
\* The original `AWH
tutorial <https://tutorials.gromacs.org/awh-tutorial.html>`__ \* The
`free energy of solvation with AWH
tutorial <https://tutorials.gromacs.org/awh-free-energy-of-solvation.html>`__
\* Lindahl, V., Lidmar, J. and Hess, B., 2014. Accelerated weight
histogram method for exploring free energy landscapes. *The Journal of
chemical physics*, 141(4), p.044110.
`doi:10.1063/1.4890371 <https://doi.org/10.1063/1.4890371>`__ \*
Lindahl, V., Gourdon, P., Andersson, M. and Hess, B., 2018. Permeability
and ammonia selectivity in aquaporin TIP2;1: linking structure to
function. *Scientific reports*, 8(1), p.2995.
`doi:10.1038/s41598-018-21357-2 <https://doi.org/10.1038/s41598-018-21357-2>`__
\* Lindahl, V., Villa, A. and Hess, B., 2017. Sequence dependency of
canonical base pair opening in the DNA double helix. *PLoS computational
biology*, 13(4), p.e1005463.
`doi:10.1371/journal.pcbi.1005463 <https://doi.org/10.1371/journal.pcbi.1005463>`__

Our exercise: estimating the binding energy of two pyrimidine molecules
-----------------------------------------------------------------------

In this tutorial, we will sample the aggregation of two identical
pyrimidine molecules and estimate the free energy of their interaction.
This simple system allows us to explore AWH using limited computational
resources, but still captures much of the essential complexity that
characterises real, scientific, problems. Similarly to the `umbrella
sampling GROMACS
tutorial <https://tutorials.gromacs.org/umbrella-sampling.html>`__, we
will use the pulling code to bring the two pyrimidine molecules together
in an aqueous solution and then compare our estimate of the free energy
of binding with the results from the umbrella sampling tutorial.

.. figure:: images/pyrimidine-stick-diagram.png
   :alt: Pyrimidine diagram

   Pyrimidine diagram

*Figure 1: A single molecule of pyrimidine*

The first thing we need to do is define *how* to bring these molecules
together (i.e. selecting the *reaction coordinate*) and this is one of
the crucial choices that needs to be made when using most enhanced
sampling techniques. Here, we will choose the vector that connects the
center of mass of the two molecules. Our reaction coordinate is shown in
red in the figure below:

.. figure:: images/sampling-locations.png
   :alt: Sampling locations

   Sampling locations

*Figure 2: An illustration of how the pyrimidine molecules will be
separated/brought together with the reaction coordinate shown in red*

The goal is to get an estimation of the free energy landscape along our
chosen reaction coordinate. To do so, we will need to bring the two
molecules close enough to capture a free energy minimum expected to
represent stable binding, and separate them far enough to ensure that we
capture a baseline where the molecules do not affect each other
energetically.

System preparation
------------------

Input files for this tutorial have already been prepared in advance
following a similar scheme as described in the `introduction to MD
tutorial <https://tutorials.gromacs.org/md-intro-tutorial.html>`__. If
you have not followed that tutorial yet… what are you doing here? It was
a prerequisite! Jokes aside, you can follow this tutorial but we will
skip most explanations regarding system preparation. If you want more
information on these parts, see the aforementioned `introduction to MD
tutorial <https://tutorials.gromacs.org/md-intro-tutorial.html>`__ and
the `umbrella sampling
tutorial <https://tutorials.gromacs.org/docs/umbrella-sampling.html>`__.

Let’s first change our working directory to the data folder, where we
will keep all the generated data in this tutorial.

.. code:: ipython3

    # Get the full path from which the tutorial was launched
    tutorial_path = %pwd
    # Change to the data folder
    %cd $tutorial_path/data

You can find all the prepared input files in the provided ``input``
folder.

.. code:: ipython3

    %ls input

Now, let’s prepare our simulations with the provided files.

Setting the GROMACS parameters for pulling
------------------------------------------

As for any GROMACS simulation, we need to set the parameters in the
``.mdp`` file appropriate for the type of simulation we want to run.
When using AWH in GROMACS, we need to pay special attention to
parameters with names including ``pull`` or ``awh``. These parameters
will let the simulation engine know that pulling and AWH will take
place, and describe the reaction coordinate(s) and bias(es).

GROMACS typically uses *pull groups* and *pull geometries* to define
*pull coordinates* (i.e. reaction coordinates). As shown in *Figure 2*,
we will use one *pull group* for each pyrimidine molecule. The distances
betwen the centers of mass of those groups will define the single *pull
coordinate*.

Setting parameters for the pull code
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Here, we will go through the settings needed to set up the pulling part
of our simulations.

First of all, let’s turn on the pull functionality:

::

   pull                     = yes                 ; Use the pull code.

We then need to specify how many atom groups are needed to define the
pull coordinate (two in our case), how many pull coordinates to use
(one), and how frequently we want to write the coordinate value in the
pullx.xvg (every 5000th step).

::

   pull-ngroups             = 2                   ; Number of pull coordinates.          
   pull-ncoords             = 1                   ; Number of pull coordinates.
   pull-nstxout             = 5000                ; Step interval to output the coordinate values to the pullx.xvg.

As with other kinds of groups in GROMACS (e.g. temperature-coupling
groups), the pull groups are described by *index groups* in the *index
file* also supplied to ``gmx grompp`` (typically named ``index.ndx``).

::

   pull-group1-name         = pyrimidine_1        ; Name of pull group 1 corresponding to an entry in an index file.
   pull-group2-name         = pyrimidine_2        ; Same, but for group 2.

To use AWH, we set the ``pull-coord-type`` to ``external-potential`` and
the potential provider to AWH with ``pull-coord1-potential-provider``.

::

   pull-coord1-groups       = 1 2                 ; Which groups define coordinate 1? Here, groups 1 and 2.
   pull-coord1-type         = external-potential  ; Apply the bias using an external module.
   pull-coord1-potential-provider = AWH           ; The external module is called AWH!

Setting parameters for AWH
~~~~~~~~~~~~~~~~~~~~~~~~~~

Now, we will set all the AWH-related parameters.

First, we need to activate ``awh`` , decide how many biases are applied
(1 in our case) and decide the frequency at which the bias, coordinate
distribution, PMF and friction should be written (every 50000th step).

::

   awh                      = yes                       ; Activating the AWH code with 'yes' 
   awh-nbias                = 1                         ; Defining how many biases are applied
   awh-nstout               = 50000                     ; Frequency at which the ener.edr file will be updated by AWH

.. container:: alert alert-block alert-info

   NOTE: The value of awh-nstout will only govern how frequently you
   write your files, NOT how frequently the bias is updated!

Now, let’s define the bias-specific options:

First, we define which ‘shape’ the target distribution should have
(constant in this case), and we also choose the histogram to be
equilibrated before a covering is considered achieved
(``awh1-equilibrate-histogram = yes``).

::

   awh1-target                = constant                ; Defining the type of Target Distribution
   awh1-equilibrate-histogram = yes                     ; The histograms need to be equilibrated before a covering is achieved

We choose a flat target distribution (option ``constant``), since it’s a
good first approximation and reflects our objective to flatten the
energy barriers and sample the entire space.

Then, we need to define the dimensionality of the reaction coordinate,
which is one in our case. We also set an index for our reaction
coordinate.

::

   awh1-ndim                  = 1                       ; Defining how many dimensions are used to define the bias
   awh1-dim1-coord-index      = 1                       ; The index of the dimensions used

Finally, we need to set our sampling interval of interest (between
``awh1-dim1-start`` and ``awh1-dim1-end`` in nm), the force constant of
the harmonic potential (kJ/(mol*nm^2)), and an estimated value for the
diffusion constant (nm^2/ps).

::

   awh1-dim1-start          = 0.35                      ; Start of the sampling interval (nm)
   awh1-dim1-end            = 2.35                      ; End of the sampling interval (nm)
   awh1-dim1-force-constant = 40000                     ; Coupling force constant
   awh1-dim1-diffusion      = 5e-5                      ; Diffusion constant along the defined dimension

The sampling interval is defined in terms of the reaction coordinate we
chose, so now we want to set ``awh1-dim1-start`` so that the molecules
are as close as possible but not overlapping (keep in mind the distances
are defined from the center of mass of each of the molecules).
Similarly, we want to set ``awh1-dim1-end`` to represent a distance
where the two molecules are not affecting each other energetically, and
~2 nm will fulfill this requirement as it is longer than twice the
cutoff distance for non-bonded interactions.

The force constant was selected to be high enough to achieve the
coupling between the reference coordinate and our reaction coordinate.
Meanwhile, the diffusion constant should be chosen as big as possible to
achieve fast crossings, BUT the system needs to be stable too, so it has
to be balanced appropriately. To set the diffusion constant
appropriately, it’s good to first estimate it from a test run, so we
recommend to try different values and pick the most meaningful one for
your system.

These values, together with the estimated initial PMF error
``awh1-error-init``, espressed in kJ/mol, will determine the initial
number of bins that ‘have’ to be filled. We will leave the initial error
to the default value ``10 kJ/mol`` here.

::

   awh1-error-init         = 10                        ; A rough estimate of the initial error

These and many other MDP options for AWH are further described in `the
GROMACS
documentation <https://manual.gromacs.org/current/user-guide/mdp-options.html>`__.

Now, we can look at how the full parameter file, ``AWH.mdp``, looks:

.. code:: ipython3

    %less input/AWH.mdp

In ``AWH.mdp``, we can see specifications for a standard production
simulation of 100 ns with the added above-mentioned pull-code and awh
options.

With all the parameters set, we now need a starting configuration of our
system to start our simulations from. We can look at ``conf.gro`` in the
input folder. This file has been prepared using a similar approach to
what is described in the `introduction to MD
tutorial <https://tutorials.gromacs.org/md-intro-tutorial.html>`__

.. code:: ipython3

    # Get the current path
    path = %pwd
    
    # Read in the structure file
    view = nv.show_structure_file(path + '/input/conf.gro')
    
    # Show ions
    view.add_representation(repr_type='ball+stick', selection='NA')
    view.add_representation(repr_type='ball+stick', selection='CL')
    # Show water
    view.add_representation(repr_type='licorice', selection='SOL', opacity=0.9)
    view.center()
    view

We can see that our starting state is an equilibrated system where the
two pyrimidine molecules are separated, and that the box is filled with
water. We also have sodium (purple spheres) and chloride (green spheres)
atoms that give the system a neutral charge.

Running ``gmx grompp``
----------------------

If you have GROMACS installed, then the next step is to run
``gmx grompp``. Running ``gmx grompp`` will produce the ``AWH.tpr`` file
needed to run the simulation but also for running the AWH analysis
tools. As a brief refresher, the simulation will start from the
structure specified by the ``-c`` flag, in our case ``conf.gro``, the
topology will be specified through the ``-p`` flag and the index in the
``-n`` flag. The index file is needed to define the pulling groups named
in the ``.mdp`` file, but also the coupling groups used for
equilibrating the system.

.. code:: ipython3

    !gmx grompp -f input/AWH.mdp -c input/conf.gro -p input/topol.top  -o AWH.tpr -n input/index.ndx

Running ``gmx mdrun``
---------------------

Now, we are finally ready to run some AWH simulations!

This is a small system, and if you have a GPU, running ``gmx mdrun``
will take a few hours. If you want to run your own simulation, proceed
with the first section below.

If you don’t have time to run the simulations now, you can move on with
the analysis using pre-calculated data provided in the reference folder.
If you prefer to use pre-calculated data, proceed to the second section
below.

Option 1: run your own simulations
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

If you want to get your own results and compare them with the
pre-calculated ones, uncomment the cells below:

First, we will make sure to keep our data organized and create a
separate folder called “AWH” in the data folder.

.. code:: ipython3

    # Create the data folder
    #%mkdir $tutorial_path/data/AWH
    
    # Change working directory
    #%cd $tutorial_path/data/AWH
    
    #print("Changed working directory to: ")
    #%pwd

Now, we can run the AWH simulations! Beware that this might take a
couple of hours. Once the simulation is done, you can proceed to the
section “Analysing AWH simulations”.

.. code:: ipython3

    #!gmx mdrun -deffnm AWH

Option 2: use reference data
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

To proceed with analysis of the pre-calculated data, copy one of folders
``AWH1``, ``AWH2``, or ``AWH3`` from the reference data folder. Each of
these folders corresponds to one AWH simulation replica generated with
the same protocol as presented in this tutorial. Since the simulations
are stochastic, you might get slightly different results depending on
which replica you choose.

Let’s create a copy of the reference data and move into that folder to
keep our results organized.

.. code:: ipython3

    # Select which dataset to use (AWH1, AWH2, or AWH3)
    AWH_dataset = "AWH1"
    
    # Copy the AWH folder from the reference to data folders
    %cp -r $tutorial_path/data/reference/$AWH_dataset $tutorial_path/data
    
    # Make this new copy working directory
    %cd $tutorial_path/data/$AWH_dataset
    
    print("Changed working directory to: ")
    %pwd

.. code:: ipython3

    !gmx grompp -f ../input/AWH.mdp -c ../input/conf.gro -p ../input/topol.top  -o AWH.tpr -n ../input/index.ndx

Now that we have prepared the reference files, we can continue with the
analysis.

Analysing AWH simulations
-------------------------

First of all, let’s look at all the files that where produced by our AWH
simulation:

.. code:: ipython3

    #Check what is contained in the AWH data folder
    !ls -ltr

``AWH.tpr`` and ``mdout.mdp`` are produced by ``gmx grompp`` - they are
the input file for the simulation run and a recap of all the parameters
chosen for ``gmx grompp``, respectively.

``AWH.gro``, ``AWH.edr`` and ``AWH_pullx.xvg`` are produced by running
``gmx mdrun``. ``AWH.gro`` is the final frame of the simulation,
``AWH.edr`` is the energy file witten during the simulation (and contain
the bias and PMF), and ``AWH_pullx.xvg`` contains the value of the
reaction coordinate per time step.

Now, let’s have a look at what happens if we search for all the
instances of ``awh1`` in the ``AWH.log`` file. Note that ``awh1`` is
part of the name of the mdp options specified in ``AWH.mdp`` as we saw
earlier, and should not be confused with the simulation replica in the
reference data folder which was named ``AWH1``.

.. code:: ipython3

    !grep awh1 AWH.log

The first few lines give us a recap of the parameters that were used for
AWH - some of which we selected and others were set to default values.
For an in-depth discussion of these additional options and their
theoretical background, have a look at the `AWH section of the GROMACS
manual <https://manual.gromacs.org/current/reference-manual/special/awh.html>`__.

The first interesting line for us in this tutorial is
``awh1: covered but histogram not equilibrated at t = ...``, which
indicates that the whole space that was selected for sampling (specified
by ``awh1-dim1-start`` and ``awh1-dim1-end``, in our case 0.35 to 2.35
nm) was indeed covered.

The next lines to pay attention to are
``awh1: equilibrated histogram at t = ... ps.`` and
``awh1: covering at t = ... Decreased the update size.``, which
indicates the first time the histogram was equilibrated and the update
size was scaled down since a covering took place, respectively.

The term “covering” might need some explanation. In the AWH manual, it
says: *‘a dimension is covered if all points in the […] sampling […]
interval have been “visited”’*. This means that all lambda points
constituing our histogram along the reaction coordinate have to be
visited at least once before it is considered a covering. Here, you can
find more details on the `the covering
criterion <https://manual.gromacs.org/current/reference-manual/special/awh.html#the-covering-criterion>`__.

After 2 or more coverings, the following line is printed in the log
file: ``awh1: out of the initial stage at t = ... ps.``. This tells us
we left the exploratory phase of AWH and we are entering the
‘equilibrium-like’ phase of the method. For more details, you can read
the `exit from the initial stage
section <https://manual.gromacs.org/current/reference-manual/special/awh.html#exit-from-the-initial-stage>`__
of the AWH manual.

Assessing sampling and calculating the PMF
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Great, we now know (hopefully) that our system has run long enough to
leave the initial stage, what does that mean in terms of the distance
value that we observe? Let’s have a look at the ``AWH_pullx.xvg`` file
to answer this question!

.. code:: ipython3

    # Load the XVG file
    pullx = np.loadtxt('AWH_pullx.xvg', comments=['@','#'])
    
    # Plot the data
    plt.plot(pullx[:,0], pullx[:,1])
    
    # Plot lines for the edges of our sampling interval
    plt.plot(pullx[:,0], [2.35] * pullx[:,0].shape[0], color='gray')
    plt.plot(pullx[:,0], [0.35] * pullx[:,0].shape[0], color='gray')
    
    plt.xlabel('Time (ps)')
    plt.ylabel('Pyrimidines distance (nm)')

Here, we can see our whole trajectory (blue) and how the system moves
back and forth across our sampling interval. The edges of the sampling
interval have been marked with grey colors.

Now, let’s take a closer look at the start of the simulation.

.. code:: ipython3

    # Find all the lines in log file which contains both 'awh1' and 'covered'
    istances = !grep awh1 AWH.log | grep covered
    
    print("The following line(s) containing the word 'covered' was found for awh1 in the log file:")
    print(istances)
    
    # Search for the number after "t = ..." in the line(s) we identified previously
    list_covered = []
    for i in istances:
        words = i.split(" ")
        for j in range(0, len(words)):
            if words[j - 2] == "t" and words[j - 1] == "=":
                list_covered.append(float(words[j]))
    
    # Round to the closest frame (in ps)
    t_first_cov = np.around(list_covered[0]/10) * 10
    
    # And find its index
    ind_first_cov = np.where(pullx[:, 0] == t_first_cov)[0][0]
    
    # Plot the pullx data
    plt.plot(pullx[:ind_first_cov + 1, 0], pullx[:ind_first_cov + 1, 1])
    
    # Plot lines for the edges of our sampling interval
    plt.plot(pullx[:ind_first_cov + 1, 0], [2.35] * pullx[:ind_first_cov + 1, 0].shape[0], color='gray')
    plt.plot(pullx[:ind_first_cov + 1, 0], [0.35] * pullx[:ind_first_cov + 1, 0].shape[0], color='gray')
    
    # Add a dashed line at the first covering
    plt.plot([t_first_cov] * 2, [0.35, 2.35], '-.', color='gray')
    
    plt.xlabel('Time (ps)')
    plt.ylabel('Pyrimidines distance (nm)')

The solid gray lines indicate the boundaries we imposed on the
exploration with ``awh1-dim1-start`` and ``awh1-dim1-end``. So what does
a ``covered`` message mean? It means that *almost* all of sampling space
has been explored. Let’s look more closely at what happens up to the
first ``covered`` message logged in the ``AWH.log`` file.

The vertical, dashed, gray line indicates the time (rounded to the
closest ps) at which the first ``covered`` message was obtained. Does it
look like ALL the space was sampled? If no, why? If you need help,
reference the `covering section of the AWH part in the GROMACS
manual <https://manual.gromacs.org/current/reference-manual/special/awh.html#the-covering-criterion>`__.

Ok, we have covered the first ``covered`` message (pun intended), what
about the first ``covering`` message?

.. code:: ipython3

    # Find all the lines in log file which contains both 'awh1' and 'covering'
    istances = !grep awh1 AWH.log | grep covering
    
    print("The following line(s) containing the word 'covering' was found for awh1 in the log file:")
    print (istances)
    
    # Search for the number after "t = ..." in the line(s) we identified previously
    list_coving = []
    for i in istances:
        words = i.split(" ")
        for j in range(0, len(words)):
            if words[j - 2] == "t" and words[j - 1] == "=":
                list_coving.append(float(words[j]))
        
    list_coving = np.asarray(list_coving)
    
    # Round to the closest frame (in ps)
    t_first_coving = int(list_coving[0]/10)*10 #rounding to the closest ps
    
    # And find its index
    ind_first_coving = np.where(pullx[:, 0] == t_first_coving)[0][0]
    
    # Plot the pullx data
    plt.plot(pullx[:ind_first_coving + 1, 0], pullx[:ind_first_coving + 1, 1])
    
    # Plot lines for the edges of our sampling interval
    plt.plot(pullx[:ind_first_coving + 1, 0], [2.35] * pullx[:ind_first_coving + 1, 0].shape[0], color='gray')
    plt.plot(pullx[:ind_first_coving + 1, 0], [0.35] * pullx[:ind_first_coving + 1, 0].shape[0], color='gray')
    
    # Add a dashed line at the first covering
    plt.plot([t_first_cov] * 2, [0.35, 2.35], '-.', color='gray')
    # And a second dashed line for the second covering
    plt.plot([t_first_coving] * 2, [0.35, 2.35], '--', color='gray')
    
    plt.xlabel('Time (ps)')
    plt.ylabel('Pyrimidines distance (nm)')

Now, the second vertical gray line indicates the time (rounded to the
closest ps) at which the first ``covering`` message was obtained. This
corresponds to the first time the histograms are equilibrated, since we
used ``awh1-equilibrate-histogram = yes`` in our ``AWH.mdp`` file.

If you want, you can apply the same idea to plot the following
coverings, up to ``len(list_coving)``.

.. code:: ipython3

    # Find all the lines in log file which contains both 'awh1' and 'out of the' (initial stage)
    istances = !grep awh1 AWH.log | grep 'out of the'
    
    print("The following line(s) containing the word 'out of the' was found for awh1 in the log file:")
    print (istances)
    
    # Search for the number after "t = ..." in the line we identified previously, and round it
    for i in istances:
        words = i.split(" ")
        for j in range(0, len(words)):
            if words[j - 2] == "t" and words[j - 1] == "=":
                t_out = int(float(words[j][:-1])/10)*10
    
    # And find its index
    ind_t_out = np.where(pullx[:, 0] == t_out)[0][0]
    
    # Plot the pullx data
    plt.plot(pullx[:ind_t_out + 1, 0], pullx[:ind_t_out + 1, 1])
    
    # Plot lines for the edges of our sampling interval
    plt.plot(pullx[:ind_t_out + 1, 0], [2.35] * pullx[:ind_t_out + 1, 0].shape[0], color='gray')
    plt.plot(pullx[:ind_t_out + 1, 0], [0.35] * pullx[:ind_t_out + 1, 0].shape[0], color='gray')
    
    # Add a dashed line at the first covering
    plt.plot([t_first_cov] * 2, [0.35, 2.35], '-.', color='gray')
    # And a second dashed line for the second covering
    plt.plot([t_first_coving] * 2, [0.35, 2.35], '--', color='gray')
    # And an orange line for the final covering
    plt.plot([t_out] * 2, [0.35, 2.35], color='darkorange')
    
    plt.xlabel('Time (ps)')
    plt.ylabel('Pyrimidines distance (nm)')

Now, the orange vertical line represents the end of the exploratory
phase of AWH and the entrance into the final phase of the method.

If we plot all our data, we get the following graph.

.. code:: ipython3

    # Plot the pullx data in the exploratory phase
    plt.plot(pullx[:ind_t_out + 1, 0], pullx[:ind_t_out + 1, 1], color = 'tab:blue')
    # And in the production phase
    plt.plot(pullx[ind_t_out:, 0], pullx[ind_t_out:, 1], color = 'tab:purple')
    
    # Plot lines for the edges of our sampling interval
    plt.plot(pullx[:,0], [2.35] * pullx[:, 0].shape[0], color='gray')
    plt.plot(pullx[:,0], [0.35] * pullx[:, 0].shape[0], color='gray')
    
    # Add a dashed line at the first covering
    plt.plot([t_first_cov] * 2, [0.35, 2.35], '-.', color='gray')
    # And a second dashed line for the second covering
    plt.plot([t_first_coving] * 2, [0.35, 2.35], '--', color='gray')
    # And an orange line for the final covering
    plt.plot([t_out] * 2, [0.35, 2.35], color='darkorange')
    
    plt.xlabel('Time (ps)')
    plt.ylabel('Pyrimidines distance (nm)')

This is the same plot as in the beginning of this section, but now it is
divided up into all the different AWH phases. The lines representing the
first and second coverings are hardly visible. The blue timeseries
represents the first exploratory phase, while the purple one represents
the production phase, which starts after the orange line. We can see
that the production phase is much longer and that the transitions
between the edge states in pyrimidine distance are well-sampled.

Calculating the PMF with ``gmx awh``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Now, let’s calculate the potential of mean force from our AWH
simulations with the ``gmx awh`` tool.

We will use the energy, ``.edr``, file and binary ``.tpr`` file to
calculate the potential of mean force, which will be outputted as a
bunch of ``.xvg`` files.

.. code:: ipython3

    # This will write output every 1 ns
    !gmx awh -f AWH.edr -s AWH.tpr -o awh -skip 10

.. code:: ipython3

    !ls -lrt

Now, a bunch of files named ``awh_t*.xvg`` should have been generated.
The number after the ‘t’ represents the time point in picoseconds, for
which the PMF was calculated. We can see that the files are spaced by 1
nanosecond, as was specified in the ``gmx awh`` command, by adding the
``-skip 10`` option.

Now, let’s take a look at how one of these files look like. We will
start at the start of the file:

.. code:: ipython3

    !head -n 16 awh_t0.xvg

-  the third line will tell you which gromacs version was used to
   analyse the files ``:-) GROMACS - gmx awh, GROMACS VERSION (-:``
-  skipping a few lines, when ``@`` appears, we start to have a look at
   the labels for the axis, in this case: ``xaxis  label "(nm)"`` and
   ``yaxis  label "(kJ/mol)"``

Let’s now plot the **potential of mean force**:

.. code:: ipython3

    fig, ax = plt.subplots()
    
    # Set the start and end data points (in ns)
    n_start = 0
    n_end = 100
    
    # Read in all the PMF data from the XVG files
    for i in range(n_start, n_end):
        time = i * 1000
        PMF = np.loadtxt('awh_t%s.xvg' % time, comments = ['@', '#'])
        ax.plot(PMF[:,0], PMF[:,1], c = plt.cm.gist_earth_r(i/n_end))
    
    ax.set_xlabel('Pyrimidines distance (nm)')
    ax.set_ylabel('Potential of Mean Force (kJ/mol)')
    
    # Create the colorbar
    fig.colorbar(plt.cm.ScalarMappable(norm=plt.Normalize(n_start, n_end), cmap='gist_earth_r'), ax=ax, label='Time (ns)')

In the plot above, the PMF estimation at different time points in the
AWH simulation is represented with colors from light sand to dark blue,
and we can clearly see that the PMF converges towards a consistent
curve.

Assessing convergence
~~~~~~~~~~~~~~~~~~~~~

Actually, ``gmx awh`` can print out a lot more useful information if we
add the ``-more`` option. Let’s try it!

.. code:: ipython3

    !gmx awh -f AWH.edr -s AWH.tpr -o awh -skip 10 -more

Note that GROMACS automatically backed up the XVG files we created
previously by adding a ‘#’ at the beginning and end of each file name.
You can easily delete all backed-up files with the bash command
``rm "#"*`` if your data folder gets too crowded with old files.

Now, let’s plot all the output data we got.

.. code:: ipython3

    fig, axs = plt.subplots(nrows = 2, ncols = 3, figsize = (15, 5))
    
    # Set the start and end data points (in ns)
    n_start = 0
    n_end = 100
    
    # Plot all the data from the XVG files:
    for i in range(n_start, n_end):
        time = i*1000
        XVG = np.loadtxt('awh_t%s.xvg' %time, comments=['@', '#'])
        count = 0
        for k, l in zip(['PMF', 'Coord bias', 'Coord distr', 'Ref value distr', 'Target ref value distr', 'sqrt(friction metric)'],[(0,0), (0,1), (0,2), (1,0), (1,1), (1,2)]):
            axs[l[0], l[1]].plot(XVG[:, 0], XVG[:, count + 1], c = plt.cm.gist_earth_r(i/n_end))
            axs[l[0],l[1]].set_title(k)
            count = count + 1
    
    # Add common x and y labels
    fig.supxlabel('Pyrimidines distance (nm)', fontsize=14)
    fig.supylabel('Energy (kJ/mol)', fontsize=14)
    
    plt.tight_layout()

The file obtained with ``-more`` contains the **PMF**, the bias applied
(**Coordinate Bias**), the distribution along the reaction coordinate
(**Coordinate distribution**), the distribution of lambda coordinates
coupled through harmonic potentials (**Reference value distribution**)
and the **square root of the friction metric**, that gives some
information on how ‘difficult’ is to sample along the coordinate of
interest. We will not really talk about the friction metric here, but
you can read more on the manual at `friction
metric <https://manual.gromacs.org/current/reference-manual/special/awh.html#awhfriction>`__.
There is an interesting discussion to be had about the friction metric
and the diffusivity along the reaction coordinate, but we will leave
that for the `awh-advanced <ADD_LINK>`__ tutorial.

**!! ADD LINK TO AWH ADVANCED !!**

Now, let’s compare the reference value distribution and the target
reference value distribution to see if our simulation is sampling the
space efficiently by assessing how closely they correlate. We will also
be able to see if the force constant applied is sufficient.

.. code:: ipython3

    # Define the plots
    fig, axs = plt.subplots(nrows = 1, ncols = 2, figsize = (8,4))
    
    # Set the color scales
    colors = plt.cm.gist_earth_r(np.linspace(0, 1, 101))
    colors2 = plt.cm.RdPu(np.linspace(0, 1, 101))
    
    # Plot all the data from the XVG files
    for i in range(0, 101):
        time = i*1000
        XVG = np.loadtxt('awh_t%s.xvg' %time, comments = ['@','#'])
        axs[0].plot(XVG[:, 0], XVG[:, 1], c = colors[i])
        axs[0].plot(XVG[:, 0], XVG[:, 2], c = colors2[i])
        
        axs[1].plot(XVG[:,0], XVG[:, 4], c = colors[i])
        axs[1].plot(XVG[:,0], XVG[:, 5], c = colors2[i])
    
    # Plot a final time for the legends
    axs[0].plot(XVG[:, 0], XVG[:, 1], c = colors[i], label = 'Final PMF')
    axs[0].plot(XVG[:, 0], XVG[:, 2], c = colors2[i], label = 'Final Bias')    
    axs[1].plot(XVG[:, 0], XVG[:, 4], c = colors[i], label = 'Final ref. value distr.')
    axs[1].plot(XVG[:, 0], XVG[:, 5], c = colors2[i], label = 'Final target ref. value distr.')
    
    # Add legends
    axs[0].legend()
    axs[1].legend()
    
    # Set all the labels
    axs[0].set_title('PMF and Bias')
    axs[1].set_title('Reference and reference target value distributions')
    axs[0].set_xlabel('Pyrimidines distance (nm)')
    axs[0].set_ylabel('Energy (kJ/mol)')
    axs[1].set_xlabel('Pyrimidines distance (nm)')
    
    plt.tight_layout()

First, we will compare the converged PMF and bias (black and purple
lines in the left plot above). We can see that the shapes of the curves
are similar enough, so the force constant is indeed sufficient. Due to
some noise at the edges of the sampling interval, the curves are not
fully overlapping, so to get an even better estimation of how similar
they are, it might be useful to align the two final curves first. It’s
also possible an even higher force constant will make the curves fit
even better. We will leave this optional exercise to you: change the
``awh1-dim1-force-constant`` value to a larger number and re-run the
simulations. What happens?

In the second plot, we can see that the reference value distribution is
approaching 1 kJ/mol across the whole sampling interval, which is what
we requested in the mdp file (shown by the target reference value
distribution in the plot). We could try to increase the sampling time
and/or the force constant to see if we get an even better fit.

Correcting for entropic contributions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

When looking at the plots for the PMF and bias above, we notice
something strange - the curves are slanted. When the molecules are far
apart the energy should be close to zero, which is the case, but we
would expect the energy to remain at this level for a while until it
suddenly increases as the molecules start interacting. We would not
expect the energy to increase seemingly linearly as they approach each
other. The reason for this artifact is that we get an extra entropic
force that wants to pull the two molecules apart as we are pulling on
the molecules. To get a PMF curve only including the interaction
energies of the two molecules, we need to correct for this effect.

The PMF, :math:`w(r)`, describes the free energy along one degree of
freedom (in this case the pyrimidines distance), :math:`r`, and can be
obtained as the potential of the average force,
:math:`\langle F \rangle`. This can then be divided into the average
interaction force, :math:`\langle F_i\rangle` plus the entropic force,
:math:`F_e`:

:math:`w(r) = - \int_\infty^0 \langle F(r) \rangle \text{d}r = - \int_\infty^0 ( \langle F_i(r) \rangle + F_e(r))\text{d}r`

Now, the sum of the terms will be determined by AWH, we want to know the
first term, and we thus want to determine the second contribution,
:math:`F_e`, to be able to subtract it from the results given by AWH.

From statistical mechanics, we know that free energy can be calculated
from the partition function, :math:`Z(r)`, as

:math:`w(r) = -k_B T \text{ln}(Z(r))`

where :math:`k_B` Boltzmann’s constant, and :math:`T` the temperature.

The partition function for the entropic contribution can be calculated
as all the configurations in terms of the collective variable,
:math:`r`, falling on the surface of a sphere. This is also the *radial
distribution function*, :math:`g(r)`, for the two molecules, and
represents the free phase space available to them.

:math:`Z(r) = g(r) = 4\pi r^2`

We can now calculate the entropic force:

:math:`F_e(r)= -\frac{\text{d}}{\text{d}r} w(r) = -\frac{\text{d}}{\text{d}r}(-k_BT\text{ln}(Z(r))) = \frac{\text{d}}{\text{d}r}k_BT\text{ln}(4\pi r^2) = \frac{2k_BT}{r}`

To get the correction term in the first equation, we now integrate the
force between :math:`r` and the edge of our sampling interval
:math:`r_m`, since we in practice can’t reach :math:`r=\infty` in our
simulations. We will also change integration variable to :math:`s` to
avoid confusion with our integration interval.

:math:`w_{corr}(r) = - \int_{r_m}^r F_e(s) \text{d}s = - (2k_BT\text{ln}(r) - 2k_BT\text{ln}(r_m)) + C_1 = 2k_BT\text{ln}(\frac{r_m}{r}) + C_1`

We know that when :math:`r = r_m` the correction potential should be
zero since all the entropic potential is already used, thus
:math:`C_1=0`.

--------------

Alternatively, we can obtain the same answer by only using the equation
for the free energy and requiring the total PMF and interaction
potential to be zero at :math:`r_m`, to determine the integration
constant, :math:`C_2`:

:math:`w(r) = w_i(r) + w_e(r) + C_2`

:math:`w(r) = w_i(r) - k_BT\text{ln}(4\pi r^2) + C_2`

:math:`w(r_m) = 0 - k_BT\text{ln}(4\pi r_m^2) + C_2 = 0`

:math:`C_2 = k_BT\text{ln}(4\pi r_m^2)`

which yields

:math:`w_{corr}(r) = w_e(r) + C_2 = - k_BT\text{ln}(4\pi r^2) + k_BT\text{ln}(4\pi r_m^2) = 2k_BT\text{ln}(\frac{r_m}{r})`

--------------

And there we have our **correction term**! Now, let’s plot it together
with our data to see if it seems correct.

First, let’s define a function that calculates the correction term,
which we can later re-use.

.. code:: ipython3

    def calculate_correction(r, rm): 
        # Define Boltzmann's constant (in kJ/(mol⋅K)
        kB = 0.008314463
        # Define the temperature used in the simulations (in K)
        T = 298.15
    
        # Calculate the correction PMF
        w_corr = 2*kB*T * np.log(rm/r)
        return w_corr

And now, let’s plot!

.. code:: ipython3

    fig, axs = plt.subplots()
    
    # Set the start and end data points (in ns)
    n_start = 0
    n_end = 100
    
    # Load and plot the PMF data from all XVG files
    for i in range(n_start, n_end):
        time = i*1000
        PMF = np.loadtxt('awh_t%s.xvg' %time, comments = ['@', '#'])
        axs.plot(PMF[:,0], PMF[:,1], c = plt.cm.gist_earth_r(i/n_end))
    
    # Get all distance numbers from the last XVG file (any of them could be used)
    r = PMF[:,0]
    # Extract the end of our sampling interval as the last distance value
    rm = PMF[:,0][-1]
    
    # Calculate and plot the correction
    correction = calculate_correction(r, rm)
    axs.plot(PMF[:,0], correction, label = "Correction term")
    
    # Set labels, legends and colorbar
    axs.legend()
    fig.colorbar(plt.cm.ScalarMappable(norm = plt.Normalize(n_start, n_end), cmap = 'gist_earth_r'), ax = axs, label = 'Time (ns)')
    axs.set_xlabel('Pyrimidines distance (nm)')
    axs.set_ylabel('Potential of mean force (kJ/mol)')

Now, we can apply the correction to the PMF data and see how it looks:

.. code:: ipython3

    fig2, axs2 = plt.subplots()
    
    # Set the start and end data points (in ns)
    n_start = 10
    n_end = 100
    
    # Calculate the correction using data from an arbitrary XVG file 
    PMF = np.loadtxt('awh_t0.xvg', comments = ['@', '#'])
    correction = calculate_correction(PMF[:,0], PMF[:,0][-1])
    
    # Load and plot the PMF data from all XVG files
    # We exclude some of the noisy curves from the beginning of the simulation to see our results better
    for i in range(n_start, n_end):
        time = i*1000
        PMF = np.loadtxt('awh_t%s.xvg' %time, comments = ['@', '#'])
        axs2.plot(PMF[:,0], PMF[:,1] - correction, c = plt.cm.gist_earth_r(i/n_end))
    
    # Set labels, legends and colorbar
    fig2.colorbar(plt.cm.ScalarMappable(norm = plt.Normalize(n_start, n_end), cmap = 'gist_earth_r'), ax = axs2, label = 'Time (ns)')
    axs2.set_xlabel('Pyrimidines distance (nm)')
    axs2.set_ylabel('Potential of Mean Force (kJ/mol)')

Now, our PMF looks flatter and is converging towards zero for larger
distance values, as we would expect! We can also see a global energy
minimum around 0.5 nm and a second local minimum around 0.75 nm. These
represent more favourable interaction distances for our molecules.

Comparing to results from umbrella sampling
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Finally, let’s compare our AWH PMF with the umbrella sampling PMF from
the omonimous tutorial.

But first, let’s quickly remind ourselves what umbrella sampling is. In
umbrella sampling. we run many simulations, each at different positions
along the reaction coordinate, applying an harmonic biasing potential
(that resembles an upside-down umbrella, thus the name) in order to
hinder the displacement of the system from the state that we chose. We
can then calculate the free energy difference between each position on
the reaction coordinate. Adding up all these free energy calculations
using WHAM (Weighted Histogram Analysis Method), we can get an
estimation of the free energy change along the pathway.

First, let’s copy the umbrella sampling reference data from the
reference folder to our working directory. If you want to learn how to
generate this data yourself, you can run through the `Umbrella sampling
tutorial <https://tutorials.gromacs.org/docs/umbrella-sampling.html>`__.

.. code:: ipython3

    !cp ../reference/profileUS.xvg .

.. code:: ipython3

    fig3, axs3 = plt.subplots()
    
    # Set the start and end data points (in ns)
    n_start = 0
    n_end = 100
    
    # Calculate the correction using data from an arbitrary XVG file 
    PMF = np.loadtxt('awh_t0.xvg', comments = ['@', '#'])
    correction = calculate_correction(PMF[:,0], PMF[:,0][-1])
    
    # Load and plot the AWH PMF data from all AWH XVG files
    for i in range(n_start, n_end):
        time = i*1000
        PMF = np.loadtxt('awh_t%s.xvg' %time, comments = ['@', '#'])
        axs3.plot(PMF[:,0], PMF[:,1] - correction, c = plt.cm.gist_earth_r(i/n_end))
    
    # Load the US PMF from one US XVG file
    PMF_US = np.loadtxt('profileUS.xvg', comments = ['@', '#'])
    
    # Extract all the r values from the US simulation (needed since the resolution is different compared to AWH)
    r_US = PMF_US[:,0]
    rm_US = PMF_US[:,0][-1]
    
    # Calculate the entropically corrected PMFs for the US simulation
    corrected_US = PMF_US[:,1] - calculate_correction(r_US, rm_US)
    
    # Plot the US result
    plt.plot(PMF_US[:,0], corrected_US, color = 'darkorange', label = 'PMF from umbrella sampling')
    
    # Set labels, legends and colorbar
    fig3.colorbar(plt.cm.ScalarMappable(norm = plt.Normalize(n_start, n_end), cmap = 'gist_earth_r'), ax = axs3, label = 'AWH simulation time (ns)')
    plt.xlabel('Pyrimidines distance (nm)')
    plt.ylabel('Potential of mean force (kJ/mol)')
    plt.legend()
    plt.ylim(-6, 6)

We can see that the shapes of the curves are very similar and that they
are both approaching zero for longer pyrimidine distances. However, for
us, the barrier hight after the global minimum, is the most interesting,
so let’s align the curves at their minima instead. Remember that we are
not after the absolute value of the free energy at each state but the
free energy difference between states, as it is independent of the path
taken between the states!

.. code:: ipython3

    fig4, axs4 = plt.subplots()
    
    # Set the start and end data points (in ns)
    n_start = 0
    n_end = 100
    
    # Calculate the correction using data from an arbitrary XVG file 
    PMF = np.loadtxt('awh_t0.xvg', comments = ['@', '#'])
    correction = calculate_correction(PMF[:,0], PMF[:,0][-1])
    
    # Load and plot the AWH PMF data from all AWH XVG files
    for i in range(n_start, n_end):
        time = i*1000
        PMF = np.loadtxt('awh_t%s.xvg' %time, comments = ['@', '#'])
        corrected_PMF = PMF[:,1] - correction
        axs4.plot(PMF[:,0], corrected_PMF - np.min(corrected_PMF), c = plt.cm.gist_earth_r(i/n_end))
    
    # Load the US PMF from one US XVG file
    US_PMF = np.loadtxt('profileUS.xvg', comments=['@','#'])
    
    # Extract all the r values from the US simulation (needed since the resolution is different compared to AWH)
    r_US = PMF_US[:,0]
    rm_US = PMF_US[:,0][-1]
    
    # Calculate the entropically corrected PMFs for the US simulation
    corrected_US = US_PMF[:,1] - calculate_correction(r_US, rm_US)
    
    # Plot the corrected and translated US result
    axs4.plot(US_PMF[:,0], corrected_US - np.min(corrected_US), color = 'darkorange', label = 'PMF from umbrella sampling')
    
    # Set labels, legends and colorbar
    fig4.colorbar(plt.cm.ScalarMappable(norm = plt.Normalize(n_start, n_end), cmap = 'gist_earth_r'), ax = axs4, label = 'AWH simulation time (ns)')
    plt.xlabel('Pyrimidines distance (nm)')
    plt.ylabel('Potential of mean force (kJ/mol)')
    plt.legend()
    plt.ylim(0, 10)

Now, we can see that we get a similar barrier height of approximately
**3.5 kJ/mol** from both AWH and umbrella sampling. This is the energy
required to break the interaction between the two pyrimidine molecules.

Discussion
~~~~~~~~~~

Now that we have looked at the plots of the PMFs from AWH, perhaps you
noticed something? Spend some time thinking about these points:

-  Why are there no errorbars?
-  How would you calculate the error for each point?

Running block analysis
----------------------

So far, we have only been analysing a single AWH simulation. While
running multiple simulations with the same initial parameters, or
*replicates* or *replicas*, are helpful to get an idea of the error of
your calculations, one would prefer to NOT have to repeat each
simulation to know if it worked well!

What about a *block analysis*? We can divide the simulation of the
‘production’ part of AWH (after leaving the initial stage) in different
size blocks and see how they behave!

Deal? Let’s have a look!

First let’s think about it for a bit, we exit the initial stage at
``9361.8 ps``, so we have the range between 10 ns and 100 ns to analyse.

What about a 2 ns block? Let’s try it!

Wait, wait, wait! What do we plot per block??? We are most interested in
the location of the global minimum of the free energy, so let’s try to
plot that!

.. code:: ipython3

    # Set the start and end data points (in ns)
    n_start = 0
    n_end = 100
    
    # Save the global minima here
    globmin = []
    
    # Calculate the correction using data from an arbitrary XVG file 
    PMF = np.loadtxt('awh_t0.xvg', comments = ['@', '#'])
    correction = calculate_correction(PMF[:,0], PMF[:,0][-1])
    
    # Load and plot the PMF data from all XVG files
    for i in range(n_start + 1, int(n_end*0.5) + 1):
        # We now take steps of 2000 ps (2 ns)
        time = i*2000
        PMF = np.loadtxt('awh_t%s.xvg' %time, comments = ['@', '#'])
        # Get the index that points to the global minimum
        minPMF = np.argmin(PMF[:,1] - correction)
        # Save the value for each block
        globmin.append(minPMF)
    
    # Plot the values!
    plt.scatter(np.arange(n_start + 1, int(n_end*0.5) + 1)*2, PMF[globmin,0])
    
    # Plot the time of exit from the initial stage as a reference
    plt.plot([9.361]*2, [np.min(PMF[globmin, 0]), np.max(PMF[globmin, 0])], '-.', color = 'gray') 
    
    # Set labels
    plt.xlabel('Time (ns)')
    plt.ylabel('Pyrimidines distance (nm)')

Now an exercise for you, try to plot the errorbars corresponding to each
block:

*Hint1:* we are plotting only the value at the end of each block here.

*Hint2:* Given the right data, ``plt.errorbar()`` should do the trick..

As the simulation is exiting the initial stage around 10 ns, we notice
something interesting - the global minimum is now shifted to favor that
the molecules are completely apart! After this value, it seems like
global minimum is correct at a distance slightly lower than 0.5 nm. So
when did we actually achieve convergence?

To answer this questions, let’s take a look at the PMF around the time
the simulation is exiting the initial stage. We will look at the
interval between 1 ns to 20 ns, highlighting 10 ns in red for reference.

.. code:: ipython3

    fig5, axs5 = plt.subplots(nrows = 1, ncols = 2, figsize = (12,4))
    
    # Set the start and end data points (in ns)
    n_start = 1
    n_end = 20
    
    # Set the time for exiting the initial stage as 10 ns
    exit_initial_stage = 10
    
    # Set the color scale
    colors = plt.cm.gist_earth_r(np.linspace(n_start - 1, 1, n_end - exit_initial_stage))
    
    # Calculate the correction using data from an arbitrary XVG file 
    PMF = np.loadtxt('awh_t0.xvg', comments = ['@', '#'])
    correction = calculate_correction(PMF[:,0], PMF[:,0][-1])
    
    # Load and plot the PMF data from all XVG files
    for i in range(n_start, n_end):
        time = i*1000
        PMF = np.loadtxt('awh_t%s.xvg' %time, comments = ['@', '#'])
    
        # Plot the results in two separate figures - before or after 10 ns
        if i < exit_initial_stage:
            axs5[0].plot(PMF[:,0], PMF[:,1] - correction, c = colors[np.abs(i-10)])
        elif i == exit_initial_stage:
            for ax in axs5:
                ax.plot(PMF[:,0], PMF[:,1] - correction, c = "firebrick", linewidth = 3, label = 't = 10 ns')
        else: 
            axs5[1].plot(PMF[:,0], PMF[:,1] - correction, c = colors[np.abs(i-10)])
    
    # Set labels, legends, and colorbar
    for ax in axs5:
        ax.sharey(axs5[0])
        ax.set_xlabel('Pyrimidines distance (nm)')
        ax.set_ylabel('Potential of mean force (kJ/mol)')
        ax.legend()
    
    axs5[0].set_title('Simulation time < 10 ns')
    axs5[1].set_title('Simulation time > 10 ns')
    fig5.colorbar(plt.cm.ScalarMappable(norm = plt.Normalize(n_start, n_end - exit_initial_stage), cmap = 'gist_earth_r'), ax = axs5, label = 'Time difference to when initial stage was exited')

Does the PMF at :math:`t=10` ns look *converged*?

Maybe, maybe not, we’ll leave the discussion to you.

What does **convergence** mean anyway?

We consider something converged when the estimation of a property of
interest is independent from the time at which it is measured. For
example, the PMF estimation does not change anymore within an error
range. In our example, the PMF changes much less after the 10 ns value.
At least, we can see that after this point, the shape of the curve is
stable and the global minimum no longer jumps around along the reaction
coordinate. Try to increase ``n_end`` in the code above and see what
happens. Would you still consider 10 ns converged?

From this, we realize that the metric we use to study converge matters a
lot! Since what we are interested in is the free energy difference
between the global minimum and the ‘maximum’ distance in the bulk
solution, this might be a better metric to use instead!

So, let’s calculate the free energy difference between the minimum
(which we found in the previous example) and the mean of the energy for
the last 1 nm.

.. code:: ipython3

    # Set the start and end data points (in ns)
    n_start = 0
    n_end = 100
    
    # Define the length of the bulk (end) part in nm
    bulk_d = 1
    
    # Save the difference between the bulk PMF and global minima here
    globdiff = []
    
    # Calculate the correction using data from an arbitrary XVG file 
    PMF = np.loadtxt('awh_t0.xvg', comments = ['@', '#'])
    correction = calculate_correction(PMF[:,0], PMF[:,0][-1])
    
    # Load and plot the PMF data from all XVG files
    for i in range(n_start + 1, int(n_end*0.5) + 1):
        # We now take steps of 2000 ps (2 ns)
        time = i*2000
        # Load the data
        PMF = np.loadtxt('awh_t%s.xvg' %time, comments = ['@', '#'])
        # Get the index that points to the global minimum
        minPMF = np.argmin(PMF[:,1] - correction)
    
        # Get the index of the beginning of the bulk section 
        idx_bulk_d = np.where(PMF[:,0] == (PMF[:,0][-1] - bulk_d))[0][0]
        # Calculate the mean PMF for the bulk part (beginning of bulk section to end of reaction coordinate)
        mean_d = np.mean(PMF[idx_bulk_d:,1] - correction[idx_bulk_d:])
        
        # Save the value for each block
        globdiff.append(mean_d - PMF[minPMF,1])
    
    # Plot the values!
    plt.scatter(np.arange(n_start + 1, int(n_end*0.5) + 1)*2, globdiff)
    # Plot the time of exit from the initial stage as a reference
    plt.plot([9.361]*2, [np.min(globdiff), np.max(globdiff)], '-.', color='gray')
    
    # Add labels
    plt.xlabel('Time (ns)')
    plt.ylabel('Pyrimidines distance (nm)')

Is this approach better? When is convergence reached now? While you
think about it for a minute, you can re-run the whole procedure, now
changing the size of blocks. Then what happens?

In case you want to use smaller blocks, you will need to re-run
``gmx awh``. We already printed out the AWH results every 100 ps in the
``AWH.edr`` file, so why not use all of our data points? This will allow
you to use even small blocks if you want. Uncomment the first line of
the next block if you wish to do this and make sure you are in the right
working directory with ``%pwd`` and navigating the file system with
``%cd``, if needed.

.. code:: ipython3

    #!gmx awh -f AWH.edr -s AWH.tpr -o analysis/awh

Running replicates
------------------

So what if we have multiple simulations or replicates? How would we
analyse our data? You are lucky! We already ran replicates for you! You
will find them among the reference data as ``AWH1``, ``AWH2`` and
``AWH3``. If you ran your own AWH simulation previously, you now have a
sample of 4! Lucky you!

There is a lot of analysis that can be done with multiple replicates,
however, we will now focus on one point that came up earlier in the
tutorial, **error estimation**. We said that to get an error estimation
we can use replicates of the same run, which will act as technical
replicates. A discussion of technical and biological replicates is not
in the cards today, but let’s say that we trust that our simulation is
converged, how do we know that it would always converge to **this**
specific solution? Replicates! Let’s have a quick look at the
pre-calculated replicates that you can find in the ``AWH1``, ``AWH2``
and ``AWH3`` in the ``reference`` folder. If you ran your own
calculations, you can add them to the error estimation too!

First, let’s check our current working directory and enter the data
folder again.

.. code:: ipython3

    # Check where we are
    !pwd
    # Enter the data folder
    %cd $tutorial_path/data

Now, let’s copy all the replicates from the ``reference`` folder. The
obeservant reader might realise we already copied one of the replicates
reference data before (specified by the ``AWH_dataset`` variable set in
the beginning of the tutorial), unless you ran your own AWH simulation,
of course. We will still copy all three datasets: ``AWH1``, ``AWH2``,
and ``AWH3``. If you already had worked on one of these datasets, it
means that some of the data will be overridden. But don’t worry, in
these folders only the input data should be present and we did no
changes to these files. If you’re worried about overriding data, you can
instead copy the folders one by one (excluding the one you already had
been working on).

.. code:: ipython3

    # Copy the reference data
    %cp -r reference/AWH? .
    
    # Let's check our working directory
    !ls

.. code:: ipython3

    # Include all the data folders you want to include in the analysis here
    data_folders = ['AWH1', 'AWH2', 'AWH3']
    
    # Loop over all the datasets and run the gmx awh in each folder
    for dataset in data_folders:
        %cd $dataset
        
        print("\n-------- Analysing dataset: ", dataset, "--------")
        !gmx grompp -f ../input/AWH.mdp -c ../input/conf.gro -p ../input/topol.top  -o AWH.tpr -n ../input/index.ndx
        !gmx awh -f AWH.edr -s AWH.tpr -o awh -skip 10
        
        %cd ..

Now that we have used ``gmx awh`` to compute the PMF for all the
replicates, let’s plot the results!

.. code:: ipython3

    fig, axs = plt.subplots(ncols = 3, sharey = True, figsize = (14, 4))
    
    # Set the start and end data points (in ns)
    n_start = 0
    n_end = 100
    
    # Calculate the correction using data from an arbitrary XVG file 
    PMF = np.loadtxt(data_folders[0] + '/awh_t0.xvg', comments = ['@', '#'])
    correction = calculate_correction(PMF[:,0], PMF[:,0][-1])
    
    # For each dataset, read in all the PMF data and plot it
    for k, dataset in enumerate(data_folders):
        for i in range(n_start, n_end):
            time = i*1000
            PMF = np.loadtxt(dataset + '/awh_t%s.xvg' %time, comments = ['@', '#'])
            axs[k].plot(PMF[:,0], PMF[:,1] - correction, c = plt.cm.gist_earth_r(i/n_end))
            axs[k].set_xlabel('Pyrimidines distance (nm)')
    
    # Add shared label and colorbar
    axs[0].set_ylabel('Potential of mean force (kJ/mol)')
    fig.colorbar(plt.cm.ScalarMappable(norm = plt.Normalize(n_start, n_end), cmap = 'gist_earth_r'), ax = axs, label = 'Time (ns)')

We can see that, qualitatively, we get the same PMF curves for each
replicate! That’s good news! It means our simulations converge towards
the same solution and we can draw similar biological conclusions from
all the replicates.

However, when looking closer, we can see that there are still some
deviations. How large are these deviations and how will they affect our
final conclusions? To answer these questions, we can quantify the
discrepancies with error bars. Let’s try it!

.. code:: ipython3

    # Calculate the correction using data from an arbitrary XVG file 
    PMF = np.loadtxt(data_folders[0] + '/awh_t0.xvg', comments = ['@', '#'])
    correction = calculate_correction(PMF[:,0], PMF[:,0][-1])
    
    # Read in the PMF data at the final frame (100 ns) for all replicates
    last_PMFs = []
    for k, dataset in enumerate(data_folders):
        last_PMF = np.loadtxt(dataset + '/awh_t100000.xvg', comments = ['@', '#'])[:,1] - correction
        # Save the PMFs and translate the curves so the global minimum is at 0 kJ/mol
        last_PMFs.append(last_PMF - np.min(last_PMF))
    
    # Calculate the average curve from the relicates
    average = np.mean(last_PMFs, axis = 0)
    # Calculate the standard deviations at each point over the replicates
    std = np.std(last_PMFs, axis = 0)
    
    # Plot the average PMF with error bars
    plt.errorbar(PMF[:,0], average, yerr = std, c = 'tab:blue', ecolor = 'tab:green')
    
    # Add labels
    plt.xlabel('Pyrimidines distance (nm)')
    plt.ylabel('Potential of mean force (kJ/mol)')


Analysing the replicas with the error estimation, we can estimate the
strength of the pyrimidine binding to :math:`3.52\pm0.18` kJ/mol when
using the three reference simulations. If you added your own simulation,
did you get a different value? How does this result compare to the
umbrella sampling example? Why do you think the errors are larger for
longer pyrimidines distances? How would more replicas affect your
results?

Closing thoughts
----------------

You have now learned how to run AWH simulations along one reaction
coordinate and how to analyse the results. Congratulations!

There are a lot of delicate decisions to make when using any enhanced
sampling technique, but we hope that this tutorial will empower you to
start making these decisions more confidently.

P.S. If you don’t feel that your AWH thirst has been quenced, look no
further than the next tutorial ``AWH-advanced`` to continue on your
climb! In that tutorial you will learn how to handle more than one
reaction coordinate, among other things.

**Ad maiora!**

Do you have any questions? Have a look at the user discussions on `GROMACS forums <htttp://forums.gromacs.org>`__
-----------------------------------------------------------------------------------------------------------------