.. code:: ipython3

    %run src/init_notebooks.py
    hide_toggle()
    check_notebook()

# Solvation free energy of ethanol using AWH method |ethanol-water.png|

.. |ethanol-water.png| image:: ./images/ethanol-water.png

::

   Author:   Alessandra Villa 
   Goal:     Learn how to run a solvation free energy computations for a small solute using alchemical transformation and accelerated weighted histogram (AWH) method
   Prerequisites: Know how to run an MD simulation, basis knowledge of thermodynamics
   Time:      90 minutes
   Software:  GROMACS 2022 
   Version:   1.0 

## Background Change in free energy is a very important physical
quantity that determines the direction of many biological processes. For
instance, the solubility of a molecule in a solvent is determined by the
solvation free energy - a change in free energy when taking a solute
molecule from vacuum into the solvent. Binding free energy is very
important when investigating the binding affinity of ligands (or drugs)
to a specific binding site on the receptor. In this tutorial we will
learn how to use accelerated weighted histogram (AWH) to evaluate
difference in free energy of two states - alchemical free energy
evaluation. To know about more the method, watch `the webinar: Applying
the Accelerated Weight Histogram method to alchemical
transformations <https://youtu.be/E5nGLcbyqTQ>`__.

For simplicity, we will focus on the free energy of solvation for a
small molecule (ethanol), but similar procedures are used to calculate
binding free energy. We will start with some background on how the free
energies can be calculated in molecular dynamics simulations, and how
the evaluation of free energy of solvation is one of the steps,
necessary to calculate free energy of binding. Then, we will focus on
the practicalities of doing such a calculation in GROMACS.

Calculating free energy
-----------------------

Assume that we need to compute the difference in free energy of two
states :math:`A` and :math:`B`. Let us also assume that :math:`A` and
:math:`B` are well separated in the phase space and direct comparison
between them is problematic. The trick then is to construct a set of
intermediate states that are close to one another in the phase space.
The proximity of the states allows one to compare them directly,
computing the free energy difference between two consequent intemediate
states. Then, going from one state to another, one can complete the full
path between the states :math:`A` and :math:`B` and the sum of the
differences between consequentive intermediate states will give the
difference in free energies between :math:`A` and :math:`B`.

For example, to calculate the binding free energy of the ligand to a
receptor, we ultimately need to compare the state, where the ligand is
bound to the receptor, to the state where both the ligand and the
receptor are separated in solution:

.. figure:: ./images/ligand-binding.png
   :alt: ligand-binding.png

   ligand-binding.png

This can be calculated directly, for example by slowly dragging the
ligand away from the receptor and computing the dependence of average
force between them. If the system will have enough time to explore all
possible states while the ligand is pulled away, the integral of the
force over the distance will give the free energy profile (usually
called potential of mean force). However, the simulation times needed
for convergence of this method are usually very long and when
convergence is not reached, the fluctuations in forces can be very
large. This makes the potential of mean force methods much more
expensive than using *free energy perturbation* or *alchemical free
energy computation* methods such as the Bennett Acceptance Ratio (BAR)
or accelerated weighted histogram (AWH).

Remember that a free energy difference between two states :math:`A` and
:math:`B` (:math:`F_B - F_A`) determines their relative probability
:math:`p_A` and :math:`p_B`,

.. math::  \frac{p_A}{p_B} = exp \frac{F_B - F_A}{k_BT}

where :math:`k_B` is Boltzmann’s constant that relates thermal energy to
the temperature (:math:`k_B = 1.38·10^{-23}` J/K), and :math:`T` is the
temperature. We could, in principle, calculate a free energy difference
by observing the system long enough, and measuring how often the system
is in which state (i.e. measuring probabilities :math:`p_A` and
:math:`p_B`). The free energy differences, however, are often of the
order of tens of kJ/mol. For example, the free energy of solvation of
ethanol at 298K is -20.1 kJ/mol, which is equivalent to -8.1
:math:`k_BT` or to a relative probability of being in one of the states
of :math:`3·10^{-4}`. We would need to wait a long time for that
transition to occur spontaneously, and even longer to get good
statistics on it. To shorten the time needed for convergence, free
energy methods rely on one basic idea: to force the system to where it
doesn’t want to be, and then measure by how much it doesn’t want to be
there. In potential of mean force, we apply mechanical force to decouple
ligand and receptor. In alchemical free energy computation methods, we
transform the system through a set of unphysical (alchemical) states,
that connect two physical states we are interested in. To do so, the
interaction strength between a molecule of interest and the rest of the
system is coupled to a variable :math:`\lambda`:

.. math::  E_{total} = E_{ligand-ligand} + E_{rest-rest} + \lambda E_{ligand-rest}  

and we slowly turn :math:`\lambda` from 1 to 0. This means we can
effectively turn off a ligand, and pretend that it is no longer coupled
to a system (at :math:`\lambda=0`). This way we force the system to
where it does not want to be (either to the solvated or to the vacuum
state, depending on what the sign of the free energy difference is).

The method of coupling and de-coupling the receptor and ligand can help
us with calculating the free energy of binding, because now we can
create a two-step path: |fep.png| where we first de-couple the ligand
from the solvent (top panel), and then re-solvate the ligand in the
presence of the receptor (bottom panel). The free energy of binding is
thus

.. math::  \Delta G_{binding} = \Delta G_1 + \Delta G_2  

and the simulation is split into two parts: one is calculating the
de-solvation free energy, and the other is calculating the free energy
of introducing the ligand into the binding site on the receptor. The
first simulation is the inverse of a free energy of solvation. That last
simulation couples the ligand from :math:`\lambda=0` where it doesn’t
interact with the system, to the state with :math:`\lambda=1`, where the
ligand is bound to the receptor. In this tutorial we will perform only
the first step of calculating the solvation energy. Partly for
computational performance reasons: because there is no protein involved,
the simulation box size can be small and the simulations will be fast.

.. |fep.png| image:: ./images/fep.png

## Free energy of solvation

To calculate a free energy of solvation, we calculate
:math:`-\Delta G_1` in the picture above, or, equivalently,
:math:`\Delta G_{solv}` in this picture: |fe-solvation.png| We’ll do
this by transforming the system through a set of unphysical (alchemical)
step, that connect two physical states and use AWH to move dynamically
between the states. (see `J. Chem. Phys. 154, 204103
(2021) <https://doi.org/10.1063/5.0044352>`__ for details). In a single
simulation :math:`\lambda` moves dynamically using Monte Carlo, a bias
potential is added to :math:`\lambda` to achieve uniform sampling.
Hamiltonian difference are computed to all the other :math:`\lambda`
values

.. |fe-solvation.png| image:: ./images/fe-solvation.png

The free energy difference can be calculated directly if the
:math:`\lambda` states are close enough. The term ‘close enough’ here
means that switching between the two states should be possible in both
directions and some of the configurations of the system should be
allowed in both states (i.e. the states should share some parts of phase
space).

The most obvious value for first and last :math:`\lambda` point would be
:math:`\lambda=0` and :math:`\lambda=1`. These end states, however,
usually have very few system conformations in common: they share very
little phase space. Because of this, we will never see enough
transitions from one state to the other and the free energy computations
will never converge in this case. That is why the transition is usually
split into several intermediate unphysical states: |fep-coupling.png|
with as many :math:`\lambda` points as possible for the convergence
between two nearby states (i.e. values of :math:`\lambda`). We will
therefore effectively ‘slowly’ turn on (or off) the interactions between
the solute and the solvent. Then we will process the data from all these
simulations, comparing the nearby values of :math:`\lambda`.

Here we will use 13 :math:`\lambda` points: 0, 0.1, 0.2, 0.3, 0.4, 0.5,
0.6, 0.7, 0.8, 0.85, 0.9, 0.95 and 1. Combining the results of all the
:math:`\lambda`-point will give us the total difference in free energy
(i.e. the difference between the states with :math:`\lambda=0` and
:math:`\lambda=1`).

.. |fep-coupling.png| image:: ./images/fep-coupling.png

Compare to the state-of-the-art way of doing the free energy
perturbation simulations, we will take one shortcut in this tutorial. We
will turn off both the electrostatic (Coulomb) interactions and the Van
der Waals (Lennard-Jones) interactions at the same time. For
high-quality results, these stages are normally separated, but here we
will do both of them at the same time for expediency. GROMACS uses
‘soft-core’ interactions to make sure that while the normal
(Lennard-Jones and Coulomb) interactions are being turned off, there
will never be two point charges sitting on top of each other: this is
achieved by turning on an interaction that effectively repels particles
at intermediate :math:`\lambda` points (in such a way that it cancels
out from the free energy difference). For details on the soft-core
interactions see
`here <http://manual.gromacs.org/current/reference-manual/functions/free-energy-interactions.html#feia>`__.

## Preparing the system Look for a file named **topol.top**, and a very
basic coordinate file named **ethanol.gro** in the input files folder.
This topology uses the `OPLS force
field <https://pubs.acs.org/doi/full/10.1021/ja9621760>`__ and defines
an ethanol molecule, and includes the definitions for SPC/E water, that
will be used for solvation.

.. code:: ipython3

    !cat input_files/topol.top

.. code:: ipython3

    !cat input_files/ethanol.gro

**Question**: Take a look at the topology file **topol.top**. For the
ethanol molecule definition, can you find which atoms are there, and how
they are connected? Note, that we use standard OPLS atom types for the
atoms in ethanol. These parameters were borrowed from the threonine
side-chain, which has the same structure as the ethanol molecule. For
more complex organic molecules one will have to look or to generate
appropriate topology and coordinate files.

Before we start preparing the system, let us introduce a small piece of
code that will allow us to interact with the command shell and
filesystem:

.. code:: ipython3

    import os,shutil
    
    #simple function to write mdp files
    def write_mdp(mdp_string, mdp_name, directory):
        mdp_filename=os.path.join(directory,mdp_name)
        mdp_filehandle=open(mdp_filename,'w')
        mdp_filehandle.write(mdp_string)
        mdp_filehandle.close()
    
    #get the path to the working directory
    pwd=os.getcwd()
    
    #path to gromacs files
    input_files=os.path.join(pwd,'input_files')
    output_files=os.path.join(pwd,'output_files')
    if not os.path.isdir(output_files):
        os.mkdir(output_files)
    shutil.copy(os.path.join(input_files,'ethanol.gro'),os.path.join(output_files,'ethanol.gro'))
    shutil.copy(os.path.join(input_files,'topol.top'),os.path.join(output_files,'topol.top'))
    %cd output_files

The write_mdp(…) function, defined here will be used to create mdp
configuration files directly from this notebook. With all input files
copied to the working directory, we can safely start working. Note, that
you can always reset this tutorial by deleting the ‘output_files’ folder
and executing the box above. Below, all the lines starting with !
indicate the shell commands.

First, we need to define the simulation box, which later will be filled
with water solvent. The box should fully incorporate the solute with
some space to spare:

.. code:: ipython3

    !gmx editconf -f ethanol.gro -o box.gro -bt dodecahedron -d 1.2

This sets up the simulation box as a rhombic dodecahedron with a minimum
distance between the solute (the ethanol molecule) and the box edge of
1.2 nm. The rhombic dodecahedron shape of the box provides more
effective packing of periodic images in comparison to rectangular boxes
and hence, fewer water molecules are needed to separate the periodic
images of the ethanol molecule. See the [GROMACS manual]
(http://manual.gromacs.org/current/reference-manual/algorithms/periodic-boundary-conditions.html)
for illustrations of this box shape and how its periodic images are
arranged.

The gmx editconf command defines the box, without adding the solvent to
it. The later is done by the gmx solvate command below.

.. code:: ipython3

    !gmx solvate -cp box.gro -cs -o solvated.gro -p topol.top

This will fill the box we created on the previous step with water
molecules (roughly 500 of them). The solute will be taken from the file
listed in the -cp option. The -cs option can be used to specify the
template system that contais solvent molecules only. Leaving it blank
will make GROMACS use the default solvent layout. The command will add
the coordinates for the water molecules to the end of the gro file and
will update the topology. You are strongly encouraged to investigate
corresponding files.

The initial placement of water molecules is not ideal and may contain
atoms that are too close to one another or small patches of empty
spaces. The coordinates of the solute atoms may come from the
experiment, where the placement of atoms is not precise. These, very
small discrepancies in the atom positioning can carry a very substantial
excess of potential energy and, as a result, the system will have very
high interatomic forces. So high that the numerical integration of
equation of motion will be unstable even for a very short timesteps. To
overcome this and make the system configuration suitable for simulation,
we will first perform the energy minimization. Executing the following
cell will create the mdp file with the instructions to GROMACS to
perform the energy minimization. The file will then be preprocessed by
grompp into a **em.tpr** run file - the file that contains all the data
necessary to perform the minimization, including coordinates of the
atoms, topology and simulation protocol. Running the mdrun on the last
line will execute the energy mimimization simulations.

.. code:: ipython3

    em_mdp=""";minimal mdp options for energy minimization
    integrator               = steep
    nsteps                   = 500
    coulombtype              = pme
    """
    write_mdp(em_mdp,'em.mdp',output_files)
    !gmx grompp -f em.mdp -c solvated.gro -o em.tpr
    !gmx mdrun -v -deffnm em 

## Equilibration of energy We are now ready to perform energy
equilibration, a step necessary to achieve thermal equilibrium and
uniformly distributh the energy over the system. For this step we will
turn on pressure and temperature coupling: we are about to compute the
difference in Gibbs free energy, and for that, the system must maintain
both constant temperature and pressure, while the ethanol molecule is
being de-coupled. For that, we will be using the velocity rescale
thermostat, and the C-rescale barostat (here more on
`thermostat <https://manual.gromacs.org/documentation/current/reference-manual/algorithms/molecular-dynamics.html#temperature-coupling>`__
and
`barostat <https://manual.gromacs.org/documentation/current/reference-manual/algorithms/molecular-dynamics.html#pressure-coupling>`__).
The global equilibration is done with **equil.mdp** config file,
constructed below. As with the mimization, the tpr file is then
constructed and passed to GROMACS to run simulations.

.. code:: ipython3

    equil_mdp=""";equilibration mdp options
    integrator               = md
    nsteps                   = 100000
    dt                       = 0.002
    nstenergy                = 100
    rlist                    = 1.1
    nstlist                  = 10
    rvdw                     = 1.1
    coulombtype              = pme
    rcoulomb                 = 1.1
    fourierspacing           = 0.13
    constraints              = h-bonds
    tcoupl                   = v-rescale
    tc-grps                  = system
    tau-t                    = 0.5
    ref-t                    = 300
    pcoupl                   = C-rescale
    ref-p                    = 1
    compressibility          = 4.5e-5
    tau-p                    = 1
    gen-vel                  = yes
    gen-temp                 = 300
    """
    write_mdp(equil_mdp,'equil.mdp',output_files)
    !gmx grompp -f equil.mdp -c em.gro -o equil.tpr
    !gmx mdrun -deffnm equil 

Here you can use the mdrun option **-ntmpi 1** to specify only using one
mpi rank since the system is so small. For larger systems more mpi ranks
would be appropriate. For very small systems such as this one, it would
also be possible to use **-nt 1** to specify only using one processor.
You may want to adjust the number of cores used, but given the small
system size, even one core should be sufficient to complete the run in
about a minute. After this step we should be ready with a *hopefully*
equilibrated configuration of ethanol in water, saved in **equil.gro**.

**Question**: You should check whether the system has been equilibrated.
How could you do this?

Note: if you do not want to wait, but go on with the tutorial, copy the
data from the /reference directory into the output_files directory. To
do this from within this notebook, remove the comment characters (#) in
the following cell

.. code:: ipython3

    ## ONLY execute the lines below if you do not want to run and wait for the simulations to finish. 
    ## Before executing the command below, check that you are in output_files directory
    #!cp ../reference/equil.edr ./ 
    #!cp ../reference/equil.gro ./


Creating the :math:`\lambda` points
-----------------------------------

Now the system preparation is complete and we are ready to do a
production simulations. First, we need to define different
:math:`\lambda` points, each of which indicate the alchemical state
along the transition between solvated and *in vacuo* states. We define
those states in the mdp configuration file below with consecutive values
of :math:`\lambda`. The free energy settings state the following: take
the molecule ethanol, and couple it to our variable :math:`\lambda`.
This is done so that :math:`\lambda=0` means that the molecule is
de-coupled, and :math:`\lambda=1` means that the molecule is fully
coupled. Take a look at the couple-lambda0 and couple-lambda1 options in
the file. There, vdwq means that both Lennard Jones and Coulomb will be
affected by the value of :math:`\lambda`, hence the solute molecule will
be fully decoupled. What is decoupled is be selected by the
couple-moltype option (ethanol in our case). The **sc-power**,
**sc-sigma** and **sc-alpha** settings control the ‘soft-core’
interactions that prevent the system from having overlapping particles
as it is being de-coupled. In this tutorial we use Beutler’s softcore
developed by `Beutler and
coworkers <https://manual.gromacs.org/documentation/2021/reference-manual/functions/free-energy-interactions.html#soft-core-interactions>`__.

A full description of the simulation parameters for free energy
calculation in GROMACS can be found
`here <http://manual.gromacs.org/current/user-guide/mdp-options.html#free-energy-calculations>`__.

.. code:: ipython3

    fep_awh_mdp="""; we'll use the sd integrator (an accurate and efficient leap-frog stochastic dynamics integrator)
    integrator               = sd
    nsteps                   = 700000 ; equilivate to 100000 step for 7 lambda points
    dt                       = 0.002
    nstenergy                = 1000
    nstcalcenergy            = 50 ; should be a divisor of nstdhdl 
    nstlog                   = 5000
    ; cut-offs at 1.0nm
    rlist                    = 1.1
    rvdw                     = 1.1
    ; Coulomb interactions
    coulombtype              = pme
    rcoulomb                 = 1.1
    fourierspacing           = 0.13
    ; Constraints
    constraints              = h-bonds
    ; set temperature to 300K
    tc-grps                  = system
    tau-t                    = 2.0
    ref-t                    = 300
    ; set pressure to 1 bar with a thermostat that gives a correct
    ; thermodynamic ensemble
    pcoupl                   = C-rescale
    ref-p                    = 1.0
    compressibility          = 4.5e-5
    tau-p                    = 5.0
    
    ; and set the free energy parameters
    free-energy              = yes
    couple-moltype           = ethanol
    nstdhdl                  = 50 ; frequency for writing energy difference in dhdl.xvg, 0 means no ouput, should be a multiple of nstcalcenergy. 
    ; these 'soft-core' parameters make sure we never get overlapping
    ; charges as lambda goes to 0
    ; soft-core function
    sc-power                 = 1
    sc-sigma                 = 0.3
    sc-alpha                 = 1.0
    ; we still want the molecule to interact with itself at lambda=0
    couple-intramol          = no
    couple-lambda1           = vdwq
    couple-lambda0           = none
    ; here we provide the lambda values we want to start
    init-lambda-state        = 12                
    calc-lambda-neighbors    = -1 ; A value of -1 means all lambda points will be written out
    ; NOTE for BAR a value of 1 is sufficient, 
    ; while for MBAR or AWH -1 should be used.
    ;
    ; These are the lambda states at which we simulate
    ; for separate LJ and Coulomb decoupling, use
    fep-lambdas              = 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.85 0.9 0.95 1.0
    ;
    ; awh setting
    awh                        = yes
    ; we apply  potential bias by controlling the position of an 
    ; harmonic potential using Monte-Carlo sampling. 
    awh-potential              = umbrella ; 
    awh-nstout                 = 50000 ; save frequency to energy file (multiple of nstenergy)
    awh-nbias                  = 1  ; n. of bias
    awh-nstsample              = 50 ; multiple of nstcalcenergy
    awh-nsamples-update        = 10
    ; initial biasing rate is determited by awh1-error-init together with 
    ; the given diffusion constant(s) awh1-dim1-diffusion  
    awh1-error-init            = 10.0 ; 
    awh1-dim1-diffusion        = 0.001 ; as high as possible (10^2 to 10^4 ps^(-1))
    ;
    awh1-growth                = exp-linear ; 
    awh1-equilibrate-histogram = no ; 
    ; bias tuned towards a constant (uniform) coordinate distribution
    awh1-target                = constant ; 
    ; awh1-user-data             = no
    ; Number of dimensions of the coordinate
    awh1-ndim                  = 1
    awh1-dim1-coord-provider   = fep-lambda  ; free energy lambda state = reaction coordinate
    awh1-dim1-coord-index      = 1 ; might not be necessary with fep-lambda
    awh1-dim1-start            = 0
    awh1-dim1-end              = 12    
    """
    write_mdp(fep_awh_mdp,'fep_awh.mdp',output_files)

Note a simple way of parallelizing sampling with AWH is to launch
multiple simultaneous trajectories, so called “walkers”, and have them
share a common bias and free energy estimate. This can reduce the length
of the initial stage significantly. To enable multiwalker, use the
option awh-share-multisim=yes and awh1-share-group = positive in the mdp
file and gmx mdrun with the option -multidir. But this is not necessary
for the system of our size. You may want to use more than one core
though.

.. code:: ipython3

    !gmx grompp -f fep_awh.mdp -c equil.gro -o fep_awh.tpr

.. code:: ipython3

    !gmx mdrun -deffnm fep_awh

Now we have a look at the files we got:

.. code:: ipython3

    !ls

Here we have obtained the information on all the lambda points one
simulation of 1400 ps, while when we have run the solvation free energy
tutorial using BAR, we have a directory for each lambda point and we
have run 200 ps for each point.

**Question**: Did we use the same lambda points between the two
approaches? and what is the difference in the total simulation time?

*Note: if you do not want to wait, but look at some of the results
directly, copy the data from the /reference directory into the
output_files directory. To do this from within this notebook, remove the
comment characters (#) in the following cell*

.. code:: ipython3

    ## ONLY execute the lines below if you do not want to run and wait for the simulations to finish. Before executing the command below, check that you are in output_files directory
    #!cp ../reference/fep_awh.edr ./ 


Post-processing: extracting the free energy
-------------------------------------------

Data directly related to AWH is extracted using the GROMACS tool `gmx
awh <https://manual.gromacs.org/2021/onlinehelp/gmx-awh.html>`__. The
data is stored in the energy file, ener.edr. The default output file
name is of the format awh_t.xvg; there will be one file for each AWH
output time, or less if you use the -skip flag. Let’s dump the output
into a separate directory.

.. code:: ipython3

    !ls awh-data

.. code:: ipython3

    !mkdir -p awh-data
    !gmx awh -quiet -s fep_awh.tpr -more -f fep_awh.edr -o awh-data/awh.xvg

Above, the -more flag tells gmx awh to extract more than the PMF from
the AWH files, e.g. histograms. The default gmx unit of energy is
kJ/mol, but with the flag -kT we get units of kT.

Let’s have a quick look at the contents of one of these files (from time
1000 ps). You can use xmgrace on your local machine to plot the file, by
removing the comment character (#) in the following cell to do that.

For plotting and data analysis we provide a Python function to read the
.xvg file format into this notebook.

.. code:: ipython3

    import numpy as np
    import matplotlib.pyplot as plt
    
    # Function that reads data from an .xvg file
    def read_xvg(fname):
        data=[]
        with open(fname) as f: 
            for line in f:
                # Lines with metadata or comments start with #, @
                if not line.startswith(("@","#")):
                    data.append(np.array([float(s) for s in line.split()]))
        data = np.vstack(data)
        return data

.. code:: ipython3

    t=1000 # ps
    # -nxy to plot multicolumn data
    awh_data = read_xvg("awh-data/awh_t{}.xvg".format(t)).T
    
    fig,ax = plt.subplots()
    ax.plot(awh_data[0],awh_data[1])
    ax.set_xlabel("#lambda")
    ax.set_ylabel("(kJ/mol)")
    
    # uncomment the line below if you have xmgrace installed
    # !xmgrace -free -nxy awh-data/awh_t{t}.xvg # wait for it... 

Now we look closer at the time-evolution of the free energy (you can do
the same for either variable in the awh.xvg file).

.. code:: ipython3

    # A list of all AWH files
    fnames = !ls awh-data/awh_t*xvg
    
    # Extract time of each file
    times = [float(re.findall('awh_t(.+?).xvg', fname)[0]) for fname in fnames]
    
    # Sort the files chronologically
    fnames = [f for (t,f) in sorted(zip(times, fnames))]
    times.sort()
    print("Number of files:", len(fnames))
    print("Time in between files ", times[1] - times[0], 'ps')
    print("Last time", times[-1], 'ps')
    
    # Plot the PMF from first files/times
    labels=[]
    istart = 0 # Start plotting from this file index
    nplot = 15  # Number of files to plot
    for fname in fnames[istart:istart+nplot]:
        data=read_xvg(fname)
        labels.append(re.findall('awh_t(.+?).xvg', fname)[0] + ' ps') # use the time as label
        plt.plot(data[:,0], data[:,1])
    plt.xlabel('#lambda')
    plt.ylabel('(kJ/mol)')
    plt.xlim([0,12])
    plt.ylim([0,40])
    plt.legend(labels);

Try increasing the istart variable above to see how the free energy
estimates are changing (less and less) over time. Note: this convergence
does not easily translate into error estimates for the solvation free
energy. To get such error bars the simplest is to run multiple
(independent) AWH simulations and calculate standard errors from that.

Something that you may want to also check is the friction.

.. code:: ipython3

    !gmx awh -fric awh-data/friction.xvg -b 1400 -f fep_awh.edr -s fep_awh.tpr -quiet

.. code:: ipython3

    !ls -alt awh-data/friction*

.. code:: ipython3

    t=1400 # ps
    awh_data = read_xvg("awh-data/friction_t{}.xvg".format(t)).T
    
    fig,ax = plt.subplots()
    ax.plot(awh_data[0],awh_data[1])
    ax.set_xlabel("#lambda")
    ax.set_ylabel("friction")
    
    # uncomment the line below if you have xmgrace installed
    # -nxy to plot multicolumn data
    # !xmgrace -free awh-data/friction_t{t}.xvg # wait for it... 

To obtain the solvation free energy we have to substruct the values at
first lambda point (lambda=0) to the values at last lambda point
(lambda=1) at time 1400 ps (last point in time).

.. code:: ipython3

    !cat awh-data/awh_t1400.xvg

.. code:: ipython3

    lambda_1 =    #copy here number from lambda point 12 - value in the 2nd column
    lambda_0 =   #copy here number from lambda point 0 - value in the 2nd column
    number = lambda_1 - lambda_0
    print(number)       # solvation free energy in kJ/mol

Look for the final number above, it should be close to the experimental
value. With appropriate sampling you should get a value of approximately
-19.1 +/- 0.3 kJ/mol (see for example `J. Phys. Chem. B
2006 <https://pubs.acs.org/doi/full/10.1021/jp0641029>`__).

**Question**: you can run multiple (independent) AWH simulations and
calculate standard errors for the free energy. To run indipendent
simulations you can for example use different starting velocities
distribution. Try it. What is the standard error? How could it be
improved? Also you can try to longer runs to get better convergency.

**Question**: Why can there sometimes be a significant (larger than the
standard error) difference between the experimental result and the
simulation result? How could this be improved?

## Topology example for relative free energy of solvation As example we
take the alchemical transformation from ethanol to ethanethiol in water
solution. Below we look at the topology file that describes the
transformation from ethanol (state A) to ethanethiol (state B

.. code:: ipython3

    #Before executing the command below, check that you are in output_files directory
    !cat ../input_files/ethanol_Sethane.top

Note that atom mass has to be kept constant between state A and B for
technical reason. That will not affect your free energy difference

## Where to go from here After calculating the free energy of solvation,
we’ve solved the first part of the free energy of binding of the earlier
equations. The second part involves coupling a molecule into (or out of)
a situation where it is bound to a protein. This introduces one
additional complexity: we end up with a situation where a weakly coupled
ligand wanders through our system: |weak-couple.png| which is bad
because this is a poorly reversible situation: there are suddenly very
few states that map from a weakly coupled to a more strongly coupled
molecule, which will drastically reduce the accuracy of the free energy
calculation.

This situation can be remedied by forcing the ligand to stay at a
specific position relative to the protein. This can be done with the
GROMACS ‘pull code’, which allows the specification of arbitrary forces
or constraints onto with respect to centers of mass of any chosen set of
atoms onto any other group of atoms. With a pull type of ‘umbrella’, we
can specify that we want a quadratic potential to this specified
location, forcing the ligand to stay at its native position even when it
has been fully de-coupled.

One way find out where to put the center of the force is by choosing a
group of atoms in the protein close to the ligand, and doing a
simulation with full ligand coupling, where the pull code is enabled,
but with zero force. The pull code will then frequently output the
coordinates of the ligand, from which an average position and an
expected deviation can be calculated. This can then serve as a reference
point for the location of the center of force for the pull code during
the production runs, and the force constant of the pull code.

Once the free energy has been calculated, care must be taken to correct
for the fact that we have trapped our molecule. This can easily be done
analytically.

**Optional Question**: Given a measured standard deviation in the
location of the center of mass of our ligand, how do we choose the force
constant for the pull code?

**Optional Question**: How do we correct for using the pull code: what
is the contribution to the free energy of applying a quadratic potential
to a molecule?

.. |weak-couple.png| image:: ./images/weak-couple.png