.. code:: ipython3

    # Execute this first 
    #
    #  * trigger notebook styling
    #  * check if notebook had been modified since its distribution
    # 
    # Note: executing any cells before this modifies the notebook.
    # 
    #%run src/init_notebooks.py
    #hide_toggle()
    #check_notebook()

Molecular dynamics simulation of a small protein using GROMACS
==============================================================

::

   authors  : Cathrine Bergh, Maryam Majdolhosseini and Alessandra Villa.    
   goal     : learn step-by-step how to run a molecular dynamics simulation of a small protein using GROMACS 
   time     : 90 minutes
   software : GROMACS 2024 (2023), python modules: numpy, matplotlib, re, nglviewer, md_traj, pandas 
   optional software: visualization software [VMD](https://www.ks.uiuc.edu/Research/vmd), Xmgrace plotting tool
   tutorial source: tutorials.gromacs.org
   version  : release - doi:10.5281/zenodo.11198375 https://zenodo.org/records/14803238

Preparations to run this notebook
=================================

Before we can start running this notebook, there are a few Python
libraries we will need to import. These are used for structure and
trajectory visualizations (NGLview), reading and plotting data files
(pandas), and handling MD trajectory data in Python (mdtraj). If you
prefer to work in the terminal, commands for using Xmgrace for plotting
and VMD for visualization will also be provided throughout the tutorial.

.. code:: ipython3

    import nglview as ng
    import pandas as pd
    import mdtraj as md

We will also need to change working directory into a prepared folder
called “data”:

.. code:: ipython3

    # Change to the data directory
    # Note that executing this command twice will result in an error you can ignore
    %cd data

.. code:: ipython3

    %ls

In this folder all our generated data will be stored. All prepared input
files are stored in the “input” folder and pre-computed reference
simulation results in the “reference” folder.

Now, we are all set to start running the tutorial!

Obtaining the input for a simulation
====================================

The starting point for every simulation is a molecular structure file.
In this tutorial, we will simulate Factor Xa - a protein playing a
critical role in the formation of blood clots. The 3D structure is
available from the RCSB website, https://www.rcsb.org/ with PDB code
1FJS. You can find the PDB file for the crystal structure in the “input”
directory as a file called “ifjs.pdb”.

Now, we visualize the structure:

.. code:: ipython3

    view = ng.show_structure_file("input/1fjs.pdb")
    view
    # click and drag to rotate, zoom with your mouseweel 
    # for more infor on this viewer have a look at https://github.com/nglviewer/nglview

Alternatively, you can use VMD to visualize the structure on your local
machine. To run VMD from within this notebook, remove the comment
character (#) in the following cell and VMD should pop up:
**``close the VMD window after you are done looking at the protein to continue with this notebook``**

.. code:: ipython3

    #!vmd input/1fjs.pdb

Cleaning the input structure
----------------------------

Once you’ve had a look at the molecule, you are going to want to strip
out all the atoms that do not belong to the protein (e.i crystal waters,
ligands, etc). To delete those atoms (labelled “HETATM” in the PDB file)
and eventually their connectivity, either use a plain text editor like
vi, emacs (Linux/Mac), or Notepad (Windows). Do not use word processing
software! Alternatively, you can use grep to delete these lines very
easily:

.. code:: ipython3

    !grep -v HETATM input/1fjs.pdb > 1fjs_protein_tmp.pdb
    !grep -v CONECT 1fjs_protein_tmp.pdb > 1fjs_protein.pdb

The cleaned structure now looks like this:

.. code:: ipython3

    view = ng.show_structure_file("1fjs_protein.pdb")
    view
    # click and drag to rotate, zoom with your mouseweel 
    # for more infor on this viewer have a look at https://github.com/nglviewer/nglview

In alternative, you can use VMD to visualize the structure on your local
machine. To run VMD from within this notebook, remove the comment
character (#) in the following cell and VMD should pop up:

.. code:: ipython3

    #!vmd 1fjs_protein.pdb

Note *Such a procedure is not universally appropriate (e.g., the case of
a tightly bound ligand or otherwise functional active-site water
molecule).*

Always check your .pdb file for entries listed under the comment
MISSING, as these entries indicate either atoms or whole residues that
are not present in the crystal structure. Terminal regions may be
absent, and may not present a problem for dynamics.

.. code:: ipython3

    !grep MISSING input/1fjs.pdb

Generating a topology
---------------------

Now, that we have verified that our protein is whole and doesn’t contain
any uwanted atoms, we are ready to use it as input to GROMACS (see the
GROMACS documentation ). The first GROMACS tool we will use is
```gmx pdb2gmx`` <https://manual.gromacs.org/current/onlinehelp/gmx-pdb2gmx.html>`__.
The purpose of ``gmx pdb2gmx`` is to generate three files:

-  The topology for the molecule (.itp)
-  A position restraint file (.itp)
-  A post-processed structure file (.gro)

The topology (topol.top by default) contains all the information
necessary to define the molecule within a simulation. This information
includes nonbonded parameters (atom types and charges) as well as bonded
parameters (bonds, angles, dihedrals and atom connectivity). We will
take a more detailed look at the topology once it has been generated.

.. container:: alert alert-block alert-info

   NOTE: Incomplete internal sequences or any amino acid residues that
   have missing atoms will cause ``gmx pdb2gmx`` to fail. These missing
   atoms/residues must be modeled in using other software packages. Also
   note that ``gmx pdb2gmx`` is not magic. It cannot generate topologies
   for arbitrary molecules, just the residues defined by the force field
   (in the \*.rtp files - generally proteins, nucleic acids, and a very
   finite amount of cofactors, like NAD(H) and ATP).

Execute ``gmx pdb2gmx`` by the following command:

.. code:: ipython3

    !gmx pdb2gmx -f 1fjs_protein.pdb -o 1fjs_processed.gro -water tip3p -ff "charmm27"

Here, we made an important decision for the course of the simulation in
choosing the CHARMM27 all-atom force field. The force field will contain
the information that will be written to the topology. This is a very
important choice! You should always read thoroughly about each force
field and decide which is most applicable to your situation. A prompt
with more force field options (shown below) will open when running
``gmx pdb2gmx`` without the ``-ff`` flag.

::

   Select the Force Field:
   From '/usr/local/gromacs/share/gromacs/top':
    1: AMBER03 protein, nucleic AMBER94 (Duan et al., J. Comp. Chem. 24, 1999-2012, 2003)
    2: AMBER94 force field (Cornell et al., JACS 117, 5179-5197, 1995)
    3: AMBER96 protein, nucleic AMBER94 (Kollman et al., Acc. Chem. Res. 29, 461-469, 1996)
    4: AMBER99 protein, nucleic AMBER94 (Wang et al., J. Comp. Chem. 21, 1049-1074, 2000)
    5: AMBER99SB protein, nucleic AMBER94 (Hornak et al., Proteins 65, 712-725, 2006)
    6: AMBER99SB-ILDN protein, nucleic AMBER94 (Lindorff-Larsen et al., Proteins 78, 1950-58, 2010)
    7: AMBERGS force field (Garcia & Sanbonmatsu, PNAS 99, 2782-2787, 2002)
    8: CHARMM27 all-atom force field (CHARM22 plus CMAP for proteins)
    9: GROMOS96 43a1 force field
   10: GROMOS96 43a2 force field (improved alkane dihedrals)
   11: GROMOS96 45a3 force field (Schuler JCC 2001 22 1205)
   12: GROMOS96 53a5 force field (JCC 2004 vol 25 pag 1656)
   13: GROMOS96 53a6 force field (JCC 2004 vol 25 pag 1656)
   14: GROMOS96 54a7 force field (Eur. Biophys. J. (2011), 40,, 843-856, DOI: 10.1007/s00249-011-0700-9)
   15: OPLS-AA/L all-atom force field (2001 aminoacid dihedrals)

There are many other options that can be passed to ``gmx pdb2gmx`` (see
the
`manual <http://manual.gromacs.org/documentation/current/onlinehelp/gmx-pdb2gmx.html>`__).
Some commonly used ones are listed here:

+-----------------------------------+-----------------------------------+
| Option                            | Effect                            |
+===================================+===================================+
| -water                            | Water model to use: none, spc,    |
|                                   | spce, tip3p, tip4p, tip5p,        |
|                                   | tips3p.                           |
+-----------------------------------+-----------------------------------+
| -ignh                             | Ignore H atoms in the PDB file;   |
|                                   | especially useful for NMR         |
|                                   | structures. Otherwise, if H atoms |
|                                   | are present, they must be in the  |
|                                   | named exactly how the force       |
|                                   | fields in GROMACS expect them to  |
|                                   | be. Different conventions exist,  |
|                                   | so dealing with H atoms can       |
|                                   | occasionally be a headache! If    |
|                                   | you need to preserve the initial  |
|                                   | H coordinates, but renaming is    |
|                                   | required, then the Linux sed      |
|                                   | command is your friend.           |
+-----------------------------------+-----------------------------------+
| -ter                              | Interactively assign charge       |
|                                   | states for N- and C-termini.      |
+-----------------------------------+-----------------------------------+
| -inter                            | Interactively assign charge       |
|                                   | states for Glu, Asp, Lys, Arg,    |
|                                   | and His; choose which Cys are     |
|                                   | involved in disulfide bonds.      |
+-----------------------------------+-----------------------------------+

A peek at the generated files
-----------------------------

When running ``gmx pdb2gmx`` four new files were generated:
1fjs_processed.gro, topol.top, topol_Protein_chain_X.itp and
posre_Protein_chain_X.itp. You now have the following files in your data
folder:

.. code:: ipython3

    !ls

1fjs_processed.gro is a GROMACS-formatted structure file that contains
all the atoms defined within the force field (i.e., H atoms have been
added to the amino acids in the protein). The topol.top file is the
system topology (more on this in a minute). The posre files contain
information used to restrain the positions of heavy atoms (more on this
later).

.. container:: alert alert-block alert-info

   NOTE: Many users assume that a .gro file is mandatory. This is not
   true. GROMACS can handle many different file formats, with .gro
   simply being the default for commands that write coordinate files. It
   is a very compact format, but it has limited precision. If you prefer
   to use, for instance, PDB format, all you need to do is to specify an
   appropriate file name with .pdb extension as your output. The purpose
   of ``gmx pdb2gmx`` is to produce a force field-compliant topology;
   the output structure is largely a side effect of this purpose and is
   intended for user convenience. The format can be just about anything
   you like, see
   https://manual.gromacs.org/documentation/current/reference-manual/file-formats.html
   for different format.

Understanding molecule “topologies”
===================================

First, let’s have a look at the output topology file (topol.top). Using
a plain text editor, we inspect its contents:

.. code:: ipython3

    !cat topol.top

The file begins with many comments (lines beginning with ; ), which
specify some meta data regarding the file, e.g. by who, when and how it
was created. The first un-commented line calls the parameters of the
chosen force field. It is at the beginning of the file, indicating that
all subsequent parameters are derived from this force field.

After that, we include the topologies of our two protein molecules that
were generated by ``pdb2gmx``. First, we take a look at the
``[ moleculetype ]`` section of the protein itp file:

.. code:: ipython3

    !grep "moleculetype" -A 3 topol_Protein_chain_A.itp

and

.. code:: ipython3

    !grep "moleculetype" -A 3 topol_Protein_chain_L.itp

The name “Protein_chain_A” defines the molecule name, based on the fact
that the protein was labeled as chain A in the PDB file. There are 3
exclusions for bonded neighbors. More information on exclusions can be
found in the GROMACS manual.

Atoms in a topology
-------------------

The next section defines the ``[ atoms ]`` in the protein. The
information is presented as columns:

.. code:: ipython3

    ! grep "atoms" -A 4 topol_Protein_chain_A.itp

The interpretation of this information is as follows:

+-----------------------------------+-----------------------------------+
| Field                             | description                       |
+===================================+===================================+
| nr                                | Atom number                       |
+-----------------------------------+-----------------------------------+
| type                              | Atom type                         |
+-----------------------------------+-----------------------------------+
| resnr                             | Amino acid residue number         |
+-----------------------------------+-----------------------------------+
| residue                           | The amino acid residue name- Note |
|                                   | that this may be different from   |
|                                   | .rtp entry.                       |
+-----------------------------------+-----------------------------------+
| atom                              | Atom name                         |
+-----------------------------------+-----------------------------------+
| cgnr                              | Charge group number - Not used    |
|                                   | anymore                           |
+-----------------------------------+-----------------------------------+
| charge                            | Self-explanatory - The “qtot”     |
|                                   | descriptor is a running total of  |
|                                   | the charge on the molecule        |
+-----------------------------------+-----------------------------------+
| mass                              | Also self-explanatory             |
+-----------------------------------+-----------------------------------+
| typeB, chargeB, massB             | Used for free energy perturbation |
|                                   | (not discussed here)              |
+-----------------------------------+-----------------------------------+

Bonds and other interactions
----------------------------

Subsequent sections include ``[ bonds ]``, ``[ pairs ]``,
``[ angles ]``, and ``[ dihedrals ]``. Some of these sections are
self-explanatory (bonds, angles, and dihedrals). More information about
these interactions can be found
`here <http://manual.gromacs.org/current/reference-manual/functions.html>`__
and special 1-4 interactions are included under
`pairs <http://manual.gromacs.org/current/reference-manual/functions/interaction-methods.html#exclusions-and-1-4-interactions>`__.
For formats associated with function types `see topology file
table <http://manual.gromacs.org/documentation/2020-beta1/reference-manual/topologies/topology-file-formats.html>`__
in the GROMACS manual.

Next, we take a look at the different types of interactions listed in
the topology file. For now, we only look at chain A, although the file
for the other chain will look very similar.

Bonded interactions:

.. code:: ipython3

    !grep "bonds" -A 2 topol_Protein_chain_A.itp

Pair interactions

.. code:: ipython3

    !grep "pairs" -A 2  topol_Protein_chain_A.itp

Angle interactions

.. code:: ipython3

    !grep "angles" -A 2 topol_Protein_chain_A.itp

Dihedral interactions

.. code:: ipython3

    !grep "dihedrals" -A 2 topol_Protein_chain_A.itp

In the commands above we only listed one interaction per type. If you
look at the full data file, there will of course be many more
interactions!

Water and position restraints
-----------------------------

The remainder of the file involves including a few other
useful/necessary topologies, starting with position restraints. The
“posre.itp” file was generated by ``gmx pdb2gmx``; it defines a force
constant used to keep atoms in place during equilibration (more on this
later).

.. code:: ipython3

    !grep "posre" topol*.itp

.. code:: ipython3

    !cat posre_Protein_chain_A.itp

Here ends the “Protein_chain_A” moleculetype definition. Then the
“Protein_chain_L” moleculetype definition starts.

The remainder of the topology file is dedicated to defining other
molecules and providing system-level descriptions. The next moleculetype
(by default) is the solvent, in this case TIP3P water. Other typical
choices for water include SPC, SPC/E, and TIP4P. We chose the water
model TIP3P by passing ``-water tip3p`` to ``gmx pdb2gmx`` and we can
see it has been included in the topology file together with a definition
of its positions restraints:

.. code:: ipython3

    !grep "Include water topology" -A 8 topol.top

For an excellent summary of the many different water models, click
`here <http://www1.lsbu.ac.uk/water/water_models.html>`__, but be aware
that not all of these models available within GROMACS.

Ions and other parameters
-------------------------

In the next section of the topology file we find parameters related to
ions:

.. code:: ipython3

    !grep "ions" topol.top

System level definitions
------------------------

Finally come system-level definitions. The ``[ system ]`` directive
gives the name of the system that will be written to output files during
the simulation. The ``[ molecules ]`` directive lists all of the
molecules in the system.

.. code:: ipython3

    !tail -8 topol.top

Some things to note regarding the ``[ molecules ]`` directive:

-  The order of the listed molecules must exactly match the order of the
   molecules in the coordinate file (in this case, .gro).
-  The names listed must match the ``[ moleculetype ]`` name for each
   species, not residue names or anything else.

If you fail to satisfy these requirements at any time, you will get
fatal errors from ``grompp`` (which will be discussed later) about
mismatched names or molecules not being found.

Now that we have examined the contents of a topology file, we can
continue building our system.

Solvating the simulation system
===============================

Now that you are familiar with the contents of the GROMACS topology
file, it is time to continue building our system. In this example, we
are going to be simulating a simple aqueous system. It is possible to
simulate proteins and other molecules in different solvents, provided
that good parameters are available for all species involved.

To solvate our protein, two steps will be required:

-  Define the box dimensions using the
   ```gmx editconf`` <https://manual.gromacs.org/current/onlinehelp/gmx-editconf.html>`__
   tool.
-  Fill the box with water using the
   ```gmx solvate`` <https://manual.gromacs.org/current/onlinehelp/gmx-solvate.html>`__
   tool.

You are now presented with a choice as to how to treat the unit cell.
For the purpose of this tutorial, we will use the rhombic dodecahedron,
as its volume is ~71% of the cubic box of the same periodic distance,
thus saving on the number of water molecules that need to be added to
solvate the protein.

Defining the simulation box
---------------------------

Let’s define the box using ``gmx editconf``:

.. code:: ipython3

    !gmx editconf -f 1fjs_processed.gro -o 1fjs_newbox.gro -c -d 1.0 -bt dodecahedron

The above command centers the protein in the box (``-c``), and places it
at least 1.0 nm from the box edge (``-d 1.0``). The box type is defined
as a rhombic dodecahedron (``-bt dodecahedron``). Have a look at this
`section <http://manual.gromacs.org/current/reference-manual/algorithms/periodic-boundary-conditions.html#pbc>`__
in the manual for more on periodic boundary conditions and
dodecahedrons.

The distance to the edge of the box is an important parameter. A protein
should never interact with its periodic image (minimum image
convention), otherwise the forces calculated will be spurious. The
minimum image convention implies that the distance between two periodic
images of the protein should be larger than the cut-off radius used to
truncate non-bonded interactions. Here we will use a cut-off radius of
1.2 nm (see below). Specifying a solute-box distance of 1.0 nm will mean
that there are at least 2.0 nm between any two periodic images of a
protein. We expect that at that distance all protein-protein
interactions are negligible.

Filling the box with water
--------------------------

.. code:: ipython3

    !gmx solvate -cp 1fjs_newbox.gro -cs spc216.gro -o 1fjs_solv.gro -p topol.top

The configuration of the protein (``-cp``) is contained in the output of
the previous ``gmx editconf`` step, and the configuration of the solvent
(``-cs``) is part of the standard GROMACS installation. We are using
“spc216.gro”, which is a generic equilibrated 3-point solvent model box.
You can use spc216.gro as the solvent configuration for SPC, SPC/E, or
TIP3P water, since they are all three-point water models. The output is
called 1fjs_solv.gro, and we tell solvate the name of the topology file
(“topol.top”) so it can be modified. Note the changes to the
``[ molecules ]`` directive of topol.top:

.. code:: ipython3

    !tail topol.top

``gmx solvate`` kept track of how many water molecules it has added,
which it then writes to your topology. Note that if you use any other
(non-water) solvent,\ ``gmx solvate`` will not make these changes to
your topology! Its compatibility with updating water molecules is
hard-coded. Also note that ``gmx solvate`` *adds* this data to the
topology file - it doesn’t overwrite the topology file like it does for
the coordinate file. If you run ``gmx solvate`` multiple times, you
might therefore get a mismatch between the number of water molecules in
your topology and coordinate file!

Finally, let’s see what our system looks like:

.. code:: ipython3

    view = ng.show_structure_file("1fjs_solv.gro")
    view.add_representation(repr_type='ball+stick', selection='SOL')
    view.camera='orthographic'
    view

We can see that water molecules have been added inside the box we
created previously. Since many visualization programs visualize whole
molecules by default, it looks like the protein sticks out of the box as
it enters its periodic image.

Adding Ions
===========

We now have a solvated system that contains a charged protein. The
output of ``pdb2gmx`` told us that the protein has a net charge of -2e
(based on its amino acid composition). If you missed this information in
the ``pdb2gmx`` output, look at the last line of each [ atoms ]
directive in topology file; it should read “qtot 1.” for chain A and
“qtot -3.” for chain L. Since life does not exist at a net charge, we
must add ions to our system. Further, we aim to approximate
physiological conditions and therefore use a NaCl concentration of 0.15
M.

Preparing the input for “gmx genion”
------------------------------------

The tool for adding ions within GROMACS is called
```gmx genion`` <https://manual.gromacs.org/current/onlinehelp/gmx-genion.html>`__.
What ``gmx genion`` does is read through the topology and replace water
molecules with the ions that the user specifies. The input is called a
run input file, which has an extension of .tpr; this file is produced by
the GROMACS grompp module (GROMACS pre-processor), which will also be
used later when we run our first simulation. ``gmx grompp`` processes
the coordinate file and topology (which describes the molecules) to
generate an atomic-level input (.tpr). The .tpr file contains all the
parameters for all of the atoms in the system.

To produce a .tpr file with
```gmx grompp`` <https://manual.gromacs.org/current/onlinehelp/gmx-grompp.html>`__,
we will need an additional input file, with the extension .mdp
(molecular dynamics parameter file); ``gmx grompp`` will assemble the
parameters specified in the .mdp file with the coordinates and topology
information to generate a .tpr file.

An .mdp file is normally used to run energy minimization or an MD
simulation, but in this case it is simply used to generate an atomic
description of the system. We can proceed with a completely empty .mdp
file in this case since its only role is to create the .tpr file.

.. code:: ipython3

    !touch ions.mdp

Assemble your .tpr file with the following:

.. code:: ipython3

    !gmx grompp -f ions.mdp -c 1fjs_solv.gro -p topol.top -o ions.tpr

Be aware that there are some notes in the output. In all cases other
than generating an output tpr file for ``gmx genion``, these are very
important and should not be ignored. Here, however, we only need the
atomic-level description of our system in the binary file ions.tpr. We
will pass this file to genion:

.. container:: alert alert-block alert-info

   NOTE: Many GROMACS tools promt for input at the command line. As this
   does not play well with the Jupyter notebook setup, we feed the input
   to the command line with a print command and a newline
   :raw-latex:`\n `substituting enter.

.. code:: ipython3

    !printf "SOL\n" | gmx genion -s ions.tpr -o 1fjs_solv_ions.gro -conc 0.15 -p \
    topol.top -pname NA -nname CL -neutral

We chose group “SOL” for embedding ions. You do not want to replace
parts of your protein with ions.

.. container:: alert alert-block alert-info

   NOTE: Make sure to run ``gmx genion`` only once. ``gmx genion`` edits
   the topology “in-place” and does not know if there are already Cl or
   Na ions in the system. If you run it over again, ions will be added
   until there is no water left to replace with ions.

In the ``gmx genion`` command, we provide the structure/state file (-s)
as input, generate a .gro file as output (-o), process the topology (-p)
to reflect the removal of water molecules and addition of ions, define
positive and negative ion names (-pname and -nname, respectively), and
tell ``gmx genion`` to add ions necessary to neutralize the net charge
on the protein by adding the correct number of negative ions (-neutral,
which in this case will add 2 Na+ ions to offset the -2 charge on the
protein). We further use genion to add a specified concentration of ions
in addition to simply neutralizing the system by specifying the -neutral
and -conc options in conjunction. Refer to the genion manual page for
information on how to use these options.

The names of the ions specified with -pname and -nname GROMACS are
standardized and not dependent on the force-field. The specified ion
names are always the elemental symbol in all capital letters, which is
the [ moleculetype ] name that is then written to the topology. Residue
or atom names may or may not append the sign of the charge (+/-),
depending on the force field. Do not use atom or residue names in the
genion command, or you will encounter errors in subsequent steps.

Your [ molecules ] directive should now be similar to

::

   [ molecules ]
   ; Compound      #mols
   Protein_chain_A    1
   Protein_chain_L    1
   SOL           11703
   NA              38
   CL              36

Let’s check the last lines of the topology file

.. code:: ipython3

    !tail -6 topol.top

If you see multiple lines of NA and CL, you most likely ran genion
multiple times. Start over with ``pdb2gmx``.

Finally, let’s take a look at what our system looks like now:

.. code:: ipython3

    view = ng.show_structure_file("1fjs_solv_ions.gro")
    view.add_representation(repr_type='spacefill', selection='NA')
    view.add_representation(repr_type='spacefill', selection='CL')
    view.add_representation(repr_type='ball+stick', selection='SOL')
    view.camera='orthographic'
    view

We can see that some water molecules have been replaced with sodium
(purple) and chloride (green) ions.

Energy minimisation
===================

The solvated, electroneutral system is now assembled. Before we can
begin dynamics, we must ensure that the system has no steric clashes or
inappropriate geometry. The structure is relaxed through a process
called energy minimization (EM).

To perform energy minimization, we are once again going to use
``gmx grompp`` to assemble the structure, topology, and simulation
parameters into a binary input file (.tpr), then we will use GROMACS MD
engine, mdrun, to run the energy minimization.

Assemble the binary input using ``gmx grompp`` using the .mdp parameter
file, as input. The simulation parameter file (.mdp) determines how the
simulation shall be run. Find more information on all the options in the
`manual <http://manual.gromacs.org/documentation/current/user-guide/mdp-options.html>`__
or in the following
`webinar <https://bioexcel.eu/webinar-a-walk-through-simulation-parameter-options-mdp-files-for-gromacs-2019-12-05/>`__.
There are a lot of parameters that can be set, here we ony set the
elemental parametes and leave everything else as default. Let’s have a
look at the input file:

.. code:: ipython3

    !cat input/emin-charmm.mdp

.. code:: ipython3

    !gmx grompp -f input/emin-charmm.mdp -c 1fjs_solv_ions.gro -p topol.top -o em.tpr

Make sure you have been updating your topol.top file when running genbox
and genion, or else you will get lots of nasty error messages (“number
of coordinates in coordinate file does not match topology,” etc).

Once run, we will find the energy-minimized structure in a file called
``em.gro``. Additionally, we will find more information on the run in an
ASCII-text log file of the EM process, ``em.log``, a file for storage of
energy, ``em.edr`` and a binary full-precision trajectory ``em.trr``.

We are now ready to run mdrun to carry out the energy minimisation:

.. code:: ipython3

    !gmx mdrun -v -deffnm em -ntmpi 1 -ntomp 1

The -v flag is for the impatient: it makes ``gmx mdrun`` verbose, such
that it prints its progress to the screen at every step. Never use this
flag if run in background, on a HPC center or on a local cluster - it
prints a lot of unnecessary data to the standard output file. The
-deffnm flag will define the file names of the input and output. So, if
you did not name your grompp output “em.tpr,” you will have to
explicitly specify its name with the ``gmx mdrun`` -s flag. The flag
-ntpmi and -ntomp tell how many MPI theads and OpenMP threads to use.

*Note: if you do not want to wait, but look at some of the results
directly, copy the data from the /reference directory into the current
directory. To do this from within this notebook, remove the comment
characters (#) in the following cell:*

.. code:: ipython3

    ## ONLY execute the lines below if you do not want to run and wait for the simulation to finish
    #!cp reference/em_charmm.edr em.edr
    #!cp reference/em_charmm.gro em.gro

Determining if the run was successful
-------------------------------------

There are two very important factors to evaluate whether the EM was
successful or not.

The first is the potential energy (printed at the end of the EM process,
even without -v). Epot should be negative and (for a simple protein in
water) on the order of 100000 kJ/mol, depending on the system size and
number of water molecules.

The second important property is the maximum force, Fmax, the target for
which was set in minim.mdp - “emtol = 1000.0” - indicating a target Fmax
no greater than 1000 kJ/(mol nm). It is possible to arrive at a
reasonable Epot with Fmax > emtol. If this happens, your system may not
be stable enough for simulation. Then, you will need to identify the
underlying cause and perhaps change your minimization parameters
(integrator, emstep, etc).

Analysing the run results
-------------------------

Let’s do a bit of analysis. The em.edr file contains all of the energy
terms that GROMACS collects during EM. You can analyze any .edr file
using the GROMACS energy module.

To analyse or visualize simulation data in Python or Jupyter notebooks,
we can output a simplified xvg format from gmx-analysis tools with the
option ``-xvg none``

.. code:: ipython3

    !printf "Potential\n0\n" | gmx energy -f em.edr -o potential.xvg -xvg none

You will be shown the average of Epot, and a file called “potential.xvg”
will be written. To plot this data file, you can use the script below.
The resulting plot should show a steady convergence of Epot.

.. code:: ipython3

    df = pd.read_csv('potential.xvg', sep='\\s+', header=None, names=['step','energy'])
    df.plot('step')

Alternatively, you can plot the data using the Xmgrace plotting tool,
which might be more conventient if you work directly in the terminal. To
use Xmgrace from this notebook, remove the comment character (#) in the
following cells:

First, generate the xvg file:

.. code:: ipython3

    #!printf "Potential\n0\n" | gmx energy -f em.edr -o potential.xvg

Plot with Xmgrace:

.. code:: ipython3

    #!xmgrace potential.xvg

Now that our system is at an energy minimum, we can begin real dynamics.

Position restraints
===================

EM reduced the largest fores by slightly moving and rotating atoms, but
the system as a whole is by no means at its equilibrium state yet.
Before we begin real dynamics, we must therefore equilibrate the solvent
and ions around the protein. If we were to attempt unrestrained dynamics
at this point, the system could collapse. When we used ``gmx solvate``,
water molecules were placed in semi-ordered fashion (look at the
visualization after we filled the box with water) and ions were placed
by replacing some of these waters. This semi-ordered configuration is
not something that would be represented by the force field, so we need
to let the molecules move a bit to enter into a such a configuration.

Remember that posre.itp file which ``pdb2gmx`` generated a long time
ago? We’re going to use it now. The purpose of posre.itp is to apply a
position-restraining force on the heavy atoms in the protein (anything
that is not a hydrogen). Movement is permitted, but only after
overcoming a substantial energy penalty. The advantage of position
restraints is that they allow us to relax our solvent and ions around
our protein, without letting the protein move. The origin of the
position restraints (the coordinates at which the restraint potential is
zero) is provided via a coordinate file passed via the -r option of
``grompp``.

To use position restraints we need to add "define = -DPOSRES" to the
simulation parameter file, .mdp, (more details on the simulation
parameters below). Have a look at the .mdp file for this run:

.. code:: ipython3

    !head -5 input/nvt-charmm.mdp 

When using position restraints, a file with restraint coordinates must
be supplied to ``gmx grompp`` through the -r option (see below). It can
be the same file as supplied for -c. 

Equilibration run - temperature
===============================

Equilibration is often conducted in two phases. The first phase is
conducted under an NVT ensemble (constant Number of particles, Volume,
and Temperature). This ensemble is also referred to as
“isothermal-isochoric” or “canonical.” The timeframe for such a
procedure is dependent upon the contents of the system. Typically,
100-200 ps should suffice, and we will conduct a 100-ps NVT
equilibration for this exercise. Depending on your machine, this may
take a while (just under an hour if run in parallel on 16 cores or so).

The point of the NVT equilibration is get our system to the temperature
we wish to simulate by applying a thermostat, which usually works by
modifying the atom velocities according to the kinetic energy
ditributions predicted at the given temperature. Another point, which we
discussed in the previous section, is to let the solvent molecules move
and orient themselves around the protein surface, potentially entering
pockets or crevices. Depending on the protein structure, this aspect can
greatly influence how much equilibration time is needed.

To run the equilibration, we will call ``grompp`` and ``mdrun`` as in
the EM step, but this time using the energy minimised structure as input
and a different .mdp parameter file. Let’s look at the mdp file for this
run:

.. code:: ipython3

    !cat input/nvt-charmm.mdp

Compared to the mdp file used for energy minimisation, we replace the
energy tolerance with a timestep size and a total number of steps to
take (which gives us a 2 fs timestep \* 50000 steps = 100 ps long
simulation). We also need to set a temperature. Take note of a few
parameters in the .mdp file:

-  ``gen_vel = yes``: Initiates velocity generation. Using different
   random seeds (gen_seed) gives different initial velocities, and thus
   multiple (different) simulations can be conducted from the same
   starting structure.
-  ``tcoupl = V-rescale``: The velocity rescaling thermostat is an
   improvement upon the Berendsen weak coupling method, which did not
   reproduce a correct kinetic ensemble.
-  ``pcoupl = no``: Pressure coupling is not applied.

A full explanation of the parameters used can be found in the GROMACS
`manual <http://manual.gromacs.org/documentation/current/user-guide/mdp-options.html>`__.

Now, we’re ready to run:

.. code:: ipython3

    !gmx grompp -f input/nvt-charmm.mdp -c em.gro -r em.gro -p topol.top -o nvt.tpr 
    !gmx mdrun -ntmpi 1 -ntomp 8 -v -deffnm nvt

Your computer should now be working at full speed to finish this
simulation; it should be done in 5 to 10 minutes.

*Note: if you do not want to wait, but look at some of the results
directly, copy the data from the /reference directory into the current
directory. To do this from within this notebook, remove the comment
characters (#) in the following cell*

.. code:: ipython3

    ## ONLY execute the lines below if you do not want to run and wait for the simulation to finish
    #!cp reference/nvt_charmm.edr nvt.edr
    #!cp reference/nvt_charmm.gro nvt.gro
    #!cp reference/nvt_charmm.cpt nvt.cpt

Let’s analyze the temperature as a function of time, again using
``gmx energy``:

.. code:: ipython3

    !echo "Temperature" | gmx energy -f nvt.edr -o temperature.xvg -xvg none -b 20

.. code:: ipython3

    df = pd.read_csv('temperature.xvg', sep='\\s+', header=None, names=['time','temperature'])
    df.plot('time')

Or for visualization with Xmgrace, remove the comment characters (#) in
the following cells:

.. code:: ipython3

    #!echo "Temperature" | gmx energy -f nvt.edr -o temperature.xvg -b 20

.. code:: ipython3

    #!xmgrace temperature.xvg

From the plot, it is clear that the temperature of the system quickly
reaches the target value (300 K), and remains stable over the remainder
of the equilibration. For this system, an equilibration period (on the
order of 50 ps) may be adequate.

Equilibration run - pressure
============================

In the previous step, NVT equilibration stabilized the temperature of
the system. Prior to data collection, we must also stabilize the
pressure (and thus also the density) of the system. Equilibration of
pressure is conducted under an NPT ensemble, wherein the Number of
particles, Pressure, and Temperature are held constant. The ensemble is
also called the “isothermal-isobaric” ensemble, and most closely
resembles experimental conditions.

The .mdp file used for a 100-ps NPT equilibration can be found in the
input file folder. It is not drastically different from the parameter
file used for NVT equilibration, but note the addition of the pressure
coupling section. We will use the `stochastic cell
rescaling <https://manual.gromacs.org/current/reference-manual/algorithms/molecular-dynamics.html#stochastic-cell-rescaling>`__
as barostat. The stochastic cell rescaling algorithm is a variant of the
Berendsen algorithm that allows correct fluctuations to be sampled.

Let’s have a look at the mdp file:

.. code:: ipython3

    !cat input/npt-charmm.mdp

We made a few more changes compared to the NVT run:

-  ``continuation = yes``: We are continuing the simulation from the NVT
   equilibration phase
-  ``gen_vel = no``: Velocities are read from the trajectory (see below)

We will call ``grompp`` and ``mdrun`` just as we did for the NVT
equilibration. Note that we are now using the ``-t`` flag to include the
checkpoint file outputted from the NVT equilibration. This file contains
all the necessary state variables to continue our simulation from a
previous one. To conserve the velocities produced during the NVT run, we
must include the final coordinate file outputted from the the NVT
simulation, using the ``-c`` option.

.. code:: ipython3

    !gmx grompp -f input/npt-charmm.mdp -c nvt.gro -r nvt.gro -t nvt.cpt -p topol.top -o npt.tpr

Now, we’ll start the NPT MD simulation - this one will take a few
minutes.

.. code:: ipython3

    !gmx mdrun -ntmpi 1 -ntomp 8 -v -deffnm npt

*Note: If the simulation takes too long to run for this tutorial, you
may use the pre-computed data by copying it to the current directory
(just remove the comment characters, #, below). But remember that you
can always check/analyse the output files while the simulation is still
running.*

.. code:: ipython3

    ##ONLY execute the lines below if you do not want to run and wait for the simulation to finish
    #!cp reference/npt_charmm.edr npt.edr
    #!cp reference/npt_charmm.gro npt.gro
    #!cp reference/npt_charmm.cpt npt.cpt

Let’s analyze the pressure as a function of time, again using
``gmx energy``:

.. code:: ipython3

    !echo "Pressure" | gmx energy -f npt.edr -o pressure.xvg -xvg none

.. code:: ipython3

    df = pd.read_csv('pressure.xvg', sep='\\s+', header=None, names=['time','pressure'])
    df.plot('time')

Alternatively, for visualization using Xmgrace, remove the comment
characters (#) in the following cells:

.. code:: ipython3

    #!echo "Pressure" | gmx energy -f npt.edr -o pressure.xvg

.. code:: ipython3

    #!xmgrace pressure.xvg

The pressure value fluctuates widely over the course of the 100-ps
equilibration phase, but this behavior is not unexpected. The reference
pressure was set to 1 bar, so is this outcome acceptable? Pressure is a
quantity that fluctuates widely over the course of an MD simulation, as
is clear from the large root-mean-square fluctuation (in the order of
100 bar), so statistically speaking, one cannot distinguish a difference
between the obtained average and the target/reference value (1 bar). The
important point is that the average pressure remains on the same order
of magnutude as the target pressure.

Let’s take a look at density using ``gmx energy``:

.. code:: ipython3

    !echo "Density" | gmx energy -f npt.edr -o density.xvg -xvg none

.. code:: ipython3

    df = pd.read_csv('density.xvg', sep='\\s+', header=None, names=['time','density'])
    df.plot('time')

Alternatively, for visualization using Xmgrace, remove the comment
characters (#) in the following cells:

.. code:: ipython3

    #!echo "Density" | gmx energy -f npt.edr -o density.xvg

.. code:: ipython3

    #!xmgrace density.xvg

The average value is close to the experimental value of 1000 kg m-3 and
the expected density of the TIP3P model of 1001 kg m-3. The parameters
for the TIP3P water model closely replicate experimental values for
water. However, since we also have a protein in the box we can expect
the density to be a little bit higher than that of pure water, which is
also what we see from our plot. The density values are very stable over
time, indicating that the system is now well-equilibrated with respect
to pressure and density.

.. container:: alert alert-block alert-info

   NOTE: Pressure-related terms are slow to converge, so a longer NPT
   equilibration might be needed for other systems.

The “production” run
====================

Upon completion of the two equilibration phases, the system is now
well-equilibrated at the desired temperature and pressure. We are now
ready to release the position restraints of the protein atoms and launch
our production MD simulation for data collection. The process is just
like we have seen before, but we will now continue the simulation from
the checkpoint file ouputted from the NPT run. We will now run a 1-ns MD
simulation. In the mdp-file, we have added a section that controls the
output frequency for data in the log file (.log), energy file (.edr),
trajcotry file (.trr), and compressed trajectory file (.xtc). Writing
data too frequently will lead to significantly longer run times and
larger output files, while too infrequent writes will result in loss of
detail in the final trajectories. It is thus a good idea to think about
how much time resolution your planned analysis will require before
launching the simulation.

Let’s take a look at the mdp file for the production run:

.. code:: ipython3

    !cat input/md-charmm.mdp

Here, we use velocity-rescaling temperature coupling as thermostat and
stochastic cell rescaling as barostat. A full explanation of the
available thermostats and barostats in GROMACS can be found in the
manual (see
`here <http://manual.gromacs.org/documentation/current/reference-manual/algorithms/molecular-dynamics.html#temperature-coupling>`__
for thermostats and
`here <http://manual.gromacs.org/documentation/current/reference-manual/algorithms/molecular-dynamics.html#pressure-coupling>`__
for barostats).

Note that for thermostats and barostats, we need to compute the
temperature or pressure of the system. This requires global
communication and is currently not done on the GPU. To reduce the
computational cost,
`nsttcouple <https://manual.gromacs.org/current/user-guide/mdp-options.html#mdp-nsttcouple>`__
and
`nstpcouple <https://manual.gromacs.org/current/user-guide/mdp-options.html#mdp-nstpcouple>`__
(frequency for coupling temperature and pressure) are set to 100 by
default in GROMACS 2023 (see the webinar `What’s new in
GROMACS2023 <https://www.gromacs.org/tutorial_webinar.html>`__. The
recommendation is to use tau_t = 1 ps for V-rescale and tau_p = 5 ps for
C-rescale.

``gmx grompp`` will print an estimate for generated data and PME load.
The PME load number tells us how many processors should be dedicated to
the PME calculation, and how many for the PP (everything except for PME)
calculations. Refer to the
`manual <http://manual.gromacs.org/documentation/current/user-guide/mdrun-performance.html#parallelization-schemes>`__
for details.

.. code:: ipython3

    !gmx grompp -f input/md-charmm.mdp -c npt.gro -t npt.cpt -p topol.top -o md.tpr

.. code:: ipython3

    !gmx mdrun -ntmpi 1 -ntomp 8 -v -deffnm md

As before, this run will take some time - probably more than planned for
going through this tutorial. The flag -ntpmi provides the number of
thread-MPI ranks to start and -ntomp the number of OpenMP threads per
MPI rank to start. The optimal values for those flag depends on the
resource you use to run the simulation. If you want to proceed without
waiting, copy the pre-computed data to your working directory by
removing the comment characters (#):

.. code:: ipython3

    ##ONLY execute the lines below if you do not want to run and wait for the simulation to finish
    !cp reference/md_charmm.log md.log
    !cp reference/md_charmm.edr md.edr
    !cp reference/md_charmm.gro md.gro
    !cp reference/md_charmm.xtc md.xtc

Analysis
========

Now that we have a simulation of our protein, we can start analyzing it!
What kind of data is important? This is a relevant question to ask
before launching the simulation, so you should have some ideas about the
types of data you will want to collect from your simulation beforehand.
In this tutorial, we will introduce a few basic GROMACS tools for
analyzing simulations, but there are of course many more out there.

```gmx trjconv`` <https://manual.gromacs.org/current/onlinehelp/gmx-trjconv.html>`__
is a useful trajectory post-processing tool that can strip out
coordinates, correct for periodicity, or manually alter the trajectory
in various ways (setting time units, altering frame frequency, etc).
Here you can find a suggested
`workflow <https://manual.gromacs.org/2021/user-guide/terminology.html?highlight=periodic%20boundary>`__
for processing trajectories with ``gmx trjconv``.

There are many options to use with ``gmx trjconv``, let’s list them by
entering ``-h``:

.. code:: ipython3

    !gmx trjconv -h

In this exercise, we will use the ``-center`` and ``-pbc`` options to
account for any periodicity in the system. During the simulation, the
protein will diffuse across the unit cell, and may appear “broken” or
“jump” from one side of the simulation box to the other. To center the
protein in the box, we run the following:

.. code:: ipython3

    !printf "1\n1\n" | gmx trjconv -s md.tpr -f md.xtc -o md_center.xtc -center -pbc mol

Select 1 (“Protein”) as the group to be centered and 1 (“Protein”) for
output. We will conduct all our analyses on this “corrected” trajectory.
Now, let’s look at the post-processed trajectory using NGLview:

.. code:: ipython3

    traj = md.load("md_center.xtc", top="1fjs_newbox.gro")
    view = ng.show_mdtraj(traj)
    view

Alternatively, remove the comment character (#) to use VMD to visualize
the simulation:

.. code:: ipython3

    #!vmd 1fjs_newbox.gro md_center.xtc

Check the minimum image convention
----------------------------------

Next, we will calculate the distance between the protein and its
periodic image. The protein should never interact with its periodic
image (minimum image convention), or else the forces calculated will be
spurious. We will now use the tool
```gmx mindist`` <https://manual.gromacs.org/current/onlinehelp/gmx-mindist.html>`__
with the option ``-pi`` to validate that the distance is large enough:

.. code:: ipython3

    !printf "1\n" | gmx mindist -s md.tpr -f md_center.xtc -pi -od mindist.xvg 
    #!xmgrace mindist.xvg

The distance between the protein and its periodic image should not be
smaller than the cut-off used to describe non-bonded interactions.

Evaluate structural stability with RMSD
---------------------------------------

Now, let’s look at structural stability. GROMACS has a built-in tool for
RMSD calculations called
```gmx rms`` <https://manual.gromacs.org/current/onlinehelp/gmx-rms.html>`__.
To calculate the RMSD relative to the crystal structure, we can run the
following command:

.. code:: ipython3

    !printf "4\n1\n" | gmx rms -s em.tpr -f md_center.xtc -o rmsd_xray.xvg -tu ns -xvg none

.. container:: alert alert-block alert-info

   NOTE: It is important that the reference structure has the same
   number of atoms as the trajectory file, or otherwise the calculation
   will fail. This can be easy to miss when using experimental
   structures rather than simulation frames as reference.

Here, we chose 4 (“Backbone”) as the group for both the least-squares
fit and RMSD calculation. The ``-tu`` flag allows us to set the time
unit of the output and we chose ns to get nicer numbers as the original
trajectory output was written in ps. Setting the time unit is
particularly useful when dealing with longer simulations. The output
plot will show the RMSD relative to the structure from the original pdb
file:

.. code:: ipython3

    df = pd.read_csv('rmsd_xray.xvg', sep='\\s+', header=None, names=['time','RMSD'])
    df.plot('time')

Or for visualization using Xmgrace, remove the comment characters (#) in
the following cells:

.. code:: ipython3

    #!printf "4\n1\n" | gmx rms -s em.tpr -f md_center.xtc -o rmsd_xray.xvg -tu ns

.. code:: ipython3

    #!xmgrace rmsd_xray.xvg 

The plot shows that the RMSD levels off around 0.15 nm (1.5 Å),
indicating that the simulated structure is stable.

Measure compactness with radius of gyration
-------------------------------------------

The radius of gyration (Rg) of a protein is a measure of its
compactness. If a protein is stably folded, it will likely maintain a
relatively steady value of Rg. If a protein unfolds, its Rg will change
over time. Let’s analyze the radius of gyration for the protein in our
simulation using
```gmx gyrate`` <https://manual.gromacs.org/current/onlinehelp/gmx-gyrate.html>`__:

.. code:: ipython3

    !echo "1" | gmx gyrate -f md_center.xtc -s md.tpr -o gyrate.xvg -xvg none

.. code:: ipython3

    df = pd.read_csv('gyrate.xvg', sep='\\s+', header=None, names=['time','Rg'], usecols=[0, 1])
    df.plot('time')

Or for visualization using Xmgrace, remove the comment characters (#) in
the following cells:

.. code:: ipython3

    #!echo "1" | gmx gyrate -f md_center.xtc -s md.tpr -o gyrate.xvg 

.. code:: ipython3

    #!xmgrace gyrate.xvg

The reasonably invariant Rg values indicates that the protein remains
very stable in its compact (folded) form over the course of the 1 ns
simulation. This result is not unexpected, but illustrates an advanced
capacity of GROMACS analysis that comes built-in.

Index files for more specific atom selection
--------------------------------------------

So far, we have focused our analysis on the whole protein, but if we
wish to select specific sets of atoms for analysis we need to create
index groups. Common selections such as the system, protein, solvent,
ions etc. are provided as default options through promps in most GROMACS
tools, but for more complex selections we need to use
``gmx make_ndx``\ or the newer ``gmx_select`` to create an index (.ndx)
file.

First, we will use ``gmx make_ndx`` to split the protein into its two
chains. This tool has its own atom selection language which can be
listed through typing ``h`` into the promt. The command ``q`` will close
the tool:

.. code:: ipython3

    !printf "h\nq\n" | gmx make_ndx -f nvt.tpr -o

To split the protein into chains we will use the selection
``splitch 1``, where ``1`` represents the protein:

.. code:: ipython3

    !printf "splitch 1\nq\n" | gmx make_ndx -f nvt.tpr -o chains_make_ndx.ndx

An alternative way to create index groups is to use the newer tool
``gmx select``, which allows for dynamic selections in addition to
almost all the functionality of ``gmx make_ndx``. The selection syntax
is a bit different compared to that of ``gmx make_ndx``, but a detailed
guide is available in the `GROMACS
manual <https://manual.gromacs.org/current/onlinehelp/selections.html>`__.

Now, we will create the index file defining the two chains using
``gmx select``.

Note that a “chain” selection is available in ``gmx select``, but can
only be applied if chain identifiers exist in the topology file. In most
gro or tpr files the chain information is lost, so we select the
separate protein molecules instead:

.. code:: ipython3

    !printf "group "Protein" and mol 1\ngroup "Protein" and mol 2" | gmx select -s nvt.tpr -on chains_select.ndx

Now, we can calculate the number of hydrogen bonds between the two
protein chains using the tool ``gmx hbond`` (see the
`manual <https://manual.gromacs.org/current/onlinehelp/gmx-hbond.html>`__).
The ``-num`` option provides the number of hydrogen bonds as a function
of time. We can use the index file obtained from ``gmx make_ndx`` to
specify the chains Then we use the index file obtained from
``gmx select`` to specify the chains:

.. code:: ipython3

    !printf "17\n18\n"| gmx hbond -f md.xtc -s md.tpr  -n chains_make_ndx.ndx -num hbnum_ndx.xvg -xvg none

Or we use the index file obtained from ``gmx select`` to specify the
chains:

.. code:: ipython3

    !printf "0\n1\n"| gmx hbond -f md.xtc -s md.tpr -n chains_select.ndx -num hbnum.xvg -xvg none

Now we plot both graph

.. code:: ipython3

    df = pd.read_csv('hbnum_ndx.xvg', sep='\\s+', header=None, names=['time','H-bonds'], usecols=[0, 1])
    df.plot('time')
    df = pd.read_csv('hbnum.xvg', sep='\\s+', header=None, names=['time','H-bonds'], usecols=[0, 1])
    df.plot('time')


Or for visualization using Xmgrace, remove the comment characters (#) in
the following cells:

.. code:: ipython3

    #!printf "17\n18\n"| gmx hbond -f md.xtc -s md.tpr -n chains_make_ndx.ndx -num hnum_xvg.xvg

.. code:: ipython3

    #!xmgrace hbnum_xvg.xvg

Report methods
==============

Now that we have run the simulation, it might be useful report or recall
the settings used to produce it, e.g. for publication. The tool
``gmx report-methods`` (see the
`manual <https://manual.gromacs.org/current/onlinehelp/gmx-report-methods.html>`__)
can be useful for this purpose. ``gmx report-methods`` prints out basic
system information on a performed run. It can also provide an
unformatted text (with the option ``-o``) or a LaTeX-formatted output
file with through the ``-m`` option.

Let’s try it:

.. code:: ipython3

    !gmx report-methods -s md.tpr

Improve performance with longer timesteps
=========================================

In this tutorial, we have used a time step of 2 fs which is necessary to
properly resolve the fastest motions in the simulation, primarily
originating from hydrogen atoms. The need for a short timestep
constitutes a major challenge in MD simulations since the timescales of
interest often are orders of magnitude longer, leading to a high
computational cost of typical simulations. There are a few different
algorithms in GROMACS that can allow for a longer timestep and thus
“faster simulations”, but beware that they always come with some
trade-offs to the dynamics.

In this section, we will discuss and provide mdp files for two methods:
\* Mass repartitioning \* Multiple time-stepping

If you want to test these methods yourself, just replace the mdp file in
the ``gmx grompp`` command for the production MD run with the updates
presented in this section.

First, let’s discuss **mass repartitioning**. The general idea is to
increase the mass of lighter atoms to slow down the fastest fluctuations
in the system. This way, we can use a longer timestep to resolve all
motions. For typical atomistic systems only the masses of hydrogen atoms
are scaled.

Let’s take a look at the mdp file for running with mass repartitioning:

.. code:: ipython3

    !cat input/md-mass-charmm.mdp

Compared to the mdp file used previously in this tutorial, we have added
the ``mass-repartition-factor = 3`` and doubled the timestep to 4 fs
(``dt = 0.004``). This means we scale the lightest masses by a factor 3
by moving this extra mass from the heavier atom it is bound to. Exactly
which numbers to use depends on the force field, but the settings used
here usually works well for most systems with constrained hydrogen bonds
(note the line ``constraints = h-bonds``).

.. code:: ipython3

    !gmx grompp -f input/md-mass-charmm.mdp -c npt.gro -t npt.cpt -p topol.top -o md-mass.tpr

.. code:: ipython3

    !gmx mdrun -v -deffnm md-mass -ntmpi 1 -ntomp 8

Next, we will take a look at the **multiple time-stepping** algorithm.
The idea is to compute some types of forces less frequently than than
every timestep, thereby reducing the computational cost. Exactly which
combinations of forces and timesteps to use is still an area under
investigation and also depends on the rest of the force field.
Therefore, it is always a good idea to be conservative and to assess
stability and energy drifts when using this method.

Let’s take a look at the mdp file when using multiple time-stepping:

.. code:: ipython3

    !cat input/md-mts-charmm.mdp

We can find the mdp options related to multiple time-stepping at the
bottom of the file, all with the prefix “mts”. First, turn on multiple
time-stepping with ``mts = yes``. In theory, it is possible to use
multiple time-stepping on more than two levels, but only two level are
currently implemented in GROMACS, so we use only ``mts-levels = 2``.
Then, we specify which types of forces to calculate less frequently (on
the second mts-level) through the ``mts-level2-forces``.
``mts-level2-forces`` supports the following entries
longrange-nonbonded, nonbonded, pair, dihedral, angle, pull and awh.
Note longrange-nonbonded can not be omitted

Here, we select long-range non-bonded interactions. The
``mts-level2-factor`` specifies the interval for computing the forces in
the second level. We set it to 2, which means we calculate these forces
half as often. Note the mdp parameter ``nstlist`` should be a multiple
of ``mts-level2-factor``.

.. code:: ipython3

    !gmx grompp -f input/md-mts-charmm.mdp -c npt.gro -t npt.cpt -p topol.top -o md-mts-charmm.tpr
    !gmx mdrun -v -deffnm md-mts-charmm -ntmpi 1 -ntomp 8

If we run simulations with either hydrogen mass repartitioning or
multiple time-stepping we get the following performance improvements on
8 CPU (ADM EPYC-Milan):

================================= ====================
Simulation                        Performance (ns/day)
================================= ====================
Regular MD                        26.9
Mass repartitioning               52.3
Multiple time-stepping            
``longrange-nonbonded``           28.3
``nonbonded longrange-nonbonded`` 50.8
================================= ====================

For multiple time-stepping we report results only for two combinations
``longrange-nonbonded`` and ``nonbonded longrange-nonbonded``. To get
the best combination for a system, different combinations of entries for
multiple time-stepping can test. The setting is also force field
dependent.

In conclusion, we can get performance improvements by using mass
repartitioning and multiple time-stepping, but keep in mind they also
slightly alter the dynamics of the system.

Do you have any questions? Have a look at the user discussions on the `GROMACS forum <htttp://forums.gromacs.org>`__
--------------------------------------------------------------------------------------------------------------------