.. code:: ipython3

    %run src/init_notebooks.py
    #check_notebook()
    hide_toggle()

Density-fit simulation
======================

Authors : Tatiana Shugaeva, Nandan Haloi, John Cowgill and Alessandra
Villa

Goal : learn step-by-step how to run a density-fit simulation using
GROMACS

Time : 60 minutes

Software : GROMACS 2024, python modules: nglviewer, MDAnalysis, pandas,
matplotlib.

Source: `GROMACS tutorials
@gromacs.org <https://tutorials.gromacs.org/>`__

version : release `doi:10.5281/zenodo.13938906 <https://zenodo.org/records/13938906>`__

Introduction to density fitting simulations
-------------------------------------------

Recent advances in single-particle cryoelectron microscopy (cryo-EM)
have enabled us to obtain near-atomic resolution structures of
challenging protein targets including large protein complexes and
membrane proteins. However, in many cases, one often ends up in
low-resolution densities that are not sufficient to confidently build
models. This mainly occurs due to the inherent dynamics of the protein
or bound-ligands such as lipids, small molecules; and the averaging of
each EM micrograph during 3-dimensional density construction leads to
low resolution averaged map for these dynamic parts.

Several molecular dynamics (MD) simulations based methods have been
developed to refine all-atom models to EM maps, e.g. correlation-driven
molecular dynamics (CDMD) simulations that uses EM density maps to add
potential to guide an atomic model to the target [Igaev et al. eLife
2019;8:e43542], molecular dynamic flexible fitting (MDFF) that applies
forces proportional to the gradient of the EM density map [Singharoy et
al. eLife 2016;5:e16105].

Here, we use the `GROMACS <http:\www.gromacs.org>`__ to perform
density-fit (or density-guided) simulations. In density-guided
simulations, additional forces are applied to atoms that depend on the
gradient of similarity between a simulated density and a reference
density. By applying these forces protein structures can be “fitted” to
densities coming from, e.g., cryo electron-microscopy (see details in
the `GROMACS
manual <https://manual.gromacs.org/current/reference-manual/special/density-guided-simulation.html>`__.
The implemented approach extends the ones described in `Orzechowski et
al. <https://doi.org/10.1529/biophysj.108.139451>`__ and `Igaev et
al.} <https://doi.org/10.7554/eLife.43542>`__

More on the density guided approach implemented in GROMACS and a
recently developed tool within GROMACS that can fit structures via
Bayes’ approach can be found in `Blau et al. bioRxiv
2022 <https://doi.org/10.1101/2022.09.30.510249>`__.

Preparations to run this notebook
---------------------------------

First of all, we need to install a tool that allow us to automatically
align density to the system coordinates:
`rigidbodyfit <https://pypi.org/project/rigidbodyfit/>`__. We will use
this tool in the pre-processing part.

.. code:: ipython3

    ! pip install rigidbodyfit

Then we import the follow post-processing tools from MDAnalysis

.. code:: ipython3

    import warnings
    # suppress some MDAnalysis warnings about PSF files
    warnings.warn('ignore')
    import nglview as ng
    import MDAnalysis as mda
    from MDAnalysis.analysis import rms
    from MDAnalysis.analysis import align
    from MDAnalysis.analysis.rms import RMSF
    import pandas as pd
    import matplotlib.pyplot as plt

Finally, we move to the working directory ``data``

.. code:: ipython3

    # Change to the data directory
    # Note that executing this command twice will result in an error you can ignore
    %cd data

Molecular system description
----------------------------

In this tutorial, we will use density fitting to fit the structure of
the `G-protein coupled calcitonin receptor-like receptor
(CLR) <https://en.wikipedia.org/wiki/Calcitonin_gene-related_peptide>`__
to its cryo-EM density. The receptor has two well-defined states, the
main difference between them is in the position of several transmembrane
alpha-helices. We will take the density file of the active state of the
receptor (EMD-20906) and fit an atomistic model of the protein into the
receptor’s density.

Note that any type of 3D structure in PDB format can be used as an input
for this procedure, ranging from homology models to experimentally
derived models or outcomes from neural network-based conformation
predictions. In this tutorial we will use a model generated by
AlphaFold2 for the fitting. Below, you can take a look at it:

.. code:: ipython3

    ## Initial structure of CLR generated by AlphaFold2
    view = ng.NGLWidget()
    view.add_component("input/CLR_initial_model.pdb")
    view
    #Click and drag to rotate, zoom with your mouseweel 
    #For more information on this viewer have a look at https://github.com/nglviewer/nglview

The density file ``EMD-20906`` was downloaded from `Electron Microscopy
Data Bank <https://www.ebi.ac.uk/emdb/EMD-20906>`__. The original file
contains electron density for a big multisubunit protein complex.
`Chimera <https://www.cgl.ucsf.edu/chimera/>`__ was used to extract and
save the density corresponding to CLR in a separated file. A `tutorial
on density extraction using
Chimera <https://youtu.be/oEtLkbApQmk?si=OpkhOs8K-zTXxSAk>`__ is
available on YouTube for further details. Below, you can see the
resulting density file:

.. code:: ipython3

    view = ng.NGLWidget()
    view.add_component('input/CLR_active_state_map_rotated.mrc')
    view

System preparation
------------------

Now we prepare the system for the simulation. First of all we need to
generate the structural and topology file to energy minimize the
structure before performing density-fit simulation. We start by
generating the topology file for the receptor using GROMACS
pre-processor tool ``pdb2gmx`` and ``CLR_initial_model.pdb`` file as
input. You find the file in the directory ``input``. In this tutorial we
use CHARMM36 force field. The force field is not distributed with
GROMACS and be downloaded . Force field directory
``charmm36-jul2022.ff`` can be found in the working directory and is
automatically read by the pre-processing tool.

All needed commands are listed below, for more detaild explanation of
the commands please refer to GROMACS tutorial: `Introduction to
Molecular
Dynamics <https://tutorials.gromacs.org/md-intro-tutorial.html>`__.

.. code:: ipython3

    # prepare .gro file and topology file.
    !gmx pdb2gmx -f input/CLR_initial_model.pdb -o model.gro -water tip3p -ff "charmm36-jul2022"

.. code:: ipython3

    # prepare the box
    !gmx editconf -f model.gro -o model_box.gro -c -d 1.0 -bt cubic

For tutorial purpose, the simulations are performed in water solution,
even if CLR is a membrane protein. Note we can still obtain accurate
results by using simulations in water (`Linnea Yvonnesdotter et
al. <https://doi.org/10.1016/j.bpj.2023.05.033>`__).

.. code:: ipython3

    # add water molecules
    !gmx solvate -cp model_box.gro -o model_solv.gro -p topol.top

Now we add ions to neutralize the system’s charge. We achieve this using
``-neutral`` option in ``gmx genion``.

.. code:: ipython3

    #adding ions
    !touch ions.mdp
    !gmx grompp -f ions.mdp -c model_solv.gro -p topol.top -o solv.tpr

.. code:: ipython3

    !echo 13 | gmx genion -s solv.tpr -o model_solv.gro -p topol.top -pname NA -nname CL -neutral  

After setting up the complete system, the next steps are energy
minimization and equilibration. For guidance, you can refer to the
`Introduction to Molecular
Dynamics <https://tutorials.gromacs.org/md-intro-tutorial.html>`__
tutorial, which wementioned earlier. You can practice system preparation
using the commands from the reference tutorial or proceed directly with
the provided ``start.gro`` file.

.. code:: ipython3

    # Please remove the comment character (#) to execute the command below. 
    # It copies prepared file start.gro from the input folder.
    #!cp input/start.gro ./

.. code:: ipython3

    view = ng.NGLWidget()
    view.add_component('start.gro')
    view.add_representation("ball+stick",selection="NA or CL")
    ## water
    view.add_representation("licorice",selection="TIP3P", opacity=0.3)
    view

Density alignment
-----------------

To run a density-guided simulation using GROMACS we need not only a
structural file (``model_solv.gro``), a topology file (``topol.top``)
describing the molecular system, but also a file containing information
on the density to map (``CLR_active_state_map_rotated.mrc``) and a
simulation parameter file (``df.mdp``). You can find the two files in
the directory ``input``

If we examine the density, we will notice that it is not aligned with
the protein structure in ``start.gro``:

.. code:: ipython3

    view = ng.NGLWidget()
    view.add_component("start.gro")
    view.add_component('input/CLR_active_state_map_rotated.mrc')
    view

To align the density and the structure, we use the rigidbodyfit script.
Here, there are two important parameters to consider:

1. sampling-depth: Increasing this value will result in a more rigorous
   alignment. Practise indicates ``9`` as a reasonable values. However,
   if you’re dissatisfied with the alignment result, consider increasing
   it to 11-12. Be aware that higher values may significantly extend the
   calculation time.

2. output-transform: This option adds a transformation matrix to the
   output. We will use it in our md parameter file ``df.mdp``.

.. code:: ipython3

    # script works with .pdb file format, so we have to convert prepared .gro to .pdb
    ! gmx editconf -f start.gro -o start.pdb

.. code:: ipython3

    # The rigidbodyfit script will run much faster locally
    ! rigidbodyfit --structure start.pdb --density input/CLR_active_state_map_rotated.mrc --sampling-depth 9 --output-transform


To check if the alignmnt was successfull we check how fitted.pdb looks
compared to the density. The process is stochastic, so if you see a
poorly aligned result, increase the ``--sampling-depth`` value and rerun
the cell above.

.. code:: ipython3

    view = ng.NGLWidget()
    view.add_component("fitted.pdb")
    view.add_component('input/CLR_active_state_map_rotated.mrc')
    view

Most parts of the structure are aligned. Note that some residues
(195-203) are slightly shifted compared to the density, so density
fitting simulation is needed.

Now, we import the information from ``transform.json`` to the df.mdp

.. code:: ipython3

    !cp input/df.mdp .

.. code:: ipython3

    mdp_shift = 'density-guided-simulation-shift-vector                  = '
    mdp_rotate = 'density-guided-simulation-transformation-matrix         = '
    mdp_name = '\ndensity-guided-simulation-reference-density-filename    = CLR_active_state_map_rotated.mrc\n'
    
    with open('transform.json') as f:
        d = json.load(f)
        a = [str(i) for i in d['shift_in_nm']]
        out_str = mdp_shift + ' '.join(a) + '\n'
    
        rot = ''
        for el in d['orientation_matrix']:
            s = [str(i) for i in el]
            rot = rot + ' ' + ' '.join(s)
    
    with open('df.mdp', 'a') as f2:
        f2.write(mdp_name)
        f2.write(out_str)
        f2.write(mdp_rotate+rot + '\n')       


.. code:: ipython3

    !cat df.mdp

More information on the mdp file option for density fit simulations can
be found in `GROMACS user
guide <https://manual.gromacs.org/documentation/current/user-guide/mdp-options.html#density-guided-simulations>`__,
in the `BioExcel
webinar <https://bioexcel.eu/webinar-density-guided-simulations-combining-cryo-em-data-and-molecular-dynamics-simulation-2020-04-28/>`__
given by Christian Blau. Some math behind the method is reported in
`GROMACS
manual <https://manual.gromacs.org/current/reference-manual/special/density-guided-simulation.html>`__.

We check that the density reference file is the working directory.

.. code:: ipython3

    !ls 

If it is not, we copy the reference file in the working directory.

.. code:: ipython3

    !cp input/CLR_active_state_map_rotated.mrc ./

Density guided simulation
-------------------------

Now we are ready to run the simulation.

.. code:: ipython3

    !gmx grompp -f df.mdp -c start.gro -p topol.top -o df.tpr

Check for warnings. If you want for time reason, you can skip the
simulation and copy the output files from the directory reference.

.. code:: ipython3

    # Please remove the comment character (#) to execute the command below. 
    # It takes long to get the result. You can stop the kernel and use the pre-calculated simulation.
    #!gmx mdrun -s df.tpr -deffnm df

Note: if you do not want to wait, but go on with the tutorial, copy the
data from the ``reference`` directory into the output_files directory.
To do this within this notebook, remove the comment characters (#) in
the following cell

.. code:: ipython3

    ## ONLY execute the lines below if you do not want to run and wait for the simulations to finish. 
    ## remove the comment characters (#) to execute the commands below
    #! cp reference/df.xtc .


Simulation analysis
-------------------

First, let’s examine the simulation results. Since the output trajectory
is not pre-aligned to the density (the transformation matrix in the
parameter file accounts for this during the simulation), we now need to
align it explicitly. Gromacs trjconv command with “-fit rot+trans”
option. We align the trajectory to the ``fitted.pdb`` file that we
generated using ``rigidbodyfit``

.. code:: ipython3

    ! echo 1 0 | gmx trjconv -f df.xtc -s fitted.pdb -fit rot+trans -o df_fitted.xtc

Basic manipulations with trajectories can be done using the `MDAanalysis
python library <https://doi.org/10.1002/jcc.21787>`__. We load
trajectory information from TPR and XTC files. Notice how residues
198-203 shift into the density (use the slider to navigate through the
frames more quickly). At which frame do the residues appear
well-aligned?

.. code:: ipython3

    # read trajectory
    u = mda.Universe('start.gro', 'df_fitted.xtc')
    view = ng.NGLWidget()
    view.add_trajectory(u)
    view.add_component('input/CLR_active_state_map_rotated.mrc')
    view

To visualize the progress of the simulation, we can plot RMSD (root mean
square deviation of atomic positions). We calculate the RMSD of each
frame compared to the input structure. In MDAnalysis, this can be done
using the align.AlignTraj class. For more details, please refer to the
`MDAnalysis
documentation <https://docs.mdanalysis.org/2.7.0/documentation_pages/analysis/rms.html>`__.

.. code:: ipython3

    # align trajectory to the first frame and calculate RMSD. RMSD values are stored in prealigner.results.rmsd
    prealigner = align.AlignTraj(u, u, select="protein and name CA",
                                 in_memory=True).run()

.. code:: ipython3

    # create list with time in ps
    time = [num*2 for num, el in enumerate(prealigner.results.rmsd)]
    
    # plot RMSD
    %matplotlib inline
    %config InlineBackend.figure_format='retina'
    plt.rcParams["figure.figsize"] = (6, 4)
    plt.rcParams["font.size"] = 14
    plt.plot(time, prealigner.results.rmsd)
    plt.xlabel('Time, ps')
    plt.ylabel('RMSD (C-alpha), Å')
    plt.show()

For the purposes of the tutorial we used a system for which structure
has already been resolved (PDBid:
`6UVA <https://www.rcsb.org/structure/6UVA>`__). We can assess the
accuracy of our simulation by comparing the RMSD between the resolved
structure and our simulation frames.

.. code:: ipython3

    # load reference structure and pick only backbone
    reference_state = mda.Universe('input/CLR_reference.pdb') 
    ref_bb = reference_state.select_atoms('name CA')
    ref_bb_positions = ref_bb.positions
    
    # Select tha same atoms that are available in the reference structure 
    bb = u.select_atoms('name CA and (resid 1:193 or resid 199:224 or resid 232:272)')
    bb.write('selected_atoms_traj.pdb', frames='all')
    
    part_atoms = mda.Universe('selected_atoms_traj.pdb')
    
    # Align and calculate RMSD
    reference_alignment = align.AlignTraj(part_atoms,  # trajectory to align
                    ref_bb,  # reference
                    select='all',  # selection of atoms to align
                    filename='aligned_traj.pdb',  # file to write the trajectory to
                    match_atoms=True,  # whether to match atoms based on mass
                   ).run()

.. code:: ipython3

    plt.plot(time, reference_alignment.results.rmsd)
    plt.rcParams["figure.figsize"] = (6, 4)
    plt.title('Comparison to the reference structure')
    plt.xlabel('Time, ps')
    plt.ylabel('RMSD (C-alpha), Å')
    plt.show()

To assess the fitting quality without reference structures, we are
plotting a metric such as cross-correlation, which represents the
structure’s similarity to the density. The closer the value is to 1, the
better the atoms fit into the density. Cross-correlation can be computed
using `Chimera <https://www.cgl.ucsf.edu/chimera/>`__ fitmap function
(аn example script ``cross_correlation_with_chimera.py`` can be found in
the ``data/additional_scripts`` folder).

Due to the adaptive force scaling scheme the force to the protein atoms
becomes extremely high towards the end of the simulations, resulting in
a crash. Therefore, to prevent distortion of the protein structure and
integritiy of the stereochemistry in the final mode, from this high
force, we will use a generalized orientation-dependent all-atom
potential `GOAP <https://doi.org/10.1016/j.bpj.2011.09.012>`__ scoring
function; originally developed to assess protein structure prediction.

GOAP score source code was downloaded at `https:
//sites.gatech.edu/cssb/goap/ <https://sites.gatech.edu/cssb/goap/>`__

Operating these tools falls beyond the scope of this tutorial. However,
we have precalculated the cross-correlation (``cc.csv``) and GOAP-score
(``goap.csv``) values for our density simulation.

.. code:: ipython3

    goap = pd.read_csv('reference/goap.csv', index_col=0)['goap_score'].to_list()
    cc = pd.read_csv('reference/cc.csv', index_col=0)['cross-correlation'].to_list()
    
    plt.rcParams["figure.figsize"] = (6, 4)
    fig, ax1 = plt.subplots()
    
    color = 'tab:red'
    ax1.set_xlabel('Time, ps')
    ax1.set_ylabel('Cross-correlation', color=color)
    ax1.plot(time[1:], cc[1:], color=color)
    ax1.tick_params(axis='y', labelcolor=color)
    
    fig.tight_layout()
    #plt.xlim(0, 45)
    #ax1.set_title('Cross-correlation')
    plt.subplots_adjust(top=0.9)
    
    plt.show() 

.. code:: ipython3

    averages = []
    window_size = 20
    for i in range(len(goap) - window_size + 1):
        window = goap[i:i + window_size]
        window_average = sum(window) / window_size
        averages.append(window_average)
    
    plt.rcParams["figure.figsize"] = (6, 4)
    fig, ax1 = plt.subplots()
    
    ax2 = ax1  # instantiate a second axes that shares the same x-axis
    
    color = 'tab:blue'
    ax2.set_ylabel('GOAP score', color=color)
    ax2.plot(time[1:], goap[1:], color=color)
    ax2.plot(time[:-(window_size)], averages[1:], color='indigo')
    ax2.tick_params(axis='y', labelcolor=color)
    
    fig.tight_layout()
    #plt.xlim(0, 45)
    #ax1.set_title('GOAP-score')
    plt.subplots_adjust(top=0.9)
    
    plt.show() 

The plot above shows the increase in cross-correlation over time. The
GOAP-score remains fairly steady during the first half of the simulation
but begins to rise after 600 ps.

Now that we have performed the density-fit simulation, we should report
or recall about what type of simulation we have per formed. The GROMACS
tool
```gmx report-methods`` <https://manual.gromacs.org/current/onlinehelp/gmx-report-methods.html>`__
can be useful for this purpose. gmx report-methods print out basic
system information on a performed run. It can also provide an unformatte
d text (with the option -o) or a LaTeX formatted output file with the
option -m.

For an additional challenge, try setting up density fitting for a more
complex system. You can find one in the ``data/input_ion_channel``
directory, that contains experimental electron density maps from the ABC
transporter TmrAB (doi:10.1038/s41586-019-1391-0). Is the density
aligned to the starting structure? Which regions need the most fitting?

Do you have any questions? Have a look at the user discussions on
`GROMACS forums <htttp://forums.gromacs.org>`__