%run src/init_notebooks.py
#check_notebook()
hide_toggle()

Density-fit simulation¶

Authors : Tatiana Shugaeva, Nandan Haloi, John Cowgill and Alessandra Villa

Goal : learn step-by-step how to run a density-fit simulation using GROMACS

Time : 60 minutes

Software : GROMACS 2024, python modules: nglviewer, MDAnalysis, pandas, matplotlib.

Source: GROMACS tutorials @gromacs.org

version : release doi:10.5281/zenodo.13938906

Introduction to density fitting simulations¶

Recent advances in single-particle cryoelectron microscopy (cryo-EM) have enabled us to obtain near-atomic resolution structures of challenging protein targets including large protein complexes and membrane proteins. However, in many cases, one often ends up in low-resolution densities that are not sufficient to confidently build models. This mainly occurs due to the inherent dynamics of the protein or bound-ligands such as lipids, small molecules; and the averaging of each EM micrograph during 3-dimensional density construction leads to low resolution averaged map for these dynamic parts.

Several molecular dynamics (MD) simulations based methods have been developed to refine all-atom models to EM maps, e.g. correlation-driven molecular dynamics (CDMD) simulations that uses EM density maps to add potential to guide an atomic model to the target [Igaev et al. eLife 2019;8:e43542], molecular dynamic flexible fitting (MDFF) that applies forces proportional to the gradient of the EM density map [Singharoy et al. eLife 2016;5:e16105].

Here, we use the GROMACS to perform density-fit (or density-guided) simulations. In density-guided simulations, additional forces are applied to atoms that depend on the gradient of similarity between a simulated density and a reference density. By applying these forces protein structures can be “fitted” to densities coming from, e.g., cryo electron-microscopy (see details in the GROMACS manual. The implemented approach extends the ones described in Orzechowski et al. and Igaev et al.}

More on the density guided approach implemented in GROMACS and a recently developed tool within GROMACS that can fit structures via Bayes’ approach can be found in Blau et al. bioRxiv 2022.

Preparations to run this notebook¶

First of all, we need to install a tool that allow us to automatically align density to the system coordinates: rigidbodyfit. We will use this tool in the pre-processing part.

! pip install rigidbodyfit

Then we import the follow post-processing tools from MDAnalysis

import warnings
# suppress some MDAnalysis warnings about PSF files
warnings.warn('ignore')
import nglview as ng
import MDAnalysis as mda
from MDAnalysis.analysis import rms
from MDAnalysis.analysis import align
from MDAnalysis.analysis.rms import RMSF
import pandas as pd
import matplotlib.pyplot as plt

Finally, we move to the working directory data

# Change to the data directory
# Note that executing this command twice will result in an error you can ignore
%cd data

Molecular system description¶

In this tutorial, we will use density fitting to fit the structure of the G-protein coupled calcitonin receptor-like receptor (CLR) to its cryo-EM density. The receptor has two well-defined states, the main difference between them is in the position of several transmembrane alpha-helices. We will take the density file of the active state of the receptor (EMD-20906) and fit an atomistic model of the protein into the receptor’s density.

Note that any type of 3D structure in PDB format can be used as an input for this procedure, ranging from homology models to experimentally derived models or outcomes from neural network-based conformation predictions. In this tutorial we will use a model generated by AlphaFold2 for the fitting. Below, you can take a look at it:

## Initial structure of CLR generated by AlphaFold2
view = ng.NGLWidget()
view.add_component("input/CLR_initial_model.pdb")
view
#Click and drag to rotate, zoom with your mouseweel
#For more information on this viewer have a look at https://github.com/nglviewer/nglview

The density file EMD-20906 was downloaded from Electron Microscopy Data Bank. The original file contains electron density for a big multisubunit protein complex. Chimera was used to extract and save the density corresponding to CLR in a separated file. A tutorial on density extraction using Chimera is available on YouTube for further details. Below, you can see the resulting density file:

view = ng.NGLWidget()
view.add_component('input/CLR_active_state_map_rotated.mrc')
view

System preparation¶

Now we prepare the system for the simulation. First of all we need to generate the structural and topology file to energy minimize the structure before performing density-fit simulation. We start by generating the topology file for the receptor using GROMACS pre-processor tool pdb2gmx and CLR_initial_model.pdb file as input. You find the file in the directory input. In this tutorial we use CHARMM36 force field. The force field is not distributed with GROMACS and be downloaded . Force field directory charmm36-jul2022.ff can be found in the working directory and is automatically read by the pre-processing tool.

All needed commands are listed below, for more detaild explanation of the commands please refer to GROMACS tutorial: Introduction to Molecular Dynamics.

# prepare .gro file and topology file.
!gmx pdb2gmx -f input/CLR_initial_model.pdb -o model.gro -water tip3p -ff "charmm36-jul2022"

# prepare the box
!gmx editconf -f model.gro -o model_box.gro -c -d 1.0 -bt cubic

For tutorial purpose, the simulations are performed in water solution, even if CLR is a membrane protein. Note we can still obtain accurate results by using simulations in water (Linnea Yvonnesdotter et al.).

# add water molecules
!gmx solvate -cp model_box.gro -o model_solv.gro -p topol.top

Now we add ions to neutralize the system’s charge. We achieve this using -neutral option in gmx genion.

#adding ions
!touch ions.mdp
!gmx grompp -f ions.mdp -c model_solv.gro -p topol.top -o solv.tpr

!echo 13 | gmx genion -s solv.tpr -o model_solv.gro -p topol.top -pname NA -nname CL -neutral

After setting up the complete system, the next steps are energy minimization and equilibration. For guidance, you can refer to the Introduction to Molecular Dynamics tutorial, which wementioned earlier. You can practice system preparation using the commands from the reference tutorial or proceed directly with the provided start.gro file.

# Please remove the comment character (#) to execute the command below.
# It copies prepared file start.gro from the input folder.
#!cp input/start.gro ./

view = ng.NGLWidget()
view.add_component('start.gro')
view.add_representation("ball+stick",selection="NA or CL")
## water
view.add_representation("licorice",selection="TIP3P", opacity=0.3)
view

Density alignment¶

To run a density-guided simulation using GROMACS we need not only a structural file (model_solv.gro), a topology file (topol.top) describing the molecular system, but also a file containing information on the density to map (CLR_active_state_map_rotated.mrc) and a simulation parameter file (df.mdp). You can find the two files in the directory input

If we examine the density, we will notice that it is not aligned with the protein structure in start.gro:

view = ng.NGLWidget()
view.add_component("start.gro")
view.add_component('input/CLR_active_state_map_rotated.mrc')
view

To align the density and the structure, we use the rigidbodyfit script. Here, there are two important parameters to consider:

sampling-depth: Increasing this value will result in a more rigorous alignment. Practise indicates 9 as a reasonable values. However, if you’re dissatisfied with the alignment result, consider increasing it to 11-12. Be aware that higher values may significantly extend the calculation time.
output-transform: This option adds a transformation matrix to the output. We will use it in our md parameter file df.mdp.

# script works with .pdb file format, so we have to convert prepared .gro to .pdb
! gmx editconf -f start.gro -o start.pdb

# The rigidbodyfit script will run much faster locally
! rigidbodyfit --structure start.pdb --density input/CLR_active_state_map_rotated.mrc --sampling-depth 9 --output-transform

To check if the alignmnt was successfull we check how fitted.pdb looks compared to the density. The process is stochastic, so if you see a poorly aligned result, increase the --sampling-depth value and rerun the cell above.

view = ng.NGLWidget()
view.add_component("fitted.pdb")
view.add_component('input/CLR_active_state_map_rotated.mrc')
view

Most parts of the structure are aligned. Note that some residues (195-203) are slightly shifted compared to the density, so density fitting simulation is needed.

Now, we import the information from transform.json to the df.mdp

!cp input/df.mdp .

mdp_shift = 'density-guided-simulation-shift-vector                  = '
mdp_rotate = 'density-guided-simulation-transformation-matrix         = '
mdp_name = '\ndensity-guided-simulation-reference-density-filename    = CLR_active_state_map_rotated.mrc\n'

with open('transform.json') as f:
    d = json.load(f)
    a = [str(i) for i in d['shift_in_nm']]
    out_str = mdp_shift + ' '.join(a) + '\n'

    rot = ''
    for el in d['orientation_matrix']:
        s = [str(i) for i in el]
        rot = rot + ' ' + ' '.join(s)

with open('df.mdp', 'a') as f2:
    f2.write(mdp_name)
    f2.write(out_str)
    f2.write(mdp_rotate+rot + '\n')

!cat df.mdp

More information on the mdp file option for density fit simulations can be found in GROMACS user guide, in the BioExcel webinar given by Christian Blau. Some math behind the method is reported in GROMACS manual.

We check that the density reference file is the working directory.

!ls

If it is not, we copy the reference file in the working directory.

!cp input/CLR_active_state_map_rotated.mrc ./

Density guided simulation¶

Now we are ready to run the simulation.

!gmx grompp -f df.mdp -c start.gro -p topol.top -o df.tpr

Check for warnings. If you want for time reason, you can skip the simulation and copy the output files from the directory reference.

# Please remove the comment character (#) to execute the command below.
# It takes long to get the result. You can stop the kernel and use the pre-calculated simulation.
#!gmx mdrun -s df.tpr -deffnm df

Note: if you do not want to wait, but go on with the tutorial, copy the data from the reference directory into the output_files directory. To do this within this notebook, remove the comment characters (#) in the following cell

## ONLY execute the lines below if you do not want to run and wait for the simulations to finish.
## remove the comment characters (#) to execute the commands below
#! cp reference/df.xtc .

Simulation analysis¶

First, let’s examine the simulation results. Since the output trajectory is not pre-aligned to the density (the transformation matrix in the parameter file accounts for this during the simulation), we now need to align it explicitly. Gromacs trjconv command with “-fit rot+trans” option. We align the trajectory to the fitted.pdb file that we generated using rigidbodyfit

! echo 1 0 | gmx trjconv -f df.xtc -s fitted.pdb -fit rot+trans -o df_fitted.xtc

Basic manipulations with trajectories can be done using the MDAanalysis python library. We load trajectory information from TPR and XTC files. Notice how residues 198-203 shift into the density (use the slider to navigate through the frames more quickly). At which frame do the residues appear well-aligned?

# read trajectory
u = mda.Universe('start.gro', 'df_fitted.xtc')
view = ng.NGLWidget()
view.add_trajectory(u)
view.add_component('input/CLR_active_state_map_rotated.mrc')
view

To visualize the progress of the simulation, we can plot RMSD (root mean square deviation of atomic positions). We calculate the RMSD of each frame compared to the input structure. In MDAnalysis, this can be done using the align.AlignTraj class. For more details, please refer to the MDAnalysis documentation.

# align trajectory to the first frame and calculate RMSD. RMSD values are stored in prealigner.results.rmsd
prealigner = align.AlignTraj(u, u, select="protein and name CA",
                             in_memory=True).run()

# create list with time in ps
time = [num*2 for num, el in enumerate(prealigner.results.rmsd)]

# plot RMSD
%matplotlib inline
%config InlineBackend.figure_format='retina'
plt.rcParams["figure.figsize"] = (6, 4)
plt.rcParams["font.size"] = 14
plt.plot(time, prealigner.results.rmsd)
plt.xlabel('Time, ps')
plt.ylabel('RMSD (C-alpha), Å')
plt.show()

For the purposes of the tutorial we used a system for which structure has already been resolved (PDBid: 6UVA). We can assess the accuracy of our simulation by comparing the RMSD between the resolved structure and our simulation frames.

# load reference structure and pick only backbone
reference_state = mda.Universe('input/CLR_reference.pdb')
ref_bb = reference_state.select_atoms('name CA')
ref_bb_positions = ref_bb.positions

# Select tha same atoms that are available in the reference structure
bb = u.select_atoms('name CA and (resid 1:193 or resid 199:224 or resid 232:272)')
bb.write('selected_atoms_traj.pdb', frames='all')

part_atoms = mda.Universe('selected_atoms_traj.pdb')

# Align and calculate RMSD
reference_alignment = align.AlignTraj(part_atoms,  # trajectory to align
                ref_bb,  # reference
                select='all',  # selection of atoms to align
                filename='aligned_traj.pdb',  # file to write the trajectory to
                match_atoms=True,  # whether to match atoms based on mass
               ).run()

plt.plot(time, reference_alignment.results.rmsd)
plt.rcParams["figure.figsize"] = (6, 4)
plt.title('Comparison to the reference structure')
plt.xlabel('Time, ps')
plt.ylabel('RMSD (C-alpha), Å')
plt.show()

To assess the fitting quality without reference structures, we are plotting a metric such as cross-correlation, which represents the structure’s similarity to the density. The closer the value is to 1, the better the atoms fit into the density. Cross-correlation can be computed using Chimera fitmap function (аn example script cross_correlation_with_chimera.py can be found in the data/additional_scripts folder).

Due to the adaptive force scaling scheme the force to the protein atoms becomes extremely high towards the end of the simulations, resulting in a crash. Therefore, to prevent distortion of the protein structure and integritiy of the stereochemistry in the final mode, from this high force, we will use a generalized orientation-dependent all-atom potential GOAP scoring function; originally developed to assess protein structure prediction.

GOAP score source code was downloaded at https: //sites.gatech.edu/cssb/goap/

Operating these tools falls beyond the scope of this tutorial. However, we have precalculated the cross-correlation (cc.csv) and GOAP-score (goap.csv) values for our density simulation.

goap = pd.read_csv('reference/goap.csv', index_col=0)['goap_score'].to_list()
cc = pd.read_csv('reference/cc.csv', index_col=0)['cross-correlation'].to_list()

plt.rcParams["figure.figsize"] = (6, 4)
fig, ax1 = plt.subplots()

color = 'tab:red'
ax1.set_xlabel('Time, ps')
ax1.set_ylabel('Cross-correlation', color=color)
ax1.plot(time[1:], cc[1:], color=color)
ax1.tick_params(axis='y', labelcolor=color)

fig.tight_layout()
#plt.xlim(0, 45)
#ax1.set_title('Cross-correlation')
plt.subplots_adjust(top=0.9)

plt.show()

averages = []
window_size = 20
for i in range(len(goap) - window_size + 1):
    window = goap[i:i + window_size]
    window_average = sum(window) / window_size
    averages.append(window_average)

plt.rcParams["figure.figsize"] = (6, 4)
fig, ax1 = plt.subplots()

ax2 = ax1  # instantiate a second axes that shares the same x-axis

color = 'tab:blue'
ax2.set_ylabel('GOAP score', color=color)
ax2.plot(time[1:], goap[1:], color=color)
ax2.plot(time[:-(window_size)], averages[1:], color='indigo')
ax2.tick_params(axis='y', labelcolor=color)

fig.tight_layout()
#plt.xlim(0, 45)
#ax1.set_title('GOAP-score')
plt.subplots_adjust(top=0.9)

plt.show()

The plot above shows the increase in cross-correlation over time. The GOAP-score remains fairly steady during the first half of the simulation but begins to rise after 600 ps.

Now that we have performed the density-fit simulation, we should report or recall about what type of simulation we have per formed. The GROMACS tool `gmx report-methods <https://manual.gromacs.org/current/onlinehelp/gmx-report-methods.html>`__ can be useful for this purpose. gmx report-methods print out basic system information on a performed run. It can also provide an unformatte d text (with the option -o) or a LaTeX formatted output file with the option -m.

For an additional challenge, try setting up density fitting for a more complex system. You can find one in the data/input_ion_channel directory, that contains experimental electron density maps from the ABC transporter TmrAB (doi:10.1038/s41586-019-1391-0). Is the density aligned to the starting structure? Which regions need the most fitting?

Do you have any questions? Have a look at the user discussions on GROMACS forums