%run src/init_notebooks.py
#check_notebook()
hide_toggle()
Density-fit simulation¶
Authors : Tatiana Shugaeva, Nandan Haloi, John Cowgill and Alessandra Villa
Goal : learn step-by-step how to run a density-fit simulation using GROMACS
Time : 60 minutes
Software : GROMACS 2024, python modules: nglviewer, MDAnalysis, pandas, matplotlib.
Source: GROMACS tutorials @gromacs.org
version : release `doi:10.5281/zenodo.13938906 <>`__
Introduction to density fitting simulations¶
Recent advances in single-particle cryoelectron microscopy (cryo-EM) have enabled us to obtain near-atomic resolution structures of challenging protein targets including large protein complexes and membrane proteins. However, in many cases, one often ends up in low-resolution densities that are not sufficient to confidently build models. This mainly occurs due to the inherent dynamics of the protein or bound-ligands such as lipids, small molecules; and the averaging of each EM micrograph during 3-dimensional density construction leads to low resolution averaged map for these dynamic parts.
Several molecular dynamics (MD) simulations based methods have been developed to refine all-atom models to EM maps, e.g. correlation-driven molecular dynamics (CDMD) simulations that uses EM density maps to add potential to guide an atomic model to the target [Igaev et al. eLife 2019;8:e43542], molecular dynamic flexible fitting (MDFF) that applies forces proportional to the gradient of the EM density map [Singharoy et al. eLife 2016;5:e16105].
Here, we use the GROMACS to perform density-fit (or density-guided) simulations. In density-guided simulations, additional forces are applied to atoms that depend on the gradient of similarity between a simulated density and a reference density. By applying these forces protein structures can be “fitted” to densities coming from, e.g., cryo electron-microscopy (see details in the GROMACS manual. The implemented approach extends the ones described in Orzechowski et al. and Igaev et al.}
More on the density guided approach implemented in GROMACS and a recently developed tool within GROMACS that can fit structures via Bayes’ approach can be found in Blau et al. bioRxiv 2022.
Preparations to run this notebook¶
First of all, we need to install a tool that allow us to automatically align density to the system coordinates: rigidbodyfit. We will use this tool in the pre-processing part.
! pip install rigidbodyfit
Then we import the follow post-processing tools from MDAnalysis
import warnings
# suppress some MDAnalysis warnings about PSF files
warnings.warn('ignore')
import nglview as ng
import MDAnalysis as mda
from MDAnalysis.analysis import rms
from MDAnalysis.analysis import align
from MDAnalysis.analysis.rms import RMSF
import pandas as pd
import matplotlib.pyplot as plt
Finally, we move to the working directory data
# Change to the data directory
# Note that executing this command twice will result in an error you can ignore
%cd data
Molecular system description¶
In this tutorial, we will use density fitting to fit the structure of the G-protein coupled calcitonin receptor-like receptor (CLR) to its cryo-EM density. The receptor has two well-defined states, the main difference between them is in the position of several transmembrane alpha-helices. We will take the density file of the active state of the receptor (EMD-20906) and fit an atomistic model of the protein into the receptor’s density.
Note that any type of 3D structure in PDB format can be used as an input for this procedure, ranging from homology models to experimentally derived models or outcomes from neural network-based conformation predictions. In this tutorial we will use a model generated by AlphaFold2 for the fitting. Below, you can take a look at it:
## Initial structure of CLR generated by AlphaFold2
view = ng.NGLWidget()
view.add_component("input/CLR_initial_model.pdb")
view
#Click and drag to rotate, zoom with your mouseweel
#For more information on this viewer have a look at https://github.com/nglviewer/nglview
The density file EMD-20906
was downloaded from Electron Microscopy
Data Bank. The original file
contains electron density for a big multisubunit protein complex.
Chimera was used to extract and
save the density corresponding to CLR in a separated file. A tutorial
on density extraction using
Chimera is
available on YouTube for further details. Below, you can see the
resulting density file:
view = ng.NGLWidget()
view.add_component('input/CLR_active_state_map_rotated.mrc')
view
System preparation¶
Now we prepare the system for the simulation. First of all we need to
generate the structural and topology file to energy minimize the
structure before performing density-fit simulation. We start by
generating the topology file for the receptor using GROMACS
pre-processor tool pdb2gmx
and CLR_initial_model.pdb
file as
input. You find the file in the directory input
. In this tutorial we
use CHARMM36 force field. The force field is not distributed with
GROMACS and be downloaded . Force field directory
charmm36-jul2022.ff
can be found in the working directory and is
automatically read by the pre-processing tool.
All needed commands are listed below, for more detaild explanation of the commands please refer to GROMACS tutorial: Introduction to Molecular Dynamics.
# prepare .gro file and topology file.
!gmx pdb2gmx -f input/CLR_initial_model.pdb -o model.gro -water tip3p -ff "charmm36-jul2022"
# prepare the box
!gmx editconf -f model.gro -o model_box.gro -c -d 1.0 -bt cubic
For tutorial purpose, the simulations are performed in water solution, even if CLR is a membrane protein. Note we can still obtain accurate results by using simulations in water (Linnea Yvonnesdotter et al.).
# add water molecules
!gmx solvate -cp model_box.gro -o model_solv.gro -p topol.top
Now we add ions to neutralize the system’s charge. We achieve this using
-neutral
option in gmx genion
.
#adding ions
!touch ions.mdp
!gmx grompp -f ions.mdp -c model_solv.gro -p topol.top -o solv.tpr
!echo 13 | gmx genion -s solv.tpr -o model_solv.gro -p topol.top -pname NA -nname CL -neutral
After setting up the complete system, the next steps are energy
minimization and equilibration. For guidance, you can refer to the
Introduction to Molecular
Dynamics
tutorial, which wementioned earlier. You can practice system preparation
using the commands from the reference tutorial or proceed directly with
the provided start.gro
file.
# Please remove the comment character (#) to execute the command below.
# It copies prepared file start.gro from the input folder.
#!cp input/start.gro ./
view = ng.NGLWidget()
view.add_component('start.gro')
view.add_representation("ball+stick",selection="NA or CL")
## water
view.add_representation("licorice",selection="TIP3P", opacity=0.3)
view
Density alignment¶
To run a density-guided simulation using GROMACS we need not only a
structural file (model_solv.gro
), a topology file (topol.top
)
describing the molecular system, but also a file containing information
on the density to map (CLR_active_state_map_rotated.mrc
) and a
simulation parameter file (df.mdp
). You can find the two files in
the directory input
If we examine the density, we will notice that it is not aligned with
the protein structure in start.gro
:
view = ng.NGLWidget()
view.add_component("start.gro")
view.add_component('input/CLR_active_state_map_rotated.mrc')
view
To align the density and the structure, we use the rigidbodyfit script. Here, there are two important parameters to consider:
sampling-depth: Increasing this value will result in a more rigorous alignment. Practise indicates
9
as a reasonable values. However, if you’re dissatisfied with the alignment result, consider increasing it to 11-12. Be aware that higher values may significantly extend the calculation time.output-transform: This option adds a transformation matrix to the output. We will use it in our md parameter file
df.mdp
.
# script works with .pdb file format, so we have to convert prepared .gro to .pdb
! gmx editconf -f start.gro -o start.pdb
# The rigidbodyfit script will run much faster locally
! rigidbodyfit --structure start.pdb --density input/CLR_active_state_map_rotated.mrc --sampling-depth 9 --output-transform
To check if the alignmnt was successfull we check how fitted.pdb looks
compared to the density. The process is stochastic, so if you see a
poorly aligned result, increase the --sampling-depth
value and rerun
the cell above.
view = ng.NGLWidget()
view.add_component("fitted.pdb")
view.add_component('input/CLR_active_state_map_rotated.mrc')
view
Most parts of the structure are aligned. Note that some residues (195-203) are slightly shifted compared to the density, so density fitting simulation is needed.
Now, we import the information from transform.json
to the df.mdp
!cp input/df.mdp .
mdp_shift = 'density-guided-simulation-shift-vector = '
mdp_rotate = 'density-guided-simulation-transformation-matrix = '
mdp_name = '\ndensity-guided-simulation-reference-density-filename = CLR_active_state_map_rotated.mrc\n'
with open('transform.json') as f:
d = json.load(f)
a = [str(i) for i in d['shift_in_nm']]
out_str = mdp_shift + ' '.join(a) + '\n'
rot = ''
for el in d['orientation_matrix']:
s = [str(i) for i in el]
rot = rot + ' ' + ' '.join(s)
with open('df.mdp', 'a') as f2:
f2.write(mdp_name)
f2.write(out_str)
f2.write(mdp_rotate+rot + '\n')
!cat df.mdp
More information on the mdp file option for density fit simulations can be found in GROMACS user guide, in the BioExcel webinar given by Christian Blau. Some math behind the method is reported in GROMACS manual.
We check that the density reference file is the working directory.
!ls
If it is not, we copy the reference file in the working directory.
!cp input/CLR_active_state_map_rotated.mrc ./
Density guided simulation¶
Now we are ready to run the simulation.
!gmx grompp -f df.mdp -c start.gro -p topol.top -o df.tpr
Check for warnings. If you want for time reason, you can skip the simulation and copy the output files from the directory reference.
# Please remove the comment character (#) to execute the command below.
# It takes long to get the result. You can stop the kernel and use the pre-calculated simulation.
#!gmx mdrun -s df.tpr -deffnm df
Note: if you do not want to wait, but go on with the tutorial, copy the
data from the reference
directory into the output_files directory.
To do this within this notebook, remove the comment characters (#) in
the following cell
## ONLY execute the lines below if you do not want to run and wait for the simulations to finish.
## remove the comment characters (#) to execute the commands below
#! cp reference/df.xtc .
Simulation analysis¶
First, let’s examine the simulation results. Since the output trajectory
is not pre-aligned to the density (the transformation matrix in the
parameter file accounts for this during the simulation), we now need to
align it explicitly. Gromacs trjconv command with “-fit rot+trans”
option. We align the trajectory to the fitted.pdb
file that we
generated using rigidbodyfit
! echo 1 0 | gmx trjconv -f df.xtc -s fitted.pdb -fit rot+trans -o df_fitted.xtc
Basic manipulations with trajectories can be done using the MDAanalysis python library. We load trajectory information from TPR and XTC files. Notice how residues 198-203 shift into the density (use the slider to navigate through the frames more quickly). At which frame do the residues appear well-aligned?
# read trajectory
u = mda.Universe('start.gro', 'df_fitted.xtc')
view = ng.NGLWidget()
view.add_trajectory(u)
view.add_component('input/CLR_active_state_map_rotated.mrc')
view
To visualize the progress of the simulation, we can plot RMSD (root mean square deviation of atomic positions). We calculate the RMSD of each frame compared to the input structure. In MDAnalysis, this can be done using the align.AlignTraj class. For more details, please refer to the MDAnalysis documentation.
# align trajectory to the first frame and calculate RMSD. RMSD values are stored in prealigner.results.rmsd
prealigner = align.AlignTraj(u, u, select="protein and name CA",
in_memory=True).run()
# create list with time in ps
time = [num*2 for num, el in enumerate(prealigner.results.rmsd)]
# plot RMSD
%matplotlib inline
%config InlineBackend.figure_format='retina'
plt.rcParams["figure.figsize"] = (6, 4)
plt.rcParams["font.size"] = 14
plt.plot(time, prealigner.results.rmsd)
plt.xlabel('Time, ps')
plt.ylabel('RMSD (C-alpha), Å')
plt.show()
For the purposes of the tutorial we used a system for which structure has already been resolved (PDBid: 6UVA). We can assess the accuracy of our simulation by comparing the RMSD between the resolved structure and our simulation frames.
# load reference structure and pick only backbone
reference_state = mda.Universe('input/CLR_reference.pdb')
ref_bb = reference_state.select_atoms('name CA')
ref_bb_positions = ref_bb.positions
# Select tha same atoms that are available in the reference structure
bb = u.select_atoms('name CA and (resid 1:193 or resid 199:224 or resid 232:272)')
bb.write('selected_atoms_traj.pdb', frames='all')
part_atoms = mda.Universe('selected_atoms_traj.pdb')
# Align and calculate RMSD
reference_alignment = align.AlignTraj(part_atoms, # trajectory to align
ref_bb, # reference
select='all', # selection of atoms to align
filename='aligned_traj.pdb', # file to write the trajectory to
match_atoms=True, # whether to match atoms based on mass
).run()
plt.plot(time, reference_alignment.results.rmsd)
plt.rcParams["figure.figsize"] = (6, 4)
plt.title('Comparison to the reference structure')
plt.xlabel('Time, ps')
plt.ylabel('RMSD (C-alpha), Å')
plt.show()
To assess the fitting quality without reference structures, we are
plotting a metric such as cross-correlation, which represents the
structure’s similarity to the density. The closer the value is to 1, the
better the atoms fit into the density. Cross-correlation can be computed
using Chimera fitmap function
(аn example script cross_correlation_with_chimera.py
can be found in
the data/additional_scripts
folder).
Due to the adaptive force scaling scheme the force to the protein atoms becomes extremely high towards the end of the simulations, resulting in a crash. Therefore, to prevent distortion of the protein structure and integritiy of the stereochemistry in the final mode, from this high force, we will use a generalized orientation-dependent all-atom potential GOAP scoring function; originally developed to assess protein structure prediction.
GOAP score source code was downloaded at https: //sites.gatech.edu/cssb/goap/
Operating these tools falls beyond the scope of this tutorial. However,
we have precalculated the cross-correlation (cc.csv
) and GOAP-score
(goap.csv
) values for our density simulation.
goap = pd.read_csv('reference/goap.csv', index_col=0)['goap_score'].to_list()
cc = pd.read_csv('reference/cc.csv', index_col=0)['cross-correlation'].to_list()
plt.rcParams["figure.figsize"] = (6, 4)
fig, ax1 = plt.subplots()
color = 'tab:red'
ax1.set_xlabel('Time, ps')
ax1.set_ylabel('Cross-correlation', color=color)
ax1.plot(time[1:], cc[1:], color=color)
ax1.tick_params(axis='y', labelcolor=color)
fig.tight_layout()
#plt.xlim(0, 45)
#ax1.set_title('Cross-correlation')
plt.subplots_adjust(top=0.9)
plt.show()
averages = []
window_size = 20
for i in range(len(goap) - window_size + 1):
window = goap[i:i + window_size]
window_average = sum(window) / window_size
averages.append(window_average)
plt.rcParams["figure.figsize"] = (6, 4)
fig, ax1 = plt.subplots()
ax2 = ax1 # instantiate a second axes that shares the same x-axis
color = 'tab:blue'
ax2.set_ylabel('GOAP score', color=color)
ax2.plot(time[1:], goap[1:], color=color)
ax2.plot(time[:-(window_size)], averages[1:], color='indigo')
ax2.tick_params(axis='y', labelcolor=color)
fig.tight_layout()
#plt.xlim(0, 45)
#ax1.set_title('GOAP-score')
plt.subplots_adjust(top=0.9)
plt.show()
The plot above shows the increase in cross-correlation over time. The GOAP-score remains fairly steady during the first half of the simulation but begins to rise after 600 ps.
Now that we have performed the density-fit simulation, we should report
or recall about what type of simulation we have per formed. The GROMACS
tool
`gmx report-methods
<https://manual.gromacs.org/current/onlinehelp/gmx-report-methods.html>`__
can be useful for this purpose. gmx report-methods print out basic
system information on a performed run. It can also provide an unformatte
d text (with the option -o) or a LaTeX formatted output file with the
option -m.
For an additional challenge, try setting up density fitting for a more
complex system. You can find one in the data/input_ion_channel
directory, that contains experimental electron density maps from the ABC
transporter TmrAB (doi:10.1038/s41586-019-1391-0). Is the density
aligned to the starting structure? Which regions need the most fitting?
Do you have any questions? Have a look at the user discussions on GROMACS forums