# Execute this first
#  * trigger notebook styling
#  * check if notebook had been modified since its distribution
# Note: executing any cells before this modifies the notebook.
%run src/init_notebooks.py

A Beginner’s Guide to Perform Molecular Dynamics Simulation of a Membrane Protein using GROMACS

Authors : Farzaneh Jalalypour, Maryam Majdolhosseini, Alessandra Villa*

Goal : Learn step-by-step how to run a molecular dynamics simulation of a simple membrane protein using GROMACS
Reading time : 40 minutes
Software : GROMACS 2022 python modules: numpy, matplotlib, re, nglviewer, md_traj, panda.
Optional Software : visualization software VMD, Xmgrace plotting tool
Tutorial Source : tutorials.gromacs.org
Version : Final

Small talk with the audience

This tutorial is designed for beginner GROMACS users and assumes you understand the basics of MD simulations and are familiar with GROMACS files and how to generate and execute them. Don’t worry if you are not yet acquainted. We have already developed an “Introduction to Molecular Dynamics” tutorial that covers all you need to know to get started with this one.

Please note that this tutorial is for educational purposes and and won’t go through every step in details. For your system, you must change the settings accordingly (see GROMACS documentation).

Preparations to run this notebook

# Change to the data directory
# Note that executing this command twice will result in an error you can ignore
%cd data

Model system for this tutorial

For this tutorial, we will utilize Malporin, a trimeric protein located at the outer membrane of Gram-negative bacteria that facilitates the translocation of the polysaccharide maltodextrin. You can download the 3D structure of malporin (PDB code 1MAL) from RCSB website https://www.rcsb.org/structure/1MAL or use the previously downloded file, “1mal.pdb” in the “input” directory.

Visualize the structure

import nglview as ng
view = ng.show_structure_file("input/1mal.pdb")
#click and drag to rotate, zoom with your mouseweel
#for more infor on this viewer have a look at https://github.com/nglviewer/nglview

System preparation for this tutorial

Building the protein-membrane system in CHARMM-GUI

We are now ready to embed the protein structure in the membrane in the proper location and orientation and construct the membrane composition we desire. To do this, we utilized the CHARMM-GUI input Generator, a handy web-based tool to generate GROMACS inputs for the protein-membrane system. In order to access CHARMM-GUI (http://www.charmm-gui.org), a web browser such as Chrome, Firefox, etc. is required.

In this Tutorial, rather than providing an in-depth discussion of how to create input via CHARMM-GUI, we will focus on how to perform a correct simulation using GROMACS. However, you can find useful information on Membrane Builder Intro and in this Membrane Proteins Tutorial .

CHARMM-GUI generates all required input files as a zip file. You can unzip the file using the command below, or directly use the previously unzipped folder named “charmm-gui-1MAL”. This folder contains a diverse set of files.

## command to unzip the zip file. The command is executable when you remove the hashtags
#import shutil
#print("Archive file unpacked successfully.")
#ls input

What we require, though, is in a “gromacs” folder. Let’s take a closer look:

%ls input/charmm-gui-1MAL/gromacs/

In a prior tutorial titled “Introduction to Molecular Dynamics”, you may get an explanation for each file including structure and topology files. Some of these mdp files may need to be changed to fit the system as further discussed in the next section.

Now lets create a new folder, called “run”. Here we will perform the minimization and equilibration steps. Ensure that you are always in the correct working directory, you can use the pwd command, which stands for “print working directory”. The command can be used to determine what directory you are currently working in.

# Creat working directory named run
%mkdir run
# Change to the directory "run"
%cd run
# Make sure you are in the right folder. This will show you the path :)

Modified MDP files

Before starting the simulations we need the topology, parameter, and structure files. We will copy them from the CHARMM-GUI-1MAL folder to the run directory. Please note that mdp files are customized to fit the requirements of the system and they are located in the input/mdp folder.

##copy required files to a folder
!scp    ../input/charmm-gui-1MAL/gromacs/step5_input.gro          .
!scp    ../input/charmm-gui-1MAL/gromacs/step5_input.pdb          .
!scp    ../input/charmm-gui-1MAL/gromacs/topol.top                .
!scp    ../input/charmm-gui-1MAL/gromacs/index.ndx                .
!scp -r ../input/charmm-gui-1MAL/gromacs/toppar                   .
#updated mdp files
!scp    ../input/mdp/*.mdp                                        .

Because GROMACS always assumes a default value for each parameter, simulations may be conducted with an entirely empty mdp file. Hence, you must double-check each option to ensure that the default settings are acceptable. If not, you can modify the mdp file. GROMACS is user-friendly and prepared an intriguing list of possibilities that is clearly explained in mdp options webpage. To make your life easier, in all mdp files of this tutorial, a small explanation is included next to each parameter.

Now we have all required files to start minimization and equilibration parts:


Lets start! Although it will take time, you can run the commands below one by one to generate files in every step. Fortunately, we have prepared output of runs ready to use if you prefer the quick route (see input/reference folder).

Energy minimisation

Energy minimisation (EM) is used to relax the protein-membrane system by removing steric conflicts and improper geometry, e.g. to reduce excessive forces caused by too-close particle interaction in the system. We will first generate a binary (.tpr) file which assemble the structure(.gro), topology(.top), and simulation parameters(.mdp). Then we run energy minimization using gmx mdrun.

But first we go through the parameters in the minimization.mdp file:

#revised minimization.mdp
!cat ../input/mdp/step6.0_minimization.mdp
## EM-step1  > to generate binary file
!gmx grompp -f step6.0_minimization.mdp -o minimization.tpr -c step5_input.gro  -r step5_input.gro -p topol.top

Note: From now on, if you want to run the commands on your system consider cells labelled with an odd number ##{number]## and remove one hashtags from each line. But, if you want to use pre-prepared outputs remove hashtags of cells with even numbers.

##1## EM-step2 > to invoke mdrun
#!gmx mdrun -v -deffnm minimization

##The command is executable when you remove one of the hashtags

Five files (tpr,gro,trr,edr,log) are generated after you run the minimization command. You can always find the previously prepared files for each step in the reference folder:

##2## EM-step2 > To transfer minimization output from the reference directory to working directory

#!scp ../reference/minimization* .

##remove one hashtag to transfer pre-prepared minimization files to working directory
#folder of pre-prepared outputs
%ls minimization*

Lets check if the minimization criteria are reached by looking at the last lines of the log file:

!tail -19 ../reference/minimization.log

There are two very important factors to evaluate to determine if energy minimization was successful. The first is the potential energy (printed at the end of the EM process, even without -v). The second important feature is the maximum force, Fmax, the target for which was set in minim.mdp - “emtol = 1000.0” - indicating a target Fmax of no greater than 1000 kJ/(mol nm). It is possible to arrive at a reasonable Epot with Fmax > emtol. If this happens, your system may not be stable enough for simulation. Evaluate why it may be happening, and perhaps change your minimization parameters (integrator, emstep, etc).

## Minimized structure
import nglview as ng
view = ng.show_structure_file("../reference/minimization.gro")
## Lipids
## water
view.add_representation("licorice",selection="TIP3P", opacity=0.9)
#Click and drag to rotate, zoom with your mouseweel
#For more information on this viewer have a look at https://github.com/nglviewer/nglview

In alternative, remove the comment character (#) to use VMD:

#!vmd ../reference/minimization.gro

Let’s do a bit of analysis. The minimization.edr file contains all of the energy terms that GROMACS collects during EM. To extract energy information you can use the tool gmx energy

!printf "Potential\n0\n" | gmx energy -f minimization.edr -o potential.xvg -xvg none
import pandas as pd
df = pd.read_csv('potential.xvg', sep='\s+', header=None, names=['step','energy(kJ Mol-1)'])

Note: to analyse or visualize simulation data in python or jupyter notebooks, we can output a simplified xvg format from gmx-analysis tools with the option -xvg none

Equilibration run - NVT

position restraints

For free proteins in solution, e.g. cytosolic proteins, position restraints are not as crucial as membrane proteins, where different parts of a system must be restrained in different levels and released gradually in several steps of minimization and equilibration. Otherwise you will end up in serious problems such as undesired change in system size and conformation.

To set position restraints we need to use the define option in the simulation parameter file, .mdp, which switch on by -DPOSRES flag. Have a look at the .mdp files for the following runs:

!head -4 ../input/mdp/step6.1_equilibration_NVT_step1.mdp
!head -4 ../input/mdp/step6.2_equilibration_NVT_step2.mdp
!head -4 ../input/mdp/step6.3_equilibration_NPT_step1.mdp
!head -4 ../input/mdp/step6.4_equilibration_NPT_step2.mdp
!head -4 ../input/mdp/step6.5_equilibration_NPT_step3.mdp
!head -4 ../input/mdp/step6.6_equilibration_NPT_step4.mdp
!head -4 ../input/mdp/step7_production_revised.mdp

Let’s use chain A (segment PROA) as an example to demonstrate how it works. The position restrain is defined in the last part of the itp file. DPOSRES_FC_BB and POSRES_FC_SC specify the type of restrains that apply to the different part of the protein. BB stands for the potein BackBone and SC for the protein Side Chains! Please execute the command below to see the position_restraints set for chain A:

!sed -n '59914,59932p'  ../input/charmm-gui-1MAL/gromacs/toppar/PROA.itp

Hint: The number of atoms may be misleading. Please follow the PDB file to ensure which atoms are selected. Here you can find the atom numbers for the first two residues:

!head -n 30 ../input/charmm-gui-1MAL/gromacs/step5_input.pdb

As you can see, the 1st and 5th atoms are N and CA protein backbone atoms of the first residue, respectively, and the value is defined by POSRES_FC_BB in PROA.itp file. The 7th and 9th are atom indexes of CB and CG which belong to the protein side chain and specify with POSRES_FC_SC. Take some moment to find the rest of the atoms and understand the concept by comparing itp and pdb files.

The P atoms of the lipid headgroups are constrained by DPOSRES_FC_LIPID. As the value is only determined for the Z direction, POPC lipid molecules can only move in the X-Y plane of the bilayer, maintaining the bilayer’s thickness. i is the atom index of the P atom of the first POPC.

!sed -n '1271,1276p'  ../input/charmm-gui-1MAL/gromacs/toppar/POPC.itp

In POPC lipid molecules, two dihedral angles are typically restrained to preserve the cis-isomer. Here, the angles formed by C1-C3-C2-O21 and C28-C29-C210-C211 atoms are described by the atom indexes of the first POPC 25-36-28-30 and 60-63-65-67, respectively, and the value is defined by DIHRES_FC.

!sed -n '1278,1285p'  ../input/charmm-gui-1MAL/gromacs/toppar/POPC.itp

Equilibration run - NVT

To perform NVT equilibration we need two steps. In each step we will remove the restrains gradually to avoid undesired change in system size and conformation.

As discussed before, by removing the first hashtag (#) from the beginning of each line in odd-numbered cells, you may “uncomment” and execute them on your system, but be warned that it will take a while. As an alternative, you can transfer the pre-prepared outputs by executing even-numbered cells to be able to countinue.

##1## equilibration-NVT6.1-step1 > To run
#!gmx grompp -f step6.1_equilibration_NVT_step1.mdp -o step6.1_equilibration_NVT_step1.tpr -c minimization.gro -r step5_input.gro -p topol.top -n index.ndx
#!gmx mdrun -v -deffnm step6.1_equilibration_NVT_step1

As an alternative, you can transfer the pre-prepared outputs by executing even-numbered cells to be able to countinue.

##2## equilibration-NVT6.1-step1 > To transfer
#!scp    ../reference/step6.1_equilibration_NVT_step1*  .

Now we run step 2 using results from step 1 as input

##3## equilibration-NVT6.2-step2 > To run
#!gmx grompp -f step6.2_equilibration_NVT_step2.mdp -o step6.2_equilibration_NVT_step2.tpr -c step6.1_equilibration_NVT_step1.gro -r step5_input.gro -p topol.top -n index.ndx
#!gmx mdrun -v -deffnm step6.2_equilibration_NVT_step2

As an alternative, you can transfer the pre-prepared outputs by executing even-numbered cells to be able to countinue.

##4## equilibration-NVT6.2-step2 > To transfer
#!scp    ../reference/step6.2_equilibration_NVT_step2*  .
#Equilibration - NVT outputs
%ls ../reference/*NVT_step*

It is crucial to evaluate every step to ensure that the simulation is going right. For this purpose, we should check the tempurature using gmx energy tool. To plot the data, we used python data analysis library (pandas), but there are several alternatives out there such as Xmgrace tool.

Evaluate temperature

## check the temperature > To run
!echo 17|gmx energy -f step6.2_equilibration_NVT_step2.edr -o  NVT_S2-temp.xvg -xvg none
import pandas as pd
df = pd.read_csv('NVT_S2-temp.xvg', sep='\s+', header=None, names=['time(ps)','temperature(K)'])
##Alternative##Plotting data by Xmgrace tool
#!xmgrace ../run/NVT_S2-temp.xvg

Evaluate the final structure

It is also strongly recommended to visualise the final structure of each step:

import nglview as ng
view = ng.show_structure_file("../reference/step6.1_equilibration_NVT_step1.gro")
## Lipids
## water
view.add_representation("licorice",selection="TIP3P", opacity=0.9)
#click and drag to rotate, zoom with your mouseweel
#for more information on this viewer have a look at https://github.com/nglviewer/nglview
import nglview as ng
view = ng.show_structure_file("../reference/step6.2_equilibration_NVT_step2.gro")
## Lipids
## water
view.add_representation("licorice",selection="TIP3P", opacity=0.9)

Inside the mdp files

As you may realize, the mdp files for EM and NVT are not the same. The integrator has been updated to md:

!sed -n '8,8p'  ../input/mdp/step6.1_equilibration_NVT_step1.mdp
!sed -n '7,7p'  ../input/mdp/step6.0_minimization.mdp

And a temperature coupling section has been added:

!sed -n '42,55p' ../input/mdp/step6.1_equilibration_NVT_step1.mdp
## For more details and compare at the entire files remove one hashtag
#!cat ../input/mdp/step6.0_minimization.mdp
#!cat ../input/mdp/step6.1_equilibration_NVT_step1.mdp

We used the v-rescale thermostat, which is the modified version of Berendsen having a stochastic term that assures the generation of a correct canonical ensemble. Sometimes you need to define groups, for instance to couple them separately. Here, SOLU_MEMB stands for protein and membrane, whereas SOLV stands for water and ions. If you’re wondering about how we suggest these names, go through the index files. You can utilize pre-defined groups in the index file or even alter the index file to establish more groups if necessary. Typically, CHARMM-GUI provides five categories in the index file, as shown below:




[ SOLU ]



[ MEMB ]



[ SOLV ]


ions and water



protein and POPC




The interpretation of this information is as follows:

!printf "q\n" | gmx make_ndx -f step5_input.gro -n index.ndx -o

You may also notice a variation in values if you look at the first lines of NVT mdp files. The sole difference between the prior stage (step6.1_equilibration_NVT_step1) and the final stage of NVT (step6.2_equilibration_NVT_step2) is that position restrains are gradually decreased and relaxed:

!head -6 ../input/mdp/step6.1_equilibration_NVT_step1.mdp
!head -4 ../input/mdp/step6.2_equilibration_NVT_step2.mdp

Equilibration run - NPT

Pressure equilibration takes place in an NPT ensemble. To keep the pressure stable, we turn the barostat on. We chose the semisotropic pressure coupling, which is isotropic in the x and y directions but not in the z, and is suitable for membrane simulations. Because Berendsen Barostat is outdated and Parrinello-Rahman is better to utilize ‘only’ in production runs when the structure is ‘properly equilibrated’, the c-rescale algorithm is appropriate for this part:

Pressure coupling is on

pcoupl = c-rescale

pcoupltype = semiisotropic

tau_p = 5.0

compressibility = 4.5e-5 4.5e-5

ref_p = 1.0 1.0

refcoord_scaling = com

In order to minimize unintended changes in system size and conformation, NPT equilibration is carried out in four phases, with each step including a progressive removal of restraint.

##9## equilibration-NPT6.3-step1 > To run
#!gmx grompp -f step6.3_equilibration_NPT_step1.mdp -o step6.3_equilibration_NPT_step1.tpr -c step6.2_equilibration_NVT_step2.gro -r step5_input.gro -p topol.top -n index.ndx
#!gmx mdrun -v -deffnm step6.3_equilibration_NPT_step1

In alternative:

##10## equilibration-NPT6.3-step1 > To transfer
#!scp    ../reference/step6.3_equilibration_NPT_step1*  .

In the next phases, we gradually reduce the constraints:

##11## equilibration-NPT6.4-step2 > To run
#!gmx grompp -f step6.4_equilibration_NPT_step2.mdp -o step6.4_equilibration_NPT_step2.tpr -c step6.3_equilibration_NPT_step1.gro -r step5_input.gro -p topol.top -n index.ndx
#!gmx mdrun -v -deffnm step6.4_equilibration_NPT_step2

In alternative:

##12## equilibration-NPT6.4-step2 > To transfer
#!scp    ../reference/step6.4_equilibration_NPT_step2*  .
##13## equilibration-NPT6.5-step3 > To run
#!gmx grompp -f step6.5_equilibration_NPT_step3.mdp -o step6.5_equilibration_NPT_step3.tpr -c step6.4_equilibration_NPT_step2.gro -r step5_input.gro -p topol.top -n index.ndx
#!gmx mdrun -v -deffnm step6.5_equilibration_NPT_step3

In alternative:

##14## equilibration-NPT6.5-step3 > To transfer
#!scp    ../reference/step6.5_equilibration_NPT_step3*  .
##15## equilibration-NPT6.6-step4 > To run
#!gmx grompp -f step6.6_equilibration_NPT_step4.mdp -o step6.6_equilibration_NPT_step4.tpr -c step6.5_equilibration_NPT_step3.gro -r step5_input.gro -p topol.top -n index.ndx
#!gmx mdrun -v -deffnm step6.6_equilibration_NPT_step4

In alternative:

##16## equilibration-NPT6.6-step4 > To transfer
#!scp    ../reference/step6.6_equilibration_NPT_step4*  .

Here you can find the output of NPT steps:

##Equilibration - NPT outputs
%ls ../reference/*NPT_step*

Evaluate the Pressure

Attention! I hope you have not forget to evaluate the temperature in each step so far. In NPT equilibration steps, we need to check at extra parameter, pressure:

## check the pressure > To run
!echo 18|gmx energy -f step6.6_equilibration_NPT_step4.edr -o  NPT_S4_Press.xvg -xvg none
import pandas as pd
import statistics
df = pd.read_csv('NPT_S4_Press.xvg', sep='\s+', header=None, names=['time(ps)','pressure(bar)'])
##Alternative##Plotting data by Xmgrace tool
#!xmgrace NPT_S4_Press.xvg

As you can see, pressure fluctuate more than temprature which is totally ok :)

The system’s temperature should rises to the desired level (~303 K) and stay steady for the rest of the equilibration. While, pressure is the parameter which fluctuates alot during the MD simulation, and its average should be statistically comparable with the pre-defined pressure (1 bar).


Invoke mdrun for production (one single simulation)

The system is now properly equilibrated and reached the appropriate temperature and pressure. The position restrictions may now be lifted, to start the production MD and begin data gathering.For production run, we will first generate the binary tpr filr using the following mdp parameters:

!cat step7_production_revised.mdp
##19.1## step7-production > To run
##generate tpr file
#!gmx grompp -f step7_production_revised.mdp -o step7_production.tpr -c step6.6_equilibration_NPT_step4.gro -t step6.6_equilibration_NPT_step4.cpt -p topol.top -n index.ndx

The completion of a production run (cell 19.2) might take hours or days. Dont execute this command here and try it on the terminal of the system you choose to perform the simulation.

!Warning: This run will generate roughly 9028 Mb of data

##19.2## step7-production(one single simulation) > To run
#!gmx mdrun -s step7_production -cpi

A 100ns simulation was performed. Every 30000 of the 50000000 steps were recorded as the trajectory frames. The trr file is compressed into an xtc file. For convenience, we saved every 100 frames of the final trajectory and put the outputs in the reference folder.

##20## step7-production > To transfer
#!scp    ../reference/step7_production*  .
##step7-production outputs
%ls *step7_production*

Visualize the final structure

import nglview as ng
view = ng.show_structure_file("../reference/step7_production.gro")
## Lipids
view.add_representation("spacefill",selection="POPC", opacity=0.5)
## to see water box uncommand below:

HINT How to generate replicas? You may need more than one replica. You can generate several tpr file using random seed option for the velocity and temperature in the ‘mdp’ file.

How to run several replicas? You dont need to perform simulations separately, because GROMACS has a simple solution for that:

step 1: make folders (here we set four replicas) #mkdir Rep1 Rep2 Rep3 Rep4

step 2: copy each tpr file in a different folder and execute mdrun by command below: #mpirun -np 4 gmx_mpi mdrun -v -s step7_production.tpr -multidir Rep1 Rep2 Rep3 Rep4 -cpi

Quiz Why we need to run several replicas? Do you know how they differ from one another? Exactly the same?


RMSD analysis

!echo 4 4 |gmx rms -s step7_production.tpr -f step7_production_traj_comp_skip100.xtc -o production_rmsd.xvg -bin production.rmsd.dat -tu ns -fit rot+trans -xvg none
import pandas as pd
df = pd.read_csv('production_rmsd.xvg', sep='\s+', header=None, names=['time(ns)','RMSD(nm)'])

The root mean square deviation (RMSD) is a practical parameter to compare the backbones of a protein from its initial to final state, which illustrates the dynamics of structure during the simulation. Here, the protein structure is pretty stable and fluctuates within expected margins(1-2 Å).

##Alternative##Plotting data by Xmgrace tool
#!xmgrace production_rmsd.xvg


##check the energy
!echo 13|gmx energy -f step7_production.edr -o production_Etot.xvg -xvg none
import pandas as pd
df = pd.read_csv('production_Etot.xvg', sep='\s+', header=None, names=['time(ps)','Energy(kJ mol-1)'])
##Alternative##Plotting data by Xmgrace tool
#!xmgrace production_Etot.xvg


#check the tempurature
!echo 15|gmx energy -f step7_production.edr -o  production_temp.xvg -xvg none
import pandas as pd
df = pd.read_csv('production_temp.xvg', sep='\s+', header=None, names=['time(ps)','Temperature*(K)'])
##Alternative##Plotting data by Xmgrace tool
#!xmgrace production_temp.xvg


#check the pressure
!echo 16|gmx energy -f step7_production.edr -o production_press.xvg -xvg none
import pandas as pd
df = pd.read_csv('production_press.xvg', sep='\s+', header=None, names=['time(ps)','Pressure(bar)'])
##Alternative##Plotting data by Xmgrace tool
#!xmgrace production_press.xvg

Trajectory Visualization

import nglview as ng
import pytraj as pt
Trajectory = pt.load('step7_production_traj_comp_skip100.xtc', top='step5_input.pdb')
view = ng.show_pytraj(Trajectory)

Now, place the mouse cursor over the protein press the play button. You will see the protein wiggling and jiggling!

view.add_spacefill('not TIP and not protein', opacity=0.6)
view.camera = 'orthographic'

In this Part, lipids (gray and red spheres), and ions (green and purpule spheres) are illustrated. You can find the name of the atoms by simply placing the cursor on it. Try to find chloride ions.

Easy ha? Now try the tutorial on your protein and let us know if it is useful. if you have any question write in the forum and we will answer as soon as possible: http://forums.gromacs.org

References 1.