Full example

This section of the tutorial will present an example of a real molecular dynamics simulation using the protein Crambin (PDB ID 1CBN). You will be able to download the input scripts as you read along. You should run these on your own computer to get a feel for the type of output you can expect to see from CHARMM. Suggested exercises are given to encourage you to modify the scripts in various ways. It is highly recommended that you download the tarball containing all of the inputs, outputs, and other files needed before beginning to work through this example. (BTM: I am still assembling the tarball, I will upload it when I'm done).

Reading in the Crambin structure
The first step of any simulation is getting your structure read into CHARMM. Unfortunately, it is not always straightforward to just get a PDB file from the Protein Data Bank and load it up. This is mainly because the PDB uses different naming conventions than CHARMM for some atoms and residues. Fortunately, there are a number of tools that can help with this such as the conv_pdb.pl script in MMTSB or the standalone PDB parser from CHARMMing (ToDo: provide download link). In this case, we will start with a version of 1CBN that has been run through CHARMMing's parser. In order to keep things simple, we will use only the protein atoms and not the bound ligand (the hetero atoms). The parsed PDB with the HETATMs removed can be found here.

Reading in the PDB file
The first input script to be used is setup.inp, which reads the sequence and coordinates from the PDB file and creates the other data structures needed by CHARMM. Most of this script should be reasonably straightforward, however there are a couple of commands that have not been seen previously:


 * READ SEQUence tells CHARMM to read the sequence for the macromolecule from the file.
 * The GENErate command is used to create a new segment in the protein structure file. In this script, we do not read in a PSF, so the GENErate command effectively creates the PSF.
 * The setup option to GENErate tells CHARMM to create the internal coordinate (IC) tables
 * a-pro is the identifier we give to the segment that we are generating (this is called the SEGID and is limited to 5 characters)
 * "first nter last cter" determine the terminal group patchings for the segments. NTER and CTER are the defaults for proteins.
 * The REWInd command tells CHARMM to begin reading a file again from the beginning (handy when you want to read two types of data from a file as we do here)!
 * The IC commands in this file tell CHARMM to fill in missing data in the internal coordinate table with information from the parameter sets (bonds, angles, etc. being placed at their minimum). This data can then be used to build cartesian coordinates. However, since the PDB file already has cartesian coordinates for all of the atoms except the hydrogens, it is not really needed here (but it does no harm to have it).
 * The HBUIld command places the hydrogens relative to the heavy atom positions. Many (most?) PDB files do not have hydrogen positions, so this command is used within CHARMM to build them. Note that the hydrogen positions may not be optimal, so it is best to minimize them.

Once we have all of the coordinates in place, we go ahead and save out our newly created PSF and coordinate files.

Making the final structure
Now that we have the protein segment read in, there is still a little bit more to do to get a really correct structure that we can work with. In many cases, a protein will have more than one segment and they will all need to be prepared separately, read in, and appended together to make a final structure. This can be done with the APPEnd option to READ PSF and READ COOR. In this case, however, we only have a single segment to read in so this step is somewhat superfluous. It is presented here only because in a lot of cases you will need to do it. However, there is one important thing we must do now! If you look at the original 1CBN PDB from the Protein Data Bank, you will notice some lines in the header that begin with "SSBOND". These lines indicate that there are disulfide bridges present in the protein, which can have a substantial effect on the behavior of participating residues. The structure must therefore be modified, which is done via the PATCh command in CHARMM. There are other types of patches we could include as well -- these are mostly used to change the protonation state of a residue. For example, the ASPP patch adds and extra protein to an Aspertine residue, giving it a net +1 charge.

In this case, however, we just want to apply the disulfide bridge patches. The input script we have to do all of this is append.inp. When you run it, be careful to note the output from the PATCh command to make sure that the patch was applied successfully. CHARMM does not always stop processing when a patch application fails, which can lead to strange problems in later scripts. There are a couple of syntactic points to note as well:


 * Note the syntax of the PATCh command: PATCh  . In this case DISUL (for disulfide) is the name of the patch, 2 selections are needed (since a disulfide bridge links two residues, and the SETUp options is used again because the IC table is modified and thus needs to be re-created.
 * We delete the hydrogen atom coordinates and re-place them once the patching is done. In this case this is not really necessary, however when appending multiple segments it is a good idea to re-create the hydrogen atoms coordinates with the whole structure in place. This will reduce bad contacts between the hydrogens and heavy atoms.
 * The IOFOrm EXTEnded command tells CHARMM to print the coordinate files out in high precision (note: older versions of CHARMM may have toruble reading these in).

Basic notes
Now that the structure has been prepared for use with CHARMM, the next step is to minimize it. The input script may be found here and you may also download the output file. As mentioned in the Minimization section of the tutorial, there are several different methods for performing minimization. This script implements a fairly straightforward routine of 100 steps of steepest descent followed by 1000 steps of ABNR with a gradient tolerance of 0.01. After the minimization is completed, the COOR RMS command is used to show the difference between the initial and final structures (the original coordinates are copied to the COMP set via the COOR COPY COMP command that is issued prior to minimization; by default COOR RMS prints out the root mean square distance between the coordinates in the main and comp sets).

The PSF and initial coordinates are read infrom the files created by the append script. We only need to write out a new coordinate file since minimizing the structure does not change the coordinates.

Exercises

 * 1) Modify the minimization script to do three separate rounds of minimization: (1) minimize only the hydrogens, fixing the heavy atoms, (2) minimize with the backbone C, N, and O atoms fixed and all other atomd free to move, (3) minimize with all atoms free. Which minimization reduces system energy the mose?
 * 2) Use COOR DIFF to get the atom by atom difference between the initial coordinates and final coordinates. Which atoms moved the most? The least?

Solvation and neutralization
The first simulation we'll set up for this tutorial will be molecular dynamics in explicit solvent. In order to continue on, we need to place a water structure around the protein. Although Crambin is neutral, we will also place it in a 0.15 molar solution of potassium chloride to illustrate the process of neutralization.

Solvation
The input file for solvation is Solvate.inp (the corresponding output file can be downloaded here).