Basic Input Scripts

Image of molecule

The features and capabilities of CHARMM can all be used through input scripts. The CHARMMing web site, http://www.charmming.org, contains an integrated set of tools for uploading structures, performing simulations, and viewing the results. In order to provide the best possible user experience, it incorporates a number of freely available tools such as Jmol for visualization and an automatic residue topology file (RTF) generator (GENRTF) which generates the necessary information for atoms and residues that are currently not supported by the CHARMM force fields. Below is a partial list of functionality that currently is incorporated into CHARMMing:

  • PDB/CRD reader and input script generator

  • Integrated molecular graphics

  • Integrated simulation tools (i.e. minimization, solvation, dynamics)

  • Automatic topology generation

This tutorial will show how the CHARMM scripts that accomplish these tasks work.

Preparation: The PDB that will be used along with this tutorial has the PDB.org ID: 1YJP. Uploading this PDB to CHARMMing will separate it into two parts: one containing the protein and the other containing the waters ("good" HETATM file). The user can also split the pdb manually. The files that are used in these forthcoming scripts can be found on this tutorial's website: http://www.charmmtutorial.org/.

First, the tutorial will cover dealing with the protein part of the PDB because waters must be treated differently. To start off the script, the title lines are needed.

* Title
*

The next step is to read in the files containing the residue topology (RTF) and parameters (PARAm). Either the user can supply their own topology and parameter files or use CHARMMs supplied files. Either way, the pathway to the files must be explicitly stated in the input script.

After the title, start off the line with "OPEN" and the next step is to assign a unit number to the "OPEN" command. This can be done by "UNIT" followed by a space and number. The unit number acts as a pointer to the file. This will be familiar to users who have done Fortran programming. Next, follow the unit number by "CARD" as this tells CHARMM that the file is to be interpreted as formatted ASCII text as opposed to binary data (in fact "FORM" or FORMatted can be used as a synonym for "CARD"). For example, when reading coordinates from a formatted .crd file, CHARMM knows that the X coordinates are in column 5. It is important to realize that columns are defined by character position instead of white space, and therefore it is imperative that files be formatted correctly. Tell CHARMM to read the file by using the command "READ NAME" followed by the path of the topology file. The whole line should now look like this:

OPEN UNIT 1 CARD READ NAME /charmm/c34b2/toppar/top_all27_prot_na.rtf

Now that CHARMM has opened a file and assigned a pointer to it (UNIT 1), it is time to to read the residue topology information. Use the "READ" command with the RTF keyword to tell CHARMM to read data from the file. The "CARD" keyword and "UNIT" number should again be provided. The complete command should be:

READ RTF CARD UNIT 1

CHARMM now reads the data in from the file, and will print out the title of the file it read (assuming there is a title) onto standard output. Since a new file was recently opened, it is now time to close the file and the unit number can be reused. To do this, use a "CLOSE UNIT 1" statement. The RTF importation should now look like this:

*CHARMM Tutorial
*

!Read in Topology and Parameter files
OPEN UNIT 1 CARD READ NAME /charmm/c32b2/toppar/top_all27_prot_na.rtf
READ RTF CARD UNIT 1
CLOSE UNIT 1

(Note: An exclamation mark (!) tells CHARMM to disregard any other character to its right. Exclamation marks are used to indicate a comment)

For the parameter file, do the same thing except change RTF to PARA and reference the correct parameter file. The line should now read:

OPEN UNIT 1 CARD READ NAME /charmm/c34b2/toppar/par_all27_prot_na.prm

READ PARA CARD UNIT 1

Once the parameter and topology files have been read in by CHARMM, the coordinate file must be read in. PDB files (typical starting point) contain the sequence of residues (the amino acids making up the protein) as well as the coordinates of the heavy atoms (the coordinates of the hydrogens are generally not available). Opening the file follows the same procedures as the RTF and Parameter files; however, reading in the files from a .pdb is slightly different. The PDB must be read twice, the first time for the sequence and the second time for the coordinates. To do the first half of this, use the "SEQU" (short for sequence) after the "READ" command. CHARMM has no way of knowing the user is reading in a PDB file without the user explicitly saying so; therefore, follow "SEQU" with "PDB". The line should now look like this:

READ SEQU PDB UNIT 1

Now that the sequence has been read in the protein structure file (PSF) has to be generated. The "GENErate" command is used in this process as it builds topology information and the constructs the internal coordinate (IC) table. There must be a label (segment ID) CHARMM can associate with this protein and that follows the "GENErate" command. The "SETUp" command creates the IC table. The line should now look like this:

GENErate A SETUp

Once the sequence is read in and the internal coordinate table is generated, it is necessary to move the file pointer back to the beginning of the file so that the coordinates can be read "from the top." This can be done by closing and reopening the file or alternatively with the rewind command:

REWInd UNIT 1

With the file rewound the coordinates can be read by telling CHARMM to read PDB formatted Cartesian coordinates:

READ COOR PDB UNIT 1

After a script is done reading a file, it should be closed. Closing a PDB file follows the same procedure as the topology and parameter files.

The script should now resemble:

*CHARMM Tutorial
*

!Read in RTF and Parameter files
OPEN UNIT 1 CARD READ NAME /charmm/c32b2/toppar/top_all27_prot_na.rtf
READ RTF CARD UNIT 1
CLOSE UNIT 1

OPEN UNIT 1 CARD READ NAME /charmm/c32b2/toppar/par_all27_prot_na.prm
READ PARA CARD UNIT 1
CLOSE UNIT 1


!Read in the pdb
OPEN UNIT 1 CARD READ NAME /path/to/pdbfile.pdb
READ SEQU PDB CARD UNIT 1

GENErate A SETUp

REWInd UNIT 1

READ COOR PDB UNIT 1
CLOSE UNIT 1

Now that CHARMM knows about the Cartesian coordinates, the internal coordinate table, which was generated above, can be filled in with useful information. The "IC FILL" command fills the internal coordinates based on the information from the Cartesian coordinates. These, however, may not be sufficient to construct all of the internal coordinates so the "IC PARAM" command should be used to fill in information based on the parameters. Now unknown Cartesian coordinates for non-hydrogens can be built from the internal coordinates by a "IC BUILD" command. The commands look like the following:

!Involves known atomic coordinates and fills in missing coordinates
IC FILL


!Will retrieve standard coordinates from the parameter file
IC PARAmeters

!Will find the Cartesian coordinates previously unknown
IC BUILd

To be sure that all the hydrogens (including waters) have coordinates, use the "HBUIld" command. Usually, coordinates for the hydrogens in the system will not be provided in the PDB and parameter files. The "HBUIld" is the "quality control" system that can be used to build these coordinates.

Any undetermined coordinates will be written out as 9999.0. These can be displayed with:

PRINt COORdinate SELEct .NOT. INITialized END

After the "IC" and "HBUIld" commands have been called, the newly-generated PSF and coordinates can be written into files. Coordinates can be written out as PDB files or as CHARMM formatted coordinate (CRD) files. First, the files must be opened using the "OPEN" command. Next, a "WRITe" command will tell CHARMM the intent of the user. Since a file has been opened, it must have an assigned pointer by a unit number, which works the same way as for a read command. The card format is also specified during the open, just as for a read and work in exactly the same way. After the write command is initiated, include "CARD". Following "CARD", and just like reading in a file, a "NAME" command is used followed by the path to where the user wants the PSF file to be saved.

After the open command, use the "WRITe" command followed by "PSF" then "CARD" and finally, point to "UNIT 1". The open and write lines for the Protein Structure File should now look like this:

OPEN UNIT 1 WRITe CARD NAME /path/to/new/file.psf

WRITe PSF CARD UNIT 1

Immediately after the write command a title for the newly written file should be given in the usual format. All together, the write statement for a PSF file should resemble the following:

OPEN UNIT 1 WRITe CARD NAME /path/to/new/file.psf WRITe PSF CARD UNIT 1

Immediately after the write command a title for the newly written file should be given in the usual format. All together, the write statement for a PSF file should resemble the following:

OPEN UNIT 1 WRITe CARD NAME /PATH/TO/NEW/FILE.PSF
WRITe PSF CARD UNIT 1
*PSF File
*
CLOSE UNIT 1

Terminate the script with a simple a simple "STOP".

The input file should now look like the following:

*CHARMM Tutorial
*

!Read in RTF and Parameter files
OPEN UNIT 1 CARD READ NAME /charmm/c32b2/toppar/top_all27_prot_na.rtf
READ RTF CARD UNIT 1
CLOSE UNIT 1

OPEN UNIT 1 CARD READ NAME /charmm/c32b2/toppar/par_all27_prot_na.prm
READ PARA CARD UNIT 1
CLOSE UNIT 1

!Read in the pdb
OPEN UNIT 1 CARD READ NAME /path/to/pdbfile.pdb
READ SEQU PDB CARD UNIT 1

!Time for CHARMM to make the Internal Coordinates table
GENErate A SETUp

REWInd UNIT 1
READ COOR PDB UNIT 1
CLOSE UNIT 1

!Involves known atomic coordinates and fills in missing coordinates
IC FILL
!Will retrieve standard coordinates from the parameter file
IC PARAmeters
!Will find the Cartesian coordinates previously unknown
IC BUILd

OPEN UNIT 1 WRITe CARD NAME /path/to/new/protein_file.psf
WRITe PSF CARD UNIT 1
*PSF File
*
CLOSE UNIT 1
OPEN UNIT 1 WRITe CARD NAME /path/to/new/protein_file.pdb
WRITe COOR PDB UNIT 1
*PDB with Coordinates
*
CLOSE UNIT 1

OPEN UNIT 1 WRITe CARD NAME /path/to/new/protein_file.crd
WRITe COOR CARD UNIT 1
*CRD File
*
CLOSe UNIT 1
STOP

Adding the waters to the system: The 1YJP PDB contains two segments, one containing the atoms that are actually part of the protein and the other containing embedded waters. Biological systems (e.g. proteins) generally are occur in solution and water molecules often become embedded in pockets and are subsequently picked up during NMR and X-Ray crystallography. A file called 1yjp_waters.pdb containing the positions of the water molecules has been created and can be downloaded from http://www.charmmtutorial.org. A new input script will be made to read these in. The title, topology, and parameter files should all be read in as done above; however, this time read in the PDB file containing the waters rather than the protein PDB.

Because the water PDB file is now being dealt with, there are no bond angles to deal with explicitly (they are implicit within the water model), and no dihedral angles at all (as H2O has only three atoms). Therefore, replace the protein input scripts "GENErate" command with:

GENErate A NOANgle NODIhedral SETUp

An additional modification to the read command must be made. It should be changed to READ COOR PDB UNIT 1 since CHARMM will want to read in the coordinates straight from the PDB file. If this is not done, CHARMM will try to generate its own coordinates as though the water was bonded to one another, which would then result in CHARMM creating undefined coordinates. A PRINt COOR statement should also be added for the users benefit to make sure everything went well.

Also, the "IC" commands can be removed, but leave "HBUIld" as the water model only contains the position of the Oxygen atom for each water, requiring that the hydrogen positions still be explicitly built. The write commands should be similar to the ones above but rename the output files so they are not overwritten.

The CHARMM water input file should resemble the following:

* Run Segment Through CHARMM
*
bomlev -2

! Read in Topology and  Parameter files
OPEN UNIT 1 CARD READ name top_all27_prot_na.rtf
READ RTF CARD UNIT 1
CLOSe UNIT 1

OPEN UNIT 1 CARD READ NAME par_all27_prot_na.prm
READ PARA CARD UNIT 1
CLOSe UNIT 1

!Read sequence from the PDB coordinate file
OPEN UNIT 1 CARD READ NAME 1yjp-a-goodhet.pdb
READ SEQU PDB UNIT 1
CLOSe UNIT 1

!now generate the PSF and also the IC table (SETU keyword)
GENErate A NOANg NODIhedral SETUp

!Read protein coord from the PDB coordinate file
OPEN UNIT 1 CARD READ NAME 1yjp-a-goodhet.pdb
READ COOR PDB UNIT 1
CLOSe UNIT 1

HBUILd SELEct ALL END

!write out the protein structure file (psf) and
!the coordinate file in pdb and crd format.

OPEN UNIT 1 CARD WRITe NAME 1yjp_waters.pdb
WRITe COOR PDB UNIT 1
* PDB Coords
*

OPEN UNIT 1 CARD WRITe NAME filename.crd
WRITe COOR CARD UNIT 1
* Coords
*

OPEN UNIT 1 CARD WRITe NAME filename.psf
WRITe PSF CARD UNIT 1
* PSF
*

STOP

Now a PSF has been generated for both the waters and the protein and it is necessary to construct an input file appending the two PSFs together to make one unified system. The proteins PSF and CRD files should be read in and then the water PSF and CRD should be read in. When reading in the water PSF, the "APPEnd" keyword should be used. This will append the water PSF to the protein PSF. For the coordinate read command use:

READ COOR IGNOre UNIT 1 SELEct RESN TIP3 END

The IGNOre keyword turns off CHARMMs checking feature which would normally throw warnings since the PSF and CRD files for the waters and the protein do not match. Next, something must be selected using the SELEct command. Since all of the residue names are TIP3 in the water file, the RESName to select can be used. Finish off the script with commands to write out the new unified PSF, PDB, and CRD files.

The append script should resemble the following:

* Append the PDBs
*
bomlev -2

! Read in Topology and  Parameter files
OPEN UNIT 1 CARD READ NAME top_all27_prot_na.rtf
READ RTF CARD UNIT 1
CLOSe UNIT 1

OPEN UNIT 1 CARD READ NAME par_all27_prot_na.prm
READ PARA CARD UNIT 1
CLOSe UNIT 1

! Read PSF from file
OPEN UNIT 1 FORM READ NAME 1yjp.psf
READ PSF CARD UNIT 1
CLOSe UNIT 1

OPEN UNIT 1 FORM READ NAME 1yjp_waters.psf
READ PSF CARD APPEnd UNIT 1
CLOSe UNIT 1

! Read protein coord from the coordinate file
OPEN UNIT 1 CARD READ NAME 1yjp.crd
READ COOR CARD UNIT 1 RESID
CLOSE UNIT 1

OPEN UNIT 1 CARD READ NAME 1yjp_waters.crd
READ COOR RESId IGNOre UNIT 1 SELEct SEGId A .AND. RESN TIP3 END
CLOSE UNIT 1

! redo hydrogen coordinates for the complete structure
COOR INIT SELE HYDRogen END
HBUILd

ENERgy

OPEN UNIT 1 CARD WRITe NAME 1yjp-full.psf
WRITe PSF CARD UNIT 1
* PSF
*

OPEN UNIT 1 CARD WRITe NAME 1yjp-full.pdb
WRITe COOR PDB UNIT 1
* Coords
*

OPEN UNIT 1 CARD WRITe NAME 1yjp-full.crd
WRITe COOR CARD UNIT 1
* Coords
*

stop