# The Energy Function

## A brief introduction to force fields

This section of the tutorial deals with how CHARMM calculates the potential energy of a molecular system and its first derivative (the force or gradient). CHARMM supports methods for evaluating molecular systems at varying levels of theory; this section introduces the purely classical CHARMM force field (which is distinct from the program itself). CHARMM the program includes support for several other force fields as well as mechanisms to calculate *ab initio* and semiempirical quantum mechanical energies and for coarse-grained modeling. These topics are beyond the scope of this tutorial.

In computational chemistry, a force field consists of two things: (1) a functional form defining how to compute the energies and forces on each particle of the system, and (2) a set of parameters that define how positional relationships between atoms determine their energy (e.g. how the bonded potential energy changes as a bond stretches or contracts, that is, as the distance between two bonded atoms increases or decreases).

A full treatment of force fields and their development is beyond the scope of this tutorial. Interested readers are encouraged to consult the Wikipedia entry on force fields and the molecular modeling references given in the introduction (particularly chapter 2 of Becker *et al.*).

## Extended background on calculating energies and forces with CHARMM

### Energy calculation

#### The force field terms

The molecular mechanics force field used by CHARMM and similar computer programs is a simplified, parametrized representation of reality. Interactions between chemically bonded nearest neighbors are handled by special terms, the so-called bonded energy terms: e.g., bond stretching, angle bending, dihedral and improper dihedral energy terms. Interactions beyond (chemically bonded) nearest neighbors are represented by the two so-called non-bonded energy terms (Lennard-Jones (LJ) and Coulomb interactions). Very early versions of the force-field contained a special term for hydrogen bonds; the code is still in CHARMM, but it is, in general, not used anymore.

Thus, the energy function of the latest CHARMM force fields (CHARMM27, CHARMM36, CGenFF, etc.) has the following form (J. Phys. Chem. B, 1998, 102, 3586; J. Comput. Chem., 2004, 25, 1400; J. Comput. Chem., 2000, 21, 86, ibid. 105ff):

where consists of the following terms,

with

,

,

,

,

, and

consists of two terms,

with

,

.

The values of the naught terms above (e.g. in , in etc.), the various force constants (, etc.), as well as partial charges and LJ parameters (, ) are taken from the force field parameters. The atomic LJ parameters , are combined with the appropriate mixing rules for a pair of atoms i, j. In the current CHARMM force fields, these are (geometric mean) and (arithmetic mean). Note that other force fields may use other mixing rules, such as the geometric instead of the arithmetic mean for

Both non-bonded terms are normally modulated by a shifting or switching function (however, periodic boundary conditions may be used in place of this modulation, particularly for electrostatics), see below.

The above terms are the standard terms of molecular mechanics force fields just described, with two exceptions that require some explanation. The first is the so-called "Urey-Bradley" () term in addition to the standard bond stretching and angle bending terms . The Urey-Bradley term is a harmonic term in the distance between atoms 1 and 3 of (some) of the angle terms and was introduced on a case by case basis during the final optimization of vibrational spectra. This term turned out to be important "for the in-plane deformations as well as separating symmetric and asymmetric bond stretching modes (e.g., in aliphatic molecules)" (J. Phys. Chem. B 1998, 102, 3586).

A more recent addition to the CHARMM force field is the CMAP procedure to treat conformational properties of protein backbones (J. Comput. Chem. 2004, 25, 1400). The so-call CMAP term is a cross-term for the (backbone dihedral angle) values, realized by grid based energy correction maps. It can, in principle, be applied to any pair of dihedral angles, but is used in the current CHARMM force field to improve the conformational properties of protein backbones. Several programs claiming to be able to use the CHARMM force field(s) lack support for this term, although it is being incorporated into an increasing number of programs.

We note in passing that CHARMM can compute energies and forces with several non-CHARMM forcefields, including converted versions of the AMBER and OPLS-AA forcefield, but also more generic force fields, such as the Merck force field. [**TODO: add refs**] Use of these force fields is beyond the scope of this tutorial.

#### Calculation of non-bonded energies: cut-offs, shifting, switching etc.

The most time consuming part of an energy or force computation is the evaluation of the LJ and electrostatic interactions between (almost) all pairs of particles.

While straightforward in principle (and in practice energy/force calculation in CHARMM is as simple as

ENERgy

issuing a single command), numerous options influence the calculation of non-bonded interactions, and so it is very important that you understand their meaning and have the necessary background to understand the rationale for them in the first place.

To compute the interaction of N atoms with each other, one needs in principle NN steps. Bonded interactions only involve next neighbors, so they are cheap in terms of computer time. For the non-bonded energy (LJ, electrostatic) it is customary to introduce a cut-off beyond which interactions are ignored. This is a reasonable approximation for the van der Waals (= LJ) interactions, which decay rapidly for large distances, but a bad one for electrostatic interactions which go to zero as . For periodic systems (the typical situation found when simulating solvated proteins), the so-called Ewald summation method (usually employed in its fast incarnation, particle-mesh-Ewald (PME)) circumvents this particular difficulty; nevertheless, cut-offs are ever present in molecular mechanics energy/force calculations and their implications need to be understood.

**Non-bonded exclusions:** Before continuing on the subject of cut-offs, we need to introduce the concept of *non-bonded exclusions*. To avoid computing interactions between chemically bonded atoms (1-2, 1-3 and, possibly, 1-4 neighbors) with multiple different force field terms (specific bonded terms, as well as the "unspecific" LJ and electrostatic term), these pairs are excluded from the non-bonded energy calculation. When using the CHARMM force field, one never calculates non-bonded interactions between 1-2 and 1-3 neighbors. LJ and electrostatic interactions between 1-4 neighbors are calculated, but can be modified compared to more distant pairs (different LJ interactions, scaling of electrostatic interactions). The details depend on the specific force field used. E.g., in the older polar hydrogen force field ("param19"), electrostatic interactions and forces between 1-4 neighbors were scaled by 0.4, whereas in the current all-atom force field(s) ("param22", "param27"), electrostatic energies and forces are used as is. This scaling factor is controlled by one of the many options to the `ENERgy` command, `E14Fac`. Thus, for calculations with the old, polar-hydrogen "param19" force field, `E14Fac` should be set to 0.4; for calculations with the current "param22/27" force field it should be set to 1.0. **IMPORTANT HINT:** *E14Fac is an example of a user settable option which should never be set by a normal user*. When you read in the respective parameter file, the correct default value is automatically set. In fact, many of the options to the

`ENERgy`command are of this type --- you should understand these, but, again, you should never change them (unless you are an expert and develop a new force field etc. in which case you won't need this tutorial!). We'll return to the issue of specifying all possible energy options as opposed to relying on defaults later on; unfortunately, this is less clear-cut than it ought to be.

**Back to cut-offs:** The CHARMM keyword to set the cut-off for electrostatic and LJ interactions is `CTOFnb`. Thus, the statement

ENERgy CTOFnb 12.0 ! it's not recommended to have ! a blanket cut-off, switching ! or shifting should be used

discards LJ and electrostatic interactions between all pairs of particles that are greater than 12.0 angstroms apart (of course any non-bonded exclusions are always honored!). The straightforward application of a cut-off criterion as in the above (bad) example introduces a discontinuity that can adversely affect the stability of a molecular dynamics calculation. Consider a pair of atoms that are separated by approximately the cut-off distance. At one time-step they interact and contribute to the energy and forces in the system. Then they move relative to each other by a tiny amount and suddenly they are more than the cut-off distance apart. Although very little has changed, they now do not contribute to the energy and forces at all.

To avoid these abrupt changes, the LJ and electrostatic energy functions in CHARMM are (always) modulated by *switching* or *shifting* functions (it actually takes some tricks to coax CHARMM into doing, effectively, a plain cut-off for you!). For the full details, in particular the detailed functional forms of switching and shifting functions, see J. Comput. Chem. 1983, 4, 187; Proteins 1989, 6, 32; J. Comput. Chem. 1994, 15, 667. The effect of a switching function on a LJ interaction is illustrated by the following figure.

At `CTOFnb` 12., the interaction (and its derivative, i.e., the forces) has to be zero as shown. Up to an *inner* cut-off (`CTONnb`), interactions between the two particles are normal. Between `CTONnb` and `CTOFnb`, the normal interaction is switched off (hence the name) by a sigmoidal function. At distances greater than `CTOFnb`, interactions and forces are zero. The effect of a switching function is shown for two different values of the inner cut-off, 8 angstroms (red) and 10 angstroms (green).

A *shifting function* also modulates the normal interaction between two particles. The design goal is that the original interaction is modified so that it approaches zero continuously at the cut-off distance (`CTOFnb`); in addition, forces are required to vanish continuously as well at the cut-off distance. Visually, it looks as if the original interaction were "shifted" so that it became zero at the cut-off distance. *The actual function is more complicated because of the requirement of continuity of potential and forces at the cut-off distance!!* With shifting functions, only `CTOFnb` is meaningful (as the cut-off radius), whereas an inner cut-off (`CTONnb`) is ignored. Switching and shifting can be activated separately for LJ and electrostatic interactions; i.e., the keywords `VSHIft` and `VSWItch` control whether a shifting or switching function is applied to LJ interactions; similarly, `SHIFt` and `SWITch` choose between shifting and switching function for electrostatic interactions. To illustrate this by a realistic example, let's take a look at the options that correspond to the default force field choices for the current CHARMM all-atom force field (param22/27)

ENERgy SHIFt VSWItch CTOFnb 12. CTONnb 10.

The line indicates that electrostatic options are shifted to zero using a cut-off value of 12 angstroms (`CTOFnb`) and that LJ interactions are modulated by a switching function between 10 (`CTONnb`) and 12 angstroms (`CTOFnb`). Any interactions beyond 12 angstroms are discarded. Note that the `CTONnb` parameter is ignored for the shifted electrostatic interactions.

**Beyond SHIFt/SWITch:** In molecular dynamics, the determining factor are the forces, not the potential energy. Thus, it has been suggested to apply shifting/switching functions to the forces, rather than the potential. Such methods are available in CHARMM as well, driven by keywords `VFSWitch` (for LJ) and `FSWItch`/`FSHIFt` (for electrostatics). The `FSHIFt` option is usually the default in c27 and later (non-protein) force fields. For the gory details see nbonds.doc of the CHARMM documentation. The recently introduced, so-called Isotropic periodic Sum (IPS) method provides yet another approach to handle cut-offs in a smooth, continuous fashion. However, all these options go beyond the scope of this tutorial.

**GROUp vs. ATOM:** A cut-off criterion can actually be applied in two ways. The default is to apply it on an atom by atom basis (keywords `ATOM`, `VATOm`, the option with V in front pertains to the van der Waals (=LJ) energy computation, that without to the electrostatic energy computation). In addition, there is the possibility of a group based cut-off (keywords GROUp, VGROup). In the CHARMM force field topologies, the partial charges of molecules / residues are grouped into small clusters that are usually neutral, or, for charged residues, carry a net charge of +/-1, +/-2 etc. When GROUp/VGROup is set CHARMM first computes the center of geometry for each of these charge groups. If the distance between the centers of geometry for a pair of groups are separated by more than the cut-off distance (`CTOFnb`), then no interaction between all atoms of the two groups is computed. By contrast, in the atom by atom case, some pairs of atoms in different groups might be inside and others might be outside the cut-off distance. Thus, in general the two options give different results. (Note: you cannot mix group and atom based approaches; i.e., the combinations ATOM/VGRO and GROU/VATO are illegal!) There are some theoretical arguments that would indicate that the group by group based cut-off criterion ought to be preferred. This is particularly the case if all groups have an overall charge of zero, which is the case, e.g., for water. Contrary to this, **it has been shown for the CHARMM parameters that the group by group based cut-off criterion gives inferior results**; in addition, it is much slower, explaining
that ATOM/VATOm is the CHARMM default.

**Some remaining options:** Having said this, we can take a look at the full default options of the current all-atom force field

ENERgy NBXMod 5 ATOM CDIEl SHIFt VATOm VDIStance VSWItch - CUTNb 14.0 CTOFnb 12.0 CTONnb 10.0 EPS 1.0 E14Fac 1.0 WMIN 1.5

The strings which you don't know yet are `NBXMod 5`, `CDIEl`, `CUTNB 14.0`, `EPS 1.0`, and `WMIN 1.5`. Of these, the only important one is `CUTNB` since this is a value you may indeed want to change. It is the cut-off radius used to generate the non-bonded pair list and is discussed in detail in the next section. `NBXMod 5` controls the handling of non-bonded exclusions (cf. above), and setting it to 5 means skipping 1-2, 1-3 interactions and treating 1-4 interactions specially. The `WMIN` distance of 1.5 is used to print out interaction pairs that are closer than 1.5 Å. At shorter distances, very high LJ repulsion would result, and it's useful to be warned since such pairs could cause problems in MD simulations. `CDIEl` and `EPS` are two related options, which nowadays are a left-over from a distant past and should be left alone. A brief explanation: The solvent (water) is a very important determinant when studying the properties of biomolecules. When computers were slower, one often tried to avoid simulations in explicit solvent, and the two parameters can be used to mimic the influence of solvent. Water has a high dielectric constant (DC), which screens electrostatic interactions. To mimic the presence of water in gas phase simulations, CHARMM allows the user to change the value of the DC () by the `EPS` keyword (default value is 1), i.e., the electrostatic energy is computed according to . The `CDIEl` / `RDIEl` option is another attempt to mimic water in gas phase simulations. `CDIE` stands for constant (uniform) dielectric, `RDIE` means a distance-dependent dielectric, , where on the right hand side means the number set by `EPS`. Effectively, when you use `RDIE`, you compute a modified electrostatic energy according to . *Nowadays when using explicit solvent, you should always use CDIE and leave EPS set to 1 (i.e., leave them at the default values!).* Should the use of explicit solvent be too expensive computationally, CHARMM nowadays offers several

*implicit solvent*models.

**A final remark on ENERgy options:** So, what non-bonded options should you set? In fact, if you read in the default param22/27 parameter file (e.g, par_all27_prot_na.prm), the options just discussed get set for you automatically. At first glance, there is no need to set any of these options explicitly; unfortunately, however, this turns out not to be true. This does not mean that you should modify the default options "just for fun". Remember, changing some values (e.g. CDIE or EPS, or replacing (V)ATOM by (V)GROUP , or changing E14Fac) is outright dangerous/false. You may want, e.g., to use some of the force based shifting/switching methods, but you should be experienced enough to understand thoroughly what you are doing and why you are doing it since the parameters were developed with the options shown above. In practice, what you'll want to change/add is to request the use of Ewald summation (PME) for solvated systems (see below); some PME related options need to be adapted depending on your system size. Similarly, for performance reasons you may want to choose a different non-bonded list cutoff `CUTNB` (but we need to understand more about non-bonded lists first). Normally, you would not want to change the cut-off radii `CTOFnb`, `CTONnb`, since they are part of the parameterization. *Unfortunately, this is where it becomes tricky:* Suppose you work with the param22/27 parameters, and, thus, correct default non-bonded options have been set. You decide (for reasons that will become clear in the next subsection) to increase CUTNB from 14 to 16 Å. Thus, you specify

ENERgy CUTNB 16. ! WARNING: may not do what you want!

and assume that everything else is left alone. Unfortunately, for the above CHARMM command another default mechanism kicks in. If you change `CUTNB`, but do not set `CTOFnb` / `CTONnb` explicitly, the latter get changed according to CTOFnb = CUTNB - 2. and CTONnb = CTOFnb -2., hence you would suddenly be calculating with CUTNB 16. CTOFnb 14. CTONnb 12., which likely is not what you had in mind. It is, therefore, a bit dangerous to rely on defaults set by the parameter file. Although unsatisfactory, we therefore recommend to set the non-bonded options *explicitly* before doing any real work. The respective energy line should be identical to the defaults set in the parameter file, with the exception of individual parameters you might want to change, such as a modified `CUTNB` or replacing `SHIFt` by `FSHIft`.

In addition, there is one case were you *have* to change `CTOFnb` (and hence `CTONnb`): If you simulate small periodic systems, the minimum image criterion dictates that the cut-off radius must be smaller or equal to half the box-length (cubic PBC). For `CTOFnb 12`., this means that your cubic periodic box should have a side length of at least 24 Ang. Thus, for smaller systems, `CTOFnb` (and, hence, `CTONnb`) have to be reduced. Interestingly, some known workers in the field (notably W. van Gunsteren and his group) find this such a bad choice (viz. reducing the cut-off radius below the force field default), so that the smallest boxes they use always have side lengths of twice the default cut-off radius of their force field. (Again, for the CHARMM all atom force field this would mean no boxes smaller than 24 Å!)

### Non-bonded lists

#### Background on non-bonded lists

In molecular mechanical energy/force calculations a cut-off criterion is normally used to limit the number of non-bonded interactions. Energy minimizations and especially molecular dynamics simulations require hundreds and thousands (if not millions) of energy/force calculations. To avoid checking for excluded pairs (1-2, 1-3 and, possibly, 1-4 neighbors) and, much more importantly, the cut-off criterion during every single energy calculation, lists are generated that contain all pairs for which the non-bonded interactions really have to be computed. This is an unnecessary overhead for a single calculation, but saves a lot of time during repeated computations.

In fact, any energy calculation in CHARMM consists really of two steps: (1) Calculation of the non-bonded list, (2) calculation of energy and forces using this non-bonded list. The time saving in repeated energy/force calculations (e.g., minimization, MD) comes from the fact that the non-bonded list is not generated for every energy calculation, but instead the list is kept for several consecutive steps (energy calculations). Now suppose the non-bonded list were generated with the cut-off radius used in the energy calculation (`CTOFnb`). The forces calculated in the first energy calculation are used to generate new particle positions (regardless whether we are doing minimization or MD). Because of the movement of the atoms, the original non-bonded list may not be correct anymore, i.e., it may contain pairs that are separated by more than `CTOFnb` and, worse, there may now be pairs less than CTOFnb apart for which no interactions are computed since they are not part of the list. For this reason, the non-bonded list should always be generated with a cut-off criterion/radius larger than `CTOFnb`, and for this reason CHARMM provides the separate CUTNB parameter used exclusively in list generation. With CUTNB > CTOFnb (by at least 1-2 Å), we can be certain that the non-bonded list is valid at least for a certain number of steps. This now explains the hierarchy of cut-off options in the param22/27 ENERgy example above, where we had

ENERgy ... CUTNB 14. CTOFnb 12. CTONnb 10.

These are the recommended cut-off values for the standard CHARMM force field! `CUTNB` controls the list generation, whereas `CTOFnb` and `CTONnb` are used for the energy calculation as described above (`CTONnb` only for those interactions modulated by a switching function). Bear in mind that when periodic boundary conditions (PBC) are used for electrostatics, `CTONnb` and `CTOFnb` only affected the van der Waals energy calculation. Graphically, the relationship between the different cut-offs can be illustrated as in the following scheme (where `CUTNB 16.` instead of 14. was chosen):

The difference between `CUTNB` and `CTOFnb`, sometimes referred to as "skin", governs the frequency with which the non-bonded list needs to be updated. Provided the update frequency is appropriate, the choice of `CUTNB` is a true choice for the user, which should not affect the results, but which may affect performance. A larger skin means more work for a single non-bond list generation, but the possibility of a larger update frequency. The respective performance from smaller / larger skins depend on the computer hardware, the system size and whether the computation is carried out in parallel or not. No general rule can be given, but frequently larger skins (e.g., 4 Å) perform somewhat better than the default skin size of 2 Å; for large scale production calculations it definitely is worth the time to experiment and benchmark to find an optimal value for `CUTNB`.

A natural question that arises from this discussion is how often the non-bonded list is updated. CHARMM allows the user to specify an interval at which the list is updated, or the program can update the list automatically as needed. If the former option is chosen, the keyword `INBFrq` is used, and `INBFrq n`, with n a positive integer tells CHARMM to update the non-bonded list every n energy calculations. The correct choice of n is left to the user and depends on the size of the skin (cf. above). Alternatively, `INBFrq -1` (or any negative integer) tells CHARMM to watch the particle positions and update the non-bonded list automatically. The algorithm is guaranteed to err on the side of safety, so with `INBFrq -1` one should always have a correct non-bonded list. For quite some time, `INBFrq -1` (= heuristic update) has been the default (the old default of +50 was horribly inadequate and almost guaranteed wrong results!); nevertheless, this is one parameter that there is no harm in setting explicitly.

Finally, as is so often the case, CHARMM offers more than one algorithm to construct non-bonded lists; all of them, however, deliver a list in the same format (so for the energy routines it is irrelevant how the non-bonded list was generated!). Only two will be mentioned here. The old, default method is labeled `BYGRoup`, and `BYGR` is the (optional) keyword that chooses this method (again, it's the default). When performance is needed (large systems, running on a parallel machine), then one should use `BYCB`. It supports normal MD of solvated systems under periodic boundary conditions, but cannot be used for some more advanced usage scenarios. Also, BYGR can generate non-bonded lists suitable for group and atom based cut-offs; BYCB only supports atom based cutoffs.

### What the ENERgy command really does -- or how to break ENERgy into its individual steps

From the above section, it should be clear that the `ENERgy` command is actually doing more than computing the energy since it also takes care of computing the non-bonded list. In fact, it is relatively smart and computes a non-bonded list if none exists (or if it is deemed outdated), but uses an up to date non-bonded list, should one be available. Thus, the `ENERgy` command is actually encapsulates a lot of the core functionality of CHARMM. Not only does invoking it cause CHARMM to calculate the total energy of the system and the forces acting on each atom, but it also makes CHARMM take care of any necessary bookkeeping work such as regenerating the non-bond and image atom lists (we'll talk more about image atoms below).

If you look again at the options we have discussed so far, then they fall into two groups, (1) those that control details of the energy calculation (e.g., `CTOFnb`), and (2) those thar control the non-bonded list generation (`CUTNB`, `INBFRQ`, `BYGR`/`BYCB`). If you go beyond a single point energy calculation (minimization, MD), then you have a third class of options controlling details of the minimization or MD.

The various steps hiding underneath the ENERgy command can actually be broken up; we show this for pedagogical purposes; also, the availability of the `NBONds`, `UPDAte` and `GETE` commands is occasionally useful in practice as well. `NBONds` parses and stores the energy related options, as well as list generation options. The `UPDAte` command generates the actual list (optionally, one could also here specify list generation and energy options). Finally, `GETEnergy` just computes the energy, the existence of a *valid* non-bonded list is assumed (otherwise your CHARMM job will crash). First, the standard way using ENERgy

! read RTF file ! read param file ! read PSF ! read coordinates ! compute energy of system, explicitly specifying the (default) ! options ENERgy NBXMod 5 ATOM CDIEl SHIFt VATOm VSWItch - CUTNb 14.0 CTOFnb 12.0 CTONnb 10.0 EPS 1.0 E14Fac 1.0 WMIN 1.5

Instead, the same can also be accomplished in steps:

! read RTF file ! read param file ! read PSF ! read coordinates ! parse non-bonded options NBONds NBXMod 5 ATOM CDIEl SHIFt VATOm VDIStance VSWItch - CTOFnb 12.0 CTONnb 10.0 EPS 1.0 E14Fac 1.0 WMIN 1.5 - INBFrq -1 CUTNb 14. ! list options ! do generate list UPDAte ! calculate "just" the energy GETE

As mentioned, the `MINImiziation` and (molecular) `DYNAmics` commands understand all options controlling non-bonded list generation and non-bonded energy / force calculation, as well as options specific to the individual task (minimization, MD). In particular for DYNA, the cumulative number of options (list generation, energy calculation, control of MD) can easily amount to 30-50. Such a command line can be become quickly very confusing. Given that list generation and energy options are remembered, it is therefore recommended to split the setting of options as shown below:

... ! PSF and full set of coordinates have been read ! optionally, periodic boundary conditions (see below) have been set up ENERgy <list options> <energy options> DYNA <only dynamics specific options>

Instead of ENERgy, one could also use NBONds/UPDAte.

### Periodic boundary conditions (PBC): IMAGe/CRYStal

#### The textbook view of PBCs

We trust that you have read some background material on molecular mechanics and molecular dynamics simulations, and, thus, expect you to have some familiarity with periodic boundary conditions (PBC). The following is intended as a refresher with emphasis on pointing out where things may not be so obvious for macromolecular systems.

Even with today's computers, the system sizes we can handle are microscopically small. Thus, without any special precautions we would be simulating "infinitesimal droplets", the properties of which would be dominated by surface effects. It is standard practice to work around this by employing periodic boundary conditions (PBC), i.e., by assuming that the central simulation system is surrounded in all spatial directions by exact copies (this assumes, of course, that you have a space-filling geometry; obviously, you can't have PBC for a spherical system. Anticipating CHARMM terminology, we refer to these surrounding boxes as images (image boxes). Unless otherwise stated, we'll assume a cubic simulation box (note, though, that CHARMM can handle any valid crystallographic space group for PBC!)

The role of the surrounding boxes (images) is to replace particles that leave the central simulation box. Consider a minimization or MD simulation, when the positions were just updated. You know the basic picture: as a particle leaves the central (cubic) box, say, to the right, it is replaced by the corresponding particle from the image of the central box on the left. Thus, working with PBCs entails making sure that your particle coordinates always fall within the central simulation box. This can be accomplished with the following pseudo-code (which only handles periodic boundaries in the x direction) computer code along the lines (the center of the box is supposed to be at the origin, x is the x coordinate of the particle of interest, and xsize is the size of the periodic box!)

if (periodicx) then if (x < -xsize/2.0) x=x+xsize if (x >= xsize/2.0) x=x-xsize endif

So, for example, if we have a particle with a x position of 4 and xsize is 10, the particle would be within the periodic box, however, if the particle's x coordinate was 6, it would be outside of the box and it's x coordinate would be changed to -4 (effectively, when a particle leaves a box on one edge, it reappears on the opposite edge). Analogous code snippets apply to the y and z coordinate, and for a cube `xsize = ysize = zsize=L`

There is, however, a second, more important component to PBC. Consider the interaction (LJ and/or electrostatic) between two particles i and j. If the replicas of the central box were real, we would have interactions between i and j in the central box, but also of i with j in all of the (infinite) number of images of the central box. This is, of course, not what we want to have; instead, we choose the interaction between i and j having the shortest distance. This may be the interaction between i and j in the central box, but it may also be the interaction between i and a particular image atom of j. In pseudo-code this so-called *minimum image convention* can be expressed as

if (periodicx) then dx = x(j) - x(i) if (dx > xsize/2.0) dx = dx - xsize if (dx <= -xsize/2.0) dx = dx + xsize endif

where x(i) and x(j) are the x coordinates of i and j, respectively, and analogous statements for y and z. If you consider the above realization of the minimum image convention carefully, you'll note that the longest possible inter-particle distance is half the box-length, L/2. Thus, if you used a cut-off radius greater than L/2, your effective cut-off radius would still be L/2. As we shall see, CHARMM handles PBC rather differently, and for the minimum image convention to be obeyed the cutoff radius must not be larger than L/2, i.e., it is the user's responsibility to shorten CTOFnb if necessary!

Again, we assume that you have heard all this before. The problem with the above illustrative examples of PBC/minimum image convention is that the code assumes a collection of atoms. Consider, e.g., a box of waters or a box containing a protein surrounded by water. Suppose one water edges out of the box, i.e., the oxygen and one hydrogen are still within the central box, but the second hydrogen is out of the box. Then naively applying periodic boundary conditions will tear the molecule apart, i.e., one has to make sure that a water is shifted periodically only as a complete molecule. Similar considerations apply of course to the protein, or any molecule present in the system. Also, the PBC/minimum image checks have to be carried out for each energy calculation. For cubic boxes, this is not too bad, and by optimizing the above code snippets, the overhead is small. However, for more complex periodic boxes (truncated octahedron, rhombic dodecahedron), this may quickly become difficult to calculate, however CHARMM will handle this for you as long as you choose one of the periodic boxes that it supports (although you can define your own space group, this is not necessary for setting up basic dynamics simulations).

#### How CHARMM does PBCs

Thus, CHARMM does things differently than most textbooks (the notable exception being Rapaports' *Art of Molecular Dynamics Simulation*, where the general approach taken by CHARMM is outlined), and, actually, differently than most comparable programs. Here we describe the underlying algorithmic approach, whereas in the next subsection we describe how things work from the user perspective and what pitfalls one has to be aware of.

CHARMM takes periodic images "seriously", i.e., (a subset of) the atoms of the periodic images surrounding the central box are generated. The name of the routine that does this is MKIMAT (that's a subroutine name, not a CHARMM command!); keep it in mind, we'll have to comment on this routine somewhat more later on. First consequence: the actual number of atoms in your system with PBC is (much) larger than the number of atoms in your central box. This apparent "waste of memory" is put to good use for the sake of generality and also performance. Contrast this with the textbook approach, the image boxes are not really present, as an image atom "enters", a "real" atom "leaves/vanishes". The number of image atoms is kept as low as possible by two factors. First, one does not have to generate the atoms from all image cells. In 3D a central cubic box is surrounded by 26 images. Of these, only, those to the right, top, and the back of the central cubic system are needed along with those at the vertices and edges, reducing this number to 14. Second, for large boxes (box lengths of solvated protein systems are easily 60 - 100 Angstroms!), one does not need to replicate the full box, instead, one needs only those image atoms which are within the cutoff radius, or more correctly, the cutoff radius used to generate the non-bonded list(s).

A periodic simulation system in CHARMM consists of the central box (e.g., a cube), and a (partial) layer of image atoms with width `CUTNB` (or actually `CUTNB` + some safety distance). Now, one generates *two* non-bonded list, one for atoms in the central box ("primary atoms"), and a second one between primary atoms and image atoms. Non-bonded interactions are computed twice, once with the primary-primary list, and once with the primary-image list. In the CHARMM output you'll find the energies listed separately, i.e., the total LJ and electrostatic energy of a system is now the sum of two terms each. You'll probably have to draw some diagrams (preferably in 2D) to convince yourself that this works. There is one border line case, and that is the case of small boxes. Here you have to ensure that the cutoff radius for the energy calculation (CTOFnb) is less than half the box length. As long as this is true, even if entries for two particles (call them i and j) exist in *both'* the primary-primary and primary secondary lists, only one of the two distances can be lower than half of the box size. If, on the other hand, `CTOFNB` is greater than half the box length then both distances could be lower than half the box length and the interaction energy will be double-counted **(BTM: check this with SB or Rick!!)**. By choosing a short enough cut-off, you ensure that the minimum image convention is implicitly obeyed. *Unfortunately*, CHARMM provides no warning if you overlook such a case, and this is one of the pitfalls lurking when using PBC in CHARMM. More in the next subsection. In general, it is simply best to avoid small boxes, since reducing `CUTNB` brings its own set of problems.

#### Use of CRYStal / IMAGe in its simplest form

Before dwelling on pitfalls, let's look at the practical aspects of setting up PBC. The user interface for setting up PBC in CHARMM is provided by two modules, `IMAGe` and `CRYStal`. `IMAGE` and `CRYSTAL` provide similar capabilities and complement each other. One may also view `CRYSTAL` as an interface to make `IMAGE` more user-friendly, and this is the way it is usually employed nowadays.

Before trying the following snippets, You should have read in RTF, parameters, PSF and coordinates for your system. Also, assuming a protein / water system, we assume that you have DEFIned two atom selections, *protein* containing all your protein atoms, and *water*, containing all your water molecules (which are often simulated in CHARMM via the TIP3 water model). Finally, let's assume that you have a box length of 60 Ang., and you are using / planning to use the default cutoff parameters (CUTNb 14. CTOFnb 12. CTONnb 10., which are our recommended cut-off parameters, although it does no harm to increase CUTNB to 16.0 as we do in some of our scripts, so long as the other parameters are given explicitly) Then, the following four lines set up PBC for a cubic box:

CRYSTal DEFIne CUBIc 60. 60. 60. 90. 90. 90. CRYSTal BUILd CUTOff @XO NOPE 0 IMAGE BYREsidue XCEN 0.0 YCEN 0.0 ZCEN 0.0 SELE water END IMAGE BYSEgment XCEN 0.0 YCEN 0.0 ZCEN 0.0 SELE protein END

The first line defines the crystal symmetry (CUBIc in our example), and gives the necessary information, side lengths A, B, C and the three angles , , , which in our case are A = B = C = 60 Å, and , respectively. The generic form of the command would be

CRYS BUILd DEFIme <type> A B C

(It is particularly easy to build rhombic dodecahedrons or truncated octahedrons, which are often preferred over cubic simulation boxes. We generally prefer to use rhombic dodecahedrons for globular systems and hexagonal prisms for long, thin macromolecules.)

The second line initiates the building of image atoms. Since CHARMM knows about cubic boxes, no further information about the crystal is needed, and `NOPEr` (the number of crystal operations) is set to 0. More important is the `CUTOff` parameter, which actually indicates how deep to construct the layer of image atoms. In order to work with the non-bonded list approach of CHARMM, the variable @XO (as we call it here, any variable name is fine, alternatively, you can give the number directly) has to be as large as where is the length of the unit cell. This is particularly important if there is a significant amount of vacuum space within the unit cell.

The meaning of the third and fourth line ("raw" `IMAGe` commands for a change) become clear once you understand the CHARMM way of handling PBC during a minimization or MD simulation when the coordinates of all particles change continuously. Eventually, particles in the primary box will drift out of the box, and atoms in the image layer will enter the primary region (or "diffuse" further away). This is no need for immediate concern, since the "skin" in our non-bonded lists gives us a safety net (just as in the absence of PBC). Eventually, however, the central box and the image layer will have to be rebuilt (by the MKIMAT routine). The obvious time to do so is when the non-bonded lists are recomputed. At this point all atoms are essentially subjected to a PBC like test, but as outlined above, one has to avoid breaking molecules into two. The two IMAGe lines tell CHARMM to apply periodic shifts to water on a residue by residue (= molecule by molecule) basis (option BYREsidue, line 3), whereas the protein is shifted as a whole (BYSEgment, line 4) -- in the case of proteins, shifting by residues could pull off individual amino acids or small groups of them!

This is it, or rather, this should be it. One would assume that once PBC is set up (via `CRYStal` / `IMAGe` as shown), any non-bonded list update would update both the primary-primary and primary-image list, and that `CUTNB` would be understood as being applicable to the generation of both lists. Alas, not so ... For (lets assume) historical reasons, the update frequency for the two lists (primary-primary, primary-image) are controlled by two different parameters, `INBFrq` and `IMGFrq`; similarly, there are two cut-off radii, `CUTNB` for the primary-primary and `CUTIM` for the primary-image non-bonded lists. Obviously, `INBFrq` should always equal `IMGFrq`, and `CUTNb`. `CUTIM` is often set to be equal to `CUTNB`, but it may be set larger if computer memory permits to reduce the frequency of list building. To add insult to injury, there is no mechanism to ensure that reasonable values have been chosen for these parameters. For example, while `INBFrq` defaults to -1 (heuristic update, which is good), `IMGFrq` has a default of 50, which is nonsensical! Thus, it is a sad fact of life that * it is the user's responsibility to ensure meaningful values for INBFrq, IMGFrq, CUTNb and CUTIm*, otherwise rubbish can and will be produced.

So, for the sake of good practice, let us show the necessary steps to set up PBC for a cubic system and ensure correct energy / force calculations. For good measure, we use a large than default `CUTNB`/`CUTIM`, and we request `BYCB`:

! read RTF ! read parameters (we assume param22/27, which in principle sets up ! most defaults correctly, i.e., what we get at this point ! corresponds to ! NONB nbxmod 5 atom cdiel shift vatom vdistance vswitch - ! cutnb 14.0 ctofnb 12.0 ctonnb 10.0 eps 1.0 e14fac 1.0 - ! wmin 1.5 ! as discussed ! read psf ! read coordinates set rc 16. ! value we want for CUTNB/CUTIM CRYSTal DEFI CUBI 60. 60. 60. 90. 90. 90. CRYSTal BUILd CUTOff 30 NOPE 0 ! cutoff is half the unit cell length IMAGE BYRE XCEN 0.0 YCEN 0.0 ZCEN 0.0 SELE water END IMAGE BYSE XCEN 0.0 YCEN 0.0 ZCEN 0.0 SELE protein END ! now issue an ENERgy command to (re)set all defaults! ENER INBFrq -1 IMGFrq -1 CUTNb @RC CUTIm @RC BYCB - NBXMod 5 ATOM CDIEl SHIFt VATOm VDIStance VSWItch - CTOFnb 12.0 CTONnb 10.0 EPS 1.0 E14Fac 1.0 WMIN 1.5 ! Note: in practice we would usually replace SHIFte ! electrostatics by PME Ewald, see below

The above should guarantee that all subsequent minimization and or MD calculations generate both non-bonded lists at correct intervals with sane cut-off radii. As noted in the comment, the one unrealistic aspect of the above code snippet is that PME is not set up. Normally, the ENERgy command above would also be used to replace SHIFted electrostatics by PME; however, we first need to introduce Ewald summation and PME.

### Ewald summation / Particle-Mesh-Ewald (PME)

#### Ewald summation and PME for dummies

Some more rigorous background can be found on Wikipedia. In the following I comment in (over)simplified form on some aspects of Ewald / PME, which I find is surprisingly often misunderstood in practice.

Ewald summation (ES) is a method to sum up (= calculate) electrostatic interactions in an infinite lattice, or, alternatively to compute electrostatic interactions under periodic boundary conditions. If you want to look at it that way: with ES you take these periodic images "seriously". In symbols, the starting point of ES is to compute the (infinite) sum (the lattice sum)

Here , are the charges of particles i, j, is the vector between particle positions i, j, and is the lattice vector pointing into all periodic images of the primary box. The prime in the sum over the is used to indicate that for (primary box), the case i=j is excluded.

has the unpleasant mathematical property of being conditionally convergent, i.e., its value depends on the order of summation.

In addition to being conditionally convergent, direct computation of the above lattice sum would be extremely tedious since the convergence is also *slow*. ES is a trick to accelerate the convergence; as a by-product the singularity of the Coulomb interactions responsible for the conditional convergence is avoided. (Thus, ES may be viewed as a
modified lattice sum).

In ES the above sum is split into two sums, according to

Here, `erf` and `erfc` are the error and complementary error functions, respectively. For a suitable choice of , the first term on the right hand side is short-ranged, whereas the second term contains all the long-range interactions. More specifically, if is chosen sufficiently large, for the first term only interactions in the primary simulation box need to be taken into account (think of the erfc term acting as a "shifting" function, damping the long range 1/r potential rapidly to zero. Thus, the kernel labelled *real* leads to the *real space sum of electrostatic interactions* which is computed normally (using the complementary error function erfc(x) as a "shifting" function),

The kernel labelled *reciprocal*, on the other hand, is long-ranged, and summation here would have to run over all lattice vectors **n**. Long-ranged interactions, however, become short-ranged in reciprocal space (hence the name), and essentially by Fourier transformation the sum resulting from the *reciprocal* kernel becomes a rapidly converging sum in reciprocal space, referred to as a "reciprocal sum" or "k-sum".

The exact mathematics adds a few additional terms and corrections, which CHARMM computes correctly, and hence we can afford to ignore. The computationally intensive parts are the real space and reciprocal space sums just discussed. The former, as pointed out, can be computed within the normal framework of non-bonded interactions, whereas the second is a somewhat funny term, entailing a double sum over atom positions and reciprocal space vectors. We spare you the details since the k-sum is nowadays usually computed by the so-called *particle-mesh-Ewald* (PME) technique. (*Note: one sometimes gets the impression that ES and PME are considered different methods. This is not so: PME is just a fast, efficient way of approximating one term of the ewald sum, namely the k-sum!*)

The basic trick of PME consists of the fact that the k-sum is not computed for atom positions, but for fixed positions on a grid. This allows CHARMM to pre-compute a lot of stuff *and* to use fast fourier transforms (FFT) to speed things up further. The atomic charges are smeared out on a grid in each step, potential (and forces) computed on the grid and are back-transformed to their atomoc positions. By using so-called B-splines for the "smearing", it suffices to compute the potential; the forces are obtained by differentiation of the splines (this avoids separate FFTs for potential and force calculations). Traditional ES scales approximately as for *N* atoms; PME scales as , which makes a big difference for large systems.

CHARMM implements both traditional ES as well as PME to compute the k-sum. Historically, the two methods were coded independently, and the older implementation of ES has a number of limitations. Since it is also much slower than PME, only PME is discussed subsequently.

#### Setting up PME

Ewald summation is requested by the `EWALd` option to `ENERgy`, and PME is requested by adding `PMEWald` as an option. While the original ES method had numerous options to fine-tune the calculation of k-vectors, for PME only 4 additional options are relevant. The first is the choice of , the others specify the number of grid points in each of the three spatial directions. A full energy call requesting PME looks like

! set up CRYStal / IMAGes (= PBC), otherwise the following will fail ENERgy ... EWALd PMEWald KAPPa 0.43 FFTX 64 FFTY 64 FFTZ 64

The above sets to 0.43, and requests a grid. As a rule of the thumb, the grid spacing in each spatial direction should be approximately 1 Å. One has to keep in mind, however, that the FFT implementation of CHARMM has certain limitations concerning the choice of prime factors; essentially, only powers of 2, 3 and 5 are allowed, e.g., FFTX 27 (= 3x3x3) is OK, but FFTX 28 (= 2x2x7) is not. Further, the more powers of 2 relative to powers of 3 and 5, the better (i.e. faster). CHARMMing uses an external Perl script to calculate optimal grid dimensions for a given system size.

The trickier parameter is . Since the real space sum is calculated by the normal non-bonded routines, the cut-off radius `CTOFNB` is applied. It, therefore, has to be sufficiently large as to ensure proper damping so that at distances interactions and forces are effectively zero. If the damping is too low ( too small), then you miss interactions and introduce a severe error in your calculations.
There are several rules of the thumb to check the correct choice of ; below is the exact criterion to choose For your value of (`CTOFNB`) and
evaluate (e.g., Mathematica!)

To be on the safe side, if your , increase (ideally, ).

Note that as long you use the the default `CTOFNB` (12 Å), you can simply use the default of 0.43 as it has been chosen for this value of `CTOFNB`. In this case it is unnecessary to check the value via the above equation. A less precise rule of thumb (from ewald.doc) is to set to .

**Final note for running MD with PME:** The implementation of the PME in CHARMM (as well as in most other comparable programs) does not conserve center of mass translation (sum of all momenta is not exactly zero). If no provisions are taken, your system might start to pick up a net translational velocity component. To compensate / avoid this artifact, CHARMM enforces that during MD you take out center of mass translation by specifying a non-zero `NTRFrq` (reasonable values are from 500 to 5000, although it does no harm to set it smaller).

**A theoretical PS:** ES is mostly discussed in the context of summing up electrostatic interactions in an infinite lattice (see, e.g., the presentation in Allen/Tildesley). A different point of view can be found in the presumably first publication in which ES was used in the context of a computer simulation (Brush, Sahlin and Teller, JCP 45, 2102 (1966), yes *the* H-bomb Teller!). The Ewald potential is simply the solution to the Poisson equation under periodic boundary conditions.

A demonstration of how to use particle mesh ewald is given on the complete example page.

## Suggested Nonbond cutoff scheme

In general, it is a good idea to go with the nonbond cutoff scheme given in the parameter file of the force field that you will be using. For example, the CHARMM27 Protein/Nucleic acid force field gives default nonbond parameters of:

NONBONDED nbxmod 5 atom cdiel shift - vatom vdistance vswitch - cutnb 14.0 ctofnb 12.0 ctonnb 10.0 - eps 1.0 e14fac 1.0 wmin 1.5

These are good default values for most simulations. In some cases when running in vacuum, it might be necessary to use `VSHIft` instead of `VSWItch`. There are a couple other points to remember:

- Always set
`INBFrq`to -1 so nonbond updates are done heuristically. - You can safely increase
`CUTNb`to decrease the frequency of nonbond list updates (at the expense of a higher memory requirement for the nonbond list). - In general, a nonbond cut off of less than 12 angstroms should not be used (the errors are too great).
- In Ewald calculations, the electrostatic cut-off method (
`SHIFt`vs.`SWITch`is ignored as the Ewald summation is used to calculate long-range electrostatics. Replace this with the Ewald parameters described above.

As an example, nonbond setup for a simulation using PBC might look like:

nonbond nbxmod 5 atom cdiel - elec ewald pme kappa 0.34 spline order 6 - vdw vatom vswitch - cutnb 14.0 ctofnb 12.0 ctonnb 10.0

Be very careful to check compatibility of your nonbond specification with the force fields and implicit solvent models you're using (if any)! This is critically important to the correctness of your simulation!