Difference between revisions of "CHARMMing's PDB Reader"

From CHARMM tutorial
Jump to navigationJump to search
(Begin writing about the PDB parser)
 
(finish basic algorithm)
Line 6: Line 6:
  
 
# Remove all lines that do not start with ATOM or HETATM
 
# Remove all lines that do not start with ATOM or HETATM
# Handle multi-model atoms. Some PDBs separate different models cleanly into different sections, but others have alternate configurations listed for individual atoms or residues within a single model
+
# Handle multi-model atoms. Some PDBs separate different models cleanly into different sections, but others have alternate configurations listed for individual atoms or residues within a single model. These are indetifieded via the following procedure:
 +
## The line is checked to see if it belongs to an alternate model (this is done by looking at the PDB B-Factor, which is usually < 1 when multiple models are present and by checking to see if an 'A' has ben prepended to the residue type). If the line is not part of a multi-model description, the algorithm skips to the last step.
 +
## The line is stored in a two state (empty or full) buffer. If the buffer is empty, store the current line in it and move onto the next line (the buffer is flushed, printing the line, if a corresponding multi-model line is not found).
 +
## If the buffer is full, compare the current line to the one in the buffer to see that they are multiple configurations of the same atom. If not, flush the buffer and store the current line into it (it is starting a multi-model section for a new atom).
 +
## If the current line matches with the line in the buffer, look at the B-factor weight. Whichever line has a lower B-factor (less likely to be the correct structural data) is tagged for removal (it will not be written out when the multimodel buffer is flushed), and go on to the next line
 +
## If the buffer is full and the line we are working on does not refer to the atom in the multimodel buffer (if said buffer is full), flush the buffer (i.e. collect the lines in it excepting those tagged for removal).
 +
# sort the lines that have been read in by segment ID, residue number, and atom number, in that order (this groups all segments together, orders residues within segments, and orders atoms within residues)
 +
# group the segments together (i.e. make a separate data structure for each independen segment) and decide whether each segment is a protein, nucleic acid, composed of "good hetero-atoms", or composed of "bad hetero-atoms" (accomplished by looking at the residues present within the segment)
 +
# Rename the terminal oxygen atom type in each protein segment from OXT to OT2 (most PDB files use OXT for the terminal Oxygen, but CHARMM expects OT2).
 +
# Decide if each nucleic acid segment is DNA or RNA (by testing for the presence of Uracil or Thymine residues)
 +
# Change the residue names and atom types to what CHARMM expects
 +
## fill in
 +
# Re-index atom number and residue IDs, to make atom and residue numbering continuous starting from 1.
 +
# Write all segments out in separate PDB files.
  
 
== Comments ==
 
== Comments ==
Line 12: Line 25:
 
* The PDB parser removes all lines other than ATOM and HETATM ones. A separate routine is called beforehand to parse out title, author, journal, and disulfide bond information.
 
* The PDB parser removes all lines other than ATOM and HETATM ones. A separate routine is called beforehand to parse out title, author, journal, and disulfide bond information.
 
* PDBs with multiple MODEL segments are passed to a separate routine to be split into their constituent models.
 
* PDBs with multiple MODEL segments are passed to a separate routine to be split into their constituent models.
 +
* We can tell if two atoms are the same by looking at their segment IDs, residue IDs, and atom numbers.

Revision as of 18:11, 25 January 2010

CHARMMing's PDB reader exists to take a raw PDB file from the Protein Data Bank and convert it into a CHARMM-formatted PDB file. CHARMM has its own syntactic expectations for PDB files, so many PDB files will have problems when loaded into CHARMM (some of these problems are quite minor while others are potentially serious). This section of the tutorial is an explanation of how CHARMMing's PDB reader modifies a file from the PDB. It is not meant to be an exhaustive exploration of the PDB file format.

Basic algorithm

The basic algorithm used by CHARMMing is:

  1. Remove all lines that do not start with ATOM or HETATM
  2. Handle multi-model atoms. Some PDBs separate different models cleanly into different sections, but others have alternate configurations listed for individual atoms or residues within a single model. These are indetifieded via the following procedure:
    1. The line is checked to see if it belongs to an alternate model (this is done by looking at the PDB B-Factor, which is usually < 1 when multiple models are present and by checking to see if an 'A' has ben prepended to the residue type). If the line is not part of a multi-model description, the algorithm skips to the last step.
    2. The line is stored in a two state (empty or full) buffer. If the buffer is empty, store the current line in it and move onto the next line (the buffer is flushed, printing the line, if a corresponding multi-model line is not found).
    3. If the buffer is full, compare the current line to the one in the buffer to see that they are multiple configurations of the same atom. If not, flush the buffer and store the current line into it (it is starting a multi-model section for a new atom).
    4. If the current line matches with the line in the buffer, look at the B-factor weight. Whichever line has a lower B-factor (less likely to be the correct structural data) is tagged for removal (it will not be written out when the multimodel buffer is flushed), and go on to the next line
    5. If the buffer is full and the line we are working on does not refer to the atom in the multimodel buffer (if said buffer is full), flush the buffer (i.e. collect the lines in it excepting those tagged for removal).
  3. sort the lines that have been read in by segment ID, residue number, and atom number, in that order (this groups all segments together, orders residues within segments, and orders atoms within residues)
  4. group the segments together (i.e. make a separate data structure for each independen segment) and decide whether each segment is a protein, nucleic acid, composed of "good hetero-atoms", or composed of "bad hetero-atoms" (accomplished by looking at the residues present within the segment)
  5. Rename the terminal oxygen atom type in each protein segment from OXT to OT2 (most PDB files use OXT for the terminal Oxygen, but CHARMM expects OT2).
  6. Decide if each nucleic acid segment is DNA or RNA (by testing for the presence of Uracil or Thymine residues)
  7. Change the residue names and atom types to what CHARMM expects
    1. fill in
  8. Re-index atom number and residue IDs, to make atom and residue numbering continuous starting from 1.
  9. Write all segments out in separate PDB files.

Comments

  • The PDB parser removes all lines other than ATOM and HETATM ones. A separate routine is called beforehand to parse out title, author, journal, and disulfide bond information.
  • PDBs with multiple MODEL segments are passed to a separate routine to be split into their constituent models.
  • We can tell if two atoms are the same by looking at their segment IDs, residue IDs, and atom numbers.