The program conn - creates the connectivity file - one of the corner stones of any MD program. This file includes all the necessary information for energy calculations of a specific molecule. Other modules, (e.g. mini_pwl that minimizes the energy of the molecule) read a prepared connectivity file before starting the computations. We start with the description of the input to con and then continue to describe the internal structure of this important program. A sample input that creates a connectivity file for valine dipeptide is:
~
~ input for connectivity (valine dipeptide)
~
file poly name=(VALD.POLY) unit=10 read
file mono name=(ALL.MONO) unit=11 read
file prop name=(ALL.PROP) unit=12 read
file wcon name=(VALD.WCON) unit=13 wovr
action
*EOD
~
The above information can be typed interactively or stored in a file - conn.inp - and
redirected to the conn program :
conn < conn.inp > conn.out
The output of the program conn also can be redirected to some new text file (conn.out) or displayed directly on the terminal screen.
Here is the polymerization file VALD.POLY where you tell MOIL how many monomers you are joining and in what sequence (molecule):
~
MOLC=(VALD) #mon=3
NTR0 VAL CTR0
*EOD
~
The files ALL.PROP and ALL.MONO contain data about atom properties (Van-der-Waals parameters, bonds, angles) and monomers description, respectively.
The conn program extracts specific information from the above databases. Below is presented only a small part of ALL.MONO file which is related to the connectivity of this molecule (all other records will be not processed for this specific configuration). Every monomer's block starts with the description keyword MONO=(...). Keyword DONE marks the end description of the monomer's atoms. If monomer consist of more than one atom the section BOND ... DONE contains the information about existing covalent bonds.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~
~ Monomers file : ALL.MONO
~
...
~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~
uncharged N-terminus
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~
~uncharged N-terminus (1)~~~~~~~~~~~~~~~~~~~~~~~~(OPLS)~~~
~
MONO=(NTR0) #prt=6 chrg=0.0
~
~
X O
~
: :
~ ME - C - N - CA...C...
~ | |
~ O H
~
UNIQ=(N) PRTC=(NH) NEXT
UNIQ=(H) PRTC=(HN) NEXT
UNIQ=(CA) PRTC=(CAH) NEXT
UNIQ=(ME) PRTC=(CH3)
UNIQ=(C) PRTC=(CO)
UNIQ=(O) PRTC=(OC)
DONE
BOND
C-N* C-O C-ME
DONE
~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~
VAL (OPLS)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
MONO=(VAL) #prt=9 chrg=-0.570
~
~
O
~
|
~ N - CA - C...N
~ | |
~ H CB - CG2
~ |
~ CG1
~
~why not CH3 instead of CH3G (maybe because of improper torsion)?
~
~
UNIQ=(N) PRTC=(NH)
UNIQ=(H) PRTC=(HN)
UNIQ=(CA) PRTC=(CAH)
UNIQ=(CB) PRTC=(CBH)
UNIQ=(CG1) PRTC=(CH3G)
UNIQ=(CG2) PRTC=(CH3)
UNIQ=(C) PRTC=(CO)
UNIQ=(O) PRTC=(OC)
UNIQ=(N) PRTC=(NH) NEXT
DONE
BOND
N-H N-CA CA-CB CB-CG1 CB-CG2
CA-C C-O C-N*
DONE
~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~
uncharged C-terminus
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~
~uncharged C-terminus (1)~(not gly,pro,BU)~~~~~~~(OPLS)~~~
~
MONO=(CTR0) #prt=6 chrg=0.2
~
~ X
O H
~ : : |
|
~ ...N...CA - C - N -ME
~
UNIQ=(C) PRTC=(CO) PREV
UNIQ=(O) PRTC=(OC) PREV
UNIQ=(CA) PRTC=(CAH) PREV
UNIQ=(N) PRTC=(NH)
UNIQ=(H) PRTC=(HN)
UNIQ=(ME) PRTC=(CH3T)
DONE
BOND
N-H N-ME
DONE
~
~
...
*EOD
The file ALL.PROP contains five major sections that describe specific properties of the molecules and start with specific keywords:
All of these sections are closed by the keyword DONE and at the end of database file the keyword *EOD should be present. The program conn will pick only the records which contain atoms from relevant monomers of ALL.MONO . Other records will be ignored. Below we display only a part of the file ALL.PROP :
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~
~ Properties file : ALL.PROP
~
PRTC
PNAM=(NH) PMAS=14. PCHG=-0.570
PEPS=0.17 PSGM=3.250
PNAM=(HN) PMAS=1. PCHG=0.370
PEPS=0.0498 PSGM=0.30
PNAM=(CO) PMAS=12. PCHG=0.50
PEPS=0.105 PSGM=3.750
PNAM=(OC) PMAS=16. PCHG=-0.5
PEPS=0.21 PSGM=2.960
PNAM=(CAH) PMAS=13. PCHG=0.20
PEPS=0.080 PSGM=3.800
PNAM=(CH3T) PMAS=15. PCHG=0.200
PEPS=0.170 PSGM=3.800
PNAM=(CH3) PMAS=15. PCHG=0.00
PEPS=0.16 PSGM=3.9100
PNAM=(CH3G) PMAS=15. PCHG=0.00
PEPS=0.160 PSGM=3.910
PNAM=(CBH) PMAS=13. PCHG=0.00
PEPS=0.080 PSGM=3.850
...
DONE
~
BOND
CAH CO 317.0 1.522
CBH CO 317.0 1.522
CO OC 570.0 1.229
CAH CBH 260.0 1.526
CAH NH 337.0 1.449
NH HN 434.0 1.010
CO CH3 317.0 1.522
CO NH 490.0 1.335
CBH CH3 260.0 1.526
CBH CH3G 260.0 1.526
CH3T NH 337.0 1.449
CAH CH3 260.0 1.526
...
DONE
~
ANGLE
CH3 CO OC
80.0 120.4
OC CO NH
80.0 122.9
CO NH HN
35.0 119.8
CO NH CAH
50.0 121.9
CAH NH HN
38.0 118.4
NH CAH CO
63.0 110.1
CAH CO OC
80.0 120.4
CO NH CH3T
50.0 121.9
CH3T NH HN
38.0 118.4
CAH CO NH
70.0 116.6
CH3 CO NH
70.0 116.6
NH CAH CBH
80.0 109.7
CAH CBH CH3
63.0 111.5
CAH CBH CH3G
63.0 111.5
CH3 CBH CH3G
63.0 111.5
CBH CAH CO
63.0 111.1
...
DONE
~
TORSION
CH3 CO NH
HN 0.0
2.5 0.0 2 -1.0
OC CO NH
HN 0.0
2.5 0.0 2 -1.0
CH3 CO NH
CAH 0.0 2.5
0.0 2 -1.0
OC CO NH
CAH 0.0 2.5
0.0 2 -1.0
OC CO NH
CH3T 0.0 2.5
0.0 2 -1.0
CAH CO NH
CH3T 0.0 2.5
0.0 2 -1.0
CAH CO NH
HN 0.0
2.5 0.0 2 -1.0
OC CO CAH
NH 0.0
0.0 0.1 3 -1.0
CO NH CAH
CBH 0.0 0.0
0.0 3 0.0
HN NH CAH
CBH 0.0 0.0
0.0 3 0.0
CO NH CAH
CO 0.0
0.0 0.0 3
0.0
HN NH CAH
CO 0.0
0.0 0.0 3
0.0
NH CAH CBH
CH3G 0.0 0.0
0.5 3 1.0
OC CO CAH
CBH 0.0 0.0
0.1 3 -1.0
NH CO CAH
CBH 0.0 0.0
0.0 2 0.0
...
DONE
~
IMPROPER
NH CAH CO
HN 45.0 0.0
NH CH3T CO
HN 45.0 0.0
CO CAH OC
NH 100.0 0.0
CO CH3 OC
NH 100.0 0.0
CAH NH CO
CBH 55.0 35.26
CBH CAH CH3 CH3G
55.0 35.26
...
DONE
~
*EOD
Here is the connectivity file (VALD.WCON) which is the product of program conn :
~ CONNECTIVITY FILE FOR MOLECULES:
~ totmon npt nb nangl ntors nimp totex
totspe
lestyp NBULK
3 14
13 18 13 6
31 20 0 1
VALD
~ Pointers to last particle of BULK
14
~ Monomer names
NTR0 VAL CTR0
~ Pointers to last particle of monomer
3 11 14
~ Properties of particles list :
~pt mono ptid lesid ptnm ptms ptchg epsgm6 epsgm12
ptwei
1 1 13 0
ME 15.00 .00000
.47821E+02 .28586E+04 .10000E+01
2 1 9 0
C 12.00 .50000
.34176E+02 .18022E+04 .10000E+01
3 1 10 0
O 16.00 -.50000
.23769E+02 .61644E+03 .10000E+01
4 2 7 0
N 14.00 -.57000
.28308E+02 .97175E+03 .10000E+01
5 2 8 0
H 1.00 .37000
.00000E+00 .00000E+00 .10000E+01
6 2 11 0
CA 13.00 .20000
.31040E+02 .17032E+04 .10000E+01
7 2 49 0
CB 13.00 .00000
.32282E+02 .18422E+04 .10000E+01
8 2 26 0 CG1
15.00 .00000 .47821E+02 .28586E+04
.10000E+01
9 2 13 0 CG2
15.00 .00000 .47821E+02 .28586E+04
.10000E+01
10 2 9 0
C 12.00 .50000
.34176E+02 .18022E+04 .10000E+01
11 2 10 0
O 16.00 -.50000
.23769E+02 .61644E+03 .10000E+01
12 3 7 0
N 14.00 -.57000
.28308E+02 .97175E+03 .10000E+01
13 3 8 0
H 1.00 .37000
.00000E+00 .00000E+00 .10000E+01
14 3 12 0 ME
15.00 .20000 .45249E+02 .24829E+04
.10000E+01
~ Bonds list:
~ ib1 ib2 kbond req
1 2
317.0000 1.5220
2 3
317.0000 1.2290
2 4
317.0000 1.3350
4 6
260.0000 1.4490
4 5
337.0000 1.0100
6 7
260.0000 1.5260
6 10
317.0000 1.5220
7 9
260.0000 1.5260
7 8
260.0000 1.5260
10 12
490.0000 1.3350
10 11
570.0000 1.2290
12 14
337.0000 1.4490
12 13
434.0000 1.0100
~ Angles list:
~ iangl1 iangl2 iangl3 kangl angleq
1 2
3 80.00000 120.40000
1 2
4 50.00000 121.90000
3 2
4 80.00000 122.90000
2 4
5 35.00000 119.80000
2 4
6 50.00000 121.90000
5 4
6 38.00000 118.40000
4 6
7 80.00000 109.70000
4 6 10
63.00000 110.10000
7 6 10
63.00000 111.10000
6 7
8 63.00000 111.50000
6 7
9 63.00000 111.50000
8 7
9 63.00000 111.50000
6 10 11
80.00000 120.40000
6 10 12
70.00000 116.60000
11 10 12
80.00000 122.90000
10 12 13
35.00000 119.80000
10 12 14
50.00000 121.90000
13 12 14
38.00000 118.40000
~ Torsions list:
~ itor1 itor2 itor3 itor4 period ktors1
ktors2
ktors3 phase
1 2
4 5 2
.0000 2.5000 .0000 -1.000
1 2
4 6 2
.0000 2.5000 .0000 -1.000
3 2
4 5 2
.0000 2.5000 .0000 -1.000
3 2
4 6 2
.0000 2.5000 .0000 -1.000
4 6
7 8 3
.0000 .0000 .5000 -1.000
4 6
7 9 3
.0000 .0000 .5000 1.000
4 6
10 11 3
.0000 .0000 .1000 -1.000
8 7
6 10 3
.0000 .0000 .5000 1.000
9 7
6 10 3
.0000 .0000 .5000 1.000
6 10 12
13 2 .0000
2.5000 .0000 -1.000
6 10 12
14 2 .0000
2.5000 .0000 -1.000
11 10 12
13 2 .0000
2.5000 .0000 -1.000
11 10 12
14 2 .0000
2.5000 .0000 -1.000
~ Improper torsion properties:
~iimp1 iimp2 iimp3 iimp4 kimp impeq
2 1
3 4 .10000000E+03 .00000000E+00
4 6
2 5 .45000000E+02 .00000000E+00
6 4
10 7 .55000000E+02 .35260000E+02
7 6
9 8 .55000000E+02 .35260000E+02
10 6 11
12 .10000000E+03 .00000000E+00
12 14 10
13 .45000000E+02 .00000000E+00
~ Exclusion list 1-2 1-3, set as followed:
~ atom number, number of exclusions and list
1 3
2 3 4
2 4
3 4 5 6
3 1
4
4 4
6 5 7 10
5 1
6
6 6
7 10 8
9 11 12
7 3
9 8 10
8 1
9
10 4
12 11 13 14
11 1
12
12 2
14 13
13 1
14
~ Special list 1-4 set as followed:
~ atom number, number of exclusions and list
1 2
5 6
2 2
7 10
3 2
5 6
4 4
8 9 11 12
5 2
7 10
6 2
13 14
7 2
11 12
8 1
10
9 1
10
11 2
13 14
To perform (?) energy evaluation the connectivity file must be supplemented by an atomic coordinates' file. The CHARMM coordinates file format is used in MOIL. Below a single configuration for the valine dipeptide is provided ( Cartesian coordinates in CHARMM format).
*
* initial structure, valine dipeptide in helix conformation
*
14
1 1 NTR0 ME
-0.02717 3.41564 0.00488 VALD 1 15.03500
2 1 NTR0 C
0.04827 1.91708 -0.24090 VALD 1 12.01100
3 1 NTR0 O
0.67142 1.46396 -1.20000 VALD 1 15.99940
4 2 VAL N
-0.58751 1.13019 0.63714 VALD 1 14.00670
5 2 VAL H
-1.05774 1.57408 1.37031 VALD 1
1.00800
6 2 VAL CA -0.63665
-0.33703 0.58146 VALD 1 13.01900
7 2 VAL CB -1.49790
-0.83846 -0.61850 VALD 1 15.03500
8 2 VAL CG1 -2.91731
-0.28641 -0.56404 VALD 1 0.00000
9 2 VAL CG2 -1.57439
-2.35741 -0.74809 VALD 1 0.00000
10 2 VAL C
0.70303 -1.08091 0.68144 VALD 1 12.01100
11 2 VAL O
0.92822 -1.80836 1.64793 VALD 1 15.99940
12 3 CTR0 N 1.61106
-0.91246 -0.28244 VALD 1 14.00670
13 3 CTR0 H 1.39819
-0.28295 -1.00480 VALD 1 1.00800
14 3 CTR0 ME 2.93848
-1.59696 -0.26438 VALD 1 15.03500
MOIL does not provide a module to generate automatically coordinate files. It allows however to process data from other sources. One important source for the coordinate files are Protein Data Bank. The coordinates are stored in a PDB format which requires adjustments. Both CHARMM and PDB formats are records of fixed length and transformation of the coordinates is not difficult, but some of the PDB atom or residue names should be edited to be consistent with the MOIL databases ( files ALL.PROP and ALL.MONO ).
The database information about proteins are stored in text files in which every record (line) begins with a keyword. A full description of all records can be find in the PDB from The Research Collaboratory for Structural Bioinformatics (RCSB) [http://www.rcsb.org/index.html]. Below we will consider one complicated example - converting the PDB data file of myglobin into CHARMM format. The myoglobin PDB data file - pdb1mbd.ent - which is the deoxy-form. To extract the coordinate data file only the records which begins with ATOM and HETATM keywords are important. All other records are ignored.
There are several steps in the preparation of the data for MOIL:
Removing some of the multiple doubling records of the same atom from a PDB file can be done manually or using a program ( for example pdb2puth). The records with labels 2,3 .. or B,C... in column 17 should be deleted manually and the label 1 or A should be replaced by a space character. We create a new coordinate file 1mbd_edit.ent from source file - pdb1mbd.ent . The records with atom numbers 106,108,688,690,714,716,1204,1206,1208, 1210,1212,1214,1216 were deleted since they are repeats of previous records and respective previous records were edited to have a space character in column 17. Terminal residues are separate issues that need to be addressed before using MOIL. Before the first atom we have added a record which describe the N-terminus of the protein molecule.
The coordinates for the N-terminus has a special value (9999.99):
ATOM 0 HX2 NTER
0 9999.99 9999.99 9999.99 1.00 28.18
which will tell to the next program (Program puth) to add the
correct one. Also the index of the monomer NTER was stored to 0 to be
different from the next one (VAL 1). In order to add the C-terminus for the
protein molecule in the record ATOM 1230 the monomer name was changed
from GLY to CTRG and the monomer index was changed to 0 (or can
be other number different from previous and next residues numbers):
ATOM 1230 OX2 CTRG 0
-3.627 24.845 -7.563 8.00 44.30 1MBD1309
The next PDB record (keyword TER) was deleted because previous edited record specifies type of C-terminus.
Water molecules which are included in the PDB file can be removed or transformed to the
MOIL format.
The transformation means that all the atoms of oxygen should be renamed into OH2
and water molecules into TIP3. Also is necessary to remove all the
multiple-duplicate records of water ( in our case HETATM 1371, 1374).
The above operations on removing of the multiple records can be done in interactive mode using the Swiss-PdbViewer (http://www.expasy.ch/spdbv/). However the NTER and CTRG records should be added manually after this, because this program remove completely such records from PDB format.
Next step is editing some of the atoms and residues (monomers) names from initial source. The name of monomer residue S04 should be changed into SUL (HETATM 1620-1624). In the monomer HEM the FE atom should be shifted with one position to right (ATOM1625). The names "N A", "N B", "N C" and "N D" ( ATOM 1630, 1641, 1649, 1657) should be changed to NA, NB, NC and ND to avoid the spaces in the name of atoms (MOIL doesn't accept the spaces in the names). Also the residue name HEM was changed in the HEM1 (HETATM 1625-1667), which is the internal name for the monomer in the ALL.MONO file for the de-oxy form. HEME should be used for the bonded form to a ligand.
Note: As alternative to the manual editing we are providing one
program (pdb2puth) which should do most of the above operations
automatically given the input PDB file. To get the same result as described above it is
necessary to call:
pdb2puth < 1mbd_pdb2puth.inp > 1mbd_pdb2puth.log
After processing the output file 1mbd.ent contains the same information as in the edited manually 1mbd_edit.ent.
To obtain the connectivity file one need to create a new file with monomers information - file 1mbd.poly - or check the file 1mbd.poly which is created by the program (pdb2puth) . The number and names of the monomers should be exact the same like the number and names of the residues in the edited 1mbd_edit.ent or the transformed 1mbd.ent PDB files. The input for the *.poly files are free so it is not important how many monomers are in a line but they should come in the same order as in the PDB file. In the case of the manual creation of the file 1mbd.poly the sequence records of the PDB file (keyword SEQRES ) are useful.
Usually the PDB file don't contains information about the coordinates of the hydrogen
atoms. Adding of these records to the protein and to the water molecules can be done using
another MOIL utility - the program puth - which also
transform the format of the PDB file into CHARMM coordinate one. However to run this
utility the connectivity file is needed. The connectivity file
can be obtained using program conn :
conn < 1mbd_conn.inp > 1mbd_conn.log
The program conn gets all the information about the files directly from input (or from a redirected text file 1mbd_conn.inp ) and write all the step of processing on the standard system output (or to a redirected text file 1mbd_conn.log ) . The input file should contains as minimum the monomers ALL.MONO , properties ALL.PROP database files and a description of the structure in term of monomers 1mbd.poly . As a result is created the file 1mbd.wcon which incorporate all the necessary information about protein molecule.
puth is a program to read pdb format file and to detect and to place missing hydrogens as defined by the connectivity file. The algorithm is based on covalent structure only and not on hydrogen bonds between donors and acceptors. For example, hydrogens for a water molecules are placed in random directions (though satisfying the covalent requirements of individual molecules). This seems to be quite satisfactory, and bad contacts are corrected by a short minimization.
Note that the program will halt if an atom other than hydrogen is missing.
Atom numbers are not important and are not used, sequential monomer numbers are not important too, however different monomers must have different monomer numbers. Of course the order of the monomers must be as is in the poly file, the order of the particles within a monomer is not important.
puth < 1mbd_puth.inp >
1mbd_puth.log
The file 1mbd_puth.inp should specifies only the connectivity
file 1mbd.wcon and the PDB file 1mbd.ent. The database files ALL.MONO and ALL.PROP are
not necessary to include anymore since the connectivity file 1mbd.wcon contains all the data about monomers and
atom properties.
The program puth creates the new CHARMM coordinate file 1mbd.crd which have the same number of monomers (residues) but larger number of atoms (because a new hydrogen atoms records is added to the protein and water molecules).
Both the connectivity 1mbd.wcon and coordinate 1mbd.crd files are enough to start any dynamics simulations however minimization is highly recommended once the placement of hydrogens is complete.
Another useful utility (Program solvatecrd) is provided for full solvating the molecule of interest. This program gets the CHARMM coordinate file format and add the necessary amount of water molecules to fill the required rectangular box.
solvatecrd < 1mbd_solv.inp
> 1mbd_solv.log
The input file 1mbd_solv.inp contains four records which should be ordered in the following:
To start the dynamics simulations it is necessary to create a new connectivity file that correspond to new structure which consist of protein molecule and environmental water molecules. Below is given an example of a new input for the program conn :
conn < 1mbd_conn_solv.inp
> 1mbd_conn_solv.log
The resulting connectivity file 1mbd_solv.wcon
with previously created CHARMM coordinate file 1mbd_solv.crd
can be used for the energy minimization or dynamics simulations.