file: tutorial.three
Conformational search using the EDMC method
-------------------------------------------
We are going to used the electrostatically Driven Monte Carlo method to produce a
conformational path corresponding to a series of energy minima of the potential
energy surface. During a very long run, the method is likely to find the
conformation corresponding to the global energy minima (at least, for relatively
small sequences with no more than 30 amino acid residues). In this example,
however, we are going to produce a short test run with only five accepted
conformations.
We are going to use here a a five-residue chain of L-alanine to show how to carry
out a conformational search with ECEPPAK. In this case, the amino acid sequence is
shorter than those used in tutorials one and two in order to speed up the
computations.
1- Generate an input file with suffix "inp" ( i.e., ten_ala_edmc.inp) that will
contain the instructions for ECEPPAK.
2.- Include in ten_ala_edmc.inp the $CNTRL Data Group to define the type of run
(EDMC) and ask the program to write the coordinates corresponding each accepted
conformation in PDB format.
$CNTRL
runtyp = edmc ! run type is a edmc run
PRINT_CART ! print Cartesian coordinates
OUTFORMAT =PDB ! format of the Cartesian file is PDB
FILE = A10edmc ! prefix of the PDB file is A10edmc
$END
3.- Include the $SEQ data group with the amino acid sequence in the
ten_ala_edmc.inp file. Let's try using a single-letter code this time. Since
this form of sequence specification is not the default we have to use and extra
keyword in the $CNTRL data group, we should add: "res_code= one_letter".
Now let's input the sequence we'll use again the AMINO-COCH3 and CARBOXYL-NHCH3
at the N- and C-terminus, respectively. The two end groups must be also
specified using a one-letter code (see manual for one-letter and three letter
codes for residues and end groups).
The $SEQ data group should read,
$SEQ
A
AAAAAAAAAA
C
$END
4.- We must include in ten_ala_edmc.inp file the $GEOM data group with the set of
dihedral angles defining the conformation of the polypeptide chain. Let's
specify an extended conformation as input.
The $GEOM data group reads,
$GEOM
180.000 180.000
-160.000-140.000 180.000 60.000
-160.000-140.000 180.000 60.000
-160.000-140.000 180.000 60.000
-160.000-140.000 180.000 60.000
-160.000-140.000 180.000 60.000
-160.000-140.000 180.000 60.000
-160.000-140.000 180.000 60.000
-160.000-140.000 180.000 60.000
-160.000-140.000 180.000 60.000
-160.000-140.000 180.000 60.000
180.000
$END
5.- The conformational search protocol is defined through a set of specific
keywords. These keywords must be included in the data group $EDMC. Most of
the EDMC keywords (see manual) are assigned default values. We are going to
enter a few of them to indicate how they can be used to specify certain aspects
of the conformational search run.
Length of the run: One possible manner of specifying the length of the Monte
Carlo run is to define the maximum number of conformations accepted by the Monte
Carlo criterion. This is done by using the using the keyword MAXIT. Since we
want five (5) accepted conformations, we include inside the $EDMC data group:
MAXIT=5
Random numbers: since the EDMC procedure uses random numbers, we need to
initialize the random number generator by providing an integer (positive or
negative). This is done using the keyword SEED:
SEED= -5555
Temperature: A parameter associated with the temperature (in Kelvin's degree) for
the simulation is defined using the keyword TEMP:
TEMP=300
Whenever the search is trap in some region of the conformational space, the
method attempt to overcome the barriers by generating conformations with major
conformational changes and relaxing the criterion of acceptance by increasing the
Temperature parameter.
there are a few alternative procedure to change the temperature. One of them,
indicated by the keyword THERMAL_SHOCK, is to produce a sudden jump in the
temperature The high temperature is defined by the keyword T_UP.
Let used this procedure in our example:
THERMAL_SHOCK T_UP = 5000
Generation of conformations: the EDMC method utilizes different protocols for
generating new conformations. These conformations can be generated by random
predictions or by using electrostatic predictions.
The following keywords are used to control the process of generation:
RAND_TO_ELEC defines the ratio of randomly- to electrostatically-generated
conformations. Let's use a ratio of 3:10
RAND_TO_ELEC=0.3
The $EDMC data group reads,
$EDMC
MAXIT=20
SEED= -5555
TEMP= 300
THERMAL_SHOCK T_UP = 5000
RAND_TO_ELEC=0.3
$END
NOTE: In the present test, we are going to start the search from the initial
conformation whose geometry is provided in the data group $GEOM. However, it is
possible (and quite common) to override this option by requesting a starting
conformation with dihedral angles generated at random. This can be easily done
using the RAND_START keyword.
6.- The complete file now reads,
$CNTRL
runtyp = edmc ! run type is a edmc run
PRINT_CART ! print Cartesian coordinates
OUTFORMAT =PDB ! format of the Cartesian file is PDB
FILE = A10edmc ! prefix of the PDB file is A10edmc
res_code= one_letter ! use one-letter code to specify input sequence
$END
$EDMC
MAXIT=20
SEED= -5555
TEMP= 300
THERMAL_SHOCK T_UP = 5000
RAND_TO_ELEC=0.3
$END
$SEQ
A
AAAAAAAAAA
C
$END
$GEOM
180.000 180.000
-160.000-140.000 180.000 60.000
-160.000-140.000 180.000 60.000
-160.000-140.000 180.000 60.000
-160.000-140.000 180.000 60.000
-160.000-140.000 180.000 60.000
-160.000-140.000 180.000 60.000
-160.000-140.000 180.000 60.000
-160.000-140.000 180.000 60.000
-160.000-140.000 180.000 60.000
-160.000-140.000 180.000 60.000
180.000
$END
6.- Save the file and run ECEPPAK
In the command line type:
recepp.s EDMC ten_ala_edmc TEN_ALA_EDMC x x 1
7.- As output, the program writes three different type of files:
(a) main_out.TEN_ALA_EDMC with a description of
the results of the conformational search procedure;
(b) outo.TEN_ALA_EDMC a file containing all the conformations accepted
by the Monte Carlo procedure (for each of them, the first line
lists the different energy terms, the next line(s) contains the sequence
(in ECEPP format) followed by the list of dihedral angles that describe the
conformation, and
(c) A10edmc###.pdb files (### represents the number of accepted conformation)
containing the Cartesian coordinates of the the conformations accepted
by the Monte Carlo procedure.
As mentioned in tutorial.two, because the minimization process uses parameters
that are machine-specific, different energy values can be obtained in
different computers. In the case of the Monte Carlo runs, the reproducibility
of a run in different computer systems is further exacerbated due to the fact
that generation of conformations from minima that are slightly different
may lead to completely different conformations, i.e. the conformational path
followed by the method soon diverged.
Since this type of calculations is usually intended to search for the global
energy minimum of the potential function, extensive tests have demonstrated that
in long simulations the EDMC method locates the conformation corresponding the
global minimum, independently of the conformational path it chooses to follow.