file: tutorial.three



Conformational search using the EDMC method
-------------------------------------------


We are going to used the electrostatically Driven Monte Carlo method to produce a 
conformational path corresponding to a series of energy minima of the potential 
energy surface.  During a very long run, the method is likely to find the 
conformation corresponding to the global energy minima (at least, for relatively 
small sequences with no more than 30 amino acid residues). In this example, 
however, we are going to produce a short test run with only five accepted 
conformations.

We are going to use here a a five-residue chain of L-alanine to show how to carry 
out a conformational search with ECEPPAK. In this case, the amino acid sequence is 
shorter than those used in tutorials one and two in order to speed up the 
computations. 

1- Generate an input file with suffix "inp" ( i.e., ten_ala_edmc.inp) that  will 
contain the instructions for ECEPPAK.

2.-  Include in ten_ala_edmc.inp the $CNTRL Data Group to define the type of run 
(EDMC) and ask the program to write the coordinates corresponding each accepted 
conformation in PDB format.

$CNTRL
runtyp = edmc    ! run type is a edmc run
PRINT_CART             !  print Cartesian coordinates
OUTFORMAT   =PDB       !  format of the Cartesian file is PDB
 FILE  = A10edmc      !    prefix of the PDB file is  A10edmc
$END

3.- Include the $SEQ data group with the amino acid sequence in the 
ten_ala_edmc.inp  file. Let's try using a single-letter code this time.  Since 
this form of sequence specification is not the default we have to use and extra 
keyword in the $CNTRL  data group, we should add: "res_code= one_letter". 

Now let's input the sequence we'll use again the AMINO-COCH3 and CARBOXYL-NHCH3 
at the N- and  C-terminus, respectively. The two end groups must be also 
specified using a one-letter code (see manual for one-letter and three letter 
codes for residues and end groups).

The $SEQ data group should  read, 

$SEQ
A
AAAAAAAAAA
C
$END

4.- We must include in ten_ala_edmc.inp file the $GEOM data group with the set of 
dihedral angles defining the conformation of the polypeptide chain.  Let's 
specify an extended conformation as input. 

The $GEOM data group reads,

$GEOM
 180.000 180.000
-160.000-140.000 180.000  60.000
-160.000-140.000 180.000  60.000
-160.000-140.000 180.000  60.000
-160.000-140.000 180.000  60.000
-160.000-140.000 180.000  60.000
-160.000-140.000 180.000  60.000
-160.000-140.000 180.000  60.000
-160.000-140.000 180.000  60.000
-160.000-140.000 180.000  60.000
-160.000-140.000 180.000  60.000
 180.000
$END
 
5.- The conformational search protocol is defined through a set of specific 
keywords.  These keywords must be included in the data group $EDMC.  Most of 
the EDMC keywords (see manual) are assigned default values.  We are going to 
enter a few of them to indicate how they can be used to specify certain aspects 
of the conformational search run. 

Length of the run: One possible manner of specifying the length of the Monte 
Carlo run is to define the maximum number of conformations accepted by the Monte 
Carlo criterion.  This is done by using the using the keyword MAXIT. Since we 
want five (5) accepted conformations, we include inside the $EDMC data group: 
MAXIT=5

Random numbers: since the EDMC procedure uses random numbers, we need to 
initialize the random number generator by providing an integer (positive or 
negative). This is done using the keyword SEED:
SEED= -5555  

Temperature: A parameter associated with the temperature (in Kelvin's degree) for 
the simulation is defined using the keyword TEMP:
TEMP=300

Whenever the search is trap in some region of the conformational space, the 
method attempt to overcome the barriers by generating conformations with major 
conformational changes and relaxing the criterion of acceptance by increasing the 
Temperature parameter. 
there are a few alternative procedure to change the temperature. One of them,
indicated by the keyword THERMAL_SHOCK, is to produce a sudden jump in the 
temperature The high temperature is defined  by the keyword T_UP.
Let used this procedure in our example:
 THERMAL_SHOCK  T_UP = 5000


Generation of conformations: the EDMC method utilizes different protocols for 
generating new conformations. These conformations can be generated by random 
predictions or by using electrostatic predictions.
The following keywords are used to control the process of generation:
RAND_TO_ELEC defines the ratio of randomly- to electrostatically-generated 
conformations. Let's use a ratio of 3:10

RAND_TO_ELEC=0.3

The $EDMC data group reads,

$EDMC
MAXIT=20
SEED= -5555
TEMP= 300
THERMAL_SHOCK T_UP = 5000
RAND_TO_ELEC=0.3
$END

NOTE: In the present test, we are going to start the search from the initial 
conformation whose geometry is provided in the data group $GEOM. However, it is 
possible (and quite common) to override this option by requesting a starting 
conformation with dihedral angles generated at random. This can be easily done 
using the RAND_START keyword.  

6.- The complete file now reads,

$CNTRL
runtyp = edmc    ! run type is a edmc run
PRINT_CART             !  print Cartesian coordinates
OUTFORMAT   =PDB       !  format of the Cartesian file is PDB
 FILE  = A10edmc      !    prefix of the PDB file is  A10edmc
res_code= one_letter   ! use one-letter code to specify input sequence
$END

$EDMC
MAXIT=20
SEED= -5555
TEMP= 300
THERMAL_SHOCK T_UP = 5000
RAND_TO_ELEC=0.3
$END

$SEQ
A
AAAAAAAAAA
C
$END

$GEOM
 180.000 180.000
-160.000-140.000 180.000  60.000
-160.000-140.000 180.000  60.000
-160.000-140.000 180.000  60.000
-160.000-140.000 180.000  60.000
-160.000-140.000 180.000  60.000
-160.000-140.000 180.000  60.000
-160.000-140.000 180.000  60.000
-160.000-140.000 180.000  60.000
-160.000-140.000 180.000  60.000
-160.000-140.000 180.000  60.000
 180.000
$END


6.- Save the file and run ECEPPAK


In the command line type:
    
     recepp.s EDMC ten_ala_edmc TEN_ALA_EDMC x x  1

7.- As output, the program writes three different type of files:
(a) main_out.TEN_ALA_EDMC with a description of
the results of the conformational search procedure; 
(b) outo.TEN_ALA_EDMC a file containing all the conformations accepted
by the Monte Carlo procedure (for each of them, the first line
lists the different energy terms, the next line(s) contains the sequence
(in ECEPP format) followed by the list of dihedral angles that describe the 
conformation, and
(c) A10edmc###.pdb  files (### represents the number of accepted conformation)
containing the Cartesian coordinates of the the conformations accepted
by the Monte Carlo procedure.

As mentioned in tutorial.two, because the minimization process uses parameters 
that are machine-specific, different energy values can be obtained in 
different computers. In the case of the Monte Carlo runs, the reproducibility
of a run in different computer systems is further exacerbated due to the fact 
that generation of conformations from minima that are slightly different
may lead to completely different conformations, i.e. the conformational path
followed by the method soon diverged.

Since this type of calculations is usually intended to search for the global 
energy minimum of the potential function, extensive tests have demonstrated that
in long simulations the EDMC method locates the conformation corresponding the 
global minimum, independently of the conformational path it chooses to follow.