file: tutorial.three Conformational search using the EDMC method ------------------------------------------- We are going to used the electrostatically Driven Monte Carlo method to produce a conformational path corresponding to a series of energy minima of the potential energy surface. During a very long run, the method is likely to find the conformation corresponding to the global energy minima (at least, for relatively small sequences with no more than 30 amino acid residues). In this example, however, we are going to produce a short test run with only five accepted conformations. We are going to use here a a five-residue chain of L-alanine to show how to carry out a conformational search with ECEPPAK. In this case, the amino acid sequence is shorter than those used in tutorials one and two in order to speed up the computations. 1- Generate an input file with suffix "inp" ( i.e., ten_ala_edmc.inp) that will contain the instructions for ECEPPAK. 2.- Include in ten_ala_edmc.inp the $CNTRL Data Group to define the type of run (EDMC) and ask the program to write the coordinates corresponding each accepted conformation in PDB format. $CNTRL runtyp = edmc ! run type is a edmc run PRINT_CART ! print Cartesian coordinates OUTFORMAT =PDB ! format of the Cartesian file is PDB FILE = A10edmc ! prefix of the PDB file is A10edmc $END 3.- Include the $SEQ data group with the amino acid sequence in the ten_ala_edmc.inp file. Let's try using a single-letter code this time. Since this form of sequence specification is not the default we have to use and extra keyword in the $CNTRL data group, we should add: "res_code= one_letter". Now let's input the sequence we'll use again the AMINO-COCH3 and CARBOXYL-NHCH3 at the N- and C-terminus, respectively. The two end groups must be also specified using a one-letter code (see manual for one-letter and three letter codes for residues and end groups). The $SEQ data group should read, $SEQ A AAAAAAAAAA C $END 4.- We must include in ten_ala_edmc.inp file the $GEOM data group with the set of dihedral angles defining the conformation of the polypeptide chain. Let's specify an extended conformation as input. The $GEOM data group reads, $GEOM 180.000 180.000 -160.000-140.000 180.000 60.000 -160.000-140.000 180.000 60.000 -160.000-140.000 180.000 60.000 -160.000-140.000 180.000 60.000 -160.000-140.000 180.000 60.000 -160.000-140.000 180.000 60.000 -160.000-140.000 180.000 60.000 -160.000-140.000 180.000 60.000 -160.000-140.000 180.000 60.000 -160.000-140.000 180.000 60.000 180.000 $END 5.- The conformational search protocol is defined through a set of specific keywords. These keywords must be included in the data group $EDMC. Most of the EDMC keywords (see manual) are assigned default values. We are going to enter a few of them to indicate how they can be used to specify certain aspects of the conformational search run. Length of the run: One possible manner of specifying the length of the Monte Carlo run is to define the maximum number of conformations accepted by the Monte Carlo criterion. This is done by using the using the keyword MAXIT. Since we want five (5) accepted conformations, we include inside the $EDMC data group: MAXIT=5 Random numbers: since the EDMC procedure uses random numbers, we need to initialize the random number generator by providing an integer (positive or negative). This is done using the keyword SEED: SEED= -5555 Temperature: A parameter associated with the temperature (in Kelvin's degree) for the simulation is defined using the keyword TEMP: TEMP=300 Whenever the search is trap in some region of the conformational space, the method attempt to overcome the barriers by generating conformations with major conformational changes and relaxing the criterion of acceptance by increasing the Temperature parameter. there are a few alternative procedure to change the temperature. One of them, indicated by the keyword THERMAL_SHOCK, is to produce a sudden jump in the temperature The high temperature is defined by the keyword T_UP. Let used this procedure in our example: THERMAL_SHOCK T_UP = 5000 Generation of conformations: the EDMC method utilizes different protocols for generating new conformations. These conformations can be generated by random predictions or by using electrostatic predictions. The following keywords are used to control the process of generation: RAND_TO_ELEC defines the ratio of randomly- to electrostatically-generated conformations. Let's use a ratio of 3:10 RAND_TO_ELEC=0.3 The $EDMC data group reads, $EDMC MAXIT=20 SEED= -5555 TEMP= 300 THERMAL_SHOCK T_UP = 5000 RAND_TO_ELEC=0.3 $END NOTE: In the present test, we are going to start the search from the initial conformation whose geometry is provided in the data group $GEOM. However, it is possible (and quite common) to override this option by requesting a starting conformation with dihedral angles generated at random. This can be easily done using the RAND_START keyword. 6.- The complete file now reads, $CNTRL runtyp = edmc ! run type is a edmc run PRINT_CART ! print Cartesian coordinates OUTFORMAT =PDB ! format of the Cartesian file is PDB FILE = A10edmc ! prefix of the PDB file is A10edmc res_code= one_letter ! use one-letter code to specify input sequence $END $EDMC MAXIT=20 SEED= -5555 TEMP= 300 THERMAL_SHOCK T_UP = 5000 RAND_TO_ELEC=0.3 $END $SEQ A AAAAAAAAAA C $END $GEOM 180.000 180.000 -160.000-140.000 180.000 60.000 -160.000-140.000 180.000 60.000 -160.000-140.000 180.000 60.000 -160.000-140.000 180.000 60.000 -160.000-140.000 180.000 60.000 -160.000-140.000 180.000 60.000 -160.000-140.000 180.000 60.000 -160.000-140.000 180.000 60.000 -160.000-140.000 180.000 60.000 -160.000-140.000 180.000 60.000 180.000 $END 6.- Save the file and run ECEPPAK In the command line type: recepp.s EDMC ten_ala_edmc TEN_ALA_EDMC x x 1 7.- As output, the program writes three different type of files: (a) main_out.TEN_ALA_EDMC with a description of the results of the conformational search procedure; (b) outo.TEN_ALA_EDMC a file containing all the conformations accepted by the Monte Carlo procedure (for each of them, the first line lists the different energy terms, the next line(s) contains the sequence (in ECEPP format) followed by the list of dihedral angles that describe the conformation, and (c) A10edmc###.pdb files (### represents the number of accepted conformation) containing the Cartesian coordinates of the the conformations accepted by the Monte Carlo procedure. As mentioned in tutorial.two, because the minimization process uses parameters that are machine-specific, different energy values can be obtained in different computers. In the case of the Monte Carlo runs, the reproducibility of a run in different computer systems is further exacerbated due to the fact that generation of conformations from minima that are slightly different may lead to completely different conformations, i.e. the conformational path followed by the method soon diverged. Since this type of calculations is usually intended to search for the global energy minimum of the potential function, extensive tests have demonstrated that in long simulations the EDMC method locates the conformation corresponding the global minimum, independently of the conformational path it chooses to follow.