ECEPP

************************************************************************
* *
* The ECEPP Package *
* *
*************************************************************************

What the Package Does
---------------------

The program performs the following calculations:
1) Single Energy Evaluation.
2) Single Energy Minimization
3) Energy evaluation of Multiple Input Conformations
4) Energy Minimization of Multiple Input Conformations
5) Monte Carlo Search using a generalized MCM (EDMC) algorithm.
6) PRODUCE an energy map for a pair of dihedral angles.
7) Carry out an rms deviations analysis.
8) Variable Target Function Procedure for structure determination.

Getting Started and Compiling the Eceppak Package
-------------------------------------------------
See the file "README" in the main eceppak directory.

How To Run this program
-----------------------

- The script to run the program is called: recepp.s. When you SOURCE the
cshrc file, an ALIAS is set up to SELECT the script for the correct
ARCHITECTURE. Files are stored in the proper subdirectory in eceppak/Scripts.

To run the program you should give a set of arguments, the number of
arguments depends on the architecture. You will get precise information
about the arguments that should be used by typing,
recepp.s

IMPORTANT: if "recepp.s" is not recognized, you need to source the cshrc file.
if the command does not execute properly, then, check your cshrc
file. It may have been set up incorrectly. Look at previous
point "To Start" to do this setup.

What's New
----------

* The old set of ECEPP Input files has been replaced by a more flexible
file structure.
* The main input file contains now a series of cards that define the type
of run and parameters.

* Residue Data file has been enhanced.
This file contains the ECEPP/3 residues and other non-standard ones.
There are 72 residues (including N-methyl residues), and new end groups
defined.
The file is found under eceppak/Data/Residue/rsdata.

Among the changes introduced in rsdata are:
(a) Data on loop closing pairs was added. The program uses a general
treatment for these pairs (introduced by A.Liwo).
(b) It includes N-methyl residues.
(c) Hydration atom types were added in the description of atoms
(old hrs.data).
(d) Description of 1-4 interactions is included in a more general format.
(e) C' was replaced by C, NP in PRO and HPRO was replaced by N to increase
compatibility with PDB format.
(f) Atom type of protons in COOH groups ( ASP, GLU, meASP, meGLU and
Carboxyl-End terminal) changed to type 1, (as in ECEPP/3, no H-bonding
allowed).

* Hydration parameters for different surface models are provided under
the subdirectory eceppak/data/Hydration_files. The SRFOPT set (srfopt.set)
of parameters is defined as the default. Other sets can be used by
modifying the recepp.s script (eceppak/Script/$ARCHITECTURE/recepp.s).

Examples
--------
The Input files provided as examples (directory eceppak/Test)
will give you an idea of the calculations the program is able to do.
There are several subdirectories here corresponding to the different
type of runs eceppak can perform.

FILE(S) EXPLANATION
------- -----------

enk_sol.inp Calculation of surface solvation energy.
To execute type:
"recepp.s ENERGY enk_sol ENK_sol dummy dummy"

enk_checkgrad.inp Checking Gradient calculation.
"recepp.s CHECKGRAD enk_checkgrad ENKGRAD dummy dummy"

enk_sp.inp Calculate energy using a soft-sphere potential.
"recepp.s ENERGY enk_sp ENKSP dummy dummy"

enk.inp EDMC run.
"recepp.s EDMC enk enk_out dummy dummy"

mebmt.inp Minimization (with output from minimizer).
"recepp.s MINIMIZE mpa1ot MPA1OT dummy dummy"

avian.inp ECEPP/3 and solvation energy.
"recepp.s ENERGY avian AVIAN dummy dummy"

cala6.inp Cyclic peptide and solvation energy.
"recepp.s ENERGY cala6 CALA6 dummy dummy"

hisp1.inp EDMC run with two possible states for PRO (UP and DOWN).
and HIS (HID and HIE) residues.
"recepp.s EDMC hisp1 HISP1 dummy dummy"

cys1.inp Input sequence with 1-letter code.
"recepp.s ENERGY cys1 CYS1 dummy dummy"

three_let.inp Input sequence with 3-letters code.
"recepp.s ENERGY three_let THREE_LET dummy dummy"

CPEP.inp Energy minimization of multiple input conformations.
outo.CPEP set of conformations to be minimized.
"recepp.s MINIMIZE CPEP CPEPout CPEP dummy"

ala_map.inp Energy map.

ala_rms1.inp RMS deviation analysis; generation of a reference
conformation.
outo.ala_rms Input conformations for comparison in ECEPP format.
ala_HELIX.pdb Input for reference conformation generation in PDB format
To execute type:
"recepp.s RMS_FIT ala_rms1 ala_rms1 ala_rms ala_HELIX"
As a result you get, among others, a file xray.ala_HELIX
that could be save for future use.

ala_rms2.inp RMS deviation analysis; comparison of a conformation
(file in pdb format) with the reference one.
ala135.pdb Input conformation for comparison in PDB format (with end groups).
xray.ala_HELIX Reference conformation for comparison (in ECEPP format).
To execute type:
"recepp.s RMS_FIT ala_rms2 ALA_RMS2 ala135 ala_HELIX"

timbck.inp Calculate upper and lower bounds for distance constraints
tim.pdb runs from a pdb file.
"recepp.s BOUNDS timbck TIMBCK tim"

vtf_tim.inp Example of a run using the Variable Target Function procedure.
outo.vtf_tim Usually constraints come from NMR experiments
bounds.timbck " recepp.s VTF vtf_tim VTFOUT dummy timbck"

tim_sp.inp Example of a Monte Carlo run combining distance constraints
bounds.timbck and a soft-sphere potential (NMR refinement).

Output files for comparison with your results are provided in directory
test_output.
NOTE: We have noticed that large differences can occur between EDMC runs
in different architectures. This appears to be related to machine precision.
In general, a single energy calculation will tell you if the ECEPP/3 energy
function is working correctly. For EDMC runs, check if the program leads
to a sequence of improved energies.

*******************
* TABLE 1 *
*******************
Conventions:
-----------
Residues can be specified using the ECEPP list number, a three-letter code or a
ONE letter code.

----------------------------------------------------------------------
ECEPP ECEPP 3-letters 1-letter
RESIDUE LIST No. KIND code code
----------------------------------------------------------------------

ALANINE 1 -1 ALA A
ASPARTIC ACID 2 -2 ASP D
CYSTINE 3 -3 CYS C_
GLUTAMIC ACID 4 -4 GLU E
PHENYLALANINE 5 -5 PHE F
GLYCINE 6 6 GLY G
HISTIDINE (HID) 7 -7 HIS H
ISOLEUCINE 8 -8 ILE I
LYSINE 9 -9 LYS K
LEUCINE 10 -10 LEU L
METHIONINE 11 -11 MET M
ASPARAGINE 12 -12 ASN N
PROLINE-DOWN 13 13 PRO P
GLUTAMINE 14 -14 GLN Q
ARGININE 15 -15 ARG R
SERINE 16 -16 SER S
THREONINE 17 -17 THR T
VALINE 18 -18 VAL V
TRYPTOPHAN 19 -19 TRP W
TYROSINE 20 -20 TYR Y
CYSTEINE 21 -21 CYX C
HYDROXYPRO-DOWN 22 -22 HPD P<
NORLEUCINE 23 -23 NOR N<
ORNITHINE 24 -24 ORN O
HISTIDINE (HIE) 25 -26 HIE H-
BENZYL-ASPARTATE 26 -30 BZD B<
ORNITHINE + 27 -25 OR+ O+
HISTIDINE+ (HIP) 28 -27 HI+ H+
LYSINE + 29 -28 LY+ K+
ARGININE + 30 -29 AR+ R+
ASPARTIC ACID - 31 -31 AS- D-
GLUTAMIC ACID - 32 -32 GL- E-
PROLINE-UP 33 13 PRU P%
AZETIDIN 34 13 AZE P*
HYDROXYPRO-UP 35 -22 HPU P>
TYROSINE - 36 -36 TY- Y-
AMINOBUTYRIC ACI 37 -33 ABU Z<
AMINOISOBUTYRIC 38 -38 AIB Z>
SERINOLA 39 -39 SLA S<
allo-ISOLEUCINE 40 -40 AIL I*
AMINOBUTYRIC LOO 41 -41 ASU U<
SXRAYIN1 42 -42 SXY X
SLLXRAYIN 43 -43 SLX X*
GLUTAMIC LOOP 44 -44 GLP E_
LYSINE LOOP 45 -45 LYP K_
DAB LOOP 46 -46 DAB B_
GLYCINE LOOP 47 47 GYP G_
LEUCINE LOOP 48 -48 LEP L_
ASPARTIC LOOP 49 -49 ASX D_
M-DUMMY50(mGLY) 50 -50 M50 @50
MeALANINE 51 -51 M-A @A
MeASPARTIC ACID 52 -52 M-D @D
MeCYSTINE 53 -53 M-C @C_
MeGLUTAMIC ACID 54 -54 M-E @E
MePHENYLALANINE 55 -55 M-F @F
SARCOSINE 56 -56 SAR @G
MeHISTIDINE 57 -57 M-H @H
MeISOLEUCINE 58 -58 M-I @I
MeLYSINE 59 -59 M-K @K
MeLEUCINE 60 -60 M-L @L
MeMETHIONINE 61 -61 M-M @M
MeASPARAGINE 62 -62 M-N @N
MeDUMMY63 63 -63 M63 @63
MeGLUTAMINE 64 -64 M-Q @Q
MeARGININE 65 -65 M-R @R
MeSERINE 66 -66 M-S @S
MeTHREONINE 67 -67 M-T @T
MeVALINE 68 -68 M-V @V
MeTRYPTOPHAN 69 -69 M-W @W
MeTYROSINE 70 -70 M-Y @Y
Me-BMT 71 -71 BMT @Z
MeORNITHINE 72 -72 MOR @O
----------------------------------------------------------------------

ECEPP ECEPP 3-letters 1-letter
END GROUPS LIST No. KIND code code
----------------------------------------------------------------------

AMINO - H2 1 1 H2N H
AMINO - H3+ 2 2 H3N H+
AMINO -CH3 3 3 CH3 M
AMINO-COCH3 4 -4 ACE A
FORMYL 5 -5 FYL F
END-PRO,CIS-H 6 -6 CHP P-
END-PRO,TRANS-H 7 -7 THP P
END-H2+-PRO 8 -8 AHP P+
PYROGLUTAMIC 9 -9 PGL G
AMINO (CYCLIZING 10 10 HN- H_
CARBOXYL - COOH 11 -11 CXH O
CARBOXYL - O 12 12 OCC O-
CARBOXYL-CH3 13 13 CCC L
CARBOXYL-NH2 14 -14 NCC N
CARBOXYL-NHCH3 15 -15 NME C
N, N - DIMETHYL 16 -16 DME D
METHYL ESTER 17 -17 MES T
ETHYL ESTER 18 -18 EES E
AMINO-T-BOC 19 -9 BOC B
CARBOXYL(CYCLIZI 20 20 CXL O_
MPA (HALF S-S) 21 -21 MPA R_
DMP (HALF S-S) 22 -22 DMP D_
CPP(AX) (HALF S- 23 -23 CPP C_
CARBOXYL-CH2F 24 24 CHF S
OCA(AX) (HALF S- 25 -25 OCA A_
OCA(EQ) (HALF S- 26 -26 OCE E_
SCA(AX) (HALF S- 27 -27 SCA S_
SCA(EQ) (HALF S- 28 -28 SCE T_
CPP(EQ) (HALF S- 29 -29 CPE F_
DANSYL 30 -30 DAN W
CARBOXYL 31 31 CXX X
AMINO-CYNAMONIC 32 -32 CYN Y

________________________________________________________________________
Note:
----
`@' is used to indicate N-methyl residues.
`_ 'is generally used to indicate a bridging residue (e.g. C_ indicates
CYSTINE).
`+' and `-' are used to indicate a charged residue (e.g. K+ indicates
charged lysine residue).

Description of the input file:
-----------------------------
The general input to the program is given through a file with
a set of instructions. The program uses a parser to read these instructions.
The parser reads and interpret the first 78 characters of a line. No
distinction is made between lower-case or upper-case letters.
The symbols # and ! are used to indicate the beginning of a comment.
When any of this symbols are encountered, the parser will ignore the
rest of the line.
Instructions related to a given procedure are associated into
the so called "Data Groups". A "Data Group" is identified by a main keyword
which contains the symbol '$' as the first character, i.e. $EDMC, $CNTRL.
Also the keyword $end or $END, should be present, indicating the end of the
Data Group.
Any word included between the main keyword and $end, is considered an
instruction.
This is an example of a Data Group

$CNTRL
runtyp=Energy
$end

The following list contains the Data Groups already defined in ECEPPAK:

$BOUNDS, $BOUND_DEF, $BRIDGE, $CNTRL, $DIST_CONST, $EDMC, $FFIELD,
$GEOM, $GRID, $MINIM, $REGIONS, $RMSFIT, $SCAN, $SELEC_PDB, $SEQ,
$SPEC, $ENERCALC, $VTF, $WINDOWS, $OVERLAP_GRP and $OMCIS.

Three of the Data Groups are considered essential and without them
the program will abort. They are: $CNTRL, $SEQ and $GEOM.

$CNTRL is used, mainly, to indicate the type of calculation the user
wants to perform.

$SEQ provides the sequence of the molecule under study.

$GEOM Contains the set of internal variables (dihedral angles) of the
initial conformation.

Description of the Data Groups
------------------------------
$CNTRL
This Data group is used to define the type of calculation the user would like to
carry out. Also, there are a few instructions, common to different modules, that
are defined here. The data group is essential. The program will not proceed
if the data group is not found.

Keywords of this data group are:
KEYWORD ARGUMENT DESCRIPTION
------ ------- -----------

RUNTYP = Define the type of calculation.

ENERGY -Compute energy.
CHECKGRAD -Check analytical gradient vs. numerical.
MINIMIZE -Carry out energy minimization
EDMC -Carry out EDMC/MCM monte Carlo search.
RMS_FIT -Compute rms deviations and fitting.
BOUNDS -Computes upper and lower bounds from
a reference conformation and generate
a constraint file for future use.
VTF -Carry out a variable Target Function
study.

VERBOSE Print all information available.

CHISCAN Carry out a systematic search with energy
minimization for low conformations of
side chains dihedral angles. Specification
of the keyword RUNTYP = MINIMIZE is
required. The set of dihedral angles to
be scanned should be specified using the
data group $SCAN. Also, NSTEP should be
specified.

NSTEP = number Number of step for the side chain search using
the CHISCAN option.
i.e. if nstep=6 the angles will be search
in increments of 60 degrees.

PRINT_CART To request printing of Cartesian coord.

OUTFORMAT = Format required for the output file
containing the Cartesian coordinates.
ECEPP ECEPP format.
PDB PDB format.
AMBER AMBER (history) format.
CNDO CNDO format.
CA_PDB PDB (with CA only)format.
SEL_PDB PDB (for selected atoms only) format.
This atoms should be specified within
the $SELEC_PDB data group.

FILE = name_of_file Filename of the output Cartesian file.
In case of multiple conformations. A
sequence of files will be written
as name_of_fileNNN.*, where NNN is an
integer from 000 to 999.

NO_HYDRG_IN_PDB Omit printing H atoms in PDB files

NRES = number number of residue on the specified
molecule. It is not essential. The
program will compute this value from
the sequence (see $SEQ data group).

RES_CODE = Specifies the input format of the sequence.
ECEPP ECEPP numbers are used. Default.
THREE_LETTER Sequence specified using a three-letter code.
ONE_LETTER Sequence specified using a one-letter code.

VAR_ANGLES = Used to define the set of variables
ALL All dihedral angles are variable. Default.
BACK Variable are the backbone dihedral angles.
SIDE Variable are the side chain dihedral angles.
SPEC Variable dihedral angles specified through
$SPEC data group.
NONE ALL dihedral angles are fixed.
PHPS Only PHI and PSI Backbone dihedral angles.
BKSD Backbone dihedral angles.

VAR_RES = number Used to define as variables a group of
dihedral angles from specific residues.
VAR_RES represents the number of residues that
contain variable dihedral angles.
The information of the specific residues
(sequence position) is entered through
the $SPEC data group.
The set of dihedral angles to be varied is
defined by selecting a proper value of VAR_ANGLES.
NOTE: Since the keyword VAR_RES works in combination
with VAR_ANGLES, VAR_ANGLES cannot be set to SPEC.

TIME = number Estimated CPU time of the run. Program
will end when this time limit is reached.
Default is 10.0**10 sec.

EMINIMA = number Use to avoid printing of high energy
conformations during multiple evaluation
of energies or minimizations.
Works in conjunction with keywords $ENERCALC
or $VTF.

NOTE: The usage of the following keywords in $CNTRL data group is kept for consistency
with previous version but is not recommended. They were incorporated into other data
groups.

SURFACE_OUT Print exposed surface for atoms.
The keyword SOLVATION= SURFACE must be
specified in datagroup $FFIELD

MULT_CONF = This flag is used to indicate the energy
evaluation or minimization multiple
conformations.
READ conformations are read from file (outo.*).
The name of the input file is passed to
the program through the recepp.s script
as the 4th argument.
RANDOM Generate conformations from random sets
of dihedral angles. In this case, MAXIT
and SEED must be specified.
NOTE: The options of this keyword are
equivalent to keywords READ_CONF and
RAND_START in $ENERCALC and $VTF data groups.

MAXIT = number Maximum no. of randomly generated conformations.
Used with MULT_CONF=RANDOM and MAXIT.

SEED = number Seed for random number generator. Used with
MULT_CONF=RANDOM and MAXIT.

REFERENCE Used to stop EDMC when the ZIMMERMAN
Code of an accepted conformation
matches the one corresponding to the
conformation provided as reference.
now can be specified in $EDMC.
If used with during energy evaluation (or
minimization) or VTF, it will print the
Zimmerman Code of the conformations. This
option is also available (recommended use)
in data groups $ENERCALC or $VTF using
the keyword ZIMMERMAN_CODE.

$BOUND_DEF
This data group works in combination with runtyp= BOUNDS (see $CNTRL keyword)
and the data group $BOUNDS.

The specific keywords of this data group are:

KEYWORD ARGUMENT DESCRIPTION
------ ------- -----------

TYPE_INPUT =
PDB_NO_ENDG Default. input file is PDB with
no end groups.
PDB_WITH_ENDG input file is PDB with end groups.

DELT_R = Upper and Lower bounds can be obtained by:
PERCENTAGE A- adding and subtracting a percentage
(PERCENT) of the actual distance (R)
to the computed value of R, i.e upper
bound= R+ (PERCENT/100)*R. Default.
FIXED B- adding and subtracting a fixed value
(FIXVAL) to the actual distances.

FIXVAL = number See explanation for DELT_R.

PERCENT = number See explanation for DELT_R.

WEIGHT = number Weight associated to the constraints.

IGNORE_H Don't stop if H cannot be identified.

MAXDIST = number Is used to reduce the number of constraints.
Only specified atoms separated by distances
smaller than MAXDIST will be used.
(default is 100000.0).

MINDIST = number Is used to reduce the number of constraints.
Only specified atoms separated by distances
greater than MINDIST will be used
(default is 0.0).

FIRST_RESIDUE = number This keyword allows the use of a portion
of a PDB file to be read and use for
generation of distance constraints.
FIRST_RESIDUE should correspond to the
PDB number of the first residue in the
sequence. Note: sequence must be specified
sequentially and no residues should be
missing.

RESIDUE_GAP = number Distance for residues separated in sequence by
RESIDUE_GAP or more residues will be computed
(default is 0).

$BOUNDS
This data group works in combination with runtyp= BOUNDS (see $CNTRL keyword)
and the data group $BOUNDS_DEF. The group does not have specific keywords. It
is used to enter the names of atoms for with distance constraints are requested
and the weight assigned to the constraint.
example: Computed Bounds between CA atoms and give them a weight of 10.0

CA CA 10.0

$BRIDGE
This data group is used to define the linkage between bridging residues.
The data group requires the specification of pairs of numbers corresponding to
the position in sequence of the bridging residues. The program recognizes
residues that forms bridges. Consequently, there is no need to specify the
number of them.

$DIST_CONST
This data group is used to define the set of distance constraints.
It works in combination with one of the following keywords:
a- RUNTYP= VTF in CNTRL data group, or
b- CONSTR_MOV in $EDMC or $FFIELD data groups.

The specific keywords of this data group are:

KEYWORD ARGUMENT DESCRIPTION
------ ------- -----------

N1PAIR = number - Number of bounds read using atom number
as identification. A tedious procedure
but needed from time to time.

N2PAIR = number - Number of bounds read using specific
alpha-numeric characters for the atoms
and corresponding residue.

RESN1_IS_ONE This flag is used to introduce distance constraints
associated to a sequence without end-groups, i. e.
the first full residue is numbered as 1 (usual case
of constraints obtained from a typical PDB file).
ECEPP ALWAYS assumes that the chain has end groups.
Consequently sequence numbering is usually shifted
by one (+1) from the PDB sequencing.
The flag should be omitted (default) if the residue
numbers in the distance-constraint file are the same
as in ECEPP. (The sequence number is used to identify
the atoms in subroutine CLASS).

DIST_WEIGHT = number - A constant with units of kcal/mol/A that converts
the "Sum of Squares of Errors" into energy. (WEI)

ADAPT_WEI - This and the following keywords are used by EDMC
method. (experimental) ADAPT_WEI is used to
indicate that the weight assigned to the distance
energy term, EDIS, should be adapted during the
course of a conformational search.
The goal is to control the value of the distance
energy term during a simulation. This keyword
should be specified in combination with:
(a) PERCENT_WEI; or
(b) PERCENT_WEI, DELTA_PERC_WEI, MAX_WEI and
MIN_WEI.

PERCENT_WEI = real_number - Defines the 'expected' ratio between EDIS and the
sum of the remaining energy terms.
If the DELTA_PERC_WEI is omitted, the algorithm
will try to keep this ratio approximately constant
during the run.

DELTA_PERC_WEI = number This flag is used to modulate the effect of the
distance constraint energy term on the search.
Works in the following manner:
DELTA_PERC_WEI/MAXIT will be added or subtracted
from the initial PERCENT_WEI during the course of
the run. In this way the algorithm tries to enforce
the distance constraints (when DELTA_PERC_WEI is
positive) while it proceeds toward lower energies.
The search will be directed toward constraints
satisfaction.
If DELTA_PERC_WEI is negative, on the other hand,
the constraints will be less important as the run
evolves and the search will be guided by the
ECEPP/3 energy terms.

MAX_WEI = number Maximum allowed value for DIST_WEIGHT; Works in
conjunction with PERCENT_WEI

MIN_WEI = number Minimum allowed value for DIST_WEIGHT; Works in
conjunction with PERCENT_WEI

SOFT_SWITCH = number Use a linear distance constraint function when
the actual distance, d, is greater than the
upper bound plus the specified number.
From Feng Ni (BRI, Montreal).

SOFT_SLOPE = number Value of the slope on linear function
From Feng Ni (BRI, Montreal).

NUMBER_OF_GROUPS = number Indicate the number of groups (set of protons)
with overlapping resonances. This value, when
specified, should be greater than one (1).
From Feng Ni (BRI, Montreal).

$EDMC
This data group works in combination with runtyp= edmc (see $CNTRL keyword).
This data group is used to define parameters and different alternatives for
the Monte Carlo search.
The EDMC method is a procedure for searching the conformational space a
polypeptide. It is based on a Monte Carlo approach that combines minimization
of the potential energy and a predictive algorithm that attempts to produce
suitable rotations that lead to better energies.

The specific keywords of this data group are:

KEYWORD ARGUMENT DESCRIPTION
------ ------- -----------

MCM - Carries out a Monte Carlo with energy
Minimization search rather than the search
available through the EDMC method.
It is a special case of EDMC, in which all
the perturbations are produced randomly.

MOTION =
CRANKSHAFT - (the default) - backbone dihedral angles
are associated in rotatable pairs.
[ psi(i-1), phi(i)], (where i is the
residue in the i-th position on the sequence)
When a member of a given pairs is selected
for a change, say a rotation 'delta', then,
an opposite rotation, '-delta', is added
to the the second dihedral angle. This type
of movement tend to preserve the global
conformation of a folded polypeptide while
changing the local conformation.

PIELA Varies one backbone angle at a time (makes
large changes)

LAMBDA Varies the angles of rotation of peptide
groups about virtual bond (CA-CA) axes.
Doesn't change much backbone shape, but
rather optimizes the orientation of peptide
groups.

CONSTR_MOV Indicates that distance constraints should
be used. See $DIST_CONST keyword to find
out how to introduce distance constraints.

BACKUP = number Time interval in seconds in which restart
information is punched. (default 3600 s)

RESTART Flag to indicate that the program should
continue a previous search. The program will
look automatically for a backup file.

MAXIT = number Maximum number of steps (accepted
conformations) in MCM/EDMC

RAND_START Start from a randomly-generated conformation.
This key works requires definition of SEED.

OMEGA_180 Works with RAND_START. Keep the omega's at 180.

RAND_TO_ELEC = number Pre-defined ratio of random to electrostatic
sampling; default 0.1. RAND_TO_ELEC=1.0 is
equivalent to the flag MCM.

MAX_REPM = number - Maximum number of repetitions of a
conformation.

MAX_RAND = number - Maximum number of random-prediction trials.

MAX_EL = number - Maximum number of electrostatic-prediction
trials within an iteration.

MAX_THERMAL = number - Maximum number of thermal movements.

EFINAL = number - Target Energy. This represents a way to
stop the search when EFINAL is reached.
default is a very large negative number.

TEMP = number - Temperature used during normal stages
of the search.
The default is doing simulations at
a constant temperature. However, there
are two other alternatives:
'Thermal_shock' and 'adapt_temp'.

THERMAL_SHOCK - Thermal shock Monte Carlo scheme. The
system is suddenly "heated". Keywords that
need to be specified are:

T_LOW = number - lower bound of temperature.

T_UP = number - upper bound of temperature.

NTEMP = number - Number of steps in which the system is
heated from T_LOW to T_UP.

ADAPT_TEMP - Adaptive temperature scheme.
If NHEAT=NCOOL=1, we have THERMAL_SHOCK.

NHEAT = number - Number of heating steps.

NCOOL = number - Number of cooling steps.

T_LOW = number - lower bound of temperature.

T_UP = number - upper bound of temperature.

NPRINT_ELEC = number - printing of electrostatic diagnosis
every NPRINT_ELEC accepted conformations.

OMPROB = number - The priori probability that a cis peptide
bond is being tried to be converted to a
trans bond. The default is 5000 which means
that the program will first attempt at
making all the peptide bonds trans.

HISP_CHANGE = number - The probability that in a given iteration
the program attempts at changing the
conformations of HIS and PRO in the sequence
from PRO-UP to PRO-DOWN, (or vice versa), or
from HIE to HID, or vice versa (default ??).

CONST_SEQ - The program will not change the protonation
form of histidine and the internal geometry
of proline.

TYPE_BKTK = - Defines the set of dihedral angles altered
during backtracking (during heating of the
system).

BACK - Only backbone dihedral angles can be moved.

ALL - All dihedral angles can be moved.

MAX_VAR_BKTK = number - Maximum number of variables that can be
changed simultaneously during backtrack.

REGION_SAMP = - Use the set of sampling regions specified
for specific amino acid.

UNIFORM - Use uniform sampling through specified
regions

NONUNIFORM - Sample through specified regions using
provided weights.

SEED = number - Initialization of the random-number
generator. Any negative number

PRINT_SAMPLED - Print "extra" information from sampling.

NWIND = number - Number of "windows" containing the
specifications of the "bombing ranges", i.e.
the ranges of the residues whose angles
will be targeted by random/electrostatic
sampling procedure. The angles of the other
residues will only change during minimiza-
tions; no changes will be made in them
during sampling. This option is useful, if
you made a point mutation in a large
protein and want to establish quickly the
effect of this mutation on conformation.
In such a case it is good to "bomb" only
the mutated residue, instead of wasting
"munitions" on the whole protein. Default
is to "bomb" the whole molecule.

MAX_BCKB_REP = number The maximum number of times that the same
backbone conformation can be accepted. When
this limit is attained, the new generated
conformations having the same Zimmerman code
will be rejected, unless is an improvement on
the current global minimum. Default value is 20.

PROMET The omegas of Pro and N-Met residues will be searched
with similar probabilities as for PHIs and PSIs.

NPRINT_CONSTR = number - printing of information about distance constraints
every NPRINT_CONSTR accepted conformations.

REFERENCE Used to stop EDMC when the Zimmerman
code of an accepted conformation
matches the one corresponding to the
conformation provided as reference (initial
conformation in file *.inp).

$ENERCALC
This data group is used to request energy evaluation or energy minimization
of a (or many) conformation(s).

The specific keywords of this data group are:

KEYWORD ARGUMENT DESCRIPTION
------ ------- -----------

SINGLE_CONF - Carry out the procedure using as input the
conformation provided in data group $GEOM

READ_CONF - Carry out the procedure starting
from the set of conformations provided
in a separate input file (outo format).

RAND_START - Carry out the procedure starting
from the set of randomly-generated
conformations.

OMEGA_180 Works with RAND_START. Keep the omega's at 180.

MAXIT = number - Maximum no. of randomly generated conformations.

SEED = number - Seed for the random number generator.

REGION_SAMP = - Use the set of sampling regions specified
for specific amino acid.

UNIFORM - Use uniform sampling through specified
regions

NONUNIFORM - Sample through specified regions using
provided weights.

BACKUP = number - This keywords should allow to stop the
procedure nicely. Not implemented, yet.

RESTART - This keywords should allow to restart the
procedure. Not implemented, yet.

NO_MINIMIZATION - Use to check energy terms related to the
distance constraints . No VTF minimization
is being carried out.

CONSTR_MOV - This keyword is used to indicate that distance
constraints are used in the calculation. The key
can be included, optionally, in the $FFIELD data
group.

ZIMMERMAN_CODE - This option is used to print the Zimmerman Code
of the conformation(s).

$FFIELD
- Specific information about the force field used.

The specific keywords of this data group are:

KEYWORD ARGUMENT DESCRIPTION
------ ------- -----------

FORCE_FIELD =

ECEPP - ECEPP/3 force field (the default).
SIMPLE_POTENTIAL - Max Vasquez's quartic potential
for VDW distances.
AMBER - Not implemented yet.
DISCOVER - Not implemented yet.
CHARMM - Not implemented yet.

SOLVATION = - Compute solvation energy.
(the default is NO solvation)

SURFACE -use surface-solvation models
developed by J. Vila and R. Williams.

VOLUME -use volume-solvation model developed by
Joe Augspurger (S_PAR_FILE=volume.set
must be specified).

ELECTROSTATIC - Not implemented yet. Is intended
to compute electrostatic solvation
using the DELPHI program
(B. Honig, Columbia Univ.).

ALL - SURFACE + ELECTROSTATIC

SURFACE_OUT - Print exposed surface for atoms.
The keyword SOLVATION= SURFACE must be
specified.

NO_SOLV_MIN - Used with SOLVATION to indicate
that solvation energy should
be added to the total energy after
energy-minimization of a
conformation, but not used during
the energy minimization process.

RAD_FILE = character_variable - Input file with radia parameters
for different solvation types.

S_PAR_FILE = character_variable - input file with solvation parameters
for different solvation types.
SURFACE-HYDRATION FILES:srfopt.set (default),
jrf.set,oons.set,solprmNW.nmr,optsl27.rall.
VOLUME-HYDRATION FILE:volume.set.

OM_TRANS - Impose a special one-fold potential
on all omega angles to keep them
trans; this goes with the keyword
FORC.

FORC = number - The torsional constant;
the default value is 100

NO_TORSIONALS - Omit torsional terms of the
potential function.

THERMO

TSTART = number

TEND = number

NSTEP = number

CONTACT_ENE = number - Defines the contact energy when
using the simplified potential.
Used with FORCE_FIELD=SIMPLE_POTENTIAL

PH = number - pH value. Not used in the present version.

RES_DBASE = character_variable - Used in some architectures (SUN) to define
the residue data file, or to select a different
file than the default ``rsdata".
Note: In general, the residue data file is
specified in the script file recepp.s.

CUTOFF = - Used to define cutoff in the energy terms.

NONE default.

BLOCK Used when a set of dihedral angles are kept
fixed during the computations. In that case,
the CUTOFF keyword can be used to omit
the calculations of 1-4 and 1-5 interactions
that don't vary during energy minimization.

DISTANCE_CA Not implemented, yet.

OVER_CUTOFF = number - Used to pre-minimize a conformation
using a simple potential function until
every single term of the energy is lower
than the value specified by "number".

NON_OVERLAP_ENER logical flag to requested printing of the
energy of a conformation after relief of
atomic overlaps when the conformation is
subjected to energy minimization using the
simple-potential function.
Should be specified with OVER_CUTOFF or
FORCE_FIELD=SIMPLE_POTENTIAL

VARDIEL - Use a distance-dependent dielectric
constant. Implementation of Feng
Ni (BRI, Montreal).

NOTE: The usage of the following keywords in $FFIELD data group is kept for consistency
with previous version but is not recommended. They were incorporated into other data
groups.

CONSTR_MOV - This keyword is used to indicate
that distance constraints are
used in the calculation. The
key can be included, optionally,
in the $EDMC data group.

$GEOM
This is another essential data group used to define the initial conformation
of the molecule. The program will not proceed if the data group is not found.
The data group should contain the LIST OF DIHEDRAL ANGLES IN A FORMATTED INPUT
(15f8.3). One line per residue (or end group) is necessary or the program will
terminated with error. Blank lines are permitted. In this case, all dihedral
angles will be set to zero, except when random generation of the starting
conformation is requested.

$GRID
These keyword can work in combination with RUNTYP=ENERGY or
RUNTYP= MINIMIZE. Generates an energy grid ( a two-dimensional energy map).
if RUNTYP= MINIMIZE is specified, the program will carry out the following
procedure:
1. The dihedral angles you define for the Phi-Psi map are kept fixed
during minimization.
2.- The program minimize the energy using the remaining variables dihedral
angles.

The program scans two dihedral angles (ANG1 and ANG2) starting from
the values specified in FROM1 and FROM2, respectively. There are two
alternative possibilities for specifying the scanning:
a- To give the final values of the dihedral angles (TO1 and TO2) and the
number of steps (N1 and N2).
b- to give the step size (STEP1 and STEP2) and the number of steps (N1 and N2).

The specific keywords of this data group are:

KEYWORD ARGUMENT DESCRIPTION
------ ------- -----------

ANG1 = character_variable - Name used to describe the first
dihedral angle.Characters allowed
are: PHI(n),PSI(n),OME(n),CHI(n),
TAU(n) where n is the residue number.

ANG2 = character_variable - Name of the second dihedral angle.

FROM1 = number - Initial value of first dihedral
angle.

FROM2 = number - Initial value of second dihedral
angle.

TO1 = number - Final value of first dihedral
angle.

TO2 = number - Final value of second dihedral
angle.

STEP1 = number - step size of first dihedral angle.

STEP2 = number - step size of second dihedral angle.

N1 = number - Number of steps for first dihedral
angle.

N2 = number - Number of steps for second
dihedral angle.

OMSCAN-OK - Used to confirm scanning over an
omega dihedral angle.

$MINIM
This keyword is used to modify a few parameters in the minimization program
of Gay (SUMSL, SMSNO).

The specific keywords of this data group are:

KEYWORD ARGUMENT DESCRIPTION
------ ------- -----------

MINIMIZER =
SUMSL - Use the unconstrained minimization
solver with analytical gradient.

SMSNO - Use the unconstrained minimization
solver with numerical gradient.

MAXFUN = number - Maximum number of function
evaluations allowed.

MAXIT = number - Maximum number of iterations
allowed.

MAXSTEP = number - Maximum value for V(RADFAC).

VTNER1 = number - Helps decide when to check for
FALSE convergence [V(26)].

ABSTOL = number - The absolute function convergence
tolerance [V(31)].

RELTOL = number - The relative function convergence
tolerance [V(32)]

DSCALE = - Not implemented, yet.

NONE

FIXED

VARIABLE

DVALUE = number - Initialization value of the
scale vector D.

FULL_PRINT - Controls SUMSL printing.

PRINT_RES_XG - Prints out values of X's,
gradient and D's on return.

PRINT_STAT - Prints out summary of statistics.

PRINT_INITIAL_X - Print initial X's and D's.

$OMCIS
This datagroup is used to defined the residues for which the reference
conformation for the peptide bond is cis.
Format:
NRES res1 res2 ... resk
where NRES is the number of residues for which the cis conformation of the
peptide bond is taken as the reference; and res1, res2,....resk are the numbers
representing the position of the residue in the sequence.

$OVERLAP_GRP
This data group is used in a version under development.
It is used within the VTF procedure to defined a sets of atoms with
overlapping resonances.
The format is as follows:
IGP GLB NG IR1 G1 IR2 G2 IR3 G3 .........IGn Gn
1 HX1 1 17 HN
2 HX2 5 7 HM0 6 HM1 6 HM2 15 HM1 15 HM2
3 HX3 2 6 HM0 15 HM0

$REGIONS
This data group is used to define the sampling regions for amino acids in
a Monte Carlo type of search.
The sampling can be UNIFORM or NONUNIFORM (this are keywords defined in
data groups $VTF and $EDMC with the keyword REGION_SAMP).
If the sample is UNIFORM the input is specified using the following format:
residue_no. region1 region2 ..... regionM
If the sample is NON-uniform the input is specified as:
residue_no. region1 weight1 region2 weight2 ..... regionM weightM

where:
- residue no. belongs to { 2, inumrs}
- regionI is one of the 16 regions of the PHI-PSI map using the Zimmerman's
code A,A*,... H* , or any the four POPOV's regions. H-, H+ (HELIX) and
S-, S+ (SHEET).
- weightI is an integer indicating the weight used to generate the sampling
probability for the associated region.

Example of UNIFORM sampling:
3 A A* C C*
Example of NONUNIFORM sampling:
3 A 40 A* 10 C 30 C* 10

A continuation line should be indicated with the symbol '\'

$RMSFIT
This data group is used for comparison of one or multiple conformations with
a reference one. It works in combination with the keyword RUNTYP= RMS_FIT.
This module calculates atomic rms deviations, rms distance deviations,
radia of gyration, and is able to produce fitting of conformations.
The program reads different types of reference and input files. By default,
it tries to read the reference conformation from a file named xray.NAME_REF
(where NAME_REF is a name provided by the user. NAME_REF is passed through an
argument of the script that runs the program).
When this file does not exist, the keyword GENERATE_REF should be used to
generate it. As a default for generation of the reference conformation, the
set of dihedral angles provided as input in the $GEOM data group is used.
If a conformation given in PDB format is going to be used as reference,
the user should used the keyword TYPE_REF with the appropriate argument to
indicate this.

The specific keywords of this data group are:

KEYWORD ARGUMENT DESCRIPTION
------ ------- -----------

GENERATE_REF - Used to indicate the generation of a
reference (or target) conformation
in ECEPP format.

TYPE_REF = - Indicates the type of input format of
the file containing the reference (or
target) conformation:

ECEPP ECEPP dihedral angles provided with the
$GEOM data group in *.inp file. Default.

PDB_NO_ENDG Typical PDB where residue No.1 is
the first full residue. No end groups.

PDB_WITH_ENDG Other files written in PDB format
where first and last residues are
end groups.

TYPE_INPUT = - Used to indicate the type of format
of the file to be used as input
(conformation(s) under study).
Acceptable input formats are:

ECEPP ECEPP dihedral angles using `outo'
format.

PDB_NO_ENDG Classical PDB where residue No.1 is
the first full residue. No end groups.

PDB_WITH_ENDG Other files written in PDB format
where first and last residues are
end groups.

IGNORE_H - Used to indicate the program not to
worry about mismatches in the H atom
names when reading PDB files.

INIT_RES = number - Initial residue used on calculation.
IFIN_RES = number - Final residue used on calculation.

ALL_HVY_ATOMS - Calculate rms of all heavy atoms of
the specified residues.

ALPHA_CARBONS - Calculate rms of CA atoms of the
the specified residues.

BACKBONE - Calculate rms of backbone atoms
(including CB) of the the specified
residues.

SIDE_CHAIN - Calculate rms of side-chain heavy atoms
of the the specified residues.

DISTANCE_RMS - Produces an additional report of the
distance rms deviations for the input
conformation(s) with respect to the
reference conformation.

CA_TRACE - Works in conjunction with the keyword
ALPHA_CARBONS. It is a flag to request
the generation of a series of aligned
pdb file with the CA traces.

PDB_ALIGN_HVY - Write a pdb file using alignment of
all the heavy atoms.

PDB_ALIGN_CA - Write a pdb file using alignment of
the CA atoms.

PDB_ALIGN_BACK - Write a pdb file using alignment of
the backbone atoms.

PDB_ALIGN_SIDE - Write a pdb file using alignment of
the side-chain heavy atoms.

METHOD = - Defines the type of algorithm used
to calculate RMS:

GOLUB - Golub method. The default.

KABSCH - Kabsch method. This requires some
IMSL routines.

FIRST_RESIDUE = number - This keyword is used to indicate that the
first residue of the PDB reference file
is numbered as `number' instead of 1.

ADOPT_REF_SEQ - This keyword is used to indicate the
program to adopt the sequence of the reference
conformation when the read conformations
have a different (or incompatible) sequence.

$SCAN
Scan carries out a systematic search of a set of specified dihedral
angles. Angles should be specified in the following way (free format).

residue_no. no_of_dih_angles no_first_dieh ... no_last_dieh

$SELEC_PDB
This data group works in combination with the keywords
PRINT_CART OUTFORMAT= SEL_PDB (in $CNTRL data group)
The data group is used to define the set of atoms included in the
output pdb file. A free format is used to enter atom numbers (integer).

$SEQ
This the last ESSENTIAL data group. It is used to define the sequence
of the molecule. There are three different ways in which the sequence
is defined:
(a) through ECEPP residue numbers (LIST); (b) Using a three-letter code;
or (c) using a one-letter code.
The keyword RES_CODE (in $CNTRL data group) is used to specify the options
described previously. If this keyword is omitted, the program will attempt
to read the sequence as ECEPP residue numbers.

Rules:
-----
(a) ECEPP residue numbers are read using free format (default). Numbers
are integers defined as ECEPP LIST numbers. Check column 2 of Table I
for correct assignment. A blank space is required between numbers.

(b) Three-letter code. These are characters variables defined in column 3
of Table I. A blank space is required between words.

(c) One-letter code. These are characters variables defined in column 4
of Table I. No blank space is required between descriptors (letters,
usually).

$SPEC
This data group is used to specify the set of variables dihedral angles.
This card usage depends on the values of the keywords VAR_ANGLES and
VAR_RES ($CNTRL data group).
(a) When VAR_ANGLES = SPEC is specified in the $CNTRL data group,
1- The VAR_RES should not be present in the $CNTRL data group.
2- The $SPEC data group is obligatory, and it must contain the following
specifications (in free format and one line per residue):

res_num num_var num_1st_var ... ... num_last_var

where:
res_num is the sequence number of the residue containing variables
dihedral angles;
num_var is the number of variables dihedral angles in the residue;
num_1st_var, ..., num_last_var is a list of numbers (integers) that
point to the specific variables dihedral angles in the residue.
The list must contain `num_var' integers.

(a) When VAR_RES = number_of_residues (number_of_residues is an integer)
is specified in the $CNTRL data group,
1- VAR_ANGLES can be given ANY value (all, back, bksd, etc.) with the
EXCEPTION of `SPEC'.
2- The $SPEC data group is required, and it must contain the sequence numbers
of the residue for which the set of dihedral angles will be defined
as the variables. The residues should be given as a list of integers in
free format.

(c) If VAR_RES and VAR_ANGLES are both omitted in the $CNTRL data group,
or VAR_RES is omitted and VAR_ANGLES is set to a value different from
SPEC, then, the SPEC data group is not required.

$VTF
This data group is used to define the parameters for a Variable Target Function
(VTF) calculation ( see as a reference Va'squez and Scheraga. J. Biomol. Struct. &
Dyn. Vol 5(4) 757-784 (1988)). It works in combination with the keyword
RUNTYP = VTF (data group $CNTRL) and the data group $DIST_CONST.

The specific keywords of this data group are:

KEYWORD ARGUMENT DESCRIPTION
------ ------- -----------

SINGLE_CONF - Carry out the procedure using as input the
conformation provided in data group $GEOM

READ_CONF - Carry out the procedure starting
from the set of conformations provided
in a separate input file (outo format).

RAND_START - Carry out the procedure starting
from the set of randomly-generated
conformations.

OMEGA_180 Works with RAND_START. Keep the omega's at 180.

CONST_SEQ - The program will not change the protonation
form of histidine and the internal geometry
of proline.

MAXIT = number - Maximum no. of randomly generated conformations.

SEED = number - Seed for the random number generator.

RANK_ORDER = number - Determines the way the distances
are order for the minimization
steps during an iteration (IORDER).
There are three possibilities:
RANK_ORDER= 0, Order by range,
`a la Braun-Go', i.e.
rank one ==> distance between nearest-neighbor
residues;
rank two ==> distance between second nearest-neighbor
residues; etc.;
RANK_ORDER= 1, Keep same order as in
input distances;
RANK_ORDER= 2, Order by growing from
N-terminus.

MAX_RANK - Parameter used to control the VTF
procedure. Usually, the procedure is
carried out by starting from random
conformations and introducing the
distance constraints up to MAX_RANK
(typically 10). From this run, a set
of conformations is selected and a
second run is carried out with the full
set of distance (i.e. the final rank is
equal to the number of residues in the
chain.

VTF_BY_RANK Indicates that distance should be included using
the established ranks. Otherwise, the procedure
introduces a few distance per minimization.
NOTE: It is generally recommended to use the
keyword VTF_BY_RANK

STEP_RANK = [-]number - STEP_RANK > 0 defines the increment of the rank
for sequential minimizations within an iteration
of the VTF procedure (IFLOV). Additionally,
STEP_RANK different from zero implies that
torsional energy terms will NOT be included in the
energy minimization and disulfide bridge (DSB)
information will NOT be used.
DSB closing at beginning of the VTF procedure interferes
greatly with the possibility of satisfying distance
constraints for smaller ranks. Consequently, it is
recommended to add the DSB as an extra set of distance
constraints. Since ECEPP assigns very high weights to
force DSB, adding the DSB as additional
distances constraints allows you to play with
different values of the weights.
If STEP_RANK < 0 is given, all the distance
constraints will be included at once.
With STEP_RANK=0 the VTF procedure INCLUDES
torsional energy, DSB information and proceeds
including distance constraints with a rank increment of 1.

BIG_VIOLATION = number Is used to determine which conformations are reasonable
after the whole process of generation is finished
(conformation that should be saved). If the maximum
violation is greater than the BIG_VIOLATION (+ 10%),
the conformation is rejected. All the conformations VTF
produces should be reasonable.

STEPS_ON_ERROR = number -At every step of the VTF procedure, we check if the maximum
violation exceeds BIG_VIOLATION. After 'n' consecutive
steps of the vtf procedure that exceed BIG_VIOLATION
(with n=STEPS_ON_ERROR), the generation process is aborted
and a new trial is started. The idea is to cut time in
useless energy-minimization. A conformation that do not
satisfy the distance criteria after "n" steps have little
chances to reorganize later, when additional distances are
added. STEPS_ON_ERROR should not be very small, since
some conformations with distances that exceed BIG_VIOLATION
at certain stage of generation can reorganize later on, as
additional distances are included in the minimization.
(From my experience, values for STEPS_ON_ERROR of
15 to 20 seem to work better).

REGION_SAMP = - Use the set of sampling regions specified
for specific amino acid.

UNIFORM - Use uniform sampling through specified
regions

NONUNIFORM - Sample through specified regions using
provided weights.

BACKUP = - This keywords should allow to stop the
procedure nicely. Not implemented, yet.

RESTART - This keywords should allow to restart the
procedure. Not implemented, yet.

NO_MINIMIZATION - Use to check energy terms related to the
distance constraints . No VTF minimization
is being carried out.

ZIMMERMAN_CODE - This option is used to print the Zimmerman Code
of the conformation(s).

$WINDOWS
This data group contains the ranges of residues whose dihedral angles will be
changed during sampling. There are as many non-empty lines as the number
of these ranges is. Each line contains the following two integers, read in free
format:

iw1 (the first residue of the range); iw2 (the last residue of the range).

Description of the Data included for each Residue in rsdata
_____________________________________________________________

A few changes in the original ECEPP3 residue data file have been included to add
flexibility to the program. The goal is to minimize the coding of instructions
that are residue-specific.
example:

ISOLEUCINE ILE I 0 0 0.000 F
19 4-0.9437972 0.3305252-0.0993245 0.9950551
-8 4
1.35 3 1 4
1.35 3 1 5
1.35 3 1 6
1.35 3 1 7
0.350197-0.548499-0.759283 3 5 3 8 0 0
-0.379217 0.310917-0.871508 5 9 3 11 1 1
0.999962-0.005163-0.007059 5 10 3 14 0 -1
0.350197-0.548498-0.759284 10 16 3 17 0 0
N 14 22 -4.59 11 7 10
-0.4226 0.9063 0.0 HN 2 2 2.27 7 4 6
1.4530 0.0 0.0 CA 9 7 0.82 17 11 16
1.7797 -0.4805 0.9222 HA 1 0 0.26 11 7 10
1.9888 -0.8392 -1.1617 CB 9 7 -0.06 0 17 19
1.9587 1.4440 0.0 C 7 14 5.80 11 8 10
1.1648 2.3835 0.0 O 17 26 -4.95 8 0 0
1.6625 -1.8692 -1.0179 HB 1 0 0.32 17 11 16
1.4086 -0.3635 -2.4951 CG2 6 5 -0.96 17 14 16
3.5188 -0.8471 -1.1725 CG1 6 6 -0.25 0 11 13
0.3225 -0.4528 -2.4713 HG2 1 0 0.32 14 0 0
1.6840 0.6781 -2.6602 HG2 1 0 0.32 14 0 0
1.8059 -0.9770 -3.3037 HG2 1 0 0.32 14 0 0
3.8906 0.1742 -1.2551 HG1 1 0 0.19 0 17 19
3.8906 -1.2463 -0.2289 HG1 1 0 0.19 0 17 19
4.0546 -1.6863 -2.3342 CD1 6 5 -0.96 0 0 0
3.7005 -2.7124 -2.2348 HD1 1 0 0.32 0 0 0
3.7005 -1.2695 -3.2771 HD1 1 0 0.32 0 0 0
5.1444 -1.6749 -2.3183 HD1 1 0 0.32 0 0 0
3.2771 1.5756 0.0

Description of first line:
(TITL(L,I),L=1,4),ARES(I),ONE_LET(I),NFATO(I), QQQ_READ(I),PK0_READ(I), NMETR

(TITL(L,I),L=1,4) Residue name.
- ARES(I) Three-letter-code residue identifier, used for sequence definition.
- ONE_LET(I) One-letter-code residue identifier, used for sequence definition.
- NFATO (I) Indicates that the 3 initial atoms (N, HN, and CA) of the first
full residue should be generated using the data from the amino-end
(NFATO=0), or the data from the residue (NFATO=1). In particular,
this assignment affects the charges of these atoms.
- QQQ_READ(i) Net charge of the ionized residue (used on specific versions
of the code).
- PK0_READ(I) pKa0 of the ionizable group (used on specific versions of the
code).
- NMETHYL Logic variable to indicate if this is an N-methylated residue.

Description of second line:
NATOMS(I),NCHI(I),SNTH2(I),CSTH2(I),SDEL(I), CDEL(I)
Same as in ECEPP/3 manual.

Description of third line:
KNDRES(I),NT,NGEOM(I),NTOR(I)
- KNDRES(I) and NGEOM(I) same as in ECEPP/3 manual.
- NTOR(I) is the number of torsional terms that are associated with EXPLICIT
dihedral angles, while NT is the TOTAL number of the torsional terms associated
with a residue, i.e. including the possible angles of the bridge formed by
this residue. The parameters of the IMPLICIT torsional angles (i.e. those which
will be calculated from the Cartesian coordinates after a bridge is formed) are
stored in the arrays after the parameters of the explicit angles.

Description of 4th to 7th lines:
AR(J,I),NBB(J,I),NSS(J,I),NANG(J,I)
Same as in ECEPP/3 manual.

Description of 8th to 11th lines:
(CHIANG(L,J,I),L=1,3),NDPT1(J,I),NDPT2(J,I), NUM(J,I),LRT1(J,I), IBRNCH(J,KINDI),
ISHFK(J,KINDI)
- CHIANG, NDPT1, NDPT2, NUM and LRT1 same as in ECEPP/3 manual. LRT1 is used in
this program (not in the original ECEPP/3).
- IBRNCH The program now handles more than one branch on the side-chains. If there
is a branch this is defined specifically (IBRNCH =1) for the bond that branches
out. Also, to bring compatibility with the IUPAC conventions (ECEPP reads
the torsional angles following this convention), a variable ISHFK is defined
for each bond to indicate is there is a shift of the bond definition given
in rsdata. In some cases, like ILE, organization of the rsdata file for
generation purposes (in ECEPP/3) requires a different rearrangement of the
bonds numbers. In the specific case of ILE, lines 9 and 10 indicate that
the dihedral angle input for bond 2 and 3 have to be exchanged.

Description of 12th to 31th lines:
(XOORD(L,J-1,I),L=1,3),ALPHA(J,I),LTYPE(J,I), NTYPE(J,I),CHG(J,I),NSN15(J,I),
NSN14(J,I),NFN14(J,I)

- XOORD, ALPHA, LTYPE, CHG, NSN15, NSN14 and NFN14 same as in ECEPP/3 manual.
- NTYPE atom type for surface solvation models.

How to Build a File with Distance and/or Dihedral Angle Constraint (bounds.*)
______________________________________________________________________________

A distance constraint energy term can be used in the calculations.
The algorithm used in this program represents a modification of the one
originally implemented in Max Vasquez's VTF (Vasquez, M. & Scheraga, H. A.
1988. "Variable-Target-Function and Build-up procedures for the calculation
of protein conformation Application to bovine pancreatic trypsin inhibitor
using limited simulated nuclear magnetic resonance data."
J. Biomol. Struct. Dyn. vol. 5, 757-784.

The functional form is:
Econs= WEI_ENE * Sum [ wei(j)*(| rj - R|)^2];
j in {pairs}

for rj < R or rj < R with R an upper or lower bound.
where
rj is the actual interproton distance.
R is either, an upper bound or a lower bound.
wei is a factor or weight used to make the constraint more (or less)
relevant with respect to others.
WEI_ENE is a factor that weights the distance energy term with
respect to other energy terms ( like electrostatic, torsional,etc.)

Distance constrains are included in the calculations in the following
manner.
1.- Use the $DIST_CONST data group (NOT the $BOUNDS) to specify the
number of constrains and setup other parameters.

2.- Generate a file (bounds.FILENAME) containing the information for each
constraint as in the following examples. There are two alternative ways to
describe the constraints:

a.- Using the ecepp number for the specific atoms. The information should
be written in one line per constraint (78 characters or less), and given in
a free format as:
mol1 iatm1 mol2 iatm2 lowb upb weight
where
mol1 is the molecule containing the first atom (integer).
iatm1 is the first atom defining the constraint (integer).
mol2 is the molecule containing the second atom (integer).
iatm2 is the second atom defining the constraint (integer).
lowb lower bound (real).
upb upper bound (real).
wei weighting factor (real).

example
1 34 1 51 1.900 5.000 10.0

b.-
1 1 HCA 1 1 HCB -1.000 3.000 10.0
mol1 res1 atm mol2 res2 atm low-b upp-b weight
where
mol1 is the molecule containing the first atom
res1 is the residue containing the first atom
C if lower-bound is -1.000, then VDW contact is assumed.

example:
mol1 res1 iatm1 mol2 res2 iatm2 lowb upb weight
(the file is a FORMATTED one).
1 1 CA 1 29 CA 7.186 7.942 10.000

will specify: the atoms defining the distance, upper and lower
bounds, and a parameter (a weight) for the constraint.

A line starting with ! is considered as a comment.
See the example file bounds.timbck
You should enter the number of constraints used in $DIST_CONST as
N1PAIR= mmm and N2PAIR= nnn, where N1PAIR and N2PAIR are the number
of constrains specified using format (a) or (b).

NOTE: if a lower-bound is -1.000, then VDW contact is assumed.

DIHEDRAL ANGLES CONSTRAINTS can also be included in the simulations.
The functional form for the penalty energy is the same one used
for the distance constraints (formula written above).
The dihedral angles constraints are included in the 'bounds.*' file
as follows:
i. The word DIHEDRAL must come after the last distance constraint.
ii. The next line should contain a number (real) that represents the
conversion factor (or penalty weight), WEIDIH (equivalent to
WEI_ENE in formula above).
iii. Each subsequent line contains a description of a dihedral angle
constraint with the following information:
residue number, dihedral angle number, expected mean value, maximum
deviation, and specified weight value.
example:

DIHEDRAL
100.00
2 1 -40 40 1000.00
5 2 -60 20 1000.00

where:
WEIDIH = 100.00.
Two dihedral angle constraints are included:
first: residue 2 , dihedral angle 1 (phi) is forced to adopt a value of
-40 deg. and the allowed deviation is 40 ( allowed values are those within
the interval [-80,0] )
second: residue 5, dihedral angle 2 (psi) is forced to adopt values within
the interval [-80, -40].

Random Number Generators,
------------------------
The program uses two random number generators. The serial version
uses the VRND program (Prof. Ken Wilson).

The parallel version uses PRNG (Prof. Mal Kalos).
PRNG (parallel random number generator) is freely available
by anonymous ftp.
It's really easy to install it on any 64-bit machine such as the SGI PC.

CTC staff can get it without ftp'ing.

cp /afs/theory/archive/ftp/pub/utilities/prng.tar.Z to wherever you
want to build it. There are also two Makefiles in the eceppak PRNG
directory, Makefile.DEC8400 and Makefile.IBMSP2.
Any of these makefiles should be appropriately changed for the specific
architecture where the user intend to install the program.

If you are not CTC staff, here's how you can get the tar file:

ftp ftp.tc.cornell.edu
login in as user anonymous
give email address as password
cd pub/utilities
get prng.tar.Z

-----------------------------------------------------------
IF IMSL libraries are not available in your computer:

Edit the file orient1.F and comment the lines:

#ifdef AIX
CALL DEVCSF (3,RTR,3,EIGVAL,T,3)
IJUMP=1
#endif

Also, removed "-limsl" from the "make" file,

LIBS = -L/usr/local/lib -limsl

should read:

LIBS = -L/usr/local/lib

Finally, recompile the program.

Only the "GOLUB" option will work for calculations of rms deviations.