@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@ A N A L Y Z E @
@ USER MANUAL @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
----------------------------------
I. The main data file (infile.inp)
----------------------------------
The input is organized in the following ten ECEPPAK-style data groups:
1. $TITLE - the title of the run (a single line);
2. $CNTRL - main control variables;
3. $SEQ - sequence data;
4. $BRIDGE - covalent bridge data;
5. $PROPERTY - control data for conformation-dependent property evaluation;
6. $RMSCALC - control data for RMS calculation;
7. $BOUNDS - reading the NMR-derived distance constraints (interactive at
the moment);
8. $CHROMO - control data for calculating inter-chromophore distance;
9. $CLUSTER - control data for cluster analysis.
10.$SUPAT - specifies the atoms being superposed.
11.$NOES - specifies the options in the calculations of NOE spectra and
coupling constants;
12.$MORASS - parameters for theoretical evaluation of the NOE spectra;
13.$COUPLING - specification of the calculation of coupling constants;
14.$MARQUARDT- options in least-square or maximum-entropy fitting of the
theoretical to the experimental NOE spectra.
-------------------------------------------------------------------------------
1. The $CNTRL data group.
RUNTYP=name where name is one of the following:
PROPERTY - calculate conformation-dependent properties (e.g. the RMS
deviation from a reference structure).
AVE_COORD - Boltzmann-average the Cartesian coordinates of the
supplied conformations.
AVE_DIST - Boltzmann-average the inverse sixth powers of interproton
distances (can be used to evaluate the "average" 2D-NMR
spectrum).
CLUSTER - Do a cluster analysis of the input set of conformations.
MORASS - Calculate NMR spectra and optionally do NMR fitting. This
option is NOT available with the version of the program
that does clustering of large conformation sets.
ANGLES - just calculate the dihedral angles from a supplied pdb file.
MINVAR - use the minimum-variance method (programmed by Dr. Mark D.
Shenderovich of Latvian Academy of Sciences) instead of the
minimal spanning tree method for clustering. Recommended, if
division for calculating subset averages or pre-selecting
representative conformations for e.g. NMR fitting is requested.
This option is NOT available with the NMR-fitting program. The
minimal spanning tree method is on the other hand recommended for
"taxonomy" of the set.
VERBOSE - print information about the progress of the run on screen.
NRES=number - the number of FULL residues in the chain. This does not need to
be given, because the number of residues is calculated when
reading the $SEQ data group.
RES_CODE=name where name is one of ECEPP, THREE_LETTER, ONE_LETTER; see
the ECEPPAK manual for details.
FULLPRINT - use if you want messages such as "no space left on device".
RES_DBASE=name - replace the residue data base defined in the analyze script
with name. If name contains at least one a trailing slash, it is
assumed to contain the absolute search path, otherwise it is
appended to the ${DBASDIR} environmental variable to define the
full search path.
T=number - the absolute temperature used in conformational averaging.
Next few keywords are associated with the runtyp=CLUSTER.
PRINT_PDB=number - when doing the cluster analysis the program will print the
Cartesian coordinates (in pdb format) of at most number
lowest-energy conformations in a family that are within
an energy cut-off specified in the $CLUSTER data group.
number=0 means that no pdb files will be created and number=1
that just the leading members of the subsequent families
will be written to pdb files.
INCLUDE_TERM - when present causes all amino (NH2, NH3+, dummy, and cyclizing),
carboxy (COOH, COO-, dummy and cyclizing), as well as the CONH2
group to be merged with the first (last) residue, rather than
counted as separate "residues" in the generated *.pdb files.
This is very useful for the compatibility with AMBER-read pdb
files. Note that otherwise the first full residue ALWAYS has
number 2.
FILE=name - the prefix of the pdb files; default is the first argument of the
analyze script.
VIRT_CHAIN - virtual chains' internal coordinates will be printed for the
leading members of the families.
BETA_TURNS - turn analysis will be carried out.
CHIRALITY - covalent bridges' dihedral angles will be calculated.
H_BONDS - H-bond analysis will be carried out.
NRCLUS1=number, NRCLUS2=number - the first and last residue to be superposed
in clustering; default 1 and INUMRS, respectively.
TREE - Construct the minimal tree and then partition it (usually takes much, much
longer time than partitioning without explicit construction of the
minimal tree, but gives some idea about how to best partition the set
of the conformations).
DISTMAT - write the upper triangle of the matrix of RMS deviations between
conformations to a disk file.
-------------------------------------------------------------------------------
2. The $PROPERTY data group
ELLIPSE= - calculate the characteristics of the moment of inertia:
AXES - calculate main axes of the moment of inertia;
VOLUME - calculate the volume of the ellipsoid;
RGYR - calculate the radius of gyration;
ALL - calculate all above.
RMS= - calculate the RMS deviation from a reference structure:
OVERALL - superpose all selected atoms of the current structure on
all selected atoms of the reference structure;
ARRAY - superpose the structures fragment by fragment, the fragment
having increasing length, which gives a 2D array of values,
whose ij-th element is the RMS deviations of the residue
i through j fragment of the structures on optimal
superposition.
NMR - calculate the deviation from the interproton distances retrieved from the
NOE data.
CONTACTS= - calculate the number of contacts between the following
types of side chains:
HYDROPHOBIC - hydrophobic side chains;
HYDROPHILIC - hydrophilic side chains;
ALL - all side chains.
The distance between side-chain centroids is taken as the criterion of contact;
for details see A. Liwo, M.R. Pincus, R.J. Wawak, S. Rackovsky, H.A. Scheraga,
Protein Science, 1993, 2, 1715-1731.
CHROM_DIST - calculate inter-chromophore distances. Requires the $CHROMO data
group.
DIST_PROP - estimate the distribution of the specified above property(ies),
based on the Boltzmann distribution.
PROP_FILE= - the prefix for the property files. They will have
extensions corresponding to property names:
'ax1','ax2','ax3' - the lengths of the axes of the moment of inertia;
'vol' - the volume of the ellipsoid;
'rms' - the RMS deviation from the reference PDB structure or the RMS
deviation from the NOE distances;
'eng' - energy;
'rgy' - the radius of gyration;
'hpb' - number of hydrophobic/hydrophilic contacts;
'ntc' - total number of contacts;
'cmp' - the fraction of hydrophobic contacts;
'nat' - the fraction of native contacts (a reference contact file is required)
'chr' - inter-chromophore distance.
-------------------------------------------------------------------------------
3. The $CLUSTER data group (required for RUNTYP=CLUSTER).
These data are read in free format.
1st line: Number of RMS cut-off values and the RMS cut-off values.
If a value is negative, it means that clustering will be carried out at
abs(cutoff) cut-off value and for this cut-off the dihedral angles of the
leading members of the families and, optionally, their Cartesian coordinates
will be written to disk files. Up to 10 cut-off values are allowed.
2nd line: Energy cut-off (kcal/mol). At a given RMS cut-off the dihedral
angles of the leading members of the families than are within Ecut above
the lowest-energy family will be written and the pdb files created.
If PRINT_PBD>1 in $CNTRL, same cut-off will be applied to writing the pdb
files of the conformations within a family.
-------------------------------------------------------------------------------
4. The $SUPAT data group (required for RUNTYP=CLUSTER or RMS calculations).
1st line: number of atom types to be superposed (free format).
2nd line: the ECEPP-style names of these atoms (15(a4,1x)).
Two additional "wildcards" are allowed:
ALL - all the atoms;
HEAV - all non-hydrogen atoms.
-------------------------------------------------------------------------------
5. The $RMSCALC data group (required for RMS calculations).
REF_FILE= - the name of PDB file containing the reference structure.
NRSUP= - the number of the first residue to superpose (#1 is the
N-terminal group, #2 is the first full residue).
NRSUP2= - the number of the last residue to superpose.
PRINT= - indicates whether to print superposed coordinates:
NONE - do not print superposed coordinates (default);
SUPERPOSED - print only the coordinates of the atoms that belong to the
"superposable" list, but corresponding to all residues and
blocking group (useful for drawing);
ALL - print coordinates of all non-hydrogen atoms from the reference and
current structure at optimal superposition of the chosen atoms.
The superposed coordinates are printed to the main output file. For
PRINT=SUPERPOSED, the pairs of PDB structures are produced, containing the
reference (chain A) and current (chain B) structure. For PRINT=ALL the
reference structure (chain A) is printed only in the beginning of the output
file, when reading the reference PDB file.
The superposable atoms are defined in the $SUPAT group (see above).
-------------------------------------------------------------------------------
6. The $CHROMO data group.
IDON= - the number of the donor side chain (default: the first
tryptophan residue in the chain).
IACC= - the number of the acceptor side chain (default: the first
dansyl or tyrosine residue in the chain).
-------------------------------------------------------------------------------
7. The $BOUNDS data group.
NMR_FILE= - the name of the file with interproton distances; KEYBOARD
assumes that the name will be typed in from the keyboard.
-------------------------------------------------------------------------------
8. The $NOES data group.
MODE= - indicates the purpose of NOE calculation:
SIMUL - just calculate NOE spectra for the supplied conformations;
FITTING - fit the statistical weights of the conformations so as to best
reproduce the experimental NOE spectrum.
CONF= - specifies, which conformations will be considered in the
calcs:
CLUSTERED - a cluster analysis will precede NOE calculation (see below);
ALL - the NOE spectra of all conformations will be calculated.
AVERAGE - says whether the spectra will be Boltzmann averaged over the whole
ensemble (for CONF=ALL) or over families (CONF=CLUSTERED).
BYSTROV= (NO)
YES - calculate the coupling constants,
NO - do not calculate coupling constants.
Next keywords are relevant for MODE=FITTING.
GEMINAL= (NO)
YES - fit the NOEs from geminal protons;
NO - do not fit the NOEs from geminal protons.
VICINAL= (YES)
YES - fit the NOEs from vicinal protons;
NO - do not fit the NOEs from vicinal protons.
RIGID= (NO)
YES - fit the NOEs from "rigid" protons (i.e. those with fixed distance,
other than geminal);
NO - do not fit the NOEs from "rigid" protons.
NOE= (INTER)
ALL - fit the NOEs from all pairs of protons,
INTER - fit the NOEs from protons belonging to different residues only,
LONG - fit the NOEs from protons belonging to non-sequential residues only.
Important! For N-C-cyclic peptides (but NOT for pairs of residues linked by
side chains), the first and the last residues are considered sequential.
ANTINOE= (NONE)
ALL - as for NOE
INTER - as for NOE
LONG - as for NOE
NONE - do not fit anti-NOEs.
The routine checks for compatibility between the values of the NOE and ANTINOE
keys. The rule is that the exclusion criterion for anti-NOEs must not be weaker
than that of NOEs. For example, if the user specified NOE=LONG ANTINOE=INTER,
the second criterion being weaker, the routine will automatically set
ANTINOE=LONG.
WEI_COUPL= (0.1)
The weight of the coupling-constant term in the minimized sum.
A0,A1,A2= (1.9, -1.4, 6.4) 2
The constants in Bystrov equation: J = A0 + A1 cos(t) + A2 cos (t)
Note! At present only the coefficients of type 1 angles can be input.
The coefficients of -CH2- protons are automatically computed from those by
averaging over the two methylene protons.
SA0,SA1,SA2= (3*2.0) - the a priori standard deviations of the above
constants.
WBASE= (1.0)
The base in weight calculation of the NOE intensities;
weight(i)=1/(wbase+Vexp(i)).
ALPHA_ENT (0.0) - the weight of the entropy factor. The complete minimized
function has the form:
_ _
F(W) = FI(W) + ALPHA_ENT * SUM W(i)*ALOG(W(I))
i
The entropy term forces the weights to be equal to each other, while
the "sum of error" term FI picks up the conformations that best fit to the
experimental observables; the latter usually results in the selection of only
a few out of several hundred, which is regarded rather strange by the authors
of the program. Just a little admixture of the "disorder" term gives
more reasonable results with more conformations with significant weights.
Truly, "maximum entropy is not a technique it is THE technique" - as said a
famous statistician. But, no-one can offer any convincing way of choosing
ALPHA_ENT. So, you guys have to play with different values and use your
common sense. It is often a hard work, but when you're through with it,
the results are worth something. Good luck!
The experimental NOE spectrum is read fro the ${prefix}.noe file,
where ${prefix} denotes the input-file prefix.
-------------------------------------------------------------------------------
9. The $MORASS data group.
-------------------------------------------------------------------------------
TAUC= (0.1)
The correlation time (ms).
TAUM= (4.5)
The correlation time of methyl protons (ms).
TIME= (0.2)
The mixing time (ms).
VOL0= (100.0)
SFRQ= (500.0)
Spectrometer frequency (Hz)
CUTT= (6.0)
Cut-off distance for NOE printing (A).
-------------------------------------------------------------------------------
10. The $COUPL data group (formatted)
For consecutive residues i = 1,inumrs (note that number 1 corresponds to
the N-terminal end group and inumrs to the C-terminal end group) the following
records are read:
Card 1 (free format):
ii,nang(i),(iang(j,i),j=1,nang(i))
ii - number of that residue (for control only)
nang(i) - number of dihedral angles pertaining to that residue for which
the coupling constants are calculated
iang(j,i) - ECEPP residue-relative numbers of those angles (the order is:
phi,psi,omega,chi1,...
Card 2 (free format):
(ityp_coupl(icoupl(j,i)),coupl(icoupl(j,i)),phase(j,i),j=1,nang(i))
ityp_coupl - the type of Bystrov equation pertaining to a given angle
coupl - the measured coupling constant
phase - the phase angle to be subtracted in order to obtain Bystrov's
theta angle. Typically, the phase is 60 deg for L-residues
and -60 deg for D-residues.
At present type 1 corresponds to non-glycine angles and type 2 to glycine
angles.
-------------------------------------------------------------------------------
11. The $MARQUARDT data group
This data group contains parameters for the Marquardt and SUMSL minimizers,
which are used in fitting the computed NMR characteristics to the experimental
data.
MINIMIZER
MARQ - Marquardt's method is used (no maximum entropy fitting); this
is the default for minimizing the sum of the squares only, without
the entropy term.
SUMSL - the SUMSL algorithm is used; this is the only option when
maximum entropy fitting is requested.
MAXIT (1000) - maximum number of iterations
MAXFUN (2000) - maximum number of function evaluations
MAXMAR (10) - maximum number of inner iterations in Marquardt's method
LAMBDA (1.0D2) - initial value of the Marquardt scaling parameter lambda
VMARQ (1.0D1) - factor to shrink or expand lambda
TOLX (1.0D-3) - tolerance on average change in parameters
TOLF (1.0D-3) - absolute tolerance on function changes to achieve convergence
RTOLF (1.0D-5) - relative tolerance on function changes
TOLLAM (1.0D0) - maximum value of lambda to stop iteration
RLMIN (1.0D-20) - minimum allowed value of lambda
----------------------------------------------------------------------------
For the description of the $SEQ and the $BRIDGE data groups see the ECEPPAK
manual.
------------------------------------------------
II. The input dihedral-angle file (outo.angfile)
------------------------------------------------
This file is produced by ECEPPAK and by ANALYZE when using the clustering
option. The file contains collated formatted data corresponding to subsequent
conformations. Each entry has the following structure:
Card 1: NR, ETOT, EVDW, EEL, ESOLV (I10, 4E15.6)
NR - the number of the conformation (ignored)
ETOT - the total energies
EVDW - van der Waals energy
EEL - electrostatic energy
ESOLV - solvation energy
Card group 2: (LIST(I), I=1,INUMRS) (16I5)
LIST(I) - numeric ECEPP code of the Ith residues in the sequence (including
the end groups).
Card group 3: Dihedral angles of the subsequent residues; one card per residue
or end group:
(ANGULO(J,I),J=1,NANG) (15F8.3)
ANGULO(J,I) - Jth dihedral angle of residue or end group I.
-------------------------------------------------
III. The experimental NOE data file (noefile.noe)
-------------------------------------------------
This file contains the experimental data in MORASS-like format (formatted
input). Each line corresponds to a NOE from a single proton pair and contains
the following data:
NUMPROT1,NUMPROT2,VOL,RES1,NUMRES1,ATOM1,NUMAT1,RES2,NUMRES2,ATOM2,NUMAT2,ICONT
(2I5,F10.5,5X,A3,I4,1X,A3,I4,3X,A3,I4,1X,A3,2I4)
NUMPROT1,NUMPROT2 - the numbers of protons of the pair; the protons must be
numbered consecutively, omitting the other atoms of the chain;
VOL - the NOE intensity pertaining to this pair;
RES1,NUMRES1,ATOM1,NUMAT1,RES2,NUMRES2,ATOM2,NUMAT2 - residue names, numbers,
atom names, and absolute atom numbers (according to the ECEPP numbering of
the chain) respectively, corresponding to these protons;
Only NUMPROT1, NUMPROT2, and VOL are actual data; the other of the
above-described data are read for verification purposes only. If the residue
number, residue name, or atom name does not match the corresponding values
found by the proton of a given NUMPROT based on the ECEPP scheme, a warning
message is printed.
ICONT - a flag indicating, whether the volume of the next NOE should be merged
with the current one; this will happen for ICONT <> 0. This happens for
equivalent protons, such as the CD and CE protons of PHE that are attached
to either side of the ring. Such NOEs form "equivalence groups" and the
total volume of the whole group is fitted, rather than each individual NOE
(note, however, that except for the methyl equivalence groups the
averaging is NOT carried out already at the stage of the calculation of
the relaxation matrix). The program automatically detects some equivalence
groups, such as methyl or methylene groups. The latter case is disputable,
because methylene protons can often be distinguished; however fitting
the total volume of the groups does not violate the theory, as the
relaxation-matrix elements are not averaged and the errors arising from
permutations of the methylene protons (we don't know which one is which
for a particular conformation) would probably be more harmful.
-------------------------------------------------
III. The NMR distance constraint file
-------------------------------------------------
The filename is given in the $BOUNDS data group or input manually, when
KEYBOARD has been specified. Each line corresponds to one constraint and
contains the following data:
ICOR,RSNAM1,IR1,RSNAM2,IR2,ATNAME1,ATNAME2,BL,BU,NA(1),(IA(I,1),I=1,6),
NA(2),(IA(I,2),I=1,6)
(I3,1X,A3,I3,1X,A3,I3,2(1X,A3),2F8.1/2(I3,2X,6I5,5X))
ICOR - the number of the constraint (ignored)
RSNAM1,IR1,RSNAM2,IR2,ATNAME1,ATNAME2 - residue names, numbers, and atom names
corresponding to the first atom of the constraint;
BL,BU - lower and upper boundary of the constraint;
NA(1),(IA(I,1),I=1,6) - number of atoms in the first group and ECEPP numbers
of these atoms;
NA(2),(IA(I,1),I=1,6) - number of atoms in the second group and ECEPP numbers
of these atoms;
The distances are calculated between the centers of the atoms of the first and
of the second group.
A. Liwo, 2/21/97. Revised 4/28/97, 6/22/98, 11/29/98, 6/23/99