ANALYZE User Manual

                     @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
                     @        A N A L Y Z E       @
                     @         USER MANUAL        @
                     @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

                   ----------------------------------
                   I. The main data file (infile.inp)
                   ----------------------------------

The input is organized in the following ten ECEPPAK-style data groups:

1. $TITLE    - the title of the run (a single line);
2. $CNTRL    - main control variables;
3. $SEQ      - sequence data;
4. $BRIDGE   - covalent bridge data;
5. $PROPERTY - control data for conformation-dependent property evaluation;
6. $RMSCALC  - control data for RMS calculation;
7. $BOUNDS   - reading the NMR-derived distance constraints (interactive at
               the moment);
8. $CHROMO   - control data for calculating inter-chromophore distance;
9. $CLUSTER  - control data for cluster analysis.
10.$SUPAT    - specifies the atoms being superposed.
11.$NOES     - specifies the options in the calculations of NOE spectra and
               coupling constants;
12.$MORASS   - parameters for theoretical evaluation of the NOE spectra;
13.$COUPLING - specification of the calculation of coupling constants;
14.$MARQUARDT- options in least-square or maximum-entropy fitting of the 
               theoretical to the experimental NOE spectra.
-------------------------------------------------------------------------------
1. The $CNTRL data group.

RUNTYP=name where name is one of the following:

       PROPERTY - calculate conformation-dependent properties (e.g. the RMS
                  deviation from a reference structure).
  
       AVE_COORD - Boltzmann-average the Cartesian coordinates of the 
                   supplied conformations.

       AVE_DIST  - Boltzmann-average the inverse sixth powers of interproton
                   distances (can be used to evaluate the "average" 2D-NMR
                   spectrum).

       CLUSTER   - Do a cluster analysis of the input set of conformations.

       MORASS    - Calculate NMR spectra and optionally do NMR fitting. This
                   option is NOT available with the version of the program
                   that does clustering of large conformation sets.

       ANGLES    - just calculate the dihedral angles from a supplied pdb file.

MINVAR    -   use the minimum-variance method (programmed by Dr. Mark D.
              Shenderovich of Latvian Academy of Sciences) instead of the 
              minimal spanning tree method for clustering. Recommended, if 
              division for calculating subset averages or pre-selecting 
              representative conformations for e.g. NMR fitting is requested. 
              This option is NOT available with the NMR-fitting program. The 
              minimal spanning tree method is on the other hand recommended for
              "taxonomy" of the set. 

VERBOSE - print information about the progress of the run on screen.

NRES=number - the number of FULL residues in the chain. This does not need to
              be given, because the number of residues is calculated when 
              reading the $SEQ data group.

RES_CODE=name where name is one of ECEPP, THREE_LETTER, ONE_LETTER; see 
          the ECEPPAK manual for details.

FULLPRINT - use if you want messages such as "no space left on device".

RES_DBASE=name - replace the residue data base defined in the analyze script
            with name. If name contains at least one a trailing slash, it is
            assumed to contain the absolute search path, otherwise it is 
            appended to the ${DBASDIR} environmental variable to define the
            full search path.

T=number - the absolute temperature used in conformational averaging.

Next few keywords are associated with the runtyp=CLUSTER.
 
PRINT_PDB=number - when doing the cluster analysis the program will print the
                   Cartesian coordinates (in pdb format) of at most number
                   lowest-energy conformations in a family that are within
                   an energy cut-off specified in the $CLUSTER data group.
                   number=0 means that no pdb files will be created and number=1
                   that just the leading members of the subsequent families 
                   will be written to pdb files.

INCLUDE_TERM - when present causes all amino (NH2, NH3+, dummy, and cyclizing),
               carboxy (COOH, COO-, dummy and cyclizing), as well as the CONH2
               group to be merged with the first (last) residue, rather than
               counted as separate "residues" in the generated *.pdb files.
               This is very useful for the compatibility with AMBER-read pdb
               files. Note that otherwise the first full residue ALWAYS has 
               number 2.

FILE=name - the prefix of the pdb files; default is the first argument of the
            analyze script.

VIRT_CHAIN - virtual chains' internal coordinates will be printed for the 
             leading members of the families.

BETA_TURNS - turn analysis will be carried out.

CHIRALITY - covalent bridges' dihedral angles will be calculated.

H_BONDS - H-bond analysis will be carried out.

NRCLUS1=number, NRCLUS2=number - the first and last residue to be superposed
            in clustering; default 1 and INUMRS, respectively.

TREE - Construct the minimal tree and then partition it (usually takes much, much
       longer time than partitioning without explicit construction of the 
       minimal tree, but gives some idea about how to best partition the set
       of the conformations).

DISTMAT - write the upper triangle of the matrix of RMS deviations between
          conformations to a disk file. 
-------------------------------------------------------------------------------
2. The $PROPERTY data group

ELLIPSE= - calculate the characteristics of the moment of inertia:
 
        AXES   - calculate main axes of the moment of inertia;
        VOLUME - calculate the volume of the ellipsoid;
        RGYR   - calculate the radius of gyration;
        ALL    - calculate all above.

RMS=  - calculate the RMS deviation from a reference structure:

        OVERALL - superpose all selected atoms of the current structure on
                  all selected atoms of the reference structure;
        ARRAY   - superpose the structures fragment by fragment, the fragment
                  having increasing length, which gives a 2D array of values,
                  whose ij-th element is the RMS deviations of the residue
                  i through j fragment of the structures on optimal 
                  superposition.

NMR - calculate the deviation from the interproton distances retrieved from the
      NOE data.

CONTACTS= - calculate the number of contacts between the following 
                     types of side chains:

         HYDROPHOBIC - hydrophobic side chains;
         HYDROPHILIC - hydrophilic side chains;
         ALL         - all side chains. 

The distance between side-chain centroids is taken as the criterion of contact;
for details see A. Liwo, M.R. Pincus, R.J. Wawak, S. Rackovsky, H.A. Scheraga,
Protein Science, 1993, 2, 1715-1731.

CHROM_DIST - calculate inter-chromophore distances. Requires the $CHROMO data
             group.

DIST_PROP - estimate the distribution of the specified above property(ies),
            based on the Boltzmann distribution.

PROP_FILE= - the prefix for the property files. They will have 
           extensions corresponding to property names:

'ax1','ax2','ax3' - the lengths of the axes of the moment of inertia;
'vol' - the volume of the ellipsoid;
'rms' - the RMS deviation from the reference PDB structure or the RMS 
        deviation from the NOE distances;
'eng' - energy;
'rgy' - the radius of gyration;
'hpb' - number of hydrophobic/hydrophilic contacts;
'ntc' - total number of contacts;
'cmp' - the fraction of hydrophobic contacts;
'nat' - the fraction of native contacts (a reference contact file is required)
'chr' - inter-chromophore distance.
-------------------------------------------------------------------------------
3. The $CLUSTER data group (required for RUNTYP=CLUSTER).

These data are read in free format.

1st line: Number of RMS cut-off values and the RMS cut-off values.
If a value is negative, it means that clustering will be carried out at
abs(cutoff) cut-off value and for this cut-off the dihedral angles of the
leading members of the families and, optionally, their Cartesian coordinates
will be written to disk files. Up to 10 cut-off values are allowed. 

2nd line: Energy cut-off (kcal/mol). At a given RMS cut-off the dihedral
angles of the leading members of the families than are within Ecut above
the lowest-energy family will be written and the pdb files created. 
If PRINT_PBD>1 in $CNTRL, same cut-off will be applied to writing the pdb
files of the conformations within a family.
-------------------------------------------------------------------------------
4. The $SUPAT data group (required for RUNTYP=CLUSTER or RMS calculations).

1st line: number of atom types to be superposed (free format).
2nd line: the ECEPP-style names of these atoms (15(a4,1x)).
Two additional "wildcards" are allowed:

ALL - all the atoms;
HEAV - all non-hydrogen atoms.
-------------------------------------------------------------------------------
5. The $RMSCALC data group (required for RMS calculations).

REF_FILE= - the name of PDB file containing the reference structure.

NRSUP= - the number of the first residue to superpose (#1 is the 
                 N-terminal group, #2 is the first full residue).

NRSUP2= - the number of the last residue to superpose.

PRINT= - indicates whether to print superposed coordinates:

      NONE - do not print superposed coordinates (default);
      SUPERPOSED - print only the coordinates of the atoms that belong to the
                   "superposable" list, but corresponding to all residues and
                   blocking group (useful for drawing);
      ALL - print coordinates of all non-hydrogen atoms from the reference and
            current structure at optimal superposition of the chosen atoms.

The superposed coordinates are printed to the main output file. For 
PRINT=SUPERPOSED, the pairs of PDB structures are produced, containing the
reference (chain A) and current (chain B) structure. For PRINT=ALL the 
reference structure (chain A) is printed only in the beginning of the output
file, when reading the reference PDB file.

The superposable atoms are defined in the $SUPAT group (see above).
-------------------------------------------------------------------------------
6. The $CHROMO data group.

IDON= - the number of the donor side chain (default: the first 
                tryptophan residue in the chain).

IACC= - the number of the acceptor side chain (default: the first
                dansyl or tyrosine residue in the chain).
-------------------------------------------------------------------------------
7. The $BOUNDS data group.

NMR_FILE= - the name of the file with interproton distances; KEYBOARD
      assumes that the name will be typed in from the keyboard.
-------------------------------------------------------------------------------
8. The $NOES data group.

MODE= - indicates the purpose of NOE calculation:
    SIMUL - just calculate NOE spectra for the supplied conformations;
    FITTING - fit the statistical weights of the conformations so as to best
                reproduce the experimental NOE spectrum.

CONF= - specifies, which conformations will be considered in the 
                   calcs:
    CLUSTERED - a cluster analysis will precede NOE calculation (see below);
    ALL - the NOE spectra of all conformations will be calculated.

AVERAGE - says whether the spectra will be Boltzmann averaged over the whole
           ensemble (for CONF=ALL) or over families (CONF=CLUSTERED).

BYSTROV= (NO)
    YES - calculate the coupling constants,
    NO  - do not calculate coupling constants.
 
Next keywords are relevant for MODE=FITTING.

GEMINAL= (NO)
    YES - fit the NOEs from geminal protons;
    NO  - do not fit the NOEs from geminal protons.

VICINAL= (YES)
    YES - fit the NOEs from vicinal protons;
    NO  - do not fit the NOEs from vicinal protons.

RIGID= (NO)
    YES - fit the NOEs from "rigid" protons (i.e. those with fixed distance,
          other than geminal);
    NO  - do not fit the NOEs from "rigid" protons.

NOE= (INTER)
    ALL - fit the NOEs from all pairs of protons,
    INTER - fit the NOEs from protons belonging to different residues only,
    LONG - fit the NOEs from protons belonging to non-sequential residues only.
Important! For N-C-cyclic peptides (but NOT for pairs of residues linked by
    side chains), the first and the last residues are considered sequential.

ANTINOE= (NONE)
    ALL - as for NOE
    INTER - as for NOE
    LONG - as for NOE
    NONE - do not fit anti-NOEs.

The routine checks for compatibility between the values of the NOE and ANTINOE
keys. The rule is that the exclusion criterion for anti-NOEs must not be weaker
than that of NOEs. For example, if the user specified NOE=LONG ANTINOE=INTER,
the second criterion being weaker, the routine will automatically set 
ANTINOE=LONG.

WEI_COUPL= (0.1)
    The weight of the coupling-constant term in the minimized sum.

A0,A1,A2= (1.9, -1.4, 6.4)                                2
    The constants in Bystrov equation: J = A0 + A1 cos(t) + A2 cos (t)

Note! At present only the coefficients of type 1 angles can be input. 
The coefficients of -CH2- protons are automatically computed from those by 
averaging over the two methylene protons.

SA0,SA1,SA2= (3*2.0) - the a priori standard deviations of the above
    constants.

WBASE= (1.0)
    The base in weight calculation of the NOE intensities; 
    weight(i)=1/(wbase+Vexp(i)).

ALPHA_ENT (0.0) - the weight of the entropy factor. The complete minimized
                  function has the form:
            _       _
          F(W) = FI(W) + ALPHA_ENT * SUM W(i)*ALOG(W(I))
                                      i

	The entropy term forces the weights to be equal to each other, while
the "sum of error" term FI picks up the conformations that best fit to the 
experimental observables; the latter usually results in the selection of only
a few out of several hundred, which is regarded rather strange by the authors
of the program. Just a little admixture of the "disorder" term gives
more reasonable results with more conformations with significant weights.
Truly, "maximum entropy is not a technique it is THE technique" - as said a
famous statistician. But, no-one can offer any convincing way of choosing
ALPHA_ENT. So, you guys have to play with different values and use your 
common sense. It is often a hard work, but when you're through with it, 
the results are worth something. Good luck!

The experimental NOE spectrum is read fro the ${prefix}.noe file, 
where ${prefix} denotes the input-file prefix.
-------------------------------------------------------------------------------
9. The $MORASS data group.
-------------------------------------------------------------------------------
TAUC= (0.1)
    The correlation time (ms).

TAUM= (4.5)
    The correlation time of methyl protons (ms).

TIME= (0.2)
    The mixing time (ms).

VOL0= (100.0)

SFRQ= (500.0)
    Spectrometer frequency (Hz)

CUTT= (6.0)
    Cut-off distance for NOE printing (A).

-------------------------------------------------------------------------------
10. The $COUPL data group (formatted)

For consecutive residues i = 1,inumrs (note that number 1 corresponds to
the N-terminal end group and inumrs to the C-terminal end group) the following 
records are read:

Card 1 (free format):
ii,nang(i),(iang(j,i),j=1,nang(i)) 

ii      - number of that residue (for control only)
nang(i) - number of dihedral angles pertaining to that residue for which 
          the coupling constants are calculated
iang(j,i) - ECEPP residue-relative numbers of those angles (the order is:
            phi,psi,omega,chi1,...

Card 2 (free format):
(ityp_coupl(icoupl(j,i)),coupl(icoupl(j,i)),phase(j,i),j=1,nang(i))

ityp_coupl - the type of Bystrov equation pertaining to a given angle
coupl      - the measured coupling constant
phase      - the phase angle to be subtracted in order to obtain Bystrov's 
             theta angle. Typically, the phase is 60 deg for L-residues 
             and -60 deg for D-residues. 

At present type 1 corresponds to non-glycine angles and type 2 to glycine 
angles.

-------------------------------------------------------------------------------
11. The $MARQUARDT data group

This data group contains parameters for the Marquardt and SUMSL minimizers,
which are used in fitting the computed NMR characteristics to the experimental
data.

MINIMIZER 

  MARQ - Marquardt's method is used (no maximum entropy fitting); this
         is the default for minimizing the sum of the squares only, without
         the entropy term.

  SUMSL - the SUMSL algorithm is used; this is the only option when 
          maximum entropy fitting is requested.


MAXIT (1000) - maximum number of iterations

MAXFUN (2000) - maximum number of function evaluations

MAXMAR (10) - maximum number of inner iterations in Marquardt's method 

LAMBDA (1.0D2) - initial value of the Marquardt scaling parameter lambda

VMARQ (1.0D1) - factor to shrink or expand lambda

TOLX (1.0D-3) - tolerance on average change in parameters

TOLF (1.0D-3) - absolute tolerance on function changes to achieve convergence

RTOLF (1.0D-5) - relative tolerance on function changes 

TOLLAM (1.0D0) - maximum value of lambda to stop iteration 

RLMIN (1.0D-20) - minimum allowed value of lambda

----------------------------------------------------------------------------

For the description of the $SEQ and the $BRIDGE data groups see the ECEPPAK 
manual.

              ------------------------------------------------
              II. The input dihedral-angle file (outo.angfile)
              ------------------------------------------------

This file is produced by ECEPPAK and by ANALYZE when using the clustering
option. The file contains collated formatted data corresponding to subsequent
conformations. Each entry has the following structure:

Card 1: NR, ETOT, EVDW, EEL, ESOLV (I10, 4E15.6)

NR - the number of the conformation (ignored)
ETOT - the total energies
EVDW - van der Waals energy
EEL - electrostatic energy
ESOLV - solvation energy

Card group 2: (LIST(I), I=1,INUMRS) (16I5)

LIST(I) - numeric ECEPP code of the Ith residues in the sequence (including
       the end groups).

Card group 3: Dihedral angles of the subsequent residues; one card per residue
       or end group:

(ANGULO(J,I),J=1,NANG) (15F8.3) 

ANGULO(J,I) - Jth dihedral angle of residue or end group I.

  
              -------------------------------------------------
              III. The experimental NOE data file (noefile.noe)
              -------------------------------------------------

This file contains the experimental data in MORASS-like format (formatted 
input). Each line corresponds to a NOE from a single proton pair and contains
the following data:

NUMPROT1,NUMPROT2,VOL,RES1,NUMRES1,ATOM1,NUMAT1,RES2,NUMRES2,ATOM2,NUMAT2,ICONT
(2I5,F10.5,5X,A3,I4,1X,A3,I4,3X,A3,I4,1X,A3,2I4)

NUMPROT1,NUMPROT2 - the numbers of protons of the pair; the protons must be
     numbered consecutively, omitting the other atoms of the chain;

VOL - the NOE intensity pertaining to this pair;

RES1,NUMRES1,ATOM1,NUMAT1,RES2,NUMRES2,ATOM2,NUMAT2 - residue names, numbers, 
     atom names, and absolute atom numbers (according to the ECEPP numbering of
     the chain) respectively, corresponding to these protons;

Only NUMPROT1, NUMPROT2, and VOL are actual data; the other of the 
above-described data are read for verification purposes only. If the residue
number, residue name, or atom name does not match the corresponding values
found by the proton of a given NUMPROT based on the ECEPP scheme, a warning 
message is printed.

ICONT - a flag indicating, whether the volume of the next NOE should be merged 
     with the current one; this will happen for ICONT <> 0. This happens for 
     equivalent protons, such as the CD and CE protons of PHE that are attached
     to either side of the ring. Such NOEs form "equivalence groups" and the 
     total volume of the whole group is fitted, rather than each individual NOE
     (note, however, that except for the methyl equivalence groups the 
     averaging is NOT carried out already at the stage of the calculation of 
     the relaxation matrix). The program automatically detects some equivalence
     groups, such as methyl or methylene groups. The latter case is disputable, 
     because methylene protons can often be distinguished; however fitting
     the total volume of the groups does not violate the theory, as the
     relaxation-matrix elements are not averaged and the errors arising from
     permutations of the methylene protons (we don't know which one is which
     for a particular conformation) would probably be more harmful.


              -------------------------------------------------
                  III. The NMR distance constraint file
              -------------------------------------------------

The filename is given in the $BOUNDS data group or input manually, when
KEYBOARD has been specified. Each line corresponds to one constraint and
contains the following data:

ICOR,RSNAM1,IR1,RSNAM2,IR2,ATNAME1,ATNAME2,BL,BU,NA(1),(IA(I,1),I=1,6),
NA(2),(IA(I,2),I=1,6)
(I3,1X,A3,I3,1X,A3,I3,2(1X,A3),2F8.1/2(I3,2X,6I5,5X))

ICOR - the number of the constraint (ignored)

RSNAM1,IR1,RSNAM2,IR2,ATNAME1,ATNAME2 - residue names, numbers, and atom names
   corresponding to the first atom of the constraint;

BL,BU - lower and upper boundary of the constraint;

NA(1),(IA(I,1),I=1,6) - number of atoms in the first group and ECEPP numbers
   of these atoms;

NA(2),(IA(I,1),I=1,6) - number of atoms in the second group and ECEPP numbers
   of these atoms;

The distances are calculated between the centers of the atoms of the first and
of the second group.


A. Liwo, 2/21/97.  Revised 4/28/97, 6/22/98, 11/29/98, 6/23/99