@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ @ A N A L Y Z E @ @ USER MANUAL @ @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ ---------------------------------- I. The main data file (infile.inp) ---------------------------------- The input is organized in the following ten ECEPPAK-style data groups: 1. $TITLE - the title of the run (a single line); 2. $CNTRL - main control variables; 3. $SEQ - sequence data; 4. $BRIDGE - covalent bridge data; 5. $PROPERTY - control data for conformation-dependent property evaluation; 6. $RMSCALC - control data for RMS calculation; 7. $BOUNDS - reading the NMR-derived distance constraints (interactive at the moment); 8. $CHROMO - control data for calculating inter-chromophore distance; 9. $CLUSTER - control data for cluster analysis. 10.$SUPAT - specifies the atoms being superposed. 11.$NOES - specifies the options in the calculations of NOE spectra and coupling constants; 12.$MORASS - parameters for theoretical evaluation of the NOE spectra; 13.$COUPLING - specification of the calculation of coupling constants; 14.$MARQUARDT- options in least-square or maximum-entropy fitting of the theoretical to the experimental NOE spectra. ------------------------------------------------------------------------------- 1. The $CNTRL data group. RUNTYP=name where name is one of the following: PROPERTY - calculate conformation-dependent properties (e.g. the RMS deviation from a reference structure). AVE_COORD - Boltzmann-average the Cartesian coordinates of the supplied conformations. AVE_DIST - Boltzmann-average the inverse sixth powers of interproton distances (can be used to evaluate the "average" 2D-NMR spectrum). CLUSTER - Do a cluster analysis of the input set of conformations. MORASS - Calculate NMR spectra and optionally do NMR fitting. This option is NOT available with the version of the program that does clustering of large conformation sets. ANGLES - just calculate the dihedral angles from a supplied pdb file. MINVAR - use the minimum-variance method (programmed by Dr. Mark D. Shenderovich of Latvian Academy of Sciences) instead of the minimal spanning tree method for clustering. Recommended, if division for calculating subset averages or pre-selecting representative conformations for e.g. NMR fitting is requested. This option is NOT available with the NMR-fitting program. The minimal spanning tree method is on the other hand recommended for "taxonomy" of the set. VERBOSE - print information about the progress of the run on screen. NRES=number - the number of FULL residues in the chain. This does not need to be given, because the number of residues is calculated when reading the $SEQ data group. RES_CODE=name where name is one of ECEPP, THREE_LETTER, ONE_LETTER; see the ECEPPAK manual for details. FULLPRINT - use if you want messages such as "no space left on device". RES_DBASE=name - replace the residue data base defined in the analyze script with name. If name contains at least one a trailing slash, it is assumed to contain the absolute search path, otherwise it is appended to the ${DBASDIR} environmental variable to define the full search path. T=number - the absolute temperature used in conformational averaging. Next few keywords are associated with the runtyp=CLUSTER. PRINT_PDB=number - when doing the cluster analysis the program will print the Cartesian coordinates (in pdb format) of at most number lowest-energy conformations in a family that are within an energy cut-off specified in the $CLUSTER data group. number=0 means that no pdb files will be created and number=1 that just the leading members of the subsequent families will be written to pdb files. INCLUDE_TERM - when present causes all amino (NH2, NH3+, dummy, and cyclizing), carboxy (COOH, COO-, dummy and cyclizing), as well as the CONH2 group to be merged with the first (last) residue, rather than counted as separate "residues" in the generated *.pdb files. This is very useful for the compatibility with AMBER-read pdb files. Note that otherwise the first full residue ALWAYS has number 2. FILE=name - the prefix of the pdb files; default is the first argument of the analyze script. VIRT_CHAIN - virtual chains' internal coordinates will be printed for the leading members of the families. BETA_TURNS - turn analysis will be carried out. CHIRALITY - covalent bridges' dihedral angles will be calculated. H_BONDS - H-bond analysis will be carried out. NRCLUS1=number, NRCLUS2=number - the first and last residue to be superposed in clustering; default 1 and INUMRS, respectively. TREE - Construct the minimal tree and then partition it (usually takes much, much longer time than partitioning without explicit construction of the minimal tree, but gives some idea about how to best partition the set of the conformations). DISTMAT - write the upper triangle of the matrix of RMS deviations between conformations to a disk file. ------------------------------------------------------------------------------- 2. The $PROPERTY data group ELLIPSE=- calculate the characteristics of the moment of inertia: AXES - calculate main axes of the moment of inertia; VOLUME - calculate the volume of the ellipsoid; RGYR - calculate the radius of gyration; ALL - calculate all above. RMS= - calculate the RMS deviation from a reference structure: OVERALL - superpose all selected atoms of the current structure on all selected atoms of the reference structure; ARRAY - superpose the structures fragment by fragment, the fragment having increasing length, which gives a 2D array of values, whose ij-th element is the RMS deviations of the residue i through j fragment of the structures on optimal superposition. NMR - calculate the deviation from the interproton distances retrieved from the NOE data. CONTACTS= - calculate the number of contacts between the following types of side chains: HYDROPHOBIC - hydrophobic side chains; HYDROPHILIC - hydrophilic side chains; ALL - all side chains. The distance between side-chain centroids is taken as the criterion of contact; for details see A. Liwo, M.R. Pincus, R.J. Wawak, S. Rackovsky, H.A. Scheraga, Protein Science, 1993, 2, 1715-1731. CHROM_DIST - calculate inter-chromophore distances. Requires the $CHROMO data group. DIST_PROP - estimate the distribution of the specified above property(ies), based on the Boltzmann distribution. PROP_FILE= - the prefix for the property files. They will have extensions corresponding to property names: 'ax1','ax2','ax3' - the lengths of the axes of the moment of inertia; 'vol' - the volume of the ellipsoid; 'rms' - the RMS deviation from the reference PDB structure or the RMS deviation from the NOE distances; 'eng' - energy; 'rgy' - the radius of gyration; 'hpb' - number of hydrophobic/hydrophilic contacts; 'ntc' - total number of contacts; 'cmp' - the fraction of hydrophobic contacts; 'nat' - the fraction of native contacts (a reference contact file is required) 'chr' - inter-chromophore distance. ------------------------------------------------------------------------------- 3. The $CLUSTER data group (required for RUNTYP=CLUSTER). These data are read in free format. 1st line: Number of RMS cut-off values and the RMS cut-off values. If a value is negative, it means that clustering will be carried out at abs(cutoff) cut-off value and for this cut-off the dihedral angles of the leading members of the families and, optionally, their Cartesian coordinates will be written to disk files. Up to 10 cut-off values are allowed. 2nd line: Energy cut-off (kcal/mol). At a given RMS cut-off the dihedral angles of the leading members of the families than are within Ecut above the lowest-energy family will be written and the pdb files created. If PRINT_PBD>1 in $CNTRL, same cut-off will be applied to writing the pdb files of the conformations within a family. ------------------------------------------------------------------------------- 4. The $SUPAT data group (required for RUNTYP=CLUSTER or RMS calculations). 1st line: number of atom types to be superposed (free format). 2nd line: the ECEPP-style names of these atoms (15(a4,1x)). Two additional "wildcards" are allowed: ALL - all the atoms; HEAV - all non-hydrogen atoms. ------------------------------------------------------------------------------- 5. The $RMSCALC data group (required for RMS calculations). REF_FILE= - the name of PDB file containing the reference structure. NRSUP= - the number of the first residue to superpose (#1 is the N-terminal group, #2 is the first full residue). NRSUP2= - the number of the last residue to superpose. PRINT= - indicates whether to print superposed coordinates: NONE - do not print superposed coordinates (default); SUPERPOSED - print only the coordinates of the atoms that belong to the "superposable" list, but corresponding to all residues and blocking group (useful for drawing); ALL - print coordinates of all non-hydrogen atoms from the reference and current structure at optimal superposition of the chosen atoms. The superposed coordinates are printed to the main output file. For PRINT=SUPERPOSED, the pairs of PDB structures are produced, containing the reference (chain A) and current (chain B) structure. For PRINT=ALL the reference structure (chain A) is printed only in the beginning of the output file, when reading the reference PDB file. The superposable atoms are defined in the $SUPAT group (see above). ------------------------------------------------------------------------------- 6. The $CHROMO data group. IDON= - the number of the donor side chain (default: the first tryptophan residue in the chain). IACC= - the number of the acceptor side chain (default: the first dansyl or tyrosine residue in the chain). ------------------------------------------------------------------------------- 7. The $BOUNDS data group. NMR_FILE= - the name of the file with interproton distances; KEYBOARD assumes that the name will be typed in from the keyboard. ------------------------------------------------------------------------------- 8. The $NOES data group. MODE= - indicates the purpose of NOE calculation: SIMUL - just calculate NOE spectra for the supplied conformations; FITTING - fit the statistical weights of the conformations so as to best reproduce the experimental NOE spectrum. CONF= - specifies, which conformations will be considered in the calcs: CLUSTERED - a cluster analysis will precede NOE calculation (see below); ALL - the NOE spectra of all conformations will be calculated. AVERAGE - says whether the spectra will be Boltzmann averaged over the whole ensemble (for CONF=ALL) or over families (CONF=CLUSTERED). BYSTROV= (NO) YES - calculate the coupling constants, NO - do not calculate coupling constants. Next keywords are relevant for MODE=FITTING. GEMINAL= (NO) YES - fit the NOEs from geminal protons; NO - do not fit the NOEs from geminal protons. VICINAL= (YES) YES - fit the NOEs from vicinal protons; NO - do not fit the NOEs from vicinal protons. RIGID= (NO) YES - fit the NOEs from "rigid" protons (i.e. those with fixed distance, other than geminal); NO - do not fit the NOEs from "rigid" protons. NOE= (INTER) ALL - fit the NOEs from all pairs of protons, INTER - fit the NOEs from protons belonging to different residues only, LONG - fit the NOEs from protons belonging to non-sequential residues only. Important! For N-C-cyclic peptides (but NOT for pairs of residues linked by side chains), the first and the last residues are considered sequential. ANTINOE= (NONE) ALL - as for NOE INTER - as for NOE LONG - as for NOE NONE - do not fit anti-NOEs. The routine checks for compatibility between the values of the NOE and ANTINOE keys. The rule is that the exclusion criterion for anti-NOEs must not be weaker than that of NOEs. For example, if the user specified NOE=LONG ANTINOE=INTER, the second criterion being weaker, the routine will automatically set ANTINOE=LONG. WEI_COUPL= (0.1) The weight of the coupling-constant term in the minimized sum. A0,A1,A2= (1.9, -1.4, 6.4) 2 The constants in Bystrov equation: J = A0 + A1 cos(t) + A2 cos (t) Note! At present only the coefficients of type 1 angles can be input. The coefficients of -CH2- protons are automatically computed from those by averaging over the two methylene protons. SA0,SA1,SA2= (3*2.0) - the a priori standard deviations of the above constants. WBASE= (1.0) The base in weight calculation of the NOE intensities; weight(i)=1/(wbase+Vexp(i)). ALPHA_ENT (0.0) - the weight of the entropy factor. The complete minimized function has the form: _ _ F(W) = FI(W) + ALPHA_ENT * SUM W(i)*ALOG(W(I)) i The entropy term forces the weights to be equal to each other, while the "sum of error" term FI picks up the conformations that best fit to the experimental observables; the latter usually results in the selection of only a few out of several hundred, which is regarded rather strange by the authors of the program. Just a little admixture of the "disorder" term gives more reasonable results with more conformations with significant weights. Truly, "maximum entropy is not a technique it is THE technique" - as said a famous statistician. But, no-one can offer any convincing way of choosing ALPHA_ENT. So, you guys have to play with different values and use your common sense. It is often a hard work, but when you're through with it, the results are worth something. Good luck! The experimental NOE spectrum is read fro the ${prefix}.noe file, where ${prefix} denotes the input-file prefix. ------------------------------------------------------------------------------- 9. The $MORASS data group. ------------------------------------------------------------------------------- TAUC= (0.1) The correlation time (ms). TAUM= (4.5) The correlation time of methyl protons (ms). TIME= (0.2) The mixing time (ms). VOL0= (100.0) SFRQ= (500.0) Spectrometer frequency (Hz) CUTT= (6.0) Cut-off distance for NOE printing (A). ------------------------------------------------------------------------------- 10. The $COUPL data group (formatted) For consecutive residues i = 1,inumrs (note that number 1 corresponds to the N-terminal end group and inumrs to the C-terminal end group) the following records are read: Card 1 (free format): ii,nang(i),(iang(j,i),j=1,nang(i)) ii - number of that residue (for control only) nang(i) - number of dihedral angles pertaining to that residue for which the coupling constants are calculated iang(j,i) - ECEPP residue-relative numbers of those angles (the order is: phi,psi,omega,chi1,... Card 2 (free format): (ityp_coupl(icoupl(j,i)),coupl(icoupl(j,i)),phase(j,i),j=1,nang(i)) ityp_coupl - the type of Bystrov equation pertaining to a given angle coupl - the measured coupling constant phase - the phase angle to be subtracted in order to obtain Bystrov's theta angle. Typically, the phase is 60 deg for L-residues and -60 deg for D-residues. At present type 1 corresponds to non-glycine angles and type 2 to glycine angles. ------------------------------------------------------------------------------- 11. The $MARQUARDT data group This data group contains parameters for the Marquardt and SUMSL minimizers, which are used in fitting the computed NMR characteristics to the experimental data. MINIMIZER MARQ - Marquardt's method is used (no maximum entropy fitting); this is the default for minimizing the sum of the squares only, without the entropy term. SUMSL - the SUMSL algorithm is used; this is the only option when maximum entropy fitting is requested. MAXIT (1000) - maximum number of iterations MAXFUN (2000) - maximum number of function evaluations MAXMAR (10) - maximum number of inner iterations in Marquardt's method LAMBDA (1.0D2) - initial value of the Marquardt scaling parameter lambda VMARQ (1.0D1) - factor to shrink or expand lambda TOLX (1.0D-3) - tolerance on average change in parameters TOLF (1.0D-3) - absolute tolerance on function changes to achieve convergence RTOLF (1.0D-5) - relative tolerance on function changes TOLLAM (1.0D0) - maximum value of lambda to stop iteration RLMIN (1.0D-20) - minimum allowed value of lambda ---------------------------------------------------------------------------- For the description of the $SEQ and the $BRIDGE data groups see the ECEPPAK manual. ------------------------------------------------ II. The input dihedral-angle file (outo.angfile) ------------------------------------------------ This file is produced by ECEPPAK and by ANALYZE when using the clustering option. The file contains collated formatted data corresponding to subsequent conformations. Each entry has the following structure: Card 1: NR, ETOT, EVDW, EEL, ESOLV (I10, 4E15.6) NR - the number of the conformation (ignored) ETOT - the total energies EVDW - van der Waals energy EEL - electrostatic energy ESOLV - solvation energy Card group 2: (LIST(I), I=1,INUMRS) (16I5) LIST(I) - numeric ECEPP code of the Ith residues in the sequence (including the end groups). Card group 3: Dihedral angles of the subsequent residues; one card per residue or end group: (ANGULO(J,I),J=1,NANG) (15F8.3) ANGULO(J,I) - Jth dihedral angle of residue or end group I. ------------------------------------------------- III. The experimental NOE data file (noefile.noe) ------------------------------------------------- This file contains the experimental data in MORASS-like format (formatted input). Each line corresponds to a NOE from a single proton pair and contains the following data: NUMPROT1,NUMPROT2,VOL,RES1,NUMRES1,ATOM1,NUMAT1,RES2,NUMRES2,ATOM2,NUMAT2,ICONT (2I5,F10.5,5X,A3,I4,1X,A3,I4,3X,A3,I4,1X,A3,2I4) NUMPROT1,NUMPROT2 - the numbers of protons of the pair; the protons must be numbered consecutively, omitting the other atoms of the chain; VOL - the NOE intensity pertaining to this pair; RES1,NUMRES1,ATOM1,NUMAT1,RES2,NUMRES2,ATOM2,NUMAT2 - residue names, numbers, atom names, and absolute atom numbers (according to the ECEPP numbering of the chain) respectively, corresponding to these protons; Only NUMPROT1, NUMPROT2, and VOL are actual data; the other of the above-described data are read for verification purposes only. If the residue number, residue name, or atom name does not match the corresponding values found by the proton of a given NUMPROT based on the ECEPP scheme, a warning message is printed. ICONT - a flag indicating, whether the volume of the next NOE should be merged with the current one; this will happen for ICONT <> 0. This happens for equivalent protons, such as the CD and CE protons of PHE that are attached to either side of the ring. Such NOEs form "equivalence groups" and the total volume of the whole group is fitted, rather than each individual NOE (note, however, that except for the methyl equivalence groups the averaging is NOT carried out already at the stage of the calculation of the relaxation matrix). The program automatically detects some equivalence groups, such as methyl or methylene groups. The latter case is disputable, because methylene protons can often be distinguished; however fitting the total volume of the groups does not violate the theory, as the relaxation-matrix elements are not averaged and the errors arising from permutations of the methylene protons (we don't know which one is which for a particular conformation) would probably be more harmful. ------------------------------------------------- III. The NMR distance constraint file ------------------------------------------------- The filename is given in the $BOUNDS data group or input manually, when KEYBOARD has been specified. Each line corresponds to one constraint and contains the following data: ICOR,RSNAM1,IR1,RSNAM2,IR2,ATNAME1,ATNAME2,BL,BU,NA(1),(IA(I,1),I=1,6), NA(2),(IA(I,2),I=1,6) (I3,1X,A3,I3,1X,A3,I3,2(1X,A3),2F8.1/2(I3,2X,6I5,5X)) ICOR - the number of the constraint (ignored) RSNAM1,IR1,RSNAM2,IR2,ATNAME1,ATNAME2 - residue names, numbers, and atom names corresponding to the first atom of the constraint; BL,BU - lower and upper boundary of the constraint; NA(1),(IA(I,1),I=1,6) - number of atoms in the first group and ECEPP numbers of these atoms; NA(2),(IA(I,1),I=1,6) - number of atoms in the second group and ECEPP numbers of these atoms; The distances are calculated between the centers of the atoms of the first and of the second group. A. Liwo, 2/21/97. Revised 4/28/97, 6/22/98, 11/29/98, 6/23/99