% FEATURES7.TXT

                      Guide for PIR Features Annotations
                                in NBRF Format
                                  Version  7
                                 16-JUN-1997 
                   [This file includes only Parts 1 and 2]

                             Preliminary Comments

This documents the standardization of features records in NBRF format achieved
through release 53.

We have received quite a few comments on the improvement in the appearance and
consistency of our database and recent research paper specifically mentioned
employing the PIR features annotations in the sequence analysis.  The following
projects have been very successful.
(1) Programs to check new or updated entries and changefiles (written in 
FORTRAN by Chris Marzec) and to check existing database entries (written in C
by Steve Garavelli) have been considerably improved and extended.  These are
the programs that do the rules "enforcement" that will be discussed.
(2) All ambiguous "Binding site:" features have been resolved to appropriate
covalent or noncovalent features.
(3) All explicit disulfide bond most most other site information in free-
text comments has been converted to features.
(4) The experimental status of all site, bond and product features have been
assigned.  A status is now required on these features in all new PIR1 and PIR2
entries.  Four status types are used in features "experimental", "absent",
"atypical" and "predicted".
(5) The comment "in mature form" for amino-terminal features that are not at
the first position of the entry and for carboxyl-terminal features that are 
not at the last position of the entry is required, and the comment should not
be used except in that context.
(6) Many new features have been added and the covalent binding sites, modified
sites, cross-links and active sites have been documented in the RESID database.
Version 10.00 of that database will be distributed with 236 entries.

How is an annotator supposed to know how to make a good feature record?
The answer to this frequently asked question is:
(1) READ THIS DOCUMENT!
(2) Use the residues database to search for residues with covalently bound
groups, for modified residues, and for cross-links.  Use the ATLAS FEATURE 
command to see the records in the RESID database to form the correct features
for protein entries.
(3) Use the ATLAS FEATURE command to look at features in related entries or at
the variety in the whole database.  The currently correct format is presented
in the database, except possibly for entries that entered in the previous
update.  Be especially wary of features or feature formats that are present
only in "Region" features of new entries.

Please send your comments to Steve Garavelli (Part 1 - Sites, Part 2 - Bonds)
or Winona Barker (Part 3 - Regions, Domains and Products).  Particularly, let
us know your thoughts about the rules marked "[GRAY]" that are either
questionable or provisional.

We will appreciate your input toward improving this document: changes that
improve clarity, additional examples, etc.

                                 Introduction

The following feature records may appear in PIR1 and PIR2 annotations.
  Active site:
  Binding site:
  Cleavage site:
  Cross-link:
  Disulfide bonds:
  Domain:
  Inhibitory site:
  Modified site:
  Product:
  Region:

In the following descriptions
 " "     enclose explicit typographic characters
 [ ]     enclose optional elements
 res     means a 3-letter amino acid residue code
 |       separates alternative elements
 form    "by" mechanism | "in" protein name | "in mature form"
 extent  "partial"
 status  "experimental" | "predicted" | "absent" | "atypical"
 ...     means indefinite repetition of the preceding optional elements

When you are preparing new features annotations, you should try to conform to
these guidelines as closely as possible.  These guidelines have three degrees
of applicability:
(1) features that are currently accepted and being used ("white rules"),
(2) features that may have been used in the past but are now undesirable,
    that have been or are being removed from entries that contain them
    and that should not be used in new entries ("black rules"),
(3) features that occur in some entries but are of uncertain value, that have
    been proposed but not yet accepted, or that are otherwise under review
    ("gray rules"); theses are marked "[GRAY]".
If an annotator thinks that a gray rule feature or some new feature is required
for an entry, the annotator should check first with either Winona Barker,
Friedhelm Pfeiffer or Steve Garavelli. 

                        Discussion of Status Indicators

A status indicator, either "experimental", "absent", "atypical" or "predicted"
is required for all features except Domain and Region.  Generally, it should
not be used with "Region:" features.  In the "Domain:" feature it should be
used except for homology domains, for self- evident features like "amino-
terminal" or "serine-rich", and for features with arbitrary designations
like "first", "1" or "A".

The "experimental" status means that the feature has been experimentally
observed in the indicated way at the indicated location.  Any indication of
alternative forms means that all the alternatives have been observed.  For
example
  Modified site: N6-methyllysine or N6,N6-dimethyllysine (Lys) (experimental)
means that both forms have been observed at the indicated location.  On the
other hand, an indication of an alternate location means that the feature is
known to occur in one or the other position, but which could not be resolved
experimentally, for example
  Modified site: (or 81) N6,N6,N6-trimethyllysine (Lys) (experimental)

The "predicted" status means that either the nature, the location, or both, of
the feature has been predicted by some means.  The experimental observation of
a feature under unnatural conditions should be carefully considered and if the
conditions seem sufficiently different from the natural case, the feature
should be marked as a prediction.  With the present system, a distinct problem
occurs when either the nature or the location of a feature, but not both, has
been experimentally determined.  Generally, the most definite form of evidence
should be presented and appropriate comments should be provided, either in the
feature or in comments with the entry.  For example, if a protein is known to
be blocked and a translated sequence is presented, but the nature and location
of the blocking group are unknown, then only a note or comment is appropriate. 
We would welcome any suggestions on how a feature with both experimental and
predicted aspects can best be presented.

The status "absent" is used to indicate a feature that, although it would be
otherwise predicted by some means, has been experimentally determined not to
occur at the indicated position.  It is intended to be used in the very limited
cases when an investigation of the specific feature produced the experimental
result.  Currently, this status is mainly used for the
  "Binding site: carbohydrate (Asn) (covalent)"
feature.  Presently, the PIR produces the only sequence databases where it is
possible to distinguish the cases where there is negative experimental evidence
from the cases where there is merely insufficient annotation. 

The status "atypical" is used to indicate a feature that does not follow the
"normal" pattern, that would otherwise be predicted not to occur, but that has
been experimentally determined to occur at the indicated location.  Again, it
is intended to be used in the very limited cases when an investigation of the
specific feature produced this result.  Examples of its use are
1. homology domains that have unusually large insertions or deletions,
2. carbohydrate binding site with the pattern N-X-C,
3. metal binding sites with an apparently missing ligand residue.

                         Discussion of "Combinability"

With the adoption of an "object-oriented" approach, it became important to
distinguish the case of groups of residues constituting in the aggregate one
feature from the case of residues in individual features sharing the same
description and grouped for convenience.  An example of a group of residues
constituting in the aggregate one feature is
  192,226,231/Binding site: copper (His, Cys, His)
Together this particular group of residues forms one unique binding site for
copper. In this case the group forms a single "object" and other groups of the 
same kind could not be combined in the same record without creating ambiguity
in the identity of the objects being represented.  Such features are not
"combinable".
An example of residues in individual features sharing the same description and
grouped for convenience is
  192,226,231/Binding site: carbohydrate (Asn) (covalent)
In this case each residue individually is an "object" and they can be combined
without introducing ambiguity.  Such features are "combinable". 

Features should not be combined unless in the discussion of a particular type
of feature it is explicitly stated that it is combinable.  Features with
different "#status" or "#link" descriptors should not be combined.


                              Discussion of Tags

Tags are short labels that are attached to certain features and other records
in the database.  A particular tagged feature may have repeated examples 
within a single entry that must be uniquely distinguished by different tags.
A tag is the very last element of the record separated from the preceding
elements by a single space.  Only one tag should occur in each record.  A
tag consists of (in order):
  the character "<",
  three or four uppercase alphabetic characters or numbers,
  the character ">"
The same tag must not be applied to more than one record of any type in each
entry.  There is, as yet, no way to check for or impose standardization on the
use of tags beyond these format rules.  Some suggestions about tags will be
discussed with particular features. Tags must be used with Domain and Product
feature records.  Their use with other feature types is presently problematic
[GRAY].

                        Discussion of Order of Features

The features records in each entry are essentially independent entities.  A
computer program reading a feature table could derive no additional information
from the order of the records in it.  However, the feature table is also looked
at by humans from time to time, and the imposition of some regularity in the
arrangement of features can be very helpful.  The preferred order of features
is as follows.

First, Product, Domain and Region records are arranged as a group in increasing
order by the first element of their range, then in decreasing order by the
second element of their range.

Second, site and bond records are arranged as a group in increasing order by
their first element.

Formerly, a certain amount of "artistic license" could be employed in arranging
a feature table to emphasize certain structural aspects of the protein or
simply to give it a greater degree of coherence.  The change mechanism does not
follow an annotator's idiosyncratic order and feature tables will rearranged
according to the rules above.
                               PART 1 - Sites

                             "Active site" Record

The "Active site" record is applied to residues of enzymes known or thought to
function in the actual catalytic reaction of the enzyme.  It should be applied
to a single residue or a short list of residues; it should not be applied to a
range (a hyphenated pair).  If the active site residues are not specifically
known but have been localized to a segment of the sequence, the "Region" record
rather than the "Active site" record should be used.  "Active site" features in
entries without an Enzyme Commission notation in their title, "Contains" or
"Alternate names" records are suspect and will be flagged by checking programs.

The format for the "Active site" record is
  "Active site:" res ["," res...] ["(" description ")"] ["#link" link]
     "#status " status

The status is required for this feature and should always be applied for new
entries.  All the residues participating in each active site that do not
require different modifiers, should be combined in the same feature.  Do not
combine residues from different active sites or that need different modifiers.
The use of description fields, discussed below, should be avoided if possible.
Examples,
  Active site: Arg #status experimental
  Active site: Asp, His, Ser #status predicted
  Active site: His, His, Asp #status experimental

A residue list may be used only for those residues which participate in the
same concerted catalytic reaction.  If all the residues participating in one
active site are the same type, then only one residue need be shown.  Enzymes
recognized to have several distinct catalytic reactions should have an "Active
site" record for each active site.  Multiple "Active site" records for what
is, in fact, a single active site should be combined into one record using a
list of residues, unless different status conditions apply.

[GRAY] Formerly, mechanisms were presented but this should no longer be done
except when the mechanism is used as a description.  Generally such a
description should be applied only when multiple active sites occur in the same
entry.  In particular, the description "charge relay system" should not be used
except in enzymes with multiple activities.  Examples are
  Active site: Cys (amide transfer)
  Active site: Cys (of 3-oxoacyl-[acyl-carrier-protein] synthase)
  Active site: Lys (of 3-oxoacyl-[acyl-carrier-protein] reductase)
  Active site: Lys (of enoyl-[acyl-carrier-protein] reductase)
  Active site: Ser (of enoyl-[acyl-carrier-protein] reductase)
  Active site: Ser (of oleoyl-[acyl-carrier-protein] hydrolase)
  Active site: Ser (of [acyl-carrier-protein] acetyl/malonyltransferase)
  Active site: Glu (alpha-reaction)
  Active site: His, Lys, Cys (beta-reaction)
Descriptors like these are now being replaced with "#link" modifiers which
point to tags in appropriate Function records, or Domain or Product features.
  Active site: Cys #link ARD #status predicted
Here the link "ARD" points to a Function record with the tag "<ARD>".  This
mechanism will also be used to link active site records with different status
conditions but which belong to the same active site object.

When a residue has a stable, covalently-bound, catalytically-active prosthetic
group, only the "Binding site: ... (covalent)" feature should be used.  An
"Active site" record should not also be used because it is the prosthetic group
which is active and not the amino acid as such.  In particular, for an active
site phosphoserine only the annotation
  Binding site: phosphate (Ser) (covalent) #status experimental
should appear.  When a residue forms a transient, covalent bond in its role as
an active site then the "Active site" record should be used and the description
field may be used.  The nature of the intermediate should be made as clear as
practical.  Annotators should consider carefully whether a covalently-bound
group is stable or transient in determining whether an annotation should be for
a modified or an active site.  The following possible features show active
sites with transient groups that could easily be confused with a binding site,
  Active site: Ser (phosphoserine intermediate)
  Active site: Tyr (phosphotyrosine intermediate)
No examples yet exist of the second feature.  Other current acceptable
examples are
  Active site: Asp (aspartylphosphate intermediate)
  Active site: Cys (phosphocysteine intermediate)
  Active site: Cys (S-acetylcysteine intermediate)
  Active site: Cys (sulfocysteine intermediate)
  Active site: His (phosphohistidine intermediate)
  Active site: Lys (ribulose-bisphosphate-binding)
Most of these features are documented in the RESID database.  Avoid records
that are unnecessarily detailed or are synonymous with existing features, like
  Active site: His (covalent intermediate)
  Active site: Asp (phosphate-binding)

Be particularly suspicious of claims that Gly, Val, Leu, Ile, Pro, Asn, Gln,
Pro, Met or Phe residues are active site residues.  It is chemically dubious
that such residues function in the actual catalytic reaction of an enzyme.
Glycine and a few other residues can form free radicals that participate in
free radical reactions, but for physical reasons such reactions are extremely
rare in biochemical reactions.
  Active site: Cys (cysteine thiyl radical intermediate)
  Active site: Gly (stable glycyl radical)
  Active site: Trp (tryptophyl radical intermediate)
  Active site: Tyr (stable tyrosyl radical)
These features are documented in the RESID database.

Residues that are structurally located near an active site but do not
participate directly in the catalytic reaction of that active site should not
be annotated in the PIR databases.  Annotations for such residues will only be
carried from PDB entries in the NRL_3D database.  Not all reactive compounds
that block an enzymatic reaction wind up reacting with an active site residue;
they may react with a residue near the active site and block the substrate's
access to the active site.  Something may be more of a "reactive site" than
an "active site", so be cautious about accepting such evidence as experimental
for active site residues.

For cysteine residues that form catalytically active disulfide bonds only the
annotation
  Disulfide bonds: redox-active
should appear.

Even though selenocysteine may function as an active site, only the feature
  Modified site: selenocysteine
should be used.

Residues that participate in allosteric control of enzyme activity but are not
catalytically active should not be annotated as active sites but as binding
sites or as regions.

In the case of residues that participate in different, symmetry-related active
sites of complexes should not be combined in the same feature, but an
appropriate description should be used to indicate the relationship.
  Active site: Asp (shared with dimeric partner)
  Active site: Cys (shared with dimeric partner)
These features imply that there are two symmetry-related active sites.  Each
site consists of an aspartate and a cysteine contributed by different chains
of the homodimer.

[BLACK] The annotation
  Active site: ... (inhibitory) ...
should not be used.  Instead, use the annotation
  Inhibitory site:

[BLACK] Do not use the expression "active site" in either "Domain" or "Region"
features.  Instead, use the term "catalytic".

            General Definitions for Binding Sites and Modified Sites

In the following discussion of binding sites and modified sites the following
definitions are very important.  Because they include historical accidents and
grammatical exigencies, these are operational definitions and do not
necessarily extend beyond the purposes of this document.

Generally an attachment site is an amino acid residue which has its side chain
chemically changed post-translationally in such a way that it could be restored
by physiological processes of hydrolysis, ammonolysis or simple (2H) reduction.
Such chemical changes may occur transiently, or more or less permanently, but
they must be covalent.  The principle is that attachment site residues could in
priciple be recovered and detected by typical methods of sequence analysis,
whereas modified sites could not be.

The "Binding site" feature includes two classes, attachment sites and binding
sites.  A "binding site" is an amino acid residue, or a group of them, that
forms biochemically important, non-covalent bonds with ions or molecules (other
than the protein constituting the entry).  These bonds may be ionic, ligand
(dative), Van der Waals, or donative or receptive hydrogen bonds.  One
borderline case is the sulfur-metal bond which will be regarded as covalent
for cysteine and non-covalent (dative ligand) for methionine.

In anticipation of implementation of the SDDL format, attachment sites will
distinguished by using "(covalent)" in "Binding site" records.  All new
"Binding sites" without "(covalent)" are severely reviewed and subject to
automatic conversion.  Consequently it is very important for annotators to
provide the "(covalent)" designation in every case when it should be applied. 

A "modified site" is an amino acid residue which is either
(1) chemically changed post-translationally in such a way that it could not be
    restored by physiological processes of hydrolysis, ammonolysis or simple
    (2H) reduction (that is, it is not a side-chain attachment site),
(2) chemically changed in any way involving the alpha amino group, including
    N-formylmethionine (this applies to both the amino terminus and internal
    residues),
(3) a carboxyl terminal residue with any chemical change involving the
    1-carboxyl group,
(4) a selenocysteine residue (these are translationally incorporated but for
    historical reasons are regarded as modified cysteine residues);
(5) aspartate or glutamate esters that can arise from either the acid or the
    amide forms.

                            "Binding site" Record

Using the foregoing definitions "Binding site" records are applied in two cases:
(1) when an amino acid residue, or a group of them, forms biochemically
    important, non-covalent bonds with ions or molecules (other than the
    protein constituting the entry); or
(2) when an amino acid residue forms an attachment site in which its side
    chain is chemically changed post-translationally in such a way that it
    could in principle be restored by physiological processes.  Such cases
    must have a "(covalent)" bond description.
The format for the "Binding site" record is
  "Binding site:" ["(or" position ")"] bound-group name "(" res ["," res...] ")"
    ["(covalent)" | "(" bonding description ")"] ["(" form ")"]
    ["(partial)"] ["#link " link] "#status " status
The status is required for this feature.

Currently acceptable covalent examples are as follows.  The status, link and
partial descriptors have been removed, and a few minor variants have been
eliminated.  Most of these features are documented in the residues database.
  Binding site: 2Fe-2S cluster (Cys) (covalent)
  Binding site: 3Fe-4S cluster (Cys) (covalent)
  Binding site: 4Fe-4S cluster (Cys) (covalent)
* Binding site: iron-sulfur cluster (Cys) (covalent)
[use this only when the cluster form has not been determined and cannot be
predicted]
  Binding site: 4-hydroxycinnamyl (Cys) (covalent)
  Binding site: acetyl (Lys) (covalent)
  Binding site: AMP (Tyr) (covalent)
  Binding site: biotin (Lys) (covalent)
  Binding site: carbohydrate (Asn) (covalent)
  Binding site: carbohydrate (Cys) (covalent)
  Binding site: carbohydrate (Lys) (covalent)
  Binding site: carbohydrate (Ser) (covalent)
  Binding site: carbohydrate (Thr) (covalent)
  Binding site: carbohydrate (Trp) (covalent)
  Binding site: carbohydrate (Tyr) (covalent)
  Binding site: carbon dioxide (Lys) (covalent) (by ...)
  Binding site: chondroitin sulfate (Ser) (covalent)
  Binding site: cysteine (Cys) (covalent)
  Binding site: dermatan sulfate (Ser) (covalent)
  Binding site: farnesyl (Cys) (covalent)
  Binding site: fatty acid (Ser) (covalent)
  Binding site: fatty acid (Thr) (covalent)
  Binding site: formyl (Lys) (covalent)
  Binding site: geranyl-geranyl (Cys) (covalent)
  Binding site: glutathione (Cys) (covalent)
  Binding site: glycerylphosphorylethanolamine (Glu) (covalent)
  Binding site: heme (Cys) (covalent)
  Binding site: heme (Glu) (covalent)
  Binding site: heme, high potential (Cys) (covalent)
  Binding site: heme, low potential (Cys) (covalent)
  Binding site: heparan sulfate (Ser) (covalent)
  Binding site: homocitryl Mo-7Fe-8S cluster (Cys) (covalent)
  Binding site: keratan sulfate (Thr) (covalent)
  Binding site: lipoamide (Lys) (covalent)
  Binding site: methyl (Cys) (covalent)
  Binding site: molybdopterin (Cys) (covalent)
  Binding site: molybdopterin guanine dinucleotide (Cys) (covalent)
  Binding site: murein (Lys) (covalent)
  Binding site: myristate (Lys) (covalent)
  Binding site: nitrosonium (Cys) (covalent)
  Binding site: palmitate (Cys) (covalent)
  Binding site: palmitate (Lys) (covalent)
  Binding site: phosphate (Arg) (covalent)
  Binding site: phosphate (Asp) (covalent)
  Binding site: phosphate (His) (covalent)
  Binding site: phosphate (His) (covalent) (by autophosphorylation)
  Binding site: phosphate (Ser) (covalent)
  Binding site: phosphate (Ser) (covalent) (by ...)
  Binding site: phosphate (Ser) (covalent) (in ...)
  Binding site: phosphate (Thr) (covalent)
  Binding site: phosphate (Thr) (covalent) (by ...)
  Binding site: phosphate (Tyr) (covalent)
  Binding site: phosphate (Tyr) (covalent) (by ...)
  Binding site: phosphopantetheine (Ser) (covalent)
  Binding site: phosphoribosyl dephospho-coenzyme A (Ser) (covalent)
  Binding site: phosphoryl-DNA (Ser) (covalent)
  Binding site: phosphoryl-RNA (Ser) (covalent)
  Binding site: phycocyanobilin (Cys) (covalent)
  Binding site: phycoerythrobilin (Cys) (covalent)
  Binding site: phytochromobilin (Cys) (covalent)
  Binding site: polyglutamate (Glu) (covalent)
  Binding site: polyglycine (Glu) (covalent)
  Binding site: pyridoxal phosphate (Lys) (covalent)
  Binding site: retinal (Lys) (covalent)
  Binding site: sn-2,3-diacylglycerol (Cys) (covalent)
  Binding site: sn-2,3-diphytanylglycerol diether (Cys) (covalent)
  Binding site: sulfate (Tyr) (covalent)
A large variety in the "(by ...)" descriptor exists.  Please consult the
database to determine currently used forms.

Examples of currently acceptable "Binding site" features not labeled "covalent"
are as follows.  The residue lists (in all but a few cases), status, link and
partial descriptors have been removed, and a few minor variants have been
eliminated. 
  Binding site: 2,3-diphosphoglycerate
  Binding site: 2Fe-2S cluster (His) (ligands)
  Binding site: 2Fe-O cluster
  Binding site: 4Fe-4S cluster 2 iron (Ser) (ligand)
  Binding site: 4Fe-4S cluster iron (His) (ligand)
  Binding site: ADP
  Binding site: ADP/AMP, allosteric
  Binding site: AMP, allosteric
  Binding site: ATP
  Binding site: ATP/GTP
  Binding site: FAD
  Binding site: FMN
  Binding site: GTP
  Binding site: GTP/GDP/EF-Ts
  Binding site: Mg-ATP
  Binding site: N-acetylgalactosamine
  Binding site: NAD
  Binding site: NAD(P)
  Binding site: NADP
  Binding site: adenosylcobalamin
  Binding site: aminoacyl-tRNA
  Binding site: anion
  Binding site: bacteriochlorophyll b magnesium (Asn) (axial ligand)
  Binding site: bacteriochlorophyll magnesium (His) (axial ligand)
  Binding site: bacteriochlorophyll magnesium, accessory (His) (axial ligand)
  Binding site: bacteriochlorophyll magnesium, special pair (His) (axial ligand)
  Binding site: bacteriopheophytin
  Binding site: beta-D-ribopyranose
  Binding site: bilirubin
  Binding site: cAMP
  Binding site: calcium
  Binding site: calcium, high affinity
  Binding site: calcium, low affinity
  Binding site: carbonate
  Binding site: cardiac glycoside
  Binding site: chloride
  Binding site: chlorophyll a magnesium (His) (axial ligand)
  Binding site: citrate, allosteric
  Binding site: cobalt 1
  Binding site: cobalt 2
  Binding site: copper
  Binding site: copper (His) (type 2)
  Binding site: copper (His) (type 3)
  Binding site: copper (His, Cys, His, Met) (type 1)
  Binding site: copper 1
  Binding site: copper 2
  Binding site: cyclosporin
  Binding site: divalent metal ions
  Binding site: fatty acid
  Binding site: fibrin
  Binding site: fructose 2,6-bisphosphate
  Binding site: fructose-1,6-bisphosphate
  Binding site: fructose-6-phosphate
  Binding site: galactose
  Binding site: glutamate
  Binding site: heme O iron (His) (axial ligand)
  Binding site: heme a iron (His) (axial ligands)
  Binding site: heme a3 iron (His) (axial ligand)
  Binding site: heme iron (Cys) (axial ligand)
[the following with one locant]
  Binding site: heme iron (His) (axial ligand)
  Binding site: heme iron (His) (axial ligand) (shared with alpha chain)
  Binding site: heme iron (His) (axial ligand) (shared with beta chain)
[the following with two locants]
  Binding site: heme iron (His) (axial ligands)
  Binding site: heme iron (His) (proximal axial ligand)
  Binding site: heme iron (His, Met) (axial ligands)
  Binding site: heme iron (Met, His) (axial ligands)
  Binding site: heme iron (Tyr) (axial ligand)
  Binding site: heme iron, high potential (His) (axial ligand)
  Binding site: heme iron, high potential (His) (axial ligands)
  Binding site: heme iron, high potential (His, Met) (axial ligands)
  Binding site: heme iron, high potential (His, Tyr) (axial ligands)
  Binding site: heme iron, low potential (His) (axial ligand)
  Binding site: heme iron, low potential (His) (axial ligands)
  Binding site: heme iron, low potential (His, Tyr) (axial ligands)
  Binding site: heparin
  Binding site: histamine
  Binding site: homocitryl Mo-7Fe-8S cluster molybdenum (His) (ligand)
  Binding site: iron
  Binding site: iron (Asp) (shared with tetrameric partners)
  Binding site: iron (His) (shared with chain M)
  Binding site: iron (His, Glu, His) (shared with chain L)
  Binding site: iron (Lys) (shared with tetrameric partners)
  Binding site: magnesium
  Binding site: magnesium (Glu) (shared with chain I)
  Binding site: magnesium (His) (shared with chain II)
  Binding site: manganese
  Binding site: mercury
  Binding site: metal
  Binding site: methylcobalamin cobalt
  Binding site: micellar substrate
  Binding site: molybdopterin (Arg)
  Binding site: molybdopterin cytosine dinucleotide (Arg)
  Binding site: nickel
  Binding site: nickel 1
  Binding site: nickel 2
  Binding site: omega-aminocarboxylic acids
  Binding site: oxygen (His) (distal axial ligand)
  Binding site: oxygen (Tyr) (distal axial ligand)
  Binding site: phospholipid
  Binding site: plastoquinone
  Binding site: potassium
  Binding site: pyrophosphate
  Binding site: retinoic acid
  Binding site: siroheme iron (Cys) (axial ligand)
  Binding site: substrate
  Binding site: substrate phosphate
  Binding site: thyroxine
  Binding site: transition metal ions
  Binding site: ubiquinone
  Binding site: zinc
  Binding site: zinc, catalytic
[see note below on the next two]
  Binding site: zinc, catalytic (Cys, His, His, His) (inhibited)
  Binding site: zinc, catalytic (His) (active)
  Binding site: zinc, high affinity
  Binding site: zinc, noncatalytic
All these have been reviewed.  If a reference is encountered that discusses the
covalent nature of one of these binding sites, please bring it to the attention
of Steve Garavelli.  Be careful when you encounter a binding site established
by a reactive analogue --- these are designed to form covalent bonds when the
actual compound may be bound noncovalently.  None of the former features
"Binding site: ATP (Lys) (covalent)" were ever actually covalent!

An alternate locant may be placed after the "Binding site" and before
the bound group name.
  Binding site: (or 150) phosphate (Ser) (covalent) #status experimental
but this form should be avoided if at all possible.

The bound-group name must always be followed by a set of parentheses inclosing
a residue or a list of residues that matches sequence residues corresponding to
the preceding numbers.  Strict parsing is enforced for this rule.  If all the
residues participating in one binding site are the same type, then only one
residue need be shown, for example
  Binding site: calcium (Asp)

The only bonding descriptions presently used are "covalent", "axial ligand",
"axial ligands", "proximal axial ligand" and "distal axial ligand".  For these
ligand cases, care must be taken in specifying the bound entity: "heme iron"
rather than simply "heme".
  Binding site: heme iron (His, Met) (axial ligands)
Covalent bonds to heme and similar prosthetic groups are to the group and not
to the metal.
  Binding site: heme (Cys) (covalent)
Also, use "ligand" if there is only one locant in the feature, and "ligands"
if there are two or more locants even though they are all the same type of
residue and one residue is shown.  Thus,
  44/Binding site: heme iron (His) (axial ligand)
  44,68/Binding site: heme iron (His) (axial ligands)
The second feature has two locants, "44,68", but only one residue, "His",
and "ligands" is used.

When a particular binding site occurs in both an active and an inhibited
form, binding site records should appear for both forms.
  Binding site: zinc, catalytic (Cys, His, His, His) (inhibited)
  Binding site: zinc, catalytic (His) (active)
In this pair of records, the first denotes the inhibited binding site
with a Cys ligand from a propeptide, and the second denotes the active
binding site with only the three His ligands of the enzyme.

A single substrate may be listed simply as "substrate".  For multiple
substrates, other than water, in the same entry the substrate may be named.
  Binding site: substrate (Arg)
  Binding site: fructose-1,6-bisphosphate (Lys) (covalent)

When it is experimentally observed that a group is covalently bound at less
than 95 mole per cent, the "(partial)" annotation should be used.  [BLACK] A
numeric percentage or some other fractional indication should not be used.  If
the covalent binding is 95 mole percent or greater, don't use the "(partial)"
annotation.  If the "(partial)" annotation is used, it will almost always be
based on an experimental observation so the "#status experimental" status
should also appear; [BLACK] do not use "(partial) #status predicted".

The "in" form should be used VERY SPARINGLY when the covalent bond is known
to occur only in the mature form or in one of several alternative polypeptide
products and the entry presents an immature sequence.
  Binding site: carbohydrate (Asp) (covalent) (in mature form)
  Binding site: phosphopantetheine (Ser) (covalent) (in acyl carrier protein)
These may be replaced by appropriate "#link" descriptors.

The "by" form is used to distinguish among different binding sites of the same
group, for example
  Binding site: phosphate (Ser) (covalent) (by autophosphorylation)
  Binding site: phosphate (Ser) (covalent) (by Ca/calmodulin-dependent kinase)
  Binding site: phosphate (Ser) (covalent) (by cAMP-dependent protein kinase)
  Binding site: phosphate (Tyr) (covalent) (by autophosphorylation in vivo)
  Binding site: phosphate (Tyr) (covalent) (by autophosphorylation in vitro)
[GRAY] The use of the terms "in vivo" and "in vitro" is questionable.  If a
feature is known to occur "in vivo", it is what would otherwise be regarded as
an experimentally determined features and so the term is superfluous. If a
feature is known to occur "in vitro", then even if it is experimentally
determined it only amounts to a prediction that the natural modification might
occur at that location and just the "#status predicted" status is warranted. 
Alternatively, if an "in vitro" feature marks something that occurs under
unnatural conditions and the descriptor would only distinguish it from the
natural occurances, then a comment is warranted and not a feature (as with the
former "Binding site: carbohydrate (Gln)" features determined to be unnatural). 
A feature marked both "in vitro" and "#status predicted" would seem to have
very little value under any circumstance.

Some covalent binding sites can occur only as a consequence of a prior
modification.  These are nonetheless biochemically separate and distinct
features.  For such cases we use two features, one to indicate the nature of
the modification and the other to indicate the secondary change.  For example,
  42/Modified site: 5-hydroxylysine (Lys)
  42/Binding site: carbohydrate (Lys) (covalent)
In the first step, a lysine is hydroxylated.  It may (or possibly may not) then
be subsequently hydroxylated.  If they were combined in a single feature, there
would be a problem using the "partial" modifier.  Would it mean the lysines at
that position were partially hydroxylated but all the hydroxylysines were
glycosylated, or would it mean that the lysines were all hydroxylated but that
hydroxylysines were partially glycosylated.  In the RESID database such cases
are indicated by the records
  Conditions: secondary to ...
if a prior modification is required or
  Conditions: incidental to ...
if it is not.

N6-acetylated lysine will be annotated as
  Binding site: acetyl (Lys) (covalent)
[BLACK] Do not annotate it as
  Modified site: N6-acetyllysine (Lys)

When there are biochemically significantly different binding sites for the same
compound in the same entry (rare), the bound-group name may include modifiers
that distinguish between the functional differences of the bound-group or of
the binding sites.  These modifiers should be placed after the bound-group,
without parentheses and separated from it by a comma.  For example,
  Binding site: calcium, high affinity
  Binding site: calcium, low affinity
  Binding site: heme, high-potential (Cys) (covalent)
  Binding site: heme, low-potential (Cys) (covalent)
  Binding site: heme iron, high-potential (His)
  Binding site: heme iron, low-potential (His)
  Binding site: zinc, catalytic
  Binding site: zinc, noncatalytic
Otherwise, different binding sites are only distiguished by being grouped in
separate "Binding site" records and those binding sites should not be labeled.
[GRAY] Do not use such features as
  Binding site: calcium 1
  Binding site: calcium 2
except to distinguish structurally distinct features, and not otherwise
chemically indistinguishable sites.

Where the sequence was determined by protein sequencing and the nature of the
covalently attached group precludes assignment of a residue as either an acid
or an amide, and unless there is unequivocal evidence to the contrary (for
example, the nucleotide sequence), there is a reasonable biochemical 
presumption that the residue should be the amide.  The reported sequence should
be presented with the ambiguity explicit in the "Residues:" record, the amide
presented in the sequence and feature records and an appropriate note like
 Note: we have shown the unidentified residue(s) as ... forming ... 
    (or bound to ...) based on ....

[GRAY] Concerted non-covalent binding of macromolecules by a set of residues
would probably best be annotated through a "Region" record rather than through
a "Binding site" record.  Something like
  42-60/Region: DNA-binding
should be used instead of
  42,45,48,50,53,56,60/Binding site: DNA (Leu)

                          "Inhibitory site" Record

The format for the "Inhibitory site" record is
  "Inhibitory site:" res ["," res...] "(" activity ["," activity ...] ")"
    "#status " status
An inhibitory site is to an inhibitor what an active site is to an enzyme. It
is the residue, or small set of residues, that is responsible for blocking the
activity of an enzyme or set of enzymes.  It should be applied to single
residues, and to a small list of residues only sparingly.  The status is
required for this feature.  Without a crystallographic structure it is very
difficult to obtain experimental evidence that a particular residue is an
inhibitory site, so most will have predicted status.  Some examples, with
status omitted
  Inhibitory site: Arg (acrosin)
  Inhibitory site: Arg (thrombin, coagulation factor Xa)
  Inhibitory site: Arg (trypsin)
  Inhibitory site: Arg (unknown proteinase)
  Inhibitory site: Cys (thermolysin)
  Inhibitory site: Leu (chymotrypsin)
  Inhibitory site: Leu (chymotrypsin, elastase)
  Inhibitory site: Lys (trypsin)
  Inhibitory site: Met (chymotrypsin, subtilisin)
  Inhibitory site: Tyr (chymotrypsin)

[GRAY] In the case that one of two residues is thought to be responsible for
the inhibitory action, the record may be applied to a list and this format is
used
  "Inhibitory site:" res "or" res "(" activity ["," activity ...] ")"
    "#status " status
For example,
  Inhibitory site: Leu or Met (elastin, chymotrypsin) #status predicted
The "or" form should be avoided whenever possible.

[BLACK] The "Inhibitory site" record is not used for allosteric inhibitor sites;
those may be annotated as binding sites.

                            "Modified site" Record

The format for the "Modified site" record is
  "Modified site:" ["(or" position ")"] chemical name "(" res ")"
                         ["(" form ")"] ["(" extent ")"] "#status " status
"res" is the three-letter code for the original encoded residue (with the
exception of selenocysteine and N-formylmethionine where no three-letter code
is used).  The "or" form should be avoided whenever possible.  Different
residues with the same feature can be combined.  In cases when an annotator
wishes to distinguish the features belonging to different domain, or product
features more clearly, then the separate modified sites for the different
domains need not be combined, as with blocked amino- or carboxyl-terminals.
The status is required for this feature.

ALL THESE FEATURES SHOULD BE DOCUMENTED IN THE RESID DATABASE.  Bring any
new examples to the attention of Steve Garavelli.

A. Modified side chains

In the most general case the side chain is chemically modified in such a way
that the original residue could not (in principle) be detected by normal
sequencing methods.  The following is a list of such modified residues.
  Modified site: (Z)-dehydrobutyrine (Thr)
  Modified site: 2'-bromophenylalanine (Phe)
  Modified site: 2'-glucosyl-tryptophan (Trp)
  Modified site: 2'-[3-carboxamido-3-(trimethylammonio)propyl]histidine (His)
  Modified site: 3',4'-dihydroxyphenylalanine (Tyr)
  Modified site: 3'-bromophenylalanine (Phe)
  Modified site: 3'-FAD-histidine (His)
  Modified site: 3'-methylhistidine (His)
  Modified site: 3-hydroxyphenylalanine (Phe)
  Modified site: 3-hydroxyproline (Pro)
  Modified site: 3-oxoalanine (Cys)
  Modified site: 4'-bromophenylalanine (Phe)
  Modified site: 4-hydroxyarginine (Arg)
  Modified site: 4-hydroxylysine (Lys)
  Modified site: 4-hydroxyproline (Pro)
  Modified site: 5-hydroxylysine (Lys)
  Modified site: 6-bromotryptophan (Trp)
  Modified site: ADP-ribosylarginine (Arg) (by ...)
  Modified site: ADP-ribosylasparagine (Asn) (by ...)
  Modified site: ADP-ribosylcysteine (Cys) (by ...)
  Modified site: ADP-ribosylserine (Ser) (by ...)
  Modified site: allysine (Lys)
  Modified site: arginine derivative (Arg)
  Modified site: asparagine derivative (Asn)
  Modified site: beta-methylthioaspartic acid (Asp)
  Modified site: bromohistidine (His)
  Modified site: citrulline (Arg)
  Modified site: cysteine derivative (Cys)
  Modified site: cysteine sulfenic acid (Cys)
  Modified site: D-alanine (Ala)
  Modified site: D-alanine (Ser)
  Modified site: D-allo-isoleucine (Ile)
  Modified site: D-asparagine (Asn)
  Modified site: D-leucine (Leu)
  Modified site: D-methionine (Met)
  Modified site: D-phenylalanine (Phe)
  Modified site: D-serine (Ser)
  Modified site: D-tryptophan (Trp)
  Modified site: dehydroalanine (Ser)
  Modified site: dehydroalanine (Tyr)
  Modified site: dehydrobutyrine (Thr)
  Modified site: dehydrotyrosine (Tyr)
  Modified site: erythro-beta-hydroxyasparagine (Asn)
  Modified site: erythro-beta-hydroxyaspartic acid (Asp)
  Modified site: gamma-carboxyglutamic acid (Glu)
  Modified site: glutamate methyl ester (Gln)
  Modified site: glutamate methyl ester (Glu)
  Modified site: glutamine derivative (Gln)
[the following two ambiguous features should be avoided if possible]
  Modified site: hydroxylysine (Lys)
  Modified site: hydroxyproline (Pro)
  Modified site: isoleucine derivative (Ile)
  Modified site: lysine derivative (Lys)
  Modified site: N4-methylasparagine (Asn)
  Modified site: N5-methylglutamine (Gln)
  Modified site: N6,N6,N6-trimethyllysine (Lys)
  Modified site: N6,N6-dimethyllysine (Lys)
  Modified site: N6-(4-amino-2-hydroxybutyl)lysine (Lys)
  Modified site: N6-methyllysine (Lys)
  Modified site: omega-N,omega-N-dimethylarginine (Arg)
  Modified site: omega-N,omega-N'-dimethylarginine (Arg)
  Modified site: omega-N-methylarginine
  Modified site: S-(6-FMN)-cysteine (Cys)
  Modified site: S-(8alpha-FAD)-cysteine (Cys)
  Modified site: selenocysteine
  Modified site: thyroxine (Tyr)
  Modified site: topaquinone (Tyr)
  Modified site: triiodothyronine (Tyr)
  Modified site: tryptophyl quinone (Trp)
Whenever possible, new modified residues should be added with substitution
positions and stereo-isomer indicators provided in accordance with appropriate
IUPAC and IUB rules.  Steve Garavelli is the curator for these modified residue
names.  Please bring any additional or new modified residues to his attention.

[BLACK] Ambiguous notations such as
  Modified site: methylation #status predicted
should never be used.

We have chosen to use the unambiguous IUPAC numbered position forms, in
preference to the IUB Greek letter designations, when such usage allows us to
avoid inconsistencies between common usage ("epsilon-aminomethyl") and IUB
recommended usage ("zeta-amino-methyl").

Note that standard abbreviations for the modified residues are not used, so
that, the correct feature is
  Modified site: gamma-carboxyglutamic acid (Glu)
and not
  Modified site: gamma-carboxyglutamic acid (Gla)

B. Modified Amino Terminus
The format for this form of the "Modified site" record is
  "Modified site:" chemical_name "(" res ")"
    ["(" form ")"] ["(" extent ")"] "#status " status

The chemical name should be as specific as possible and should usually include
the term "amino end" at the end.  When an unblocked or longer precursor form
is presented in the entry and the modified site is not position 1, the "in
mature form" modifier should be used, for example.
 Modified site: acetylated amino end (Ala) (in mature form) #status experimental
[GRAY] Because not all processed forms requiring this modifier are the final
"mature" form, it may become necessary to replace this modifier with something
like "(in processed form) #link ...".  Annotators are invited to comment on
this proposal.

Current acceptable examples are:
  Modified site: 2-oxobutanoic acid (Thr)
  Modified site: L-3-phenyllactic acid (Phe)
  Modified site: N-formylmethionine
  Modified site: acetylated amino end (xxx)
* Modified site: blocked amino end 
[* this form is used only when the presented sequence is completely ambiguous
at the amino terminus]
  Modified site: blocked amino end (xxx)
  Modified site: dimethylated amino end (Pro)
  Modified site: fatty acylated amino end (Cys)
  Modified site: formylated amino end (Gly)
  Modified site: glucuronylated amino end (Gly)
  Modified site: methylated amino end (Ala)
  Modified site: myristylated amino end (Gly)
  Modified site: succinylated amino end (Trp)
  Modified site: pyrrolidone carboxylic acid (Gln)
  Modified site: pyruvic acid (Ser)
  Modified site: trimethylated amino end (Ala)

The form descriptor "(probably ...)" should be used with "blocked amino end"
whenever an appropriate prediction can be made for an otherwise experimentally
determined ambiguous feature.
  Modified site: blocked amino end (Ala) (probably acetylated)
                                                         #status experimental
The "blocked amino end" is usually only appropriate with experimental status,
because otherwise the specific modification would be used with a predicted
status.  With increasing degrees of certainty,
  Modified site: acetylated amino end (Ala) #status predicted
says you are guessing both whether and by what,
  Modified site: blocked amino end (Ala) (probably acetylated)
                                                         #status experimental
says you know whether but are guessing by what,
  Modified site: acetylated amino end (Ala) #status experimental
says you know both whether and by what.

Formylated amino terminal methionine is coded for and like selenocysteine is
not really a modified site.  However it should be annotated as a modified site
when it is experimentally observed in a protein.  Making the residue explicit
is not required in this case.  No occurrence has yet been noted of this
modified residue in other than the first position.

For amino terminal glutamine undergoing cyclization the format is
  "Modified site: pyrrolidone carboxylic acid (Gln)" ["(in mature form)"]
                                             ["#link " link] "#status " status
When the amino terminus is known to be glutamine and blocked, pyrrolidone
carboxylic acid can be assumed unless a reason to believe otherwise is
explicitly provided, in which case
  Modified site: blocked amino end (Gln) (in mature form) #status experimental
should be used. The form
  Modified site: pyrrolidone carboxylic acid (Glx)
should be avoided. The ambiguity should be explicitly noted in the "Residues"
record, an appropriate comment made, and the sequence and feature presented as
Gln.  People entering sequences should be explicitly warned about the notation
"<E" appearing in some articles; such sequences should be entered with a "Q"
and an appropriate feature prepared.

[BLACK] Doubly annotated forms like
  Modified site: acetylated and phosphorylated amino end (Ser)
should not be used.  These should appear in two records.
  Modified site: acetylated amino end (Ser)
  Binding site: phosphate (Ser) (covalent)
See also the discussion of incidental and secondary modifications under the
covalent type "Binding site" section above.

In the case where a residue is enzymatically cleaved at the bond between the
alpha carbon and the alpha amino-nitrogen to produce a new amino terminus
blocked with a 2-oxo or a 2-hydroxy acid, the residue giving rise to the
blocking group is entered in the sequence and one of these annotations is used
  Modified site: 2-oxobutanoic acid (Thr)
  Modified site: L-3-phenyllactic acid (Phe)
  Modified site: pyruvic acid (Ser)
These features do not have "amino end" in the chemical name.  However,if the
preceding sequence is shown, these features should have the "(in mature form)"
modifier.

C. Modified Carboxyl Terminus

The format for this form of the "Modified site" record has the same format as
for the modified amino terminus
  "Modified site:" chemical_name "(" res ")" ["(" extent ")"] ["(" form ")"]
                                                            "#status " status
Current examples are:
  Modified site: amidated carboxyl end (xxx)
  Modified site: amidated carboxyl end (xxx) (in mature form)
  Modified site: amidated carboxyl end (xxx) (amide in mature form ...
    from following glycine)
  Modified site: amidated carboxyl end (Ala) (amide in mature form ... 
    from following serine)
  Modified site: amidated carboxyl end (Tyr) (amide in mature form ...
    from following leucine)
  Modified site: blocked carboxyl end (xxx)
  Modified site: chondroitin sulfate ester carboxyl end (Asp) (in mature form)
  Modified site: GPI-anchor ethanolamine amidated carboxyl end (xxx) (in
    mature form)
  Modified site: GSI-anchor ethanolamine amidated carboxyl end (Ser) (in
    mature form)
  Modified site: methyl ester carboxyl end (Cys) (in mature form)

The chemical name should be as specific as possible and should include the term
"carboxyl end" at the end.  The "in" form should be used when a longer immature
sequence is presented in the entry and the modified site is not at the final
position.

In the case where the carboxyl amide arises from enzymatic cleavage of the
bond between the alpha-carbon and amino nitrogen of the following glycine
residue, a special form of the "in mature form" annotation is used
  Modified site: amidated carboxyl end (Ile) (amide in mature form
    from following glycine)
All but a very small number of amidations arise from this mechanism.  The cases
where leucine and serine are used are documented but not well-understood.

The GSI-anchor is a chemically distinct modification that must be carefully
distinguished from the more well-known GPI-anchor.

Connections through the amino- or carboxyl-ends to other encoded peptide chains
are now all treated uniformly as Cross-link features.

[GRAY] We are considering the proper annotation of tubulins where a carboxy-
terminal tyrosine can be removed, and later a tyrosine is reattached by a
distinct biochemical mechanism.  Features like,
  m-n/Product: tubulin, processed form <MAT2>
  m-(n+1)/Product: tubulin, unprocessed form <MAT1>
  n/Modified site: tyrosine amidated carboxyl end (...) #link MAT2
may be introduced.  Your comments on this proposal would be appreciated.

D. Selenocysteine

The format for this form of the "Modified site:" record is
  "Modified site: selenocysteine "#status " status

It had formerly been thought that selenocysteine arose from post-translational
modification of cysteine residues and no single-letter code was assigned.  When
it was discovered to be encoded, the assignment of a special single-letter code
presented an insurmountable software implementation problem.  Instead this
feature record is applied to those residues, or list of residues.  Although it
usually serves as an active site, a second feature for that annotation is
superfluous.  However, when it also serves as a covalent binding site for a
prosthetic group, it is considered a secondary modification and two feature
records are used.
  Modified site: selenocysteine
  Binding site: molybdopterin guanine dinucleotide (Cys) (covalent)
Two different things are going on here.  The first feature indicates the true
coding identity of the residue.   The second indicates the true prosthetic
group covalently bound to the sequence-presented residue.  [This all arises
because of the terrible historical accident that no one knew selenocysteine was
encoded until it was too late.  Ever computer database uses "C" and everyone's
computer program will break if a new letter is introduced for it.]
Do not use the 1-letter code "X" in the sequence or the 3-letter code "Sec"
in a feature for selenocysteine.  "X" may, of course, be used in "Residues"
records for encoded selenocysteine.

E. Acetyllysine, Carbamyllysine, and Acylcysteine

Amino terminal lysine acetylated on the alpha-amino group should be annotated
  Modified site: acetylated amino end (Lys)
When a lysine in any position is acetylated or carbamylated on the N6-amino
group, it should be annotated like
  Binding site: acetyl (Lys) (covalent)
  Binding site: carbon dioxide (Lys) (covalent)
Likewise, be careful to distinguish amino terminal cysteine acylated on the
alpha-amino group from S-acylated cysteine.  The amino-acylated form is like
  Modified site: acetylated amino end (Cys)
  Modified site: fatty acylated amino end (Cys)
while the S-acylated form is like
  Binding site: palmitate (Cys) (covalent)
  Binding site: sn-2,3-diacylglycerol (Cys) (covalent)
Other sequence databases [SWISS-PROT] are not careful in making this important
distinction and contain errors on this point.

F. Aspartate and Glutamate esters

Because it has been experimentally observered that both glutamic acid and
glutamine give rise to glutamate methyl ester in the same protein and these
rules would otherwise require that they be annotated differently, esters
of the acids will be annotated with "Modified site" records.  Current
acceptable examples are:
  Modified site: glutamate methyl ester (Gln) (by cheB-dependent deamidation
    and methylation)
  Modified site: glutamate methyl ester (Glu)

                                PART 2 - Bonds

                           "Cleavage site" Record

Where a protein sequence has a cleavage site for activation or preactivation
processing, the appropriate "Product" feature should be used.  Where a protein
sequence has a cleavage site for proteolytic enzymes in the normal process of
digestion, no annotation should be used.  The only appropriate use of a
"Cleavage site" record would be in the case of a specific, biologically
significant, proteolytic inactivation.  This feature should only be applied to
a hyphenated pair (range) of adjacent residues.  The format is
  "Cleavage site:" res "-" res "(" activity ")" "#status " status
Some acceptable examples are
  Cleavage site: Arg-Ser (thrombin)
  Cleavage site: Gly-Ile (collagenase)
  Cleavage site: His-Ser (plasmin)
  Cleavage site: Phe-Leu (chymosin)
  Cleavage site: Phe-Met (rennin)
  Cleavage site: Pro-Ile (autolytic)
A Comment is usually appropriate to explain the biological significance of
these features.

Where a sequence is cleaved by an enzyme that is thereby inhibited by the
product (a "suicide inhibitor"), the cleavage site of the inhibitor should
also be annotated as an inhibitory site.

[GRAY] The annotation of intein and extein features is under review.  The use
of features like
  Cleavage site: xxx-yyy (autolytic) #link PRE
for the precursor forms, and
  Cross-link: peptide (xxx-zzz) #link MAT
[see the following section] for the spliced forms may be introduced.  Your
comments on this proposal are appreciated.  When protein splicing occurs, two
entries are used, one for the precursor form and a second for the spliced form, 
ONLY when the splicing rearranges the order of the peptide segments. (See CVJB
and CVJBP)

                             "Cross-link" Record

The "Cross-link" record should be used when two or more residues form a
covalent bond through their side chains, other than cysteine disulfides,
or through the amino- or carboxyl-terminal.
The format for an intramolecular "Cross-link" record is
  "Cross-link:" cross-link name "(" res "-" res ")" ["(" extent ")"]
                                                         "#status " status
This should be applied only to hyphenated pairs of residues.  Some current
examples are:
  Cross-link: (2S,3S,6R)-3-methyl-lanthionine (Cys-Thr)
  Cross-link: 5-imidazolinone (Ser-Gly)
  Cross-link: cysteinylhistidine (Cys-His)
  Cross-link: cysteinyltyrosine (Cys-Tyr)
  Cross-link: isopeptide amino end (Cys-Asn)
  Cross-link: isopeptide amino end (Gly-Asn)
  Cross-link: lysinoalanine (Ser-Lys)
  Cross-link: lysine-topaquinone (Lys-Tyr)
  Cross-link: oxazole (Cys-Ser)
  Cross-link: peptide (Asn-Ser)
  Cross-link: sn-(2S,6R)-lanthionine (Ser-Cys)
  Cross-link: thiazole (Gly-Cys)
  Cross-link: thiolester (Cys-Gln)
  Cross-link: tryptophan-tryptophyl quinone (Trp-Trp)

The format for the intermolecular "Cross-link" record is
  "Cross-link:" cross-link name "(" res ") (interchain" ["to" partner ] ")"
                                           ["(" extent ")"] "#status " status
This should be applied only to individual residues.  Some current examples are:
  Cross-link: desmosine (Lys) (interchain)
  Cross-link: isopeptide (Gln) (interchain to ... -Lys)
  Cross-link: isopeptide (Lys) (interchain to ... -Gln)
  Cross-link: isopeptide carboxyl end (Gly) (interchain to ...)
  Cross-link: thiolester (Cys) (interchain to ...)
  Cross-link: thiolester carboxyl end (Gly) (interchain to ... -Cys)
[GRAY] The use of numbered partners here has the same difficulties as with
the "Disulfide bonds: (interchain)" and extreme caution is urged.
This record should be applied when only the side chains of two or more
identified residues are directly involved in the cross-link.  If an amino- or
carboxyl-terminal group is involved, both are annotated as Cross-links and
the terminal features carry the "amino end" or "carboxyl end" in their name.

The "Cross-link: isopeptide" is used for a side chain linked to either an
amino- or carboxyl-terminal group.  The "Cross-link: peptide" is used when
both amino- or carboxyl-terminal groups are linked from different chain
segments.  See the discussion on protein splicing in the "Cleavage site"
section for use of the "Cross-link: peptide" feature.  [GRAY] A "Cross-link:
cyclopeptide" may be used when both amino- or carboxyl-terminal groups are
linked from the same chain segment.  The use of the cyclopeptide feature
is extremely dangerous and it may be limited to entries in PIR4 and NRL_3D.

If the cross-link is secondary to the chemical modification of one or both
residues (in the sense of a modified site as defined above), the participating
residue may also be marked as a modified site.  For example,
  112-163   Cross-link: tryptophan-tryptophyl quinone (Trp-Trp)
  112       Modified site: tryptophyl quinone (Trp)
in which the chemically distinct nature of the two tryptophan residues is
obvious.  If non-proteinaceous compounds are involved, this latter case will
generally apply.  If the partner of a cross-linked residue is not identified,
the residue may be annotated as a covalent binding site.

[BLACK] The old "Thiolester bond" record should not be used.  Instead, use the
form
  Cross-link: thiolester
Thiol ethers should be denoted by appropriate compound names, like lanthionine
or cysteinylhistidine.

                           "Disulfide bonds" Record

The format for the intramolecular "Disulfide bonds:" record is
  "Disulfide bonds:" ["(in" form ")"] "#status " status
This record should be applied to hyphenated pairs (ranges) of residues, and
pairs with the same experimental status should be grouped into lists.
  Disulfide bonds:
  Disulfide bonds: (in conotoxin GI)
These are in the process of being converted to "#link" forms.

Alternative bonds may be indicated within the same record using this format.
  "Disulfide bonds:" "(or " hyphenated pairs ")"] "#status " status
For example,
  Disulfide bonds: (or 106-121)
  Disulfide bonds: (or 20-42, 41-99)
The "or" form should be avoided whenever possible.

[BLACK] Disulfide bonds that would have different annotations must be placed in
separate records.  For example, instead of
 28-44,43-95,49-122,50-88/Disulfide bonds: #status predicted (except for 49-122)
two records should appear
 28-44,43-95,50-88/Disulfide bonds: #status predicted
 49-122/Disulfide bonds: #status experimental

The format for the interchain "Disulfide bonds:" record is
  "Disulfide bonds:" interchain  ["(to " partner ")"] "#status " status
Generally, this record will applied to individual residues.  Without "(to ...)"
the interchain bond is assumed to be to the same residue in a dimeric partner,
for example:
  56/Disulfide bonds: interchain
It may be applied to lists of residues when it is thought that all the residues
participate in intermolecular bonds to partners of the same sequence but the
pattern of bonding is not known.  Such cases will usually be status pedicted.
  56,72,98/Disulfide bonds: interchain
Where the bond is between partners of the same sequence (homopolymeric),
records should be applied to both residues individually.
  136/Disulfide bonds: interchain (to 133)
  133/Disulfide bonds: interchain (to 136)
Examples of intermolecular bonds to partners with different sequences
(heteropolymeric):
  Disulfide bonds: interchain (to heavy chain)
  Disulfide bonds: interchain (to beta chain)
  Disulfide bonds: interchain (to chain B1)
  Disulfide bonds: interchain (to alpha-180)
  Disulfide bonds: interchain (to gamma-34 or gamma-35)
The special case of intermolecular bonds to different partners with the same
sequence may be distinguished:
  Disulfide bonds: interchain (to mu chain in another subunit)
The partner should be indicated as clearly as necessary. (In NRL_3D the
partner's code is used.)

[GRAY] The problems of checking and maintaining correct codes and numbers in
references to other entries cannot be dealt with in the current database form
and will still be difficult after SDDL.  Any comments on alternative mechanisms
for conveying this information in a manner which will allow easy checking and
maintenance would be appreciated.  The same problem applies to the interchain
"Cross-link" records.

In some cases, alternative monomeric, dimeric or multimeric forms are known to
exist.  Each form should have an appropriate record with an "(in " modifier.
For example,
  Disulfide bonds: (in monomeric form)
  Disulfide bonds: interchain (in polymeric form)
These will be replaced by appropriate "#link" records later.

[GRAY] Disulfide bonds can also form with "free" cysteine or with the small
peptide glutathione.  These are now treated as covalent "Binding site" 
features.  Disulfide bond features should be regarded as a special case of a
Cross-link.  It should only be used between encoded polypeptide sequences.
Disulfide bonds to 

The format for a transient or active site "Disulfide bonds" record is
  "Disulfide bonds: redox-active (" status ")"
This record should be applied to hyphenated pairs (ranges) of residues.