% FEATURES7.TXT Guide for PIR Features Annotations in NBRF Format Version 7 16-JUN-1997 [This file includes only Parts 1 and 2] Preliminary Comments This documents the standardization of features records in NBRF format achieved through release 53. We have received quite a few comments on the improvement in the appearance and consistency of our database and recent research paper specifically mentioned employing the PIR features annotations in the sequence analysis. The following projects have been very successful. (1) Programs to check new or updated entries and changefiles (written in FORTRAN by Chris Marzec) and to check existing database entries (written in C by Steve Garavelli) have been considerably improved and extended. These are the programs that do the rules "enforcement" that will be discussed. (2) All ambiguous "Binding site:" features have been resolved to appropriate covalent or noncovalent features. (3) All explicit disulfide bond most most other site information in free- text comments has been converted to features. (4) The experimental status of all site, bond and product features have been assigned. A status is now required on these features in all new PIR1 and PIR2 entries. Four status types are used in features "experimental", "absent", "atypical" and "predicted". (5) The comment "in mature form" for amino-terminal features that are not at the first position of the entry and for carboxyl-terminal features that are not at the last position of the entry is required, and the comment should not be used except in that context. (6) Many new features have been added and the covalent binding sites, modified sites, cross-links and active sites have been documented in the RESID database. Version 10.00 of that database will be distributed with 236 entries. How is an annotator supposed to know how to make a good feature record? The answer to this frequently asked question is: (1) READ THIS DOCUMENT! (2) Use the residues database to search for residues with covalently bound groups, for modified residues, and for cross-links. Use the ATLAS FEATURE command to see the records in the RESID database to form the correct features for protein entries. (3) Use the ATLAS FEATURE command to look at features in related entries or at the variety in the whole database. The currently correct format is presented in the database, except possibly for entries that entered in the previous update. Be especially wary of features or feature formats that are present only in "Region" features of new entries. Please send your comments to Steve Garavelli (Part 1 - Sites, Part 2 - Bonds) or Winona Barker (Part 3 - Regions, Domains and Products). Particularly, let us know your thoughts about the rules marked "[GRAY]" that are either questionable or provisional. We will appreciate your input toward improving this document: changes that improve clarity, additional examples, etc. Introduction The following feature records may appear in PIR1 and PIR2 annotations. Active site: Binding site: Cleavage site: Cross-link: Disulfide bonds: Domain: Inhibitory site: Modified site: Product: Region: In the following descriptions " " enclose explicit typographic characters [ ] enclose optional elements res means a 3-letter amino acid residue code | separates alternative elements form "by" mechanism | "in" protein name | "in mature form" extent "partial" status "experimental" | "predicted" | "absent" | "atypical" ... means indefinite repetition of the preceding optional elements When you are preparing new features annotations, you should try to conform to these guidelines as closely as possible. These guidelines have three degrees of applicability: (1) features that are currently accepted and being used ("white rules"), (2) features that may have been used in the past but are now undesirable, that have been or are being removed from entries that contain them and that should not be used in new entries ("black rules"), (3) features that occur in some entries but are of uncertain value, that have been proposed but not yet accepted, or that are otherwise under review ("gray rules"); theses are marked "[GRAY]". If an annotator thinks that a gray rule feature or some new feature is required for an entry, the annotator should check first with either Winona Barker, Friedhelm Pfeiffer or Steve Garavelli. Discussion of Status Indicators A status indicator, either "experimental", "absent", "atypical" or "predicted" is required for all features except Domain and Region. Generally, it should not be used with "Region:" features. In the "Domain:" feature it should be used except for homology domains, for self- evident features like "amino- terminal" or "serine-rich", and for features with arbitrary designations like "first", "1" or "A". The "experimental" status means that the feature has been experimentally observed in the indicated way at the indicated location. Any indication of alternative forms means that all the alternatives have been observed. For example Modified site: N6-methyllysine or N6,N6-dimethyllysine (Lys) (experimental) means that both forms have been observed at the indicated location. On the other hand, an indication of an alternate location means that the feature is known to occur in one or the other position, but which could not be resolved experimentally, for example Modified site: (or 81) N6,N6,N6-trimethyllysine (Lys) (experimental) The "predicted" status means that either the nature, the location, or both, of the feature has been predicted by some means. The experimental observation of a feature under unnatural conditions should be carefully considered and if the conditions seem sufficiently different from the natural case, the feature should be marked as a prediction. With the present system, a distinct problem occurs when either the nature or the location of a feature, but not both, has been experimentally determined. Generally, the most definite form of evidence should be presented and appropriate comments should be provided, either in the feature or in comments with the entry. For example, if a protein is known to be blocked and a translated sequence is presented, but the nature and location of the blocking group are unknown, then only a note or comment is appropriate. We would welcome any suggestions on how a feature with both experimental and predicted aspects can best be presented. The status "absent" is used to indicate a feature that, although it would be otherwise predicted by some means, has been experimentally determined not to occur at the indicated position. It is intended to be used in the very limited cases when an investigation of the specific feature produced the experimental result. Currently, this status is mainly used for the "Binding site: carbohydrate (Asn) (covalent)" feature. Presently, the PIR produces the only sequence databases where it is possible to distinguish the cases where there is negative experimental evidence from the cases where there is merely insufficient annotation. The status "atypical" is used to indicate a feature that does not follow the "normal" pattern, that would otherwise be predicted not to occur, but that has been experimentally determined to occur at the indicated location. Again, it is intended to be used in the very limited cases when an investigation of the specific feature produced this result. Examples of its use are 1. homology domains that have unusually large insertions or deletions, 2. carbohydrate binding site with the pattern N-X-C, 3. metal binding sites with an apparently missing ligand residue. Discussion of "Combinability" With the adoption of an "object-oriented" approach, it became important to distinguish the case of groups of residues constituting in the aggregate one feature from the case of residues in individual features sharing the same description and grouped for convenience. An example of a group of residues constituting in the aggregate one feature is 192,226,231/Binding site: copper (His, Cys, His) Together this particular group of residues forms one unique binding site for copper. In this case the group forms a single "object" and other groups of the same kind could not be combined in the same record without creating ambiguity in the identity of the objects being represented. Such features are not "combinable". An example of residues in individual features sharing the same description and grouped for convenience is 192,226,231/Binding site: carbohydrate (Asn) (covalent) In this case each residue individually is an "object" and they can be combined without introducing ambiguity. Such features are "combinable". Features should not be combined unless in the discussion of a particular type of feature it is explicitly stated that it is combinable. Features with different "#status" or "#link" descriptors should not be combined. Discussion of Tags Tags are short labels that are attached to certain features and other records in the database. A particular tagged feature may have repeated examples within a single entry that must be uniquely distinguished by different tags. A tag is the very last element of the record separated from the preceding elements by a single space. Only one tag should occur in each record. A tag consists of (in order): the character "<", three or four uppercase alphabetic characters or numbers, the character ">" The same tag must not be applied to more than one record of any type in each entry. There is, as yet, no way to check for or impose standardization on the use of tags beyond these format rules. Some suggestions about tags will be discussed with particular features. Tags must be used with Domain and Product feature records. Their use with other feature types is presently problematic [GRAY]. Discussion of Order of Features The features records in each entry are essentially independent entities. A computer program reading a feature table could derive no additional information from the order of the records in it. However, the feature table is also looked at by humans from time to time, and the imposition of some regularity in the arrangement of features can be very helpful. The preferred order of features is as follows. First, Product, Domain and Region records are arranged as a group in increasing order by the first element of their range, then in decreasing order by the second element of their range. Second, site and bond records are arranged as a group in increasing order by their first element. Formerly, a certain amount of "artistic license" could be employed in arranging a feature table to emphasize certain structural aspects of the protein or simply to give it a greater degree of coherence. The change mechanism does not follow an annotator's idiosyncratic order and feature tables will rearranged according to the rules above. PART 1 - Sites "Active site" Record The "Active site" record is applied to residues of enzymes known or thought to function in the actual catalytic reaction of the enzyme. It should be applied to a single residue or a short list of residues; it should not be applied to a range (a hyphenated pair). If the active site residues are not specifically known but have been localized to a segment of the sequence, the "Region" record rather than the "Active site" record should be used. "Active site" features in entries without an Enzyme Commission notation in their title, "Contains" or "Alternate names" records are suspect and will be flagged by checking programs. The format for the "Active site" record is "Active site:" res ["," res...] ["(" description ")"] ["#link" link] "#status " status The status is required for this feature and should always be applied for new entries. All the residues participating in each active site that do not require different modifiers, should be combined in the same feature. Do not combine residues from different active sites or that need different modifiers. The use of description fields, discussed below, should be avoided if possible. Examples, Active site: Arg #status experimental Active site: Asp, His, Ser #status predicted Active site: His, His, Asp #status experimental A residue list may be used only for those residues which participate in the same concerted catalytic reaction. If all the residues participating in one active site are the same type, then only one residue need be shown. Enzymes recognized to have several distinct catalytic reactions should have an "Active site" record for each active site. Multiple "Active site" records for what is, in fact, a single active site should be combined into one record using a list of residues, unless different status conditions apply. [GRAY] Formerly, mechanisms were presented but this should no longer be done except when the mechanism is used as a description. Generally such a description should be applied only when multiple active sites occur in the same entry. In particular, the description "charge relay system" should not be used except in enzymes with multiple activities. Examples are Active site: Cys (amide transfer) Active site: Cys (of 3-oxoacyl-[acyl-carrier-protein] synthase) Active site: Lys (of 3-oxoacyl-[acyl-carrier-protein] reductase) Active site: Lys (of enoyl-[acyl-carrier-protein] reductase) Active site: Ser (of enoyl-[acyl-carrier-protein] reductase) Active site: Ser (of oleoyl-[acyl-carrier-protein] hydrolase) Active site: Ser (of [acyl-carrier-protein] acetyl/malonyltransferase) Active site: Glu (alpha-reaction) Active site: His, Lys, Cys (beta-reaction) Descriptors like these are now being replaced with "#link" modifiers which point to tags in appropriate Function records, or Domain or Product features. Active site: Cys #link ARD #status predicted Here the link "ARD" points to a Function record with the tag "". This mechanism will also be used to link active site records with different status conditions but which belong to the same active site object. When a residue has a stable, covalently-bound, catalytically-active prosthetic group, only the "Binding site: ... (covalent)" feature should be used. An "Active site" record should not also be used because it is the prosthetic group which is active and not the amino acid as such. In particular, for an active site phosphoserine only the annotation Binding site: phosphate (Ser) (covalent) #status experimental should appear. When a residue forms a transient, covalent bond in its role as an active site then the "Active site" record should be used and the description field may be used. The nature of the intermediate should be made as clear as practical. Annotators should consider carefully whether a covalently-bound group is stable or transient in determining whether an annotation should be for a modified or an active site. The following possible features show active sites with transient groups that could easily be confused with a binding site, Active site: Ser (phosphoserine intermediate) Active site: Tyr (phosphotyrosine intermediate) No examples yet exist of the second feature. Other current acceptable examples are Active site: Asp (aspartylphosphate intermediate) Active site: Cys (phosphocysteine intermediate) Active site: Cys (S-acetylcysteine intermediate) Active site: Cys (sulfocysteine intermediate) Active site: His (phosphohistidine intermediate) Active site: Lys (ribulose-bisphosphate-binding) Most of these features are documented in the RESID database. Avoid records that are unnecessarily detailed or are synonymous with existing features, like Active site: His (covalent intermediate) Active site: Asp (phosphate-binding) Be particularly suspicious of claims that Gly, Val, Leu, Ile, Pro, Asn, Gln, Pro, Met or Phe residues are active site residues. It is chemically dubious that such residues function in the actual catalytic reaction of an enzyme. Glycine and a few other residues can form free radicals that participate in free radical reactions, but for physical reasons such reactions are extremely rare in biochemical reactions. Active site: Cys (cysteine thiyl radical intermediate) Active site: Gly (stable glycyl radical) Active site: Trp (tryptophyl radical intermediate) Active site: Tyr (stable tyrosyl radical) These features are documented in the RESID database. Residues that are structurally located near an active site but do not participate directly in the catalytic reaction of that active site should not be annotated in the PIR databases. Annotations for such residues will only be carried from PDB entries in the NRL_3D database. Not all reactive compounds that block an enzymatic reaction wind up reacting with an active site residue; they may react with a residue near the active site and block the substrate's access to the active site. Something may be more of a "reactive site" than an "active site", so be cautious about accepting such evidence as experimental for active site residues. For cysteine residues that form catalytically active disulfide bonds only the annotation Disulfide bonds: redox-active should appear. Even though selenocysteine may function as an active site, only the feature Modified site: selenocysteine should be used. Residues that participate in allosteric control of enzyme activity but are not catalytically active should not be annotated as active sites but as binding sites or as regions. In the case of residues that participate in different, symmetry-related active sites of complexes should not be combined in the same feature, but an appropriate description should be used to indicate the relationship. Active site: Asp (shared with dimeric partner) Active site: Cys (shared with dimeric partner) These features imply that there are two symmetry-related active sites. Each site consists of an aspartate and a cysteine contributed by different chains of the homodimer. [BLACK] The annotation Active site: ... (inhibitory) ... should not be used. Instead, use the annotation Inhibitory site: [BLACK] Do not use the expression "active site" in either "Domain" or "Region" features. Instead, use the term "catalytic". General Definitions for Binding Sites and Modified Sites In the following discussion of binding sites and modified sites the following definitions are very important. Because they include historical accidents and grammatical exigencies, these are operational definitions and do not necessarily extend beyond the purposes of this document. Generally an attachment site is an amino acid residue which has its side chain chemically changed post-translationally in such a way that it could be restored by physiological processes of hydrolysis, ammonolysis or simple (2H) reduction. Such chemical changes may occur transiently, or more or less permanently, but they must be covalent. The principle is that attachment site residues could in priciple be recovered and detected by typical methods of sequence analysis, whereas modified sites could not be. The "Binding site" feature includes two classes, attachment sites and binding sites. A "binding site" is an amino acid residue, or a group of them, that forms biochemically important, non-covalent bonds with ions or molecules (other than the protein constituting the entry). These bonds may be ionic, ligand (dative), Van der Waals, or donative or receptive hydrogen bonds. One borderline case is the sulfur-metal bond which will be regarded as covalent for cysteine and non-covalent (dative ligand) for methionine. In anticipation of implementation of the SDDL format, attachment sites will distinguished by using "(covalent)" in "Binding site" records. All new "Binding sites" without "(covalent)" are severely reviewed and subject to automatic conversion. Consequently it is very important for annotators to provide the "(covalent)" designation in every case when it should be applied. A "modified site" is an amino acid residue which is either (1) chemically changed post-translationally in such a way that it could not be restored by physiological processes of hydrolysis, ammonolysis or simple (2H) reduction (that is, it is not a side-chain attachment site), (2) chemically changed in any way involving the alpha amino group, including N-formylmethionine (this applies to both the amino terminus and internal residues), (3) a carboxyl terminal residue with any chemical change involving the 1-carboxyl group, (4) a selenocysteine residue (these are translationally incorporated but for historical reasons are regarded as modified cysteine residues); (5) aspartate or glutamate esters that can arise from either the acid or the amide forms. "Binding site" Record Using the foregoing definitions "Binding site" records are applied in two cases: (1) when an amino acid residue, or a group of them, forms biochemically important, non-covalent bonds with ions or molecules (other than the protein constituting the entry); or (2) when an amino acid residue forms an attachment site in which its side chain is chemically changed post-translationally in such a way that it could in principle be restored by physiological processes. Such cases must have a "(covalent)" bond description. The format for the "Binding site" record is "Binding site:" ["(or" position ")"] bound-group name "(" res ["," res...] ")" ["(covalent)" | "(" bonding description ")"] ["(" form ")"] ["(partial)"] ["#link " link] "#status " status The status is required for this feature. Currently acceptable covalent examples are as follows. The status, link and partial descriptors have been removed, and a few minor variants have been eliminated. Most of these features are documented in the residues database. Binding site: 2Fe-2S cluster (Cys) (covalent) Binding site: 3Fe-4S cluster (Cys) (covalent) Binding site: 4Fe-4S cluster (Cys) (covalent) * Binding site: iron-sulfur cluster (Cys) (covalent) [use this only when the cluster form has not been determined and cannot be predicted] Binding site: 4-hydroxycinnamyl (Cys) (covalent) Binding site: acetyl (Lys) (covalent) Binding site: AMP (Tyr) (covalent) Binding site: biotin (Lys) (covalent) Binding site: carbohydrate (Asn) (covalent) Binding site: carbohydrate (Cys) (covalent) Binding site: carbohydrate (Lys) (covalent) Binding site: carbohydrate (Ser) (covalent) Binding site: carbohydrate (Thr) (covalent) Binding site: carbohydrate (Trp) (covalent) Binding site: carbohydrate (Tyr) (covalent) Binding site: carbon dioxide (Lys) (covalent) (by ...) Binding site: chondroitin sulfate (Ser) (covalent) Binding site: cysteine (Cys) (covalent) Binding site: dermatan sulfate (Ser) (covalent) Binding site: farnesyl (Cys) (covalent) Binding site: fatty acid (Ser) (covalent) Binding site: fatty acid (Thr) (covalent) Binding site: formyl (Lys) (covalent) Binding site: geranyl-geranyl (Cys) (covalent) Binding site: glutathione (Cys) (covalent) Binding site: glycerylphosphorylethanolamine (Glu) (covalent) Binding site: heme (Cys) (covalent) Binding site: heme (Glu) (covalent) Binding site: heme, high potential (Cys) (covalent) Binding site: heme, low potential (Cys) (covalent) Binding site: heparan sulfate (Ser) (covalent) Binding site: homocitryl Mo-7Fe-8S cluster (Cys) (covalent) Binding site: keratan sulfate (Thr) (covalent) Binding site: lipoamide (Lys) (covalent) Binding site: methyl (Cys) (covalent) Binding site: molybdopterin (Cys) (covalent) Binding site: molybdopterin guanine dinucleotide (Cys) (covalent) Binding site: murein (Lys) (covalent) Binding site: myristate (Lys) (covalent) Binding site: nitrosonium (Cys) (covalent) Binding site: palmitate (Cys) (covalent) Binding site: palmitate (Lys) (covalent) Binding site: phosphate (Arg) (covalent) Binding site: phosphate (Asp) (covalent) Binding site: phosphate (His) (covalent) Binding site: phosphate (His) (covalent) (by autophosphorylation) Binding site: phosphate (Ser) (covalent) Binding site: phosphate (Ser) (covalent) (by ...) Binding site: phosphate (Ser) (covalent) (in ...) Binding site: phosphate (Thr) (covalent) Binding site: phosphate (Thr) (covalent) (by ...) Binding site: phosphate (Tyr) (covalent) Binding site: phosphate (Tyr) (covalent) (by ...) Binding site: phosphopantetheine (Ser) (covalent) Binding site: phosphoribosyl dephospho-coenzyme A (Ser) (covalent) Binding site: phosphoryl-DNA (Ser) (covalent) Binding site: phosphoryl-RNA (Ser) (covalent) Binding site: phycocyanobilin (Cys) (covalent) Binding site: phycoerythrobilin (Cys) (covalent) Binding site: phytochromobilin (Cys) (covalent) Binding site: polyglutamate (Glu) (covalent) Binding site: polyglycine (Glu) (covalent) Binding site: pyridoxal phosphate (Lys) (covalent) Binding site: retinal (Lys) (covalent) Binding site: sn-2,3-diacylglycerol (Cys) (covalent) Binding site: sn-2,3-diphytanylglycerol diether (Cys) (covalent) Binding site: sulfate (Tyr) (covalent) A large variety in the "(by ...)" descriptor exists. Please consult the database to determine currently used forms. Examples of currently acceptable "Binding site" features not labeled "covalent" are as follows. The residue lists (in all but a few cases), status, link and partial descriptors have been removed, and a few minor variants have been eliminated. Binding site: 2,3-diphosphoglycerate Binding site: 2Fe-2S cluster (His) (ligands) Binding site: 2Fe-O cluster Binding site: 4Fe-4S cluster 2 iron (Ser) (ligand) Binding site: 4Fe-4S cluster iron (His) (ligand) Binding site: ADP Binding site: ADP/AMP, allosteric Binding site: AMP, allosteric Binding site: ATP Binding site: ATP/GTP Binding site: FAD Binding site: FMN Binding site: GTP Binding site: GTP/GDP/EF-Ts Binding site: Mg-ATP Binding site: N-acetylgalactosamine Binding site: NAD Binding site: NAD(P) Binding site: NADP Binding site: adenosylcobalamin Binding site: aminoacyl-tRNA Binding site: anion Binding site: bacteriochlorophyll b magnesium (Asn) (axial ligand) Binding site: bacteriochlorophyll magnesium (His) (axial ligand) Binding site: bacteriochlorophyll magnesium, accessory (His) (axial ligand) Binding site: bacteriochlorophyll magnesium, special pair (His) (axial ligand) Binding site: bacteriopheophytin Binding site: beta-D-ribopyranose Binding site: bilirubin Binding site: cAMP Binding site: calcium Binding site: calcium, high affinity Binding site: calcium, low affinity Binding site: carbonate Binding site: cardiac glycoside Binding site: chloride Binding site: chlorophyll a magnesium (His) (axial ligand) Binding site: citrate, allosteric Binding site: cobalt 1 Binding site: cobalt 2 Binding site: copper Binding site: copper (His) (type 2) Binding site: copper (His) (type 3) Binding site: copper (His, Cys, His, Met) (type 1) Binding site: copper 1 Binding site: copper 2 Binding site: cyclosporin Binding site: divalent metal ions Binding site: fatty acid Binding site: fibrin Binding site: fructose 2,6-bisphosphate Binding site: fructose-1,6-bisphosphate Binding site: fructose-6-phosphate Binding site: galactose Binding site: glutamate Binding site: heme O iron (His) (axial ligand) Binding site: heme a iron (His) (axial ligands) Binding site: heme a3 iron (His) (axial ligand) Binding site: heme iron (Cys) (axial ligand) [the following with one locant] Binding site: heme iron (His) (axial ligand) Binding site: heme iron (His) (axial ligand) (shared with alpha chain) Binding site: heme iron (His) (axial ligand) (shared with beta chain) [the following with two locants] Binding site: heme iron (His) (axial ligands) Binding site: heme iron (His) (proximal axial ligand) Binding site: heme iron (His, Met) (axial ligands) Binding site: heme iron (Met, His) (axial ligands) Binding site: heme iron (Tyr) (axial ligand) Binding site: heme iron, high potential (His) (axial ligand) Binding site: heme iron, high potential (His) (axial ligands) Binding site: heme iron, high potential (His, Met) (axial ligands) Binding site: heme iron, high potential (His, Tyr) (axial ligands) Binding site: heme iron, low potential (His) (axial ligand) Binding site: heme iron, low potential (His) (axial ligands) Binding site: heme iron, low potential (His, Tyr) (axial ligands) Binding site: heparin Binding site: histamine Binding site: homocitryl Mo-7Fe-8S cluster molybdenum (His) (ligand) Binding site: iron Binding site: iron (Asp) (shared with tetrameric partners) Binding site: iron (His) (shared with chain M) Binding site: iron (His, Glu, His) (shared with chain L) Binding site: iron (Lys) (shared with tetrameric partners) Binding site: magnesium Binding site: magnesium (Glu) (shared with chain I) Binding site: magnesium (His) (shared with chain II) Binding site: manganese Binding site: mercury Binding site: metal Binding site: methylcobalamin cobalt Binding site: micellar substrate Binding site: molybdopterin (Arg) Binding site: molybdopterin cytosine dinucleotide (Arg) Binding site: nickel Binding site: nickel 1 Binding site: nickel 2 Binding site: omega-aminocarboxylic acids Binding site: oxygen (His) (distal axial ligand) Binding site: oxygen (Tyr) (distal axial ligand) Binding site: phospholipid Binding site: plastoquinone Binding site: potassium Binding site: pyrophosphate Binding site: retinoic acid Binding site: siroheme iron (Cys) (axial ligand) Binding site: substrate Binding site: substrate phosphate Binding site: thyroxine Binding site: transition metal ions Binding site: ubiquinone Binding site: zinc Binding site: zinc, catalytic [see note below on the next two] Binding site: zinc, catalytic (Cys, His, His, His) (inhibited) Binding site: zinc, catalytic (His) (active) Binding site: zinc, high affinity Binding site: zinc, noncatalytic All these have been reviewed. If a reference is encountered that discusses the covalent nature of one of these binding sites, please bring it to the attention of Steve Garavelli. Be careful when you encounter a binding site established by a reactive analogue --- these are designed to form covalent bonds when the actual compound may be bound noncovalently. None of the former features "Binding site: ATP (Lys) (covalent)" were ever actually covalent! An alternate locant may be placed after the "Binding site" and before the bound group name. Binding site: (or 150) phosphate (Ser) (covalent) #status experimental but this form should be avoided if at all possible. The bound-group name must always be followed by a set of parentheses inclosing a residue or a list of residues that matches sequence residues corresponding to the preceding numbers. Strict parsing is enforced for this rule. If all the residues participating in one binding site are the same type, then only one residue need be shown, for example Binding site: calcium (Asp) The only bonding descriptions presently used are "covalent", "axial ligand", "axial ligands", "proximal axial ligand" and "distal axial ligand". For these ligand cases, care must be taken in specifying the bound entity: "heme iron" rather than simply "heme". Binding site: heme iron (His, Met) (axial ligands) Covalent bonds to heme and similar prosthetic groups are to the group and not to the metal. Binding site: heme (Cys) (covalent) Also, use "ligand" if there is only one locant in the feature, and "ligands" if there are two or more locants even though they are all the same type of residue and one residue is shown. Thus, 44/Binding site: heme iron (His) (axial ligand) 44,68/Binding site: heme iron (His) (axial ligands) The second feature has two locants, "44,68", but only one residue, "His", and "ligands" is used. When a particular binding site occurs in both an active and an inhibited form, binding site records should appear for both forms. Binding site: zinc, catalytic (Cys, His, His, His) (inhibited) Binding site: zinc, catalytic (His) (active) In this pair of records, the first denotes the inhibited binding site with a Cys ligand from a propeptide, and the second denotes the active binding site with only the three His ligands of the enzyme. A single substrate may be listed simply as "substrate". For multiple substrates, other than water, in the same entry the substrate may be named. Binding site: substrate (Arg) Binding site: fructose-1,6-bisphosphate (Lys) (covalent) When it is experimentally observed that a group is covalently bound at less than 95 mole per cent, the "(partial)" annotation should be used. [BLACK] A numeric percentage or some other fractional indication should not be used. If the covalent binding is 95 mole percent or greater, don't use the "(partial)" annotation. If the "(partial)" annotation is used, it will almost always be based on an experimental observation so the "#status experimental" status should also appear; [BLACK] do not use "(partial) #status predicted". The "in" form should be used VERY SPARINGLY when the covalent bond is known to occur only in the mature form or in one of several alternative polypeptide products and the entry presents an immature sequence. Binding site: carbohydrate (Asp) (covalent) (in mature form) Binding site: phosphopantetheine (Ser) (covalent) (in acyl carrier protein) These may be replaced by appropriate "#link" descriptors. The "by" form is used to distinguish among different binding sites of the same group, for example Binding site: phosphate (Ser) (covalent) (by autophosphorylation) Binding site: phosphate (Ser) (covalent) (by Ca/calmodulin-dependent kinase) Binding site: phosphate (Ser) (covalent) (by cAMP-dependent protein kinase) Binding site: phosphate (Tyr) (covalent) (by autophosphorylation in vivo) Binding site: phosphate (Tyr) (covalent) (by autophosphorylation in vitro) [GRAY] The use of the terms "in vivo" and "in vitro" is questionable. If a feature is known to occur "in vivo", it is what would otherwise be regarded as an experimentally determined features and so the term is superfluous. If a feature is known to occur "in vitro", then even if it is experimentally determined it only amounts to a prediction that the natural modification might occur at that location and just the "#status predicted" status is warranted. Alternatively, if an "in vitro" feature marks something that occurs under unnatural conditions and the descriptor would only distinguish it from the natural occurances, then a comment is warranted and not a feature (as with the former "Binding site: carbohydrate (Gln)" features determined to be unnatural). A feature marked both "in vitro" and "#status predicted" would seem to have very little value under any circumstance. Some covalent binding sites can occur only as a consequence of a prior modification. These are nonetheless biochemically separate and distinct features. For such cases we use two features, one to indicate the nature of the modification and the other to indicate the secondary change. For example, 42/Modified site: 5-hydroxylysine (Lys) 42/Binding site: carbohydrate (Lys) (covalent) In the first step, a lysine is hydroxylated. It may (or possibly may not) then be subsequently hydroxylated. If they were combined in a single feature, there would be a problem using the "partial" modifier. Would it mean the lysines at that position were partially hydroxylated but all the hydroxylysines were glycosylated, or would it mean that the lysines were all hydroxylated but that hydroxylysines were partially glycosylated. In the RESID database such cases are indicated by the records Conditions: secondary to ... if a prior modification is required or Conditions: incidental to ... if it is not. N6-acetylated lysine will be annotated as Binding site: acetyl (Lys) (covalent) [BLACK] Do not annotate it as Modified site: N6-acetyllysine (Lys) When there are biochemically significantly different binding sites for the same compound in the same entry (rare), the bound-group name may include modifiers that distinguish between the functional differences of the bound-group or of the binding sites. These modifiers should be placed after the bound-group, without parentheses and separated from it by a comma. For example, Binding site: calcium, high affinity Binding site: calcium, low affinity Binding site: heme, high-potential (Cys) (covalent) Binding site: heme, low-potential (Cys) (covalent) Binding site: heme iron, high-potential (His) Binding site: heme iron, low-potential (His) Binding site: zinc, catalytic Binding site: zinc, noncatalytic Otherwise, different binding sites are only distiguished by being grouped in separate "Binding site" records and those binding sites should not be labeled. [GRAY] Do not use such features as Binding site: calcium 1 Binding site: calcium 2 except to distinguish structurally distinct features, and not otherwise chemically indistinguishable sites. Where the sequence was determined by protein sequencing and the nature of the covalently attached group precludes assignment of a residue as either an acid or an amide, and unless there is unequivocal evidence to the contrary (for example, the nucleotide sequence), there is a reasonable biochemical presumption that the residue should be the amide. The reported sequence should be presented with the ambiguity explicit in the "Residues:" record, the amide presented in the sequence and feature records and an appropriate note like Note: we have shown the unidentified residue(s) as ... forming ... (or bound to ...) based on .... [GRAY] Concerted non-covalent binding of macromolecules by a set of residues would probably best be annotated through a "Region" record rather than through a "Binding site" record. Something like 42-60/Region: DNA-binding should be used instead of 42,45,48,50,53,56,60/Binding site: DNA (Leu) "Inhibitory site" Record The format for the "Inhibitory site" record is "Inhibitory site:" res ["," res...] "(" activity ["," activity ...] ")" "#status " status An inhibitory site is to an inhibitor what an active site is to an enzyme. It is the residue, or small set of residues, that is responsible for blocking the activity of an enzyme or set of enzymes. It should be applied to single residues, and to a small list of residues only sparingly. The status is required for this feature. Without a crystallographic structure it is very difficult to obtain experimental evidence that a particular residue is an inhibitory site, so most will have predicted status. Some examples, with status omitted Inhibitory site: Arg (acrosin) Inhibitory site: Arg (thrombin, coagulation factor Xa) Inhibitory site: Arg (trypsin) Inhibitory site: Arg (unknown proteinase) Inhibitory site: Cys (thermolysin) Inhibitory site: Leu (chymotrypsin) Inhibitory site: Leu (chymotrypsin, elastase) Inhibitory site: Lys (trypsin) Inhibitory site: Met (chymotrypsin, subtilisin) Inhibitory site: Tyr (chymotrypsin) [GRAY] In the case that one of two residues is thought to be responsible for the inhibitory action, the record may be applied to a list and this format is used "Inhibitory site:" res "or" res "(" activity ["," activity ...] ")" "#status " status For example, Inhibitory site: Leu or Met (elastin, chymotrypsin) #status predicted The "or" form should be avoided whenever possible. [BLACK] The "Inhibitory site" record is not used for allosteric inhibitor sites; those may be annotated as binding sites. "Modified site" Record The format for the "Modified site" record is "Modified site:" ["(or" position ")"] chemical name "(" res ")" ["(" form ")"] ["(" extent ")"] "#status " status "res" is the three-letter code for the original encoded residue (with the exception of selenocysteine and N-formylmethionine where no three-letter code is used). The "or" form should be avoided whenever possible. Different residues with the same feature can be combined. In cases when an annotator wishes to distinguish the features belonging to different domain, or product features more clearly, then the separate modified sites for the different domains need not be combined, as with blocked amino- or carboxyl-terminals. The status is required for this feature. ALL THESE FEATURES SHOULD BE DOCUMENTED IN THE RESID DATABASE. Bring any new examples to the attention of Steve Garavelli. A. Modified side chains In the most general case the side chain is chemically modified in such a way that the original residue could not (in principle) be detected by normal sequencing methods. The following is a list of such modified residues. Modified site: (Z)-dehydrobutyrine (Thr) Modified site: 2'-bromophenylalanine (Phe) Modified site: 2'-glucosyl-tryptophan (Trp) Modified site: 2'-[3-carboxamido-3-(trimethylammonio)propyl]histidine (His) Modified site: 3',4'-dihydroxyphenylalanine (Tyr) Modified site: 3'-bromophenylalanine (Phe) Modified site: 3'-FAD-histidine (His) Modified site: 3'-methylhistidine (His) Modified site: 3-hydroxyphenylalanine (Phe) Modified site: 3-hydroxyproline (Pro) Modified site: 3-oxoalanine (Cys) Modified site: 4'-bromophenylalanine (Phe) Modified site: 4-hydroxyarginine (Arg) Modified site: 4-hydroxylysine (Lys) Modified site: 4-hydroxyproline (Pro) Modified site: 5-hydroxylysine (Lys) Modified site: 6-bromotryptophan (Trp) Modified site: ADP-ribosylarginine (Arg) (by ...) Modified site: ADP-ribosylasparagine (Asn) (by ...) Modified site: ADP-ribosylcysteine (Cys) (by ...) Modified site: ADP-ribosylserine (Ser) (by ...) Modified site: allysine (Lys) Modified site: arginine derivative (Arg) Modified site: asparagine derivative (Asn) Modified site: beta-methylthioaspartic acid (Asp) Modified site: bromohistidine (His) Modified site: citrulline (Arg) Modified site: cysteine derivative (Cys) Modified site: cysteine sulfenic acid (Cys) Modified site: D-alanine (Ala) Modified site: D-alanine (Ser) Modified site: D-allo-isoleucine (Ile) Modified site: D-asparagine (Asn) Modified site: D-leucine (Leu) Modified site: D-methionine (Met) Modified site: D-phenylalanine (Phe) Modified site: D-serine (Ser) Modified site: D-tryptophan (Trp) Modified site: dehydroalanine (Ser) Modified site: dehydroalanine (Tyr) Modified site: dehydrobutyrine (Thr) Modified site: dehydrotyrosine (Tyr) Modified site: erythro-beta-hydroxyasparagine (Asn) Modified site: erythro-beta-hydroxyaspartic acid (Asp) Modified site: gamma-carboxyglutamic acid (Glu) Modified site: glutamate methyl ester (Gln) Modified site: glutamate methyl ester (Glu) Modified site: glutamine derivative (Gln) [the following two ambiguous features should be avoided if possible] Modified site: hydroxylysine (Lys) Modified site: hydroxyproline (Pro) Modified site: isoleucine derivative (Ile) Modified site: lysine derivative (Lys) Modified site: N4-methylasparagine (Asn) Modified site: N5-methylglutamine (Gln) Modified site: N6,N6,N6-trimethyllysine (Lys) Modified site: N6,N6-dimethyllysine (Lys) Modified site: N6-(4-amino-2-hydroxybutyl)lysine (Lys) Modified site: N6-methyllysine (Lys) Modified site: omega-N,omega-N-dimethylarginine (Arg) Modified site: omega-N,omega-N'-dimethylarginine (Arg) Modified site: omega-N-methylarginine Modified site: S-(6-FMN)-cysteine (Cys) Modified site: S-(8alpha-FAD)-cysteine (Cys) Modified site: selenocysteine Modified site: thyroxine (Tyr) Modified site: topaquinone (Tyr) Modified site: triiodothyronine (Tyr) Modified site: tryptophyl quinone (Trp) Whenever possible, new modified residues should be added with substitution positions and stereo-isomer indicators provided in accordance with appropriate IUPAC and IUB rules. Steve Garavelli is the curator for these modified residue names. Please bring any additional or new modified residues to his attention. [BLACK] Ambiguous notations such as Modified site: methylation #status predicted should never be used. We have chosen to use the unambiguous IUPAC numbered position forms, in preference to the IUB Greek letter designations, when such usage allows us to avoid inconsistencies between common usage ("epsilon-aminomethyl") and IUB recommended usage ("zeta-amino-methyl"). Note that standard abbreviations for the modified residues are not used, so that, the correct feature is Modified site: gamma-carboxyglutamic acid (Glu) and not Modified site: gamma-carboxyglutamic acid (Gla) B. Modified Amino Terminus The format for this form of the "Modified site" record is "Modified site:" chemical_name "(" res ")" ["(" form ")"] ["(" extent ")"] "#status " status The chemical name should be as specific as possible and should usually include the term "amino end" at the end. When an unblocked or longer precursor form is presented in the entry and the modified site is not position 1, the "in mature form" modifier should be used, for example. Modified site: acetylated amino end (Ala) (in mature form) #status experimental [GRAY] Because not all processed forms requiring this modifier are the final "mature" form, it may become necessary to replace this modifier with something like "(in processed form) #link ...". Annotators are invited to comment on this proposal. Current acceptable examples are: Modified site: 2-oxobutanoic acid (Thr) Modified site: L-3-phenyllactic acid (Phe) Modified site: N-formylmethionine Modified site: acetylated amino end (xxx) * Modified site: blocked amino end [* this form is used only when the presented sequence is completely ambiguous at the amino terminus] Modified site: blocked amino end (xxx) Modified site: dimethylated amino end (Pro) Modified site: fatty acylated amino end (Cys) Modified site: formylated amino end (Gly) Modified site: glucuronylated amino end (Gly) Modified site: methylated amino end (Ala) Modified site: myristylated amino end (Gly) Modified site: succinylated amino end (Trp) Modified site: pyrrolidone carboxylic acid (Gln) Modified site: pyruvic acid (Ser) Modified site: trimethylated amino end (Ala) The form descriptor "(probably ...)" should be used with "blocked amino end" whenever an appropriate prediction can be made for an otherwise experimentally determined ambiguous feature. Modified site: blocked amino end (Ala) (probably acetylated) #status experimental The "blocked amino end" is usually only appropriate with experimental status, because otherwise the specific modification would be used with a predicted status. With increasing degrees of certainty, Modified site: acetylated amino end (Ala) #status predicted says you are guessing both whether and by what, Modified site: blocked amino end (Ala) (probably acetylated) #status experimental says you know whether but are guessing by what, Modified site: acetylated amino end (Ala) #status experimental says you know both whether and by what. Formylated amino terminal methionine is coded for and like selenocysteine is not really a modified site. However it should be annotated as a modified site when it is experimentally observed in a protein. Making the residue explicit is not required in this case. No occurrence has yet been noted of this modified residue in other than the first position. For amino terminal glutamine undergoing cyclization the format is "Modified site: pyrrolidone carboxylic acid (Gln)" ["(in mature form)"] ["#link " link] "#status " status When the amino terminus is known to be glutamine and blocked, pyrrolidone carboxylic acid can be assumed unless a reason to believe otherwise is explicitly provided, in which case Modified site: blocked amino end (Gln) (in mature form) #status experimental should be used. The form Modified site: pyrrolidone carboxylic acid (Glx) should be avoided. The ambiguity should be explicitly noted in the "Residues" record, an appropriate comment made, and the sequence and feature presented as Gln. People entering sequences should be explicitly warned about the notation " m-(n+1)/Product: tubulin, unprocessed form n/Modified site: tyrosine amidated carboxyl end (...) #link MAT2 may be introduced. Your comments on this proposal would be appreciated. D. Selenocysteine The format for this form of the "Modified site:" record is "Modified site: selenocysteine "#status " status It had formerly been thought that selenocysteine arose from post-translational modification of cysteine residues and no single-letter code was assigned. When it was discovered to be encoded, the assignment of a special single-letter code presented an insurmountable software implementation problem. Instead this feature record is applied to those residues, or list of residues. Although it usually serves as an active site, a second feature for that annotation is superfluous. However, when it also serves as a covalent binding site for a prosthetic group, it is considered a secondary modification and two feature records are used. Modified site: selenocysteine Binding site: molybdopterin guanine dinucleotide (Cys) (covalent) Two different things are going on here. The first feature indicates the true coding identity of the residue. The second indicates the true prosthetic group covalently bound to the sequence-presented residue. [This all arises because of the terrible historical accident that no one knew selenocysteine was encoded until it was too late. Ever computer database uses "C" and everyone's computer program will break if a new letter is introduced for it.] Do not use the 1-letter code "X" in the sequence or the 3-letter code "Sec" in a feature for selenocysteine. "X" may, of course, be used in "Residues" records for encoded selenocysteine. E. Acetyllysine, Carbamyllysine, and Acylcysteine Amino terminal lysine acetylated on the alpha-amino group should be annotated Modified site: acetylated amino end (Lys) When a lysine in any position is acetylated or carbamylated on the N6-amino group, it should be annotated like Binding site: acetyl (Lys) (covalent) Binding site: carbon dioxide (Lys) (covalent) Likewise, be careful to distinguish amino terminal cysteine acylated on the alpha-amino group from S-acylated cysteine. The amino-acylated form is like Modified site: acetylated amino end (Cys) Modified site: fatty acylated amino end (Cys) while the S-acylated form is like Binding site: palmitate (Cys) (covalent) Binding site: sn-2,3-diacylglycerol (Cys) (covalent) Other sequence databases [SWISS-PROT] are not careful in making this important distinction and contain errors on this point. F. Aspartate and Glutamate esters Because it has been experimentally observered that both glutamic acid and glutamine give rise to glutamate methyl ester in the same protein and these rules would otherwise require that they be annotated differently, esters of the acids will be annotated with "Modified site" records. Current acceptable examples are: Modified site: glutamate methyl ester (Gln) (by cheB-dependent deamidation and methylation) Modified site: glutamate methyl ester (Glu) PART 2 - Bonds "Cleavage site" Record Where a protein sequence has a cleavage site for activation or preactivation processing, the appropriate "Product" feature should be used. Where a protein sequence has a cleavage site for proteolytic enzymes in the normal process of digestion, no annotation should be used. The only appropriate use of a "Cleavage site" record would be in the case of a specific, biologically significant, proteolytic inactivation. This feature should only be applied to a hyphenated pair (range) of adjacent residues. The format is "Cleavage site:" res "-" res "(" activity ")" "#status " status Some acceptable examples are Cleavage site: Arg-Ser (thrombin) Cleavage site: Gly-Ile (collagenase) Cleavage site: His-Ser (plasmin) Cleavage site: Phe-Leu (chymosin) Cleavage site: Phe-Met (rennin) Cleavage site: Pro-Ile (autolytic) A Comment is usually appropriate to explain the biological significance of these features. Where a sequence is cleaved by an enzyme that is thereby inhibited by the product (a "suicide inhibitor"), the cleavage site of the inhibitor should also be annotated as an inhibitory site. [GRAY] The annotation of intein and extein features is under review. The use of features like Cleavage site: xxx-yyy (autolytic) #link PRE for the precursor forms, and Cross-link: peptide (xxx-zzz) #link MAT [see the following section] for the spliced forms may be introduced. Your comments on this proposal are appreciated. When protein splicing occurs, two entries are used, one for the precursor form and a second for the spliced form, ONLY when the splicing rearranges the order of the peptide segments. (See CVJB and CVJBP) "Cross-link" Record The "Cross-link" record should be used when two or more residues form a covalent bond through their side chains, other than cysteine disulfides, or through the amino- or carboxyl-terminal. The format for an intramolecular "Cross-link" record is "Cross-link:" cross-link name "(" res "-" res ")" ["(" extent ")"] "#status " status This should be applied only to hyphenated pairs of residues. Some current examples are: Cross-link: (2S,3S,6R)-3-methyl-lanthionine (Cys-Thr) Cross-link: 5-imidazolinone (Ser-Gly) Cross-link: cysteinylhistidine (Cys-His) Cross-link: cysteinyltyrosine (Cys-Tyr) Cross-link: isopeptide amino end (Cys-Asn) Cross-link: isopeptide amino end (Gly-Asn) Cross-link: lysinoalanine (Ser-Lys) Cross-link: lysine-topaquinone (Lys-Tyr) Cross-link: oxazole (Cys-Ser) Cross-link: peptide (Asn-Ser) Cross-link: sn-(2S,6R)-lanthionine (Ser-Cys) Cross-link: thiazole (Gly-Cys) Cross-link: thiolester (Cys-Gln) Cross-link: tryptophan-tryptophyl quinone (Trp-Trp) The format for the intermolecular "Cross-link" record is "Cross-link:" cross-link name "(" res ") (interchain" ["to" partner ] ")" ["(" extent ")"] "#status " status This should be applied only to individual residues. Some current examples are: Cross-link: desmosine (Lys) (interchain) Cross-link: isopeptide (Gln) (interchain to ... -Lys) Cross-link: isopeptide (Lys) (interchain to ... -Gln) Cross-link: isopeptide carboxyl end (Gly) (interchain to ...) Cross-link: thiolester (Cys) (interchain to ...) Cross-link: thiolester carboxyl end (Gly) (interchain to ... -Cys) [GRAY] The use of numbered partners here has the same difficulties as with the "Disulfide bonds: (interchain)" and extreme caution is urged. This record should be applied when only the side chains of two or more identified residues are directly involved in the cross-link. If an amino- or carboxyl-terminal group is involved, both are annotated as Cross-links and the terminal features carry the "amino end" or "carboxyl end" in their name. The "Cross-link: isopeptide" is used for a side chain linked to either an amino- or carboxyl-terminal group. The "Cross-link: peptide" is used when both amino- or carboxyl-terminal groups are linked from different chain segments. See the discussion on protein splicing in the "Cleavage site" section for use of the "Cross-link: peptide" feature. [GRAY] A "Cross-link: cyclopeptide" may be used when both amino- or carboxyl-terminal groups are linked from the same chain segment. The use of the cyclopeptide feature is extremely dangerous and it may be limited to entries in PIR4 and NRL_3D. If the cross-link is secondary to the chemical modification of one or both residues (in the sense of a modified site as defined above), the participating residue may also be marked as a modified site. For example, 112-163 Cross-link: tryptophan-tryptophyl quinone (Trp-Trp) 112 Modified site: tryptophyl quinone (Trp) in which the chemically distinct nature of the two tryptophan residues is obvious. If non-proteinaceous compounds are involved, this latter case will generally apply. If the partner of a cross-linked residue is not identified, the residue may be annotated as a covalent binding site. [BLACK] The old "Thiolester bond" record should not be used. Instead, use the form Cross-link: thiolester Thiol ethers should be denoted by appropriate compound names, like lanthionine or cysteinylhistidine. "Disulfide bonds" Record The format for the intramolecular "Disulfide bonds:" record is "Disulfide bonds:" ["(in" form ")"] "#status " status This record should be applied to hyphenated pairs (ranges) of residues, and pairs with the same experimental status should be grouped into lists. Disulfide bonds: Disulfide bonds: (in conotoxin GI) These are in the process of being converted to "#link" forms. Alternative bonds may be indicated within the same record using this format. "Disulfide bonds:" "(or " hyphenated pairs ")"] "#status " status For example, Disulfide bonds: (or 106-121) Disulfide bonds: (or 20-42, 41-99) The "or" form should be avoided whenever possible. [BLACK] Disulfide bonds that would have different annotations must be placed in separate records. For example, instead of 28-44,43-95,49-122,50-88/Disulfide bonds: #status predicted (except for 49-122) two records should appear 28-44,43-95,50-88/Disulfide bonds: #status predicted 49-122/Disulfide bonds: #status experimental The format for the interchain "Disulfide bonds:" record is "Disulfide bonds:" interchain ["(to " partner ")"] "#status " status Generally, this record will applied to individual residues. Without "(to ...)" the interchain bond is assumed to be to the same residue in a dimeric partner, for example: 56/Disulfide bonds: interchain It may be applied to lists of residues when it is thought that all the residues participate in intermolecular bonds to partners of the same sequence but the pattern of bonding is not known. Such cases will usually be status pedicted. 56,72,98/Disulfide bonds: interchain Where the bond is between partners of the same sequence (homopolymeric), records should be applied to both residues individually. 136/Disulfide bonds: interchain (to 133) 133/Disulfide bonds: interchain (to 136) Examples of intermolecular bonds to partners with different sequences (heteropolymeric): Disulfide bonds: interchain (to heavy chain) Disulfide bonds: interchain (to beta chain) Disulfide bonds: interchain (to chain B1) Disulfide bonds: interchain (to alpha-180) Disulfide bonds: interchain (to gamma-34 or gamma-35) The special case of intermolecular bonds to different partners with the same sequence may be distinguished: Disulfide bonds: interchain (to mu chain in another subunit) The partner should be indicated as clearly as necessary. (In NRL_3D the partner's code is used.) [GRAY] The problems of checking and maintaining correct codes and numbers in references to other entries cannot be dealt with in the current database form and will still be difficult after SDDL. Any comments on alternative mechanisms for conveying this information in a manner which will allow easy checking and maintenance would be appreciated. The same problem applies to the interchain "Cross-link" records. In some cases, alternative monomeric, dimeric or multimeric forms are known to exist. Each form should have an appropriate record with an "(in " modifier. For example, Disulfide bonds: (in monomeric form) Disulfide bonds: interchain (in polymeric form) These will be replaced by appropriate "#link" records later. [GRAY] Disulfide bonds can also form with "free" cysteine or with the small peptide glutathione. These are now treated as covalent "Binding site" features. Disulfide bond features should be regarded as a special case of a Cross-link. It should only be used between encoded polypeptide sequences. Disulfide bonds to The format for a transient or active site "Disulfide bonds" record is "Disulfide bonds: redox-active (" status ")" This record should be applied to hyphenated pairs (ranges) of residues.