RESID Database of Protein Modifications Release Notes
Release 76.00 31-May-2018

1  Release 76.00 contains 621 entries.  The distribution has a minimum of four
   files.              bytes
   RESIDUES.XML      2075527
   RESIDUES.DTD        10676 (version 01.28)
   images.zip        6479920 (621 files unpacked)
   models.zip         747222 (589 files unpacked)
   Index files for essentially all records, and Perl scripts for producing them,
   are available upon request.  In particular, there is an index of the entries
   by the standard amino acids they are considered to be derived from.

2  When the UniProt Knowledgebase does not have a feature annotation for a
   RESID entry, the feature appears as "Not available" and there is a standard
   note.  Except for the entries for 3 standard amino acids with no feature
   annotations, 4 sequence ambiguities, 4 artifactual, and 23 deprecated entries,
   497 entries are annotated in UniProt,
    70 entries are not annotated in UniProt.

3  The database files are available by ftp services at
   ftp://ftp.pir.georgetown.edu/pir_databases/other_databases/resid/
   and at
   ftp://ftp.ebi.ac.uk/pub/databases/RESID

   4  Residue masses and mass differences are calculated using atomic mass averages
   in the file AtomTabl.XML and the accompanying AtomTabl.DTD that is available
   at ftp://ftp.pir.georgetown.edu/pir_databases/other_databases/psi/
   and will be deposited at
   http://psidev.cvs.sourceforge.net/viewvc/psidev/psi/mod/data/
   The atomic mass averages and isotope masses in version 04.00 of AtomTabl.XML
   are based on the IUPAC Standard Atom Weights Revised v2 2013,
   http://www.iupac.org/news/news-detail/article/standard-atomic-weights-revised-v2.html,
   the IUPAC Isotopic Composition of the Elements, 2001, and the AME2003 Atomic
   Mass Evaluation. AtomTabl.XML is used to calculate mass values for the
   PSI-MOD Ontology of Protein Modifications in the files PSI-MOD.obo and
   PSI-MOD.obo_xml.

   Please note that the IUPAC Standard Atom Weights Revised 2013, available at 
   http://iupac.org/publications/pac/asap/PAC-REP-10-09-14/ , and the IUPAC 
   Standard Atom Weights Revised v2 2013, rather than defining a standard 
   average atomic mass for 12 elements, which have values strongly dependent on
   environmental sampling, defines a range of values for the average atomic 
   mass. Those elements include hydrogen, carbon, nitrogen, oxygen, sulfur, and 
   chlorine. The report does provide "conventional atomic weights" with 4
   significant digits for hydrogen, sulfur, and chlorine, and 5 significant
   digits for carbon, nitrogen, and oxygen. In the IUPAC Standard Atom Weights
   Revised v2 2013, changes in the atomic mass averages of selenium and 
   molybdenum resulted in changes for the chemical or average isotope mass of
   entries containing those elements.

5  This release of the RESID Database requires DTD file version 01.28.
   Restricted vocabulary terms of the RESID Database are enumerated in the DTD
   file, which may be replaced or augmented by an XML-Schema.

6  UTF-8 extended characters currently occur in the following records: 
   SystematicName, Author, Title.

7  The FormalCharge records were implemented beginning in release 60.00.  The
   FormalCharge record is an element of the FormulaBlock and CorrectionBlock
   records. It consists of an integer immediately followed by a plus or minus
   sign.  It optionally follows the Formula record and precedes the Weight
   record.  It indicates the formal charge of the modified amino acid residue,
   when the chemical structure as a molecular entity is inherently charged,
   other than simply through the addition or loss of a proton. In the PSI-MOD
   ontology it was necessary to introduce this property in order to calculate
   correctly to six decimal places a residue mass as measured by mass
   sprectrometry. Typically, it is necessary to use this record for quaternary
   amines and well-characterized metal clusters.
  
   To avoid confusion, the plus sign indicator for indefinite and varying
   composition in the Formula record will be replaced by a Note record in the
   FormulaBlock and CorrectionBlock records to indicate minimal, varying or
   repeated components of a composition. If your work may be affected by these
   changes, you are invited to send me your comments and suggestions.

8  The special encoded amino acids selenocysteine and pyrrolysine are translated
   in the nucleotide sequence databases by the IUBMB single letter codes "U" and
   "O". The UniProt Knowledgebase formerly used "C" and "K" respectively for
   those amino acids, differentiating them from the standard amino acids through
   features. The UniProt Knowledgebase now represents encoded selenocysteine and
   pyrrolysine residues by the IUBMB single letter codes "U" and "O",
   accompanied by the features
   FT   NON_STD     ...    ...       Selenocysteine.
   and
   FT   NON_STD     ...    ...       Pyrrolysine.
   as appropriate. This annotation is documented at
   http://web.expasy.org/docs/userman.html#FT_NON_STD
   Both encodings are represented in the RESID Database, and differences for
   the special encoded amino acids and their secondary modifications are shown
   based on both the special encoded amino acids and on the hypothetical
   modification of the corresponding standard amino acids.  A table of the amino
   acid one-letter and three-letter codes is available at
   http://pir.georgetown.edu/resid/faq.shtml#q01
   The other special encoded amino acid, N-formyl-L-methionine, does not have
   standard one-letter or three-letter codes.  Differences are also provided for
   it and its secondary modifications based on both the encoded form, and on the
   hypothetical modification of methionine.

   A compilation of frequently asked questions is available at
   http://pir.georgetown.edu/resid/faq.shtml
   A table of glycosylation modifications is available at
   http://pir.georgetown.edu/resid/togm.shtml
   A table of flavin modifications is available at
   http://pir.georgetown.edu/resid/aof.shtml

9  Names based on "azane" are being introduced as alternate names. Names based
   on "amino" have been designated as "preselected" by the IUPAC Nomenclature 
   Committee, and will continue to be used in preference to the corresponding
   "azane" names.

10 The word "autocatalytic" is used in the EnzymeName record when the
   modification is produced exclusively by an autocatalytic process, and not by
   the action of an enzyme acting on the protein as a substrate. This annotation
   will usually be used for modifications that occur uniquely in one protein.

11 Remediation of the Protein DataBank has necessitated many changes in the
   cross-references from modifications in the RESID Database to the PDB HET
   codes.  The PDB is retaining the single alpha-amino acetylation code ACE
   the alpha-carboxyl amidation code NH2, and other molecular fragment codes
   considered to be bonded to amino acid side chains, so it will not be possible
   to implement links to a PDB server that will resolve the many-to-one mapping
   of those codes.  There are also unresolved inconsistencies, with the same
   structure represented differently in different PDB entries, and multiple
   HET codes are provided for these one-to-many mappings.

12 When the UniProt Knowledgebase feature annotations for modifications are
   revised and enhanced, some feature annotations in the new, restricted
   vocabulary may appear in the RESID Database before they appear in the public
   release of UniProt.  The feature annotations scheduled to be replaced in
   UniProt will continue to appear in RESID until they have been converted.
   The CROSSLNK and NON_STD classes were introduced, and the THIOLEST and
   THIOETH classes were removed.  The BINDING, CARBOHYD, LIPID and MOD_RES
   class features have been standardized. The METAL class features are
   subject to further revision.

13 A minor systematic error (see H. Maehr, "Graphic representation of 
   configuration in two-dimensional space. Current conventions, clarifications,
   and proposed extensions", J. Chem. Inf. Comput. Sci. vol. 42, pp. 894-902, 
   2002, PMID:12132891) occurs in the structure GIF images of many entries.
   This error is being corrected as the images are gradually replaced.
   The following models contain phantom atoms: AA0184, AA0187, AA0188, AA0189,
   AA0377, AA0378, AA0379, AA0380, AA0381, AA0382, AA0410.
   The following entries contain multiple systematic names separated by
   semicolons within the "Systematicname" record: AA0208.

14 The standardization of UniProt Knowledgebase feature annotations, and the
   RESID Database have been discussed in two articles of the journal Proteomics.

   Farriol-Mathis, N., Garavelli, J.S., Boeckmann, B., Duvaud, S., Gasteiger,
   E., Gateau, A., Veuthey, A., Bairoch, A. (2004)
   Annotation of post-translational modifications in the Swiss-Prot knowledge
   base. 
   Proteomics 4, 1537-1550.

   Garavelli, J.S. (2004)
   The RESID Database of Protein Modifications as a resource and annotation
   tool.
   Proteomics 4, 1527–1533.

   The PSI-MOD Ontology of Protein Modifications was announced in a 
   communication published in Nature Biotechnology.

   Montecchi-Palazzi, L., Beavis, R., Binz, P.A., Chalkley, R.J., Cottrell, J.,
   Creasy, D., Shofstahl, J., Seymour, S.L., Garavelli, J.S. (2008)
   The PSI-MOD community standard for representation of protein modification 
   data.
   Nature Biotechnol. 26(8), 864-866.

15 The RESID Database is released at the Protein Information Resource at the
   Center for Bioinformatics & Computational Biology, Delaware Biotechnology
   Institute, University of Delaware, and at the Georgetown University Medical
   Center, Georgetown University. My thanks to DBI, and Cathy Wu for making this
   possible.

   In this release a new, more accurate model structure for RESID:AA0266, heme
   P460-bis-L-cysteine-L-tyrosine has been prepared with the expert assistance
   of Dr. Miri Hirshberg at the European Bioinformatics Institute.  I want to 
   express my heartfelt appreciation for all of her assistance over the years.

   The RESID Database of Protein Modifications is a service mark of John S. 
   Garavelli.

16 A site for searching the RESID Database is being tested at
   http://pir0.georgetown.edu/cgi-bin/resid
   An announcement will be made when it is in full service.

FTP sites:
ftp://ftp.pir.georgetown.edu/pir_databases/other_databases/resid/
ftp://ftp.ebi.ac.uk/pub/databases/RESID/

Web sites:
http://pir.georgetown.edu/resid/
http://home.earthlink.net/~jsgaravelli/RESIDInfo.HTML

Citation:
To cite the RESID Database, please refer to
Proteomics Vol. 4, pp. 1527–1533, 2004
http://www3.interscience.wiley.com/cgi-bin/abstract/108061125/ABSTRACT
DOI:10.1002/pmic.200300777

Contact:
John S. Garavelli
Center for Bioinformatics & Computational Biology
Delaware Biotechnology Institute
University of Delaware
15 Innovation Way, Suite 205
Newark, DE  19711-5449
USA
302-831-6922
jsgarave@udel.edu