1         2         3         4         5         6         7         8
12345678901234567890123456789012345678901234567890123456789012345678901234567890

RESID Database Release Notes
Release 71.01 01-Mar-2013

1  Release 71.01 contains 598 entries.  The distribution has a minimum of four
   files.              bytes
   RESIDUES.XML      1943826    
   RESIDUES.DTD        10676 (version 01.28)
   images.zip        5914911 (598 files unpacked)
   models.zip         713109 (565 files unpacked)
   Index files for essentially all records, and Perl scripts for producing them,
   are available upon request.  In particular, there is an index of the entries
   by the standard amino acids they are considered to be derived from.

2  When the UniProt Knowledgebase does not have a feature annotation for a
   RESID entry, the feature appears as "Not available" and there is a standard
   note.  Except for the entries for 3 standard amino acids with no feature
   annotations, 4 sequence ambiguities, 4 artifactual, and 22 deprecated entries,
   498 entries are annotated in UniProt,
    72 entries are not annotated in UniProt.

3  The ftp server at the National Cancer Institute NCIFCRF is no longer 
   available.  That ftp service has been replaced by service at
   ftp://ftp.pir.georgetown.edu/pir_databases/other_databases/resid/
   The ftp service at the EBI will continue to be available at
   ftp://ftp.ebi.ac.uk/pub/databases/RESID.

4  A Web Server for the RESID Database is available at
   http://www.ebi.ac.uk/ontology-lookup/RESIDService?resid_id=
   Provide a valid RESID Database identifier at the end of this address, and the
   Web Server will return the entry in XML format. Security for transfer procedures
   has unfortunately blocked updates of this server. We hope to overcome this
   problem before the next release.

5  Residue masses and mass differences are calculated using atomic mass averages
   in the file AtomTabl.XML and the accompanying AtomTabl.DTD deposited at
   http://psidev.cvs.sourceforge.net/viewvc/psidev/psi/mod/data/
   The atomic mass averages and isotope masses in version 02.00 of AtomTabl.XML
   are based on the IUPAC Standard Atom Weights Revised 2009, the IUPAC Isotopic
   Composition of the Elements, 2001, and the AME2003 Atomic Mass Evaluation.
   AtomTabl.XML is used to calculate mass values for the PSI-MOD Ontology of 
   Protein Modifications in the files PSI-MOD.obo and PSI-MOD.obo_xml.

   Please take note that the IUPAC Standard Atom Weights Revised 2009,
   available at http://iupac.org/publications/pac/asap/PAC-REP-10-09-14/ ,
   rather than defining a standard average atomic mass for 10 elements,
   which have values strongly dependent on environmental sampling, defines a
   range of values for the average atomic mass. Those elements include
   hydrogen, carbon, nitrogen, oxygen, sulfur, and chlorine. The report does
   provide "conventional atomic weights" with 4 significant digits for hydrogen,
   sulfur, and chlorine, and 5 significant digits for carbon, nitrogen, and
   oxygen.

6  This release of the RESID Database requires DTD file version 01.28.
   Restricted vocabulary terms of the RESID Database are enumerated in the DTD
   file, which may be replaced or augmented by an XML-Schema.

7  UTF-8 extended characters currently occur in the following records: 
   SystematicName, Author, Title.

8  The FormalCharge records were implemented beginning in release 60.00.  The
   FormalCharge record is an element of the FormulaBlock and CorrectionBlock
   records. It consists of an integer immediately followed by a plus or minus
   sign.  It optionally follows the Formula record and precedes the Weight
   record.  It indicates the formal charge of the modified amino acid residue,
   when the chemical structure as a molecular entity is inherently charged,
   other than simply through the addition or loss of a proton. In the PSI-MOD
   ontology it was necessary to introduce this property in order to calculate
   correctly to six decimal places a residue mass as measured by mass
   sprectrometry. Typically, it is necessary to use this record for quaternary
   amines and well-characterized metal clusters.
  
   To avoid confusion, the plus sign indicator for indefinite and varying
   composition in the Formula record will be replaced by a Note record in the
   FormulaBlock and CorrectionBlock records to indicate minimal, varying or
   repeated components of a composition. If your work may be affected by these
   changes, you are invited to send me your comments and suggestions.

9  The special encoded amino acids selenocysteine and pyrrolysine are translated
   in the nucleotide sequence databases by the IUBMB single letter codes "U" and
   "O". The UniProt Knowledgebase formerly used "C" and "K" respectively for
   those amino acids, differentiating them from the standard amino acids through
   features. The UniProt Knowledgebase now represents encoded selenocysteine and
   pyrrolysine residues by the IUBMB single letter codes "U" and "O",
   accompanied by the features
   FT   NON_STD     ...    ...       Selenocysteine.
   and
   FT   NON_STD     ...    ...       Pyrrolysine.
   as appropriate. This annotation is documented at
   http://web.expasy.org/docs/userman.html#FT_NON_STD
   Both encodings are represented in the RESID Database, and differences for
   the special encoded amino acids and their secondary modifications are shown
   based on both the special encoded amino acids and on the hypothetical
   modification of the corresponding standard amino acids.  A table of the amino
   acid one-letter and three-letter codes is available at
   http://www.ebi.ac.uk/RESID/faq.html#q01
   The other special encoded amino acid, N-formyl-L-methionine, does not have
   standard one-letter or three-letter codes.  Differences are also provided for
   it and its secondary modifications based on both the encoded form, and on the
   hypothetical modification of methionine.

   A compilation of frequently asked questions is available at
   http://www.ebi.ac.uk/RESID/faq.htm
   A table of glycosylation modifications is available at
   http://www.ebi.ac.uk/RESID/togm.html
   A table of flavin modifications is available at
   http://www.ebi.ac.uk/RESID/aof.html

10 Names based on "azane" are being introduced as alternate names. Names based
   on "amino" have been designated as "preselected" by the IUPAC Nomenclature 
   Committee, and will continue to be used in preference to the corresponding
   "azane" names.

11 The word "autocatalytic" is used in the EnzymeName record when the
   modification is produced exclusively by an autocatalytic process, and not by
   the action of an enzyme acting on the protein as a substrate. This annotation
   will usually be used for modifications that occur uniquely in one protein.

12 Remediation of the Protein DataBank has necessitated many changes in the
   cross-references from modifications in the RESID Database to the PDB HET
   codes.  The PDB is retaining the single alpha-amino acetylation code ACE
   the alpha-carboxyl amidation code NH2, and other molecular fragment codes
   considered to be bonded to amino acid side chains, so it will not be possible
   to implement links to a PDB server that will resolve the many-to-one mapping
   of those codes.  There are also unresolved inconsistencies, with the same
   structure represented differently in different PDB entries, and multiple
   HET codes are provided for these one-to-many mappings.

13 When the UniProt Knowledgebase feature annotations for modifications are
   revised and enhanced, some feature annotations in the new, restricted
   vocabulary may appear in the RESID Database before they appear in the public
   release of UniProt.  The feature annotations scheduled to be replaced in
   UniProt will continue to appear in RESID until they have been converted.
   The CROSSLNK and NON_STD classes were introduced, and the THIOLEST and
   THIOETH classes were removed.  The BINDING, CARBOHYD, LIPID and MOD_RES
   class features have been standardized. The METAL class features are
   subject to further revision.

14 A minor systematic error (see H. Maehr, "Graphic representation of 
   configuration in two-dimensional space. Current conventions, clarifications,
   and proposed extensions", J. Chem. Inf. Comput. Sci. vol. 42, pp. 894-902, 
   2002, PMID:12132891) occurs in the structure GIF images of many entries.
   This error is being corrected as the images are gradually replaced.
   The following models contain phantom atoms: AA0184, AA0187, AA0188, AA0189,
   AA0377, AA0378, AA0379, AA0380, AA0381, AA0382, AA0410.
   The following entries contain multiple systematic names separated by
   semicolons within the "Systematicname" record: AA0208.

15 The standardization of UniProt Knowledgebase feature annotations, and the
   RESID Database have been discussed in two articles of the journal Proteomics.

   Farriol-Mathis, N., Garavelli, J.S., Boeckmann, B., Duvaud, S., Gasteiger,
   E., Gateau, A., Veuthey, A., Bairoch, A. (2004)
   Annotation of post-translational modifications in the Swiss-Prot knowledge
   base. 
   Proteomics 4, 1537-1550.

   Garavelli, J.S. (2004)
   The RESID Database of Protein Modifications as a resource and annotation
   tool.
   Proteomics 4, 1527–1533.

   The PSI-MOD Ontology of Protein Modifications was announced in a 
   communication published in Nature Biotechnology.

   Montecchi-Palazzi, L., Beavis, R., Binz, P.A., Chalkley, R.J., Cottrell, J.,
   Creasy, D., Shofstahl, J., Seymour, S.L., Garavelli, J.S. (2008)
   The PSI-MOD community standard for representation of protein modification 
   data.
   Nature Biotechnol. 26(8), 864-866.

16 The RESID Database is released at the Protein Information Resource at the
   Center for Bioinformatics & Computational Biology, Delaware Biotechnology
   Institute, University of Delaware, and at the Georgetown University Medical
   Center, Georgetown University. My thanks to DBI, and Cathy Wu for making this
   possible.

   The RESID Database is hosted by the EMBL-EBI at http://www.ebi.ac.uk/RESID
   providing information, public access to search the RESID Database, and an
   ftp directory.

   The RESID Database of Protein Modifications is a service mark of John S. 
   Garavelli.

FTP sites:
ftp://ftp.pir.georgetown.edu/pir_databases/other_databases/resid/
ftp://ftp.ebi.ac.uk/pub/databases/RESID/

Web sites:
http://www.ebi.ac.uk/RESID
http://home.earthlink.net/~jsgaravelli/RESIDInfo.HTML

Web Servers:
http://www.ebi.ac.uk/ontology-lookup/RESIDService?resid_id=
http://pir.georgetown.edu/cgi-bin/resid

Citation:
To cite the RESID Database, please refer to
Proteomics Vol. 4, pp. 1527–1533, 2004
http://www3.interscience.wiley.com/cgi-bin/abstract/108061125/ABSTRACT
DOI:10.1002/pmic.200300777

Contact:
John S. Garavelli
Center for Bioinformatics & Computational Biology
Delaware Biotechnology Institute
University of Delaware
15 Innovation Way, Suite 205
Newark, DE  19711-5449
USA
302-831-6922
jsgarave@udel.edu