RESID Database of Protein Modifications Release Notes Release 76.00 31-May-2018 1 Release 76.00 contains 621 entries. The distribution has a minimum of four files. bytes RESIDUES.XML 2075527 RESIDUES.DTD 10676 (version 01.28) images.zip 6479920 (621 files unpacked) models.zip 747222 (589 files unpacked) Index files for essentially all records, and Perl scripts for producing them, are available upon request. In particular, there is an index of the entries by the standard amino acids they are considered to be derived from. 2 When the UniProt Knowledgebase does not have a feature annotation for a RESID entry, the feature appears as "Not available" and there is a standard note. Except for the entries for 3 standard amino acids with no feature annotations, 4 sequence ambiguities, 4 artifactual, and 23 deprecated entries, 497 entries are annotated in UniProt, 70 entries are not annotated in UniProt. 3 The database files are available by ftp services at ftp://ftp.pir.georgetown.edu/pir_databases/other_databases/resid/ and at ftp://ftp.ebi.ac.uk/pub/databases/RESID 4 Residue masses and mass differences are calculated using atomic mass averages in the file AtomTabl.XML and the accompanying AtomTabl.DTD that is available at ftp://ftp.pir.georgetown.edu/pir_databases/other_databases/psi/ and will be deposited at http://psidev.cvs.sourceforge.net/viewvc/psidev/psi/mod/data/ The atomic mass averages and isotope masses in version 04.00 of AtomTabl.XML are based on the IUPAC Standard Atom Weights Revised v2 2013, http://www.iupac.org/news/news-detail/article/standard-atomic-weights-revised-v2.html, the IUPAC Isotopic Composition of the Elements, 2001, and the AME2003 Atomic Mass Evaluation. AtomTabl.XML is used to calculate mass values for the PSI-MOD Ontology of Protein Modifications in the files PSI-MOD.obo and PSI-MOD.obo_xml. Please note that the IUPAC Standard Atom Weights Revised 2013, available at http://iupac.org/publications/pac/asap/PAC-REP-10-09-14/ , and the IUPAC Standard Atom Weights Revised v2 2013, rather than defining a standard average atomic mass for 12 elements, which have values strongly dependent on environmental sampling, defines a range of values for the average atomic mass. Those elements include hydrogen, carbon, nitrogen, oxygen, sulfur, and chlorine. The report does provide "conventional atomic weights" with 4 significant digits for hydrogen, sulfur, and chlorine, and 5 significant digits for carbon, nitrogen, and oxygen. In the IUPAC Standard Atom Weights Revised v2 2013, changes in the atomic mass averages of selenium and molybdenum resulted in changes for the chemical or average isotope mass of entries containing those elements. 5 This release of the RESID Database requires DTD file version 01.28. Restricted vocabulary terms of the RESID Database are enumerated in the DTD file, which may be replaced or augmented by an XML-Schema. 6 UTF-8 extended characters currently occur in the following records: SystematicName, Author, Title. 7 The FormalCharge records were implemented beginning in release 60.00. The FormalCharge record is an element of the FormulaBlock and CorrectionBlock records. It consists of an integer immediately followed by a plus or minus sign. It optionally follows the Formula record and precedes the Weight record. It indicates the formal charge of the modified amino acid residue, when the chemical structure as a molecular entity is inherently charged, other than simply through the addition or loss of a proton. In the PSI-MOD ontology it was necessary to introduce this property in order to calculate correctly to six decimal places a residue mass as measured by mass sprectrometry. Typically, it is necessary to use this record for quaternary amines and well-characterized metal clusters. To avoid confusion, the plus sign indicator for indefinite and varying composition in the Formula record will be replaced by a Note record in the FormulaBlock and CorrectionBlock records to indicate minimal, varying or repeated components of a composition. If your work may be affected by these changes, you are invited to send me your comments and suggestions. 8 The special encoded amino acids selenocysteine and pyrrolysine are translated in the nucleotide sequence databases by the IUBMB single letter codes "U" and "O". The UniProt Knowledgebase formerly used "C" and "K" respectively for those amino acids, differentiating them from the standard amino acids through features. The UniProt Knowledgebase now represents encoded selenocysteine and pyrrolysine residues by the IUBMB single letter codes "U" and "O", accompanied by the features FT NON_STD ... ... Selenocysteine. and FT NON_STD ... ... Pyrrolysine. as appropriate. This annotation is documented at http://web.expasy.org/docs/userman.html#FT_NON_STD Both encodings are represented in the RESID Database, and differences for the special encoded amino acids and their secondary modifications are shown based on both the special encoded amino acids and on the hypothetical modification of the corresponding standard amino acids. A table of the amino acid one-letter and three-letter codes is available at http://pir.georgetown.edu/resid/faq.shtml#q01 The other special encoded amino acid, N-formyl-L-methionine, does not have standard one-letter or three-letter codes. Differences are also provided for it and its secondary modifications based on both the encoded form, and on the hypothetical modification of methionine. A compilation of frequently asked questions is available at http://pir.georgetown.edu/resid/faq.shtml A table of glycosylation modifications is available at http://pir.georgetown.edu/resid/togm.shtml A table of flavin modifications is available at http://pir.georgetown.edu/resid/aof.shtml 9 Names based on "azane" are being introduced as alternate names. Names based on "amino" have been designated as "preselected" by the IUPAC Nomenclature Committee, and will continue to be used in preference to the corresponding "azane" names. 10 The word "autocatalytic" is used in the EnzymeName record when the modification is produced exclusively by an autocatalytic process, and not by the action of an enzyme acting on the protein as a substrate. This annotation will usually be used for modifications that occur uniquely in one protein. 11 Remediation of the Protein DataBank has necessitated many changes in the cross-references from modifications in the RESID Database to the PDB HET codes. The PDB is retaining the single alpha-amino acetylation code ACE the alpha-carboxyl amidation code NH2, and other molecular fragment codes considered to be bonded to amino acid side chains, so it will not be possible to implement links to a PDB server that will resolve the many-to-one mapping of those codes. There are also unresolved inconsistencies, with the same structure represented differently in different PDB entries, and multiple HET codes are provided for these one-to-many mappings. 12 When the UniProt Knowledgebase feature annotations for modifications are revised and enhanced, some feature annotations in the new, restricted vocabulary may appear in the RESID Database before they appear in the public release of UniProt. The feature annotations scheduled to be replaced in UniProt will continue to appear in RESID until they have been converted. The CROSSLNK and NON_STD classes were introduced, and the THIOLEST and THIOETH classes were removed. The BINDING, CARBOHYD, LIPID and MOD_RES class features have been standardized. The METAL class features are subject to further revision. 13 A minor systematic error (see H. Maehr, "Graphic representation of configuration in two-dimensional space. Current conventions, clarifications, and proposed extensions", J. Chem. Inf. Comput. Sci. vol. 42, pp. 894-902, 2002, PMID:12132891) occurs in the structure GIF images of many entries. This error is being corrected as the images are gradually replaced. The following models contain phantom atoms: AA0184, AA0187, AA0188, AA0189, AA0377, AA0378, AA0379, AA0380, AA0381, AA0382, AA0410. The following entries contain multiple systematic names separated by semicolons within the "Systematicname" record: AA0208. 14 The standardization of UniProt Knowledgebase feature annotations, and the RESID Database have been discussed in two articles of the journal Proteomics. Farriol-Mathis, N., Garavelli, J.S., Boeckmann, B., Duvaud, S., Gasteiger, E., Gateau, A., Veuthey, A., Bairoch, A. (2004) Annotation of post-translational modifications in the Swiss-Prot knowledge base. Proteomics 4, 1537-1550. Garavelli, J.S. (2004) The RESID Database of Protein Modifications as a resource and annotation tool. Proteomics 4, 1527–1533. The PSI-MOD Ontology of Protein Modifications was announced in a communication published in Nature Biotechnology. Montecchi-Palazzi, L., Beavis, R., Binz, P.A., Chalkley, R.J., Cottrell, J., Creasy, D., Shofstahl, J., Seymour, S.L., Garavelli, J.S. (2008) The PSI-MOD community standard for representation of protein modification data. Nature Biotechnol. 26(8), 864-866. 15 The RESID Database is released at the Protein Information Resource at the Center for Bioinformatics & Computational Biology, Delaware Biotechnology Institute, University of Delaware, and at the Georgetown University Medical Center, Georgetown University. My thanks to DBI, and Cathy Wu for making this possible. In this release a new, more accurate model structure for RESID:AA0266, heme P460-bis-L-cysteine-L-tyrosine has been prepared with the expert assistance of Dr. Miri Hirshberg at the European Bioinformatics Institute. I want to express my heartfelt appreciation for all of her assistance over the years. The RESID Database of Protein Modifications is a service mark of John S. Garavelli. 16 A site for searching the RESID Database is being tested at http://pir0.georgetown.edu/cgi-bin/resid An announcement will be made when it is in full service. FTP sites: ftp://ftp.pir.georgetown.edu/pir_databases/other_databases/resid/ ftp://ftp.ebi.ac.uk/pub/databases/RESID/ Web sites: http://pir.georgetown.edu/resid/ http://home.earthlink.net/~jsgaravelli/RESIDInfo.HTML Citation: To cite the RESID Database, please refer to Proteomics Vol. 4, pp. 1527–1533, 2004 http://www3.interscience.wiley.com/cgi-bin/abstract/108061125/ABSTRACT DOI:10.1002/pmic.200300777 Contact: John S. Garavelli Center for Bioinformatics & Computational Biology Delaware Biotechnology Institute University of Delaware 15 Innovation Way, Suite 205 Newark, DE 19711-5449 USA 302-831-6922 jsgarave@udel.edu