RESID Database Release Notes Release 71.00 31-Dec-2012 1 Release 71.00 contains 595 entries. The distribution has a minimum of four files. bytes RESIDUES.XML 1922859 RESIDUES.DTD 10676 (version 01.28) images.zip 5797735 (595 files unpacked) models.zip 708235 (562 files unpacked) Index files for essentially all records, and Perl scripts for producing them, are available upon request. In particular, there is an index of the entries by the standard amino acids they are considered to be derived from. There was no quarterly release on 30 September 2012. 2 When the UniProt Knowledgebase does not have a feature annotation for a RESID entry, the feature appears as "Not available" and there is a standard note. Except for the entries for 3 standard amino acids with no feature annotations, 4 sequence ambiguities, 4 artifactual, and 21 deprecated entries, 494 entries are annotated in UniProt, 74 entries are not annotated in UniProt. 3 The ftp server at the National Cancer Institute NCIFCRF is no longer available. That ftp service has been replaced by service at ftp://ftp.pir.georgetown.edu/pir_databases/other_databases/resid/ The ftp service at the EBI will continue to be available at ftp://ftp.ebi.ac.uk/pub/databases/RESID. 4 A Web Server for the RESID Database is available at http://www.ebi.ac.uk/ontology-lookup/RESIDService?resid_id= Provide a valid RESID Database identifier at the end of this address, and the Web Server will return the entry in XML format. Security for transfer procedures may delay updates of this server. 5 Residue masses and mass differences are calculated using atomic mass averages in the file AtomTabl.XML and the accompanying AtomTabl.DTD deposited at http://psidev.cvs.sourceforge.net/viewvc/psidev/psi/mod/data/ The atomic mass averages and isotope masses in version 02.00 of AtomTabl.XML are based on the IUPAC Standard Atom Weights Revised 2009, the IUPAC Isotopic Composition of the Elements, 2001, and the AME2003 Atomic Mass Evaluation. AtomTabl.XML is used to calculate mass values for the PSI-MOD Ontology of Protein Modifications in the files PSI-MOD.obo and PSI-MOD.obo_xml. Please take note that the IUPAC Standard Atom Weights Revised 2009, available at http://iupac.org/publications/pac/asap/PAC-REP-10-09-14/ , rather than defining a standard average atomic mass for 10 elements, which have values strongly dependent on environmental sampling, defines a range of values for the average atomic mass. Those elements include hydrogen, carbon, nitrogen, oxygen, sulfur, and chlorine. The report does provide "conventional atomic weights" with 4 significant digits for hydrogen, sulfur, and chlorine, and 5 significant digits for carbon, nitrogen, and oxygen. 6 This release of the RESID Database requires DTD file version 01.28. Restricted vocabulary terms of the RESID Database are enumerated in the DTD file, which may be replaced or augmented by an XML-Schema. 7 The FormalCharge records were implemented beginning in release 60.00. The FormalCharge record is an element of the FormulaBlock and CorrectionBlock records. It consists of an integer immediately followed by a plus or minus sign. It optionally follows the Formula record and precedes the Weight record. It indicates the formal charge of the modified amino acid residue, when the chemical structure as a molecular entity is inherently charged, other than simply through the addition or loss of a proton. In the PSI-MOD ontology it was necessary to introduce this property in order to calculate correctly to six decimal places a residue mass as measured by mass sprectrometry. Typically, it is necessary to use this record for quaternary amines and well-characterized metal clusters. To avoid confusion, the plus sign indicator for indefinite and varying composition in the Formula record will be replaced by a Note record in the FormulaBlock and CorrectionBlock records to indicate minimal, varying or repeated components of a composition. If your work may be affected by these changes, you are invited to send me your comments and suggestions. 8 The special encoded amino acids selenocysteine and pyrrolysine are translated in the nucleotide sequence databases by the IUBMB single letter codes "U" and "O". The UniProt Knowledgebase formerly used "C" and "K" respectively for those amino acids, differentiating them from the standard amino acids through features. The UniProt Knowledgebase now represents encoded selenocysteine and pyrrolysine residues by the IUBMB single letter codes "U" and "O", accompanied by the features FT NON_STD ... ... Selenocysteine. and FT NON_STD ... ... Pyrrolysine. as appropriate. This annotation is documented at http://web.expasy.org/docs/userman.html#FT_NON_STD Both encodings are represented in the RESID Database, and differences for the special encoded amino acids and their secondary modifications are shown based on both the special encoded amino acids and on the hypothetical modification of the corresponding standard amino acids. A table of the amino acid one-letter and three-letter codes is available at http://www.ebi.ac.uk/RESID/faq.html#q01 The other special encoded amino acid, N-formyl-L-methionine, does not have standard one-letter or three-letter codes. Differences are also provided for it and its secondary modifications based on both the encoded form, and on the hypothetical modification of methionine. A compilation of frequently asked questions is available at http://www.ebi.ac.uk/RESID/faq.htm A table of glycosylation modifications is available at http://www.ebi.ac.uk/RESID/togm.html A table of flavin modifications is available at http://www.ebi.ac.uk/RESID/aof.html 9 Names based on "azane" are being introduced as alternate names. Names based on "amino" have been designated as "preselected" by the IUPAC Nomenclature Committee, and will continue to be used in preference to the corresponding "azane" names. 10 The word "autocatalytic" is used in the EnzymeName record when the modification is produced exclusively by an autocatalytic process, and not by the action of an enzyme acting on the protein as a substrate. This annotation will usually be used for modifications that occur uniquely in one protein. 11 Remediation of the Protein DataBank has necessitated many changes in the cross-references from modifications in the RESID Database to the PDB HET codes. The PDB is retaining the single alpha-amino acetylation code ACE the alpha-carboxyl amidation code NH2, and other molecular fragment codes considered to be bonded to amino acid side chains, so it will not be possible to implement links to a PDB server that will resolve the many-to-one mapping of those codes. There are also unresolved inconsistencies, with the same structure represented differently in different PDB entries, and multiple HET codes are provided for these one-to-many mappings. 12 When the UniProt Knowledgebase feature annotations for modifications are revised and enhanced, some feature annotations in the new, restricted vocabulary may appear in the RESID Database before they appear in the public release of UniProt. The feature annotations scheduled to be replaced in UniProt will continue to appear in RESID until they have been converted. The CROSSLNK and NON_STD classes were introduced, and the THIOLEST and THIOETH classes were removed. The BINDING, CARBOHYD, LIPID and MOD_RES class features have been standardized. The METAL class features are undergoing revision. 13 A minor systematic error (see H. Maehr, "Graphic representation of configuration in two-dimensional space. Current conventions, clarifications, and proposed extensions", J. Chem. Inf. Comput. Sci. vol. 42, pp. 894-902, 2002, PMID:12132891) occurs in the structure GIF images of many entries. This error is being corrected as the images are gradually replaced. The following models contain phantom atoms: AA0184, AA0187, AA0188, AA0189, AA0377, AA0378, AA0379, AA0380, AA0381, AA0382, AA0410. The following entries contain multiple systematic names separated by semicolons within the "Systematicname" record: AA0208. 14 The standardization of UniProt Knowledgebase feature annotations, and the RESID Database have been discussed in two articles of the journal Proteomics. Farriol-Mathis, N., Garavelli, J.S., Boeckmann, B., Duvaud, S., Gasteiger, E., Gateau, A., Veuthey, A., Bairoch, A. (2004) Annotation of post-translational modifications in the Swiss-Prot knowledge base. Proteomics 4, 1537-1550. Garavelli, J.S. (2004) The RESID Database of Protein Modifications as a resource and annotation tool. Proteomics 4, 1527–1533. The PSI-MOD Ontology of Protein Modifications was announced in a communication published in Nature Biotechnology. Montecchi-Palazzi, L., Beavis, R., Binz, P.A., Chalkley, R.J., Cottrell, J., Creasy, D., Shofstahl, J., Seymour, S.L., Garavelli, J.S. (2008) The PSI-MOD community standard for representation of protein modification data. Nature Biotechnol. 26(8), 864-866. 15 The RESID Database is released at the Protein Information Resource at the Center for Bioinformatics & Computational Biology, Delaware Biotechnology Institute, University of Delaware, and at the Georgetown University Medical Center, Georgetown University. My thanks to DBI, and Cathy Wu for making this possible. The RESID Database is hosted by the EMBL-EBI at http://www.ebi.ac.uk/RESID providing information, public access to search the RESID Database, and an ftp directory. The RESID Database of Protein Modifications is a service mark of John S. Garavelli. FTP sites: ftp://ftp.pir.georgetown.edu/pir_databases/other_databases/resid/ ftp://ftp.ebi.ac.uk/pub/databases/RESID/ Web sites: http://www.ebi.ac.uk/RESID http://home.earthlink.net/~jsgaravelli/RESIDInfo.HTML Web Servers: http://www.ebi.ac.uk/ontology-lookup/RESIDService?resid_id= http://pir.georgetown.edu/cgi-bin/resid Citation: To cite the RESID Database, please refer to Proteomics Vol. 4, pp. 1527–1533, 2004 http://www3.interscience.wiley.com/cgi-bin/abstract/108061125/ABSTRACT DOI:10.1002/pmic.200300777 Contact: John S. Garavelli Center for Bioinformatics & Computational Biology Delaware Biotechnology Institute University of Delaware 15 Innovation Way, Suite 205 Newark, DE 19711-5449 USA 302-831-6922 jsgarave@udel.edu