Document PSD-CODATA-0703 PIR Installation Document For the CODATA Format Release of P R O T E I N S E Q U E N C E D A T A B A S E of PIR-International Release 80.00, December 31, 2004 283416 sequences, 96216763 residues Protein Information Resource (PIR)* National Biomedical Research Foundation 3900 Reservoir Road, N.W., Box 571414 Washington, DC 20057-1414, USA Japan International Protein Munich Information Center for Information Database (JIPID) Protein Sequences (MIPS) Amakubo 1-16-1 GSF-Forschungszentrum f. Umwelt und Gesundheit Tsukuba 305-0005, Japan am Max-Planck-Instut f. Biochemie Am Klopferspitz 18, D-82152 Martinsried, FRG This database may be redistributed, provided that this notice be given to each user and that the words "Derived from" shall precede this notice if the database has been altered by the redistributor. We have made every effort to ensure proper functioning of the programs and cannot be held responsible for the consequences to users of any problems encountered during their operation. *PIR is a registered mark of NBRF PIR is partially supported by National Library of Medicine grant LM05798 ************************* * PIR-PSD Final Release * ========================= Release 80.00 is the final release for the PIR-International Protein Sequence Database (PIR-PSD). In 2002, PIR joined EBI and SIB to form the UniProt consortium. PIR-PSD sequences and annotations have been integrated into UniProt Knowledgebase. Bi-directional cross-references between UniProt (UniProt Knowledgebase and/or UniParc) and PIR-PSD are established to allow easy tracking of former PIR-PSD entries. PIR-PSD unique sequences, reference citations, and experimentally-verified data can now be found in the relevant UniProt records. This final version of the database will be accessible from the PIR web site and downloadable from the FTP site. 1.0 CODATA Format ================= This document describes the quarterly release of the PIR-International Protein Sequence Database in CODATA format formerly distributed on magnetic media for non-VAX/VMS systems in fixed-length 80-byte records. 2.0 In this Release =================== Release 80.00 of the Protein Sequence Database contains 283,416 entries and 96,216,763 residues. The Release is separated into four datasets. Sectione 1, Fully Classified Entries, contains 20,685 entries and 8,103,841 residues. Section 2, Verified and Classified Entries, contains 262,300 entries and 88,045,621 residues. Section 3, Unverified Entries, contains 24 entries and 74 residues. Section 4, Unencoded or Untranslated Entries, contains 407 tries and 67,227 residues. A total of 36,403 superfamilies includinding 5,700 fully curated ones are represented in sections 1 and 2. 3.0 Features in this Release ============================ Starting with Release 64.00 of the Protein Sequence Database, PIR-International is including status information in protein titles, function and complex records. These new status identifiers are as follows. [validated] = in a title or function block means that one of the references in the entry contains some experimental evidence for the protein's function. [similarity] = in a title or function block means that the name and/or function has been assigned by end to end sequence similarity with other entries that have that same name or function. [imported] = in a title means that the name was imported with the sequence from GenBank, EMBL DDBJ, or other source and has not been verified by PIR. Complete coverage of the entire database will not be obtained for several releases. The absence of a status identifier at this time should NOT be taken as an indication that the information in the title or function blocks is not correct or has not been evaluated by PIR staff.