Announcements of the Protein Information Resource PIR-International 6 August 1997 Highlights 1. Availability of PIR-International Release 53.00 and Associated Data Sets 2. Unique Features of the PIR-International Protein Sequence Database 3. The NRL_3D Database Updated 4. Format changes anticipated for Release 54.00 5. Ordering the Atlas of Protein and Genomic Sequences CD-ROM 1. Availability of PIR-International Release 53.00 and Associated Data Sets The quarterly (June 30) releases of the PIR-International Protein Sequence Database, the NRL_3D database (corresponding to Brookhaven Protein Data Bank Release 79), and the ALN Database of Protein Sequence Alignments are available. Release Information for PIR-International Data Sets ============================================================================== Data Set Release Entries Residues Description PIR1 53.00 13706 5125282 Section 1. Classified and Annotated Entries PIR2 53.00 77275 24432750 Section 2. Annotated Entries PIR3 53.00 3832 868136 Section 3. Unverified Entries PIR4 53.00 238 43412 Section 4. Unencoded or Untranslated Entries NRL_3D 21.00 10717 1897913 NRL Protein Sequences in Brookhaven PDB ALN 16.00 2896 Database of Protein Sequence Alignments PATCHX 53.00 96102 27691941 Available protein sequences not in PIR ECOLI 4.10 592 3996197 Escherichia coli DNA Database RESID 10.00 236 Residues annotated as features in PIR ------------------------------------------------------------------------------ Availability Information for the Data Sets ============================================================================= Data Set | ATLAS CD-ROM | Magnetic Media | PIR WWW | PIR FTP Site | Online | PIR1 | X | X | X | X | X | PIR2 | X | X | X | X | X | PIR3 | X | X | X | X | X | PIR4 | X | X | X | X | X | NRL_3D | X | X | X | X | X | ALN | X | | X | | | PATCHX | X | | | | X | ECOLI | X | | | | | RESID | X | | | | | ----------------------------------------------------------------------------- The Complex Carbohydrate Structure Database (CCSD) and associated CarbBank program for Windows95/NT are also distributed on the Atlas of Protein and Genomic Sequences CD-ROM. The PIR URL: http://www-nbrf.georgetown.edu/pir/ The PIR anonymous ftp site: nbrf.georgetown.edu 2. Unique Fetaures of the PIR-International Protein Sequence Database The PIR-International Protein Sequence Database is unique among comprehensive public domain protein sequence databases in the following respects: * Beginning with Release 53.00, essentially all sequence entries are classified into families (see below). * The PIR-International Protein Sequence Database contains more citations and more up-to-date data. * Full citations, including the titles of papers cited, are given. * The sequence reported in each citation is represented in a manner that clearly shows any differences from the sequence shown in the entry and allows the reported sequence to be reconstructed automatically. * Cross-references to the nucleotide sequence databases are directly associated with the citation on which they are based. * The most complete and current genetic information is provided, including map position, intron positions, and start codon (if different from ATG), along with pointers to genome databases. * Feature annotations are represented with greater accuracy and consistency because of format and terminology restrictions. The current Guide for PIR Features Annotations is publicly available. * It has consistently adhered to its announced update schedule. The PIR has been updated and publicly released 4 times per year for the last 13 years. * Public access is provided through our Web site and online system to the interim updates normally prepared on a weekly basis (except during holiday periods and during preparation of quarterly releases). Dr. Friedhelm Pfeiffer at MIPS has clustered 93% of the sequences in the PIR database into families whose members have about 50% or more sequence identity. Less than 5% of entries in the database are considered unclassifiable, usually because they are too fragmentary. Only about 2% of entries are not fully analyzed. Over 10,000 alignments of the families that contain at least two sequences are available at the MIPS Web Site http://www.mips.biochem.mpg.de/ Every family classified in this way has been assigned a permanent ID. About half the sequences have been further clustered into superfamilies that have also been assigned permanent IDs. The assignment of permanent IDs to superfamilies allows users to keep track of superfamilies more easily. The permanent family and superfamily numbers will shortly be available on the PIR Web Site by clicking on the "Associated information" when viewing an entry. This information will be available to VAX-VMS users of the Atlas CD-ROM and magnetic tapes. 3. The NRL_3D Database Updated The NRL_3D Database, produced by the PIR since 1989, is a database of protein sequences with determined structures extracted from the Brookhaven Protein Data Bank (PDB) coordinate data files. It provides an interface between the Protein Sequence Database and the PDB and provides access to the PDB data via computerized sequence searching and comparison methods. This release corresponding to the January 1997 release of the Brookhaven Protein Data Bank was produced using the facilities of the National Cancer Institute Biomedical Supercomputer Center, Frederick, MD, and we gratefully acknowledge the cooperation of J.V. Maizel, Jr., S.K. Burt, and G.W. Smythers. 4. Format changes anticipated for Release 54.00 We anticipate that there will be some format changes in Release 54. If you want to receive advance notice of the changes, please E-mail a request for the PIR Developer's Bulletin to PIRMAIL@NBRF.GEORGETOWN.EDU 5. Ordering the Atlas of Protein and Genomic Sequences CD-ROM In addtion to the databases listed previously, the ATLAS CD-ROM also includes an installation guide, an ATLAS User's Guide, the FASTA package, and an Installation Manual and Tutorial for CarbBank. The ATLAS program, which accesses all of the other data sets on the CD-ROM, does not access the Complex Carbohydrate Structure Database. Orders for the ATLAS CD-ROM are accepted, WITHOUT PREPAYMENT on institutional purchase orders, by FAX or E-mail. For further information in the US and the Americas, please contact: Kathryn Sidman, Technical Services Coordinator Protein Information Resource (PIR) National Biomedical Research Foundation (NBRF) 3900 Reservoir Rd., NW Washington DC 20007 FAX: (202) 687-1662 phone: (202) 687-2121 E-mail: PIRMAIL@nbrf.georgetown.edu In Europe contact: Martinsried Institute for Protein Sequences (MIPS) Max-Planck-Institute for Biochemistry 8033 Martinsried, Germany FAX: 49 89 8578 2655 phone: 49 89 8578 2657 E-mail: mewes@ehpmic.mips.biochem.mpg.de In Asia and Oceania contact: Japan International Protein Information Database (JIPID) Science University of Tokyo 2669 Yamazaki, Noda 278 Japan FAX: 81 47 122 1544 phone: 81 48 124 1501 E-mail: Tsugita@JPNSUT31.BITNET For information about CarbBank contact: CarbBank/CCSD 114 W. Magnolia St. Suite 305 Bellingham, WA 98225 Phone: (360) 671-8134 EMail: CarbBank@PacificRim.net PIR is a registered mark of NBRF ------------------------------------------------------------------------ Dr. John S. Garavelli Associate Director Protein Information Resource National Biomedical Research Foundation Washington, DC 20007 PIRMAIL@NBRF.GEORGETOWN.EDU