NRL_3D PROTEIN SEQUENCE--STRUCTURE DATABASE distributed by PIR-International produced by Protein Information Resource (PIR)* Supported by NIH grant LM-05798 Developed in collaboration with The Naval Research Laboratory (NRL) Partially supported by the Office of Naval Research and U.S. Army Medical Research and Development Command Document NRL_3D-2000 This database may be copied and redistributed freely, without prior consent, provided that the PIR-International is acknowledged as the source. Vendors who redistribute the database are requested to identify it prominently. They should also indicate if the database has been reformatted, modified, or enhanced. We would appreciate receiving typical advertising copy for each release and an annual statement of (1) from whom and in what form you obtain the database and (2) to how many end-users you estimate it is distributed. We have made every effort to present the data accurately and to ensure the proper functioning of the programs. We cannot be responsible for the consequences to users of any errors in the data or programs. Protein Information Resource (PIR) National Biomedical Research Foundation Georgetown University Medical Center 3900 Reservoir Road, N.W. Washington, D.C. 20007 USA *PIR is a registered mark of NBRF. NRL_3D is a sequence--structure database derived from the 3-dimensional structures of proteins deposited in the Protein Data Bank (PDB) [1] as of March 2000. NRL_3D was conceived, developed, and tested by K. Namboodiri, N. Pattabiraman, A. Lowrey, and B. Gaber [2,4] at the Naval Research Laboratory, Washington, DC. It was incorporated into the Protein Information Resource (PIR) [3] by D. George, W.C. Barker, and J.S. Garavelli at the National Biomedical Research Foundation, Washington, DC, and M. Kusunoki of the Institute for Protein Research, Osaka University, Osaka, Japan. When this database is used in published research please cite references 2, 3, and 4. Since 1999, the PDB has been produced and distributed by the Research Collaboratory for Structural Bioinformatics (RCSB). The PDB contains atomic coordinates for the 3-dimensional structure of biomolecules obtained using X-ray, electron or neutron diffraction, nuclear magnetic resonance or molecular modeling methods. One common research task is selecting sets of coordinate data according to sequence properties. However, the primary sequence information in the PDB is not presented in a format compatible with sequence manipulation programs of the PIR or other sequence analysis software packages. The NRL_3D database provides a link between this software and the PDB and allows amino acid sequences in the PDB to be searched and analyzed. Sequence, reference and annotation data from the coordinate sets in the PDB are extracted and reformatted in NBRF-format (see the PIR Document Database File Structure and Format Specification for details). These data are set up as a separate database fully accessible to all PIR programs. The program that performs the extraction and reformatting, PDB2PIR, can select coordinate sets on criteria that (1) they correspond to well-defined protein sequences, (2) they are determined with resolutions below a specified limit, (3) they were not determined by molecular mechanics or other theoretical techniques or (4) they contain data added or revised after a specified date. When a selected PDB entry contains more than one polypeptide chain, each chain is represented as a separate entry in NRL_3D. In addition, if more than 3.8 Angstroms separates the alpha-carbons of adjacent amino acids, the sequence is divided into separate fragments. Large separations between adjacent amino acids typically occurs when an intervening sequence fragment has not been resolved in the electron density map. The existence of large separations is taken to indicate that the sequence is not complete or that the chains are not covalently connected. The entry identification codes in NRL_3D are derived from the PDB codes. The first four characters in the code correspond to the PDB code. That four character code is followed first by an optional letter distinguishing one of several polypeptide chains indicated in the same PDB entry. When the chain identifier of a PDB entry is a number rather than a letter, the letter corresponding to the number is used whenever possible. At the end of some of the codes is an optional number distinguishing one the several fragments detected in the same polypeptide chain. The fragment numbers are assigned in order from the amino terminal end, but fragments that do not have more than three recognizable amino acids are subsequently eliminated. The titles for NRL_3D entries are constructed by combining the COMPND and SOURCE records of PDB. The PDB resolution and the R-values are included when they can be recognized. The list of contributing authors is included as a reference with the citation 'Coordinates deposited in the Protein Data Bank.' The PDB HELIX, SHEET, TURN, SSBOND and SITE records as well as some special ATOM and HETATM records are carried into appropriate PIR features. In the PIR-International Protein Sequence Database sequences are numbered sequentially beginning with 1 at the amino end to the carboxyl end of the sequence. This numbering system is used by all the PIR software to specify subsequences or specific amino acids within the sequence. In the PDB, however, the numbering does not necessarily begin with 1, may contain negative values, and non-numerical insertions (i.e., 23A, 23B, etc.), and may not always follow consecutively. The numbering schemes in PDB entries are chosen by the depositors of the coordinate sets to highlight the correspondence between residues in homologous structures. To alleviate to some extent problems associated with these differing numbering systems, a transformation table has been constructed for each entry in the NRL_3D database, and are available in a separate accompanying file. The PATTERN, MATCH and SCAN commands of the the ATLAS program can be used to search the sequences in the NRL_3D Database in the same way as the other PIR-International Protein Sequence Databases. The PIR Web site has similar searching capabilities. The PDB files are not distributed by the PIR. The PIR Web site does provide links to PDB and to molecular graphics and modeling programs obtainable from other sources that can be used to read and display both PDB files and the PIR RESID Database files in PDB format. References ---------- 1. H.M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T.N. Bhat, H. Weissig, I.N. Shindyalov, P.E. Bourne. (2000) The Protein Data Bank. Nucleic Acids Res. 28, 235-242. 2. K. Namboodiri, N. Pattabiraman, A. Lowrey, and B.P. Gaber. (1988) Automated Protein Structure Data Bank Similarity Searches and Their Use in Molecular Modeling with MIDAS. J. Mol. Graphics 6, 211-212. 3. W.C. Barker, J.S. Garavelli, H. Huang, P.B. McGarvey, B.C. Orcutt, G.Y. Srinivasarao, C. Xiao, L.S. Yeh, R.S. Ledley, J.F. Janda, F. Pfeiffer, H.W. Mewes, A. Tsugita, C. Wu. (2000) The Protein Information Resource (PIR). Nucleic Acids Res. 28, 41-44. 4. N. Pattabiraman, K. Namboodiri, A. Lowrey, and B.P. Gaber. (1990) NRL_3D: a sequence-structure database derived from the Protein Data Bank (PDB) and searchable within the PIR environment. Protein Sequences & Data Analysis 3, 387-405.