PIR Non-redundant REFerence protein database(PIR-NREF)
          ******************************************************

                    Protein Information Resource (PIR)
                 National Biomedical Research Foundation
                  Georgetown University Medical Center         
                         3900 Reservoir Road, N.W.
                        Washington, DC  20057, USA
                          Phone: (202) 687-2121
                          Fax  : (202) 687-1662
                   E-mail: pirmail@nbrf.georgetown.edu


1. Introduction

As a major resource of protein information, one of our primary aims is to provide a timely and comprehensive
collection of all protein sequence data that keeps pace with the genome sequencing projects and contains 
source attribution and minimal redundancy.  The PIR-NREF (Non-redundant REFerence) protein database includes 
all sequences  from PIR-PSD, UniProtKB/Swiss-Prot, UniProtKB/TrEMBL, RefSeq, GenPept, and PDB. The NREF entries, each representing 
an identical amino acid sequence from the same source organism redundantly presented in one or more underlying 
protein databases, can serve as the basic unit for protein annotation. The NCBI taxonomy is used as the ontology 
for matching source organism names at the species or strain (if known) levels.

The NREF report provides source attribution (containing protein IDs, accession numbers, taxonomy ID, and protein 
names from underlying databases), in addition to taxonomy, amino acid sequence, and composite bibliography data. 
The composite protein names, including synonyms, alternate names, and even misspellings, can be used to assist 
ontology development on protein names and the identification of mis-annotated proteins. Related sequences, 
including identical sequences from different organisms, as well as identical subsequences and highly similar 
sequences (>=95% sequence identity) are also listed. 


2. Major Features

a)Comprehensiveness and Timeliness:  
Containing all sequences from PIR-PSD, UniProtKB/Swiss-Prot, UniProtKB/TrEMBL, RefSeq, GenPept,and PDB, and updated bi-weekly.

b)Non-Redundancy:  Clustered by sequence identity and taxonomy at the species level. 

c)Source Attribution:  Containing protein IDs and names from associated databases (with hypertext links), in 
addition to protein sequence, taxonomy, and bibliography.


3. New Release Features

Two New Search Options:

* Text: Retrieve a matching list of entries by searching the protein names or the species/organism name.
      
* Species-Based Browsing and Searching Organisms: Browse/Search ~100 organisms including over 70 complete 
genomes.


4. Database Access and Usage

FTP Downloading:
PIR-NREF is availble for free downloading and redistribution from our FTP site in XML format (data file) and 
FASTA format (sequence file).   

Web Site Access:
The Web site supports both text and sequence searches for report and list retrieval.  Direct report retrieval 
is based on sequence unique identifiers, including IDs and accession numbers of the source databases.  List 
retrieval is supported by both text and sequence searches.  The text search matches protein and species names 
using combinations of text string (and substring). Sequence searches include full-scale and species-based BLAST 
searches and peptide/pattern match for functional identification of query proteins or peptides. The Peptide 
Match finds an exact match in the NREF database to a user-defined peptide sequence. The Pattern Match searches 
a user-defined pattern or ProSite pattern against all NREF sequences.

5. Publications

The Protein Information Resource: an integrated public resource of functional annotation of proteins. 
Wu, C., Huang, H., Arminski, L., Castro-Alvear, J., Chen, Y., Hu, Z., Ledley, R. S., Lewis, K. C., Mewes, H.-W., 
Orcutt, B.C., Suzek, B.E., Tsugita, A., Vinayaka, C.R., Yeh, L.-S., Zhang, J. and Barker, W.C. (2002). 
Nucleic Acids Research, 30, 35-37.


Presentations/Posters

PIR Non-Redundant Reference Protein Database (PIR-NREF)
Suzek, B.E., Huang, H., Orcutt, B., Chen, Y., Hu, Z., Zhang, J. and Wu, C.H.
Sixth Annual International Conference on Research in Computational Molecular Biology (April 2002)

5. Sponsorship
PIR is supported by the NIH Grant# P41 LM05798.