Representative Proteomes (RP) ============================= Representative Proteomes are proteomes that can best represent all complete proteomes in terms of the majority of the sequence space and information. We provide four sets of Representative Proteomes based on co-membership threshold (CMT) cut-off to allow users to decrease or increase the granularity of the sequence space based on their requirements (PMID: 21556138, http://pir.georgetown.edu/rps/). This directory contains an archive sub-directory, a rg sub-directory and the following files: 1) readme.txt: This file; 2) release_note.txt: current release note; 3) summary.html: Summary statistics for the RPs; 4) completeProteomeSet-seqs.fasta.gz: sequence file for all complete proteomes (one protein per gene); 5) rp-seq-x.fasta.gz, (x=15, 35, 55, 75): RP sequence files at different CMT cut-offs; 6) rpg-x.txt (x=15, 35, 55, 75): Text files of Representative Proteomes Group (RPG) at different CMT cut-offs; Note: rpg-x.txt file format: >rp_taxon_id organism_code name taxon_group_id score(PPS:IsRefP,#PMID,#PDB,#SP,#Entry) C(CUTOFF) RefP tax_id organism_code name taxon_group_id score(PPS:IsRefP,#PMID,#PDB,#SP,#Entry) X_to_rp(X) RefP ... (terms are separated by a tab) Example: >246196 MYCS2 Mycobacterium smegmatis (strain ATCC 700084 / mc(2)155) Bac/ActnBac 11112.56779(PPS:1,36,101,396,6601) 35(CUTOFF) RefP 216594 MYCMM Mycobacterium marinum (strain ATCC BAA-535 / M) Bac/ActnBac 1111.38755(PPS:0,3,12,316,5418) 36.51304(X) 1770 MYCPA Mycobacterium paratuberculosis Bac/ActnBac 1111.81459(PPS: 0,22,17,445,4316) 36.96261(X) 189918 MYCSK Mycobacterium sp (strain KMS). Bac/ActnBac 1111.28800(PPS: 0,1,0,307,5893) 50.23351(X) 350058 MYCVP Mycobacterium vanbaalenii (strain DSM 7251 / PYR-1) Bac/ActnBac 1111.32097(PPS:0,2,1,324,5902) 47.95203(X) Home page: http://pir.georgetown.edu/rps/ Browse: http://pir.georgetown.edu/rps/browse.html BLAST search: http://pir.georgetown.edu/rps/blast_rp.shtml Make your own RP sequence file: http://pir.georgetown.edu/rps/mk_rp.shtml FTP download: ftp://ftp.pir.georgetown.edu/databases/rps/ Sequence files at different cut-offs and for all complete proteomes (one protein per gene). Text files of proteome clusters at different co-membership threshold cut-offs. ftp://ftp.pir.georgetown.edu/databases/rps/rg Text files of genome clusters at different co-membership threshold cut-offs. Representative genomes (RGs) are constructed based on the corresponding RPs. ------------------------------------ Protein Information Resource (PIR) Georgetown University Medical Center 3300 Whitehaven Street, NW, Suite 1200 Washington, DC 20007, USA Email: pirmail@georgetown.edu