home uniprot
Protein Search
 
       Home      About PIR     Databases      Search/Retrieval      Download      Support
HOME / Viral Representative Proteomes
Viral Representative Proteomes (Viral RPs)

Viral Representative Proteomes:
Computational Clustering of UniProtKB Virus Proteomes



Release 2024_02, March 27, 2024

Viral Representative Proteomes (Viral RPs) are computed from UniProtKB virus complete proteomes. For each pair of proteomes, we calculate their co-membership in UniRef50 clusters. We then hierarchically cluster the similar proteomes into a set of Representative Proteome Groups (RPGs) based on their co-memberships at the cutoff levels of 95%, 75%, 55%, 35% and 15%. The proteomes in each RPG are ranked using a Proteome Priority Score to facilitate the selection of a top ranked proteome as the representative from the group. We also use taxonomic group and host information to annotate the viral proteomes in each RPG. Viral RPs can be used to improve proteome annotation, protein classification, and taxonomic nomenclature bias detection in the viral proteome community.

Release Statistics

BLAST sequence search

Browse Viral RPs

Download Viral RPs files

(download the complete proteome set, #Proteomes: 86962)
 RPG fileSeq file*#RPGs
95% cut-offrpg-95.txt rp-seqs-95.fasta.gz15848
75% cut-offrpg-75.txt rp-seqs-75.fasta.gz10498
55% cut-offrpg-55.txt rp-seqs-55.fasta.gz8170
35% cut-off rpg-35.txtrp-seqs-35.fasta.gz6466
15% cut-offrpg-15.txt rp-seqs-15.fasta.gz4800

* All sequence files have been filtered to contain one-protein-per-gene.

Make your own RP sequence file

There are two ways to make your own RP sequence file with respect to taxonomic group and cut-off level:

Using a script:
Please click here to get a Perl script and click here to get the configuration file. You can modify the configuration file according to your need and run the script from your machine.

Download the RPs files by Virus taxonomic group and co-membership cutoff below:
Get one file from each row and then put them together to form your RP sequence file.

Virus Taxonomic Group 95% 75% 55% 35% 15%
unclassified DNA viruses x x x x x
unclassified archaeal viruses x x x x x
unclassified bacterial viruses x x x x x
unclassified viruses x x x x x
Satellites x x x x x
Other viruses x x x x x
environmental samples x x x x x

Note: The Viral RPGs do not include the polyproteomes. To download the RPGs that include polyproteomes, please click here.

Publication
Chen C, Huang H, Mazumder R, Natale DA, McGarvey PB, Zhang J, Polson SW, Wang Y, Wu CH; UniProt Consortium. Computational clustering for viral reference proteomes. Bioinformatics. 2016 Jul 1;32(13):2041-3. doi: 10.1093/bioinformatics/btw110. Epub 2016 Feb 26.



PIR
 HomeAbout PIRDatabasesSearch/AnalysisDownloadSupport  SITE MAPTERMS OF USE
©2018 Protein Information Resource