home uniprot
 
       Home      About PIR     Databases      Search/Retrieval      Download      Support
HOME / Viral Reference Proteomes
Viral Reference Proteomes (Viral RPs)

Viral Reference Proteomes Help



What are Viral Reference Proteomes?

Viral Reference Proteomes (Viral RPs) are computed from UniProtKB virus complete proteomes. For each pair of proteomes, we calculate their co-membership in UniRef50 clusters. We then hierarchically cluster the similar proteomes into a set of Representative Proteome Groups (RPGs) based on their co-memberships at the cutoff levels of 95%, 75%, 55%, 35% and 15%. The proteomes in each RPG are ranked using a Proteome Priority Score to facilitate the selection of a top ranked proteome as the representative from the group. We also use taxonomic group and host information to annotate the viral proteomes in each RPG. Viral RPs can be used to improve proteome annotation, protein classification, and taxonomic nomenclature bias detection in the viral proteome community.



Viral Reference Proteomes BLAST search

BLAST search can be performed against RP95, RP75, RP55, RP35 or RP15. Lower RPs (such as RP15) has fewer proteomes than higher RPs (such as RP75). For example, to get the least number of BLAST results choose RP15. For additional help on BLAST search please see here.



Browse Viral Reference Proteomes

The taxonomic group view of Viral RPs at five different co-membership cutoff levels can be viewed from here. The top most nodes are Deltavirus, dsDNA viruses, dsRNA viruses, environmental samples, Retro-transcribing viruses, Satellites, ssDNA viruses, ssRNA viruses, and Other viruses. The fully expanded view shows all the proteomes that have been analyzed to identify the RPs. Browsing the RPs at different threshold for different taxonomy nodes can provide clues as to which CMT is best for a particular branch and how the RPs are distributed in their taxonomic group tree. Once a desired set of RPs is displayed on the screen, it can be printed for future reference.



Download RPGs and RP Sequences

For the co-membership cutoff levels of 95%, 75%, 55%, 35% and 15%, corresponding Viral Representative Proteome Group files are provided, via links from Viral RP home page in the format below:

>rp_UPId	rp_taxon_id	rp_oscode	rp_name	rp_taxon	rp_PPS(details)	C (CUTOFF)	X_to_seed(X-seed)	
 mp_UPId	mp_taxon_id	mp_oscode	mp_name	mp_taxon	mp_PPS(details)	X_to_rp (X-RP)	X_to_seed(X-seed)	

Where rp is Reference Proteome, mp is Member Proteome in the RPG.

An example of Representative Proteome Group is shown below:

>UP000000252	374526	9CAUD	Lactococcus phage ul36k1t1.	dsDNA viruses, no RNA stage	27104.09017(PPS:0,1,1,1.82,53)	75(CUTOFF)		100.00000(X-seed)
 UP000000251	374529	9CAUD	Lactococcus phage ul36t1k1.	dsDNA viruses, no RNA stage	27104.08554(PPS:0,1,1,1.78,51)	86.27451(X-RP)		86.27451(X-seed)
 UP000001579	374527	9CAUD	Lactococcus phage ul36t1.	dsDNA viruses, no RNA stage	27104.08430(PPS:0,1,1,1.76,52)	80.76923(X-RP)		80.76923(X-seed)

We also provide the sequences file in FASTA format for the RP95, RP75, RP55, RP35, and RP15 sets. In addition, users can choose to make their own customized RP set by using the taxonomic group based table or perl sscript available via a link from the Viral RP home page.



Customizable RP Sequences

We provide two ways to allow users create customized RP sequences file with respect to Virus taxonomic groups and co-membership cutoff levels.
  • User can run the following PERL script with a configuration file customized to his/her needs to download the RP sequences.
  • Perl Script
    #
    # Make your own RP sequence file
    #
    # This script will create a RP sequence file using the user's choices of
    # Taxonomy Groups and RPG cutoffs.
    #
    # The perl module "LWP::Simple" is required
    #
    #!/usr/bin/perl
    
    use LWP::Simple;
    if(@ARGV != 2) {
    	print "Usage: perl getMyRPSeq.pl mypick.txt output.seq\n";
    	exit 1; 
    }
    
    # configuration file of tab-delimited txt file with three columns:
    # TaxonGroup      TaxonId RPCutoff
    # Note: for each row in the file, only the third column should be changed. 
    # 	The possible values are 15, 35, 55, 75, 95
    my $mypick = $ARGV[0];
    
    # path to the output sequence file
    my $output = $ARGV[1];
    
    my %taxonId = ();
    my $url = 'http://pir.georgetown.edu/rps/viruses/data/rp_seq_tax_group/current/';
    open(PICK, $mypick) or die "Can't open $!\n";
    while($line=) {
    	chomp($line);
    	if(!($line =~ /^Virus Taxonomic Group/ || $line =~ /^#/ || $line =~ /^$/)) {
    		my ($taxId, $cutoff) = (split(/\t/, $line))[1, 2];
    		my $seqUrl = $url."/".$cutoff."/".$taxId.".seq";
    		my $file = $taxId.".seq";
    		$taxonId{$taxId} = 1;
    		my $status = getstore($seqUrl, $file);
    		die "Error $status on $seqUrl" unless is_success($status);
    	}
    }
    close(PICK);
    open(OUT, ">", $output) or die "Can't open $!\n";
    for my $key (sort keys %taxonId) {
    	my $seqFile = $key.".seq";
    	if(-e $key.".seq") {
    		open(FH, $seqFile) or die "Can't open $!\n";
    		while($line=) {
    			print OUT $line;
    		}	
    		close(FH);
    		unlink($seqFile);
    	}
    }
    close(OUT);
    
    Configuration File
    # configuration file of tab-delimited txt file with three columns:
    # VirusGroup      TaxonId RPCutoff
    # Note: for each row in the file, only the third column should be changed.
    #       The possible values are 15, 35, 55, 75 95
    #
    Virus Taxonomic Group   Id      RPCutoff
    Deltavirus      39759   75
    Retro-transcribing viruses      35268   75
    Satellites      12877   75
    dsDNA viruses, no RNA stage     35237   75
    dsRNA viruses   35325   75
    ssDNA viruses   29258   75
    ssRNA viruses   439488  75
    Other viruses   10239   75
    environmental samples   186616  75
    

  • Download the RP sequences using Virus taxonomic group and co-membership cutoff levels information by getting one file from each row and then put them together to form the customized RP sequences file.




PIR
 HomeAbout PIRDatabasesSearch/AnalysisDownloadSupport  SITE MAPTERMS OF USE
©2018 Protein Information Resource