FAQ [PIR - Protein Information Resource]

Home About PIR Databases Search/Retrieval Download Support

HOME / FAQ

FAQ

Why is release 80.00 the final release for PIR-PSD?

To avoid duplication of work within UniProt, PIR-PSD release 80.00 (31-Dec-2004) is the final.
The PIR-PSD data were imported into UniParc, and bi-directional cross-references between Swiss-Prot + TrEMBL and PIR-PSD were created to allow easy tracking of former PIR-PSD entries into UniProtKB. All suitable sequences in PIR-PSD that are missing from Swiss-Prot + TrEMBL are being incorporated into the TrEMBL section of UniProt Knowledgebase. Additionally, all valid references and experimentally verified data present in PIR-PSD, but missing from Swiss-Prot + TrEMBL, are also being transferred to the relevant UniProtKB records.

What happened to NREF?

NREF is a comprehensive database for sequence searching and protein identification, containing non-redundant protein sequences from UniProtKB, RefSeq, GenPept, and PDB. Because UniParc database has a similar sequence space coverage, NREF sequences absent in UniProtKB, but present in UniParc, are retrievable after performing a text search in iProClass. This was decided in an effort to provide a centralized comprehensive database and minimize duplication of work between UniProt and PIR.

What is UniProt?

Until recently, the TrEMBL + Swiss-Prot databases and the PIR Protein Sequence Database (PIR-PSD) coexisted with differing protein sequence coverage and annotation priorities. In 2002, the maintainers of these databases- European Bioinformatics Institute, Swiss Institute of Bioinformatics, and Protein Information Resource, respectively- joined forces as the Universal Protein Resource Consortium (UniProt Consortium).The primary mission of the consortium is to support biological research by maintaining a high quality database that serves as a stable, comprehensive, fully classified, richly and accurately annotated protein sequence knowledgebase, with extensive cross-references and querying interfaces freely accessible to the scientific community. UniProt builds upon the solid foundations laid by the consortium members over many years.

The UniProt databases consist of three database layers:

The UniProt Archive (UniParc) provides a stable, comprehensive sequence collection without redundant sequences by storing the complete body of publicly available protein sequence data.

The UniProt Knowledgebase (UniProtKB) provides the central database of protein sequences with accurate, consistent, rich sequence and functional annotation.

The UniProt Non-redundant Reference (UniRef) databases provide condensed data collections based on the UniProt Knowledgebase in order to obtain complete coverage of sequence space at several resolutions.

What is the difference between an accession number and an ID?

An accession number (AC) is assigned to each sequence upon inclusion into UniProtKB. Accession numbers are stable from release to release. If several UniProt Knowledgebase entries are merged into one, for reasons of minimizing redundancy, the accession numbers of all relevant entries are kept. Each entry has one primary AC and optional secondary ACs.
An ID is a unique identifier, often containing biologically relevant information. It is sometimes necessary, for reasons of consistency, to change IDs (for example to ensure that related entries have similar names). Another common cause for changing an ID is when an entry is promoted from UniProt's TrEMBL section (with computationally-annotated records) to the Swiss-Prot section (with fully curated records). However, an accession number is always conserved, and therefore allows unambiguous citation of UniProt entries.

How can I link to PIR from my program/database?

We welcome and encourage you to provide links back to PIR from your database or program. Please, refer to Use/Link to PIR to get detail information about this procedure.

Can I save results as a table?

Yes, there is save option "Table" on the right corner of the result page that allows you to do this. You may later on open the file using any spreadsheet program.

What does the iProClass value-added information represent?

The iProClass report provides an alternative view of UniProtKB proteins. On top of the information presented in the UniProtKB report, it contains thorough information regarding protein family classification (PIRSF), protein structure and function, and it also allows ID mapping to multiple databases.

In the past, I submitted protein sequences to PIR-PSD, where should I submit my sequence now?

PIR has recently joined forces with the European Bioinformatics Institute and the Swiss Institute of Bioinformatics to establish the Universal Protein Resource (UniProt), the central resource of protein sequence and function. Please submit your sequences directly to UniProtKB using SPIN, the new web-based tool for submitting directly sequenced proteins.

Classification
What does the PIRSF family level mean?

The primary PIRSF classification level is the homeomorphic family (HFam), whose members are both homologous (evolved from a common ancestor) and homeomorphic (sharing full-length sequence similarity and a common domain architecture). At a lower level are the subfamilies (SubFam) which are clusters representing functional specialization and/or domain architecture variation within the family. Above the homeomorphic level there may be parent superfamilies (SuperFam) that connect distantly related families and orphan proteins based on common domains. They may be homeomorphic superfamilies, but are more likely to be domain superfamilies if the common domains do not extend over the full length of the proteins. Because proteins can belong to more than one domain superfamily, the PIRSF structure is formally a network.

What does curation status mean?
Curation status reflects the level PIRSFs are manually curated.

None: Computer-generated protein clusters, no manual curation. The clusters are computationally defined using both pairwise based parameters (% sequence identity, sequence length ratio and overlap length ratio) and cluster-based parameters (% matched members, distance to neighboring clusters and overall domain arrangement).

Preliminary: Computer-generated clusters are manually curated for membership (does this protein belong to the cluster?) and domain architecture (Pfam domains listed from N- to C- terminus).

Full: A name is assigned to the protein family, and accompanying references are listed when available. In many cases, brief descriptions are also provided.

PIRSF scan does not give a match to any PIRSF, however, I know my protein may belong to one PIRSF

It is possible that no PIRSF match is returned after a scan; the reason being that the query protein is analyzed versus a reduced set of PIRSFs. Only PIRSFs that are fully curated and have associated benchmarked HMM models are considered for the analysis.
On the other hand, if the query protein is related to a member that has been placed manually by a curator, it is possible that the algorithm fails to hit the relevant PIRSF.

Protein Information Resource