iProClass: An Integrated, Comprehensive, and Annotated Protein Classification Database

Cathy H. Wu, Chunlin Xiao, Zhenglin Hou, Hongzhan Huang, and Winona C. Barker
Protein Information Resource, National Biomedical Research Foundation,
Georgetown University Medical Center, 3900 Reservoir Road, NW, Washington, DC 20007-2195

ABSTRACT

The iProClass database is an integrated resource that provides comprehensive family relationships at both global (whole protein) and local (domain and motif/site) levels, as well as structural/functional classifications and features of proteins. It is extended from ProClass, a family database that integrates PIR superfamilies and PROSITE motifs. The PIR superfamily/family organization provides complete and non-overlapping clustering of all proteins. The iProClass currently consists of more than 210,000 non-redundant PIR and Swiss-Prot proteins organized with more than 29,000 PIR superfamilies, 100,000 MIPS families, 2600 PIR homology and Pfam domains, 1300 ProClass/PROSITE motifs, and 280 PIR post-translational modification sites. It links to over 30 databases of protein families, structures, functions, genes, genomes, literature, and taxonomy, such as Pfam, PRINTS, BLOCKS, KEGG, PDB, SCOP, and CATH. Protein and superfamily summary reports provide rich annotations, including membership information with sequence length, taxonomy, and keyword statistics, full family relationships, comprehensive enzyme (EC) and PDB cross-references, and graphical feature display. The database facilitates classification-driven annotation for protein sequences and complete genomes, and supports structural/functional genomics and proteomics research. The iProClass is implemented in Oracle 8i object-relational system and available for sequence/text search and report retrieval at /iproclass/. [Supported by NSF grant DBI-9974855 and NIH grant P41-LM05798].

Presented at the 4th Annual Conference on Computational Genomics, Baltimore, MD 2000.

Back to Publications Page