iProClass: An Integrated, Comprehensive, and Annotated Protein Classification Database
Cathy H. Wu, Chunlin Xiao, Zhenglin Hou, Hongzhan Huang, and Winona C. BarkerABSTRACT
The iProClass database is an integrated resource that provides
comprehensive family relationships at both global (whole protein) and local
(domain and motif/site) levels, as well as structural/functional classifications
and features of proteins. It is extended from ProClass, a family database
that integrates PIR superfamilies and PROSITE motifs. The PIR superfamily/family
organization provides complete and non-overlapping clustering of all proteins.
The iProClass currently consists of more than 210,000 non-redundant PIR and
Swiss-Prot proteins organized with more than 29,000 PIR superfamilies, 100,000
MIPS families, 2600 PIR homology and Pfam domains, 1300 ProClass/PROSITE motifs,
and 280 PIR post-translational modification sites. It links to over 30 databases
of protein families, structures, functions, genes, genomes, literature, and
taxonomy, such as Pfam, PRINTS, BLOCKS, KEGG, PDB, SCOP, and CATH. Protein
and superfamily summary reports provide rich annotations, including membership
information with sequence length, taxonomy, and keyword statistics, full family
relationships, comprehensive enzyme (EC) and PDB cross-references, and graphical
feature display. The database facilitates classification-driven annotation
for protein sequences and complete genomes, and supports structural/functional
genomics and proteomics research. The iProClass is implemented in Oracle 8i
object-relational system and available for sequence/text search and report
retrieval at /iproclass/.
[Supported by NSF grant DBI-9974855 and NIH grant P41-LM05798].
Presented at the 4th Annual Conference on Computational Genomics, Baltimore, MD 2000.