Table 1 Overview of Protein Bioinformatics Databases
Primary Category |
Secondary Category |
Database Name |
Database Content |
URL |
References |
Sequence |
NCBI |
Reference Sequence (RefSeq) |
Biologically non-redundant collection of DNA, RNA, and protein sequences |
Pruitt, K. D., Tatusova, T., Maglott, D. R.
(2007). NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes,
transcripts and proteins. Nucleic Acids Res. 35, D61-D65. (PMID: 17130148) |
|
Entrez Protein Database |
Collection of protein sequences from a variety of sources, including SwissProt, PIR, PRF, PDB, and translations from annotated coding regions in GenBank and RefSeq |
Wheeler,
D. L., Barrett, T., Benson, D. A., Bryant, S. H., Canese,
K., Chetvernin, V., Church, D. M., DiCuccio, M., Edgar, R., Federhen,
S., Geer, L. Y., Kapustin, Y., Khovayko, O.,
Landsman, D., Lipman, D. J., Madden, T. L., Maglott, D. R., Ostell, J.,
Miller, V., Pruitt, K. D., Schuler, G. D., Sequeira,
E., Sherry, S. T., Sirotkin, K., Souvorov, A., Starchenko, G., Tatusov, R. L., Tatusova, T.
A., Wagner, L., Yaschenko, E. (2007) Database
resources of the National Center for Biotechnology Information. Nucleic Acids
Res. 35, D5-D12. (PMID: 17170002) |
|||
UniProt |
UniProt Knowledgebase (UniProtKB) |
Collection
of functional information on proteins, with accurate, consistent and rich
annotation |
The UniProt
Consortium. (2009) The Universal Protein Resource (UniProt)
in 2010. Nucleic Acids Res. Oct 20. [Epub ahead of
print]. (PMID: 19843607) |
||
UniProt Archive (UniParc) |
Comprehensive and non-redundant database that contains most of the publicly available protein sequences in the world |
Leinonen, R., Diez, F. G., Binns, D., Fleischmann, W., Lopez, R., Apweiler, R. (2004) UniProt Archive. Bioinformatics 20, 3236-3237. (PMID: 15044231) |
|||
UniProt Reference Clusters (UniRef) |
Clustered sets of sequences from UniProt Knowledgebase (including splice variants and isoforms) and selected UniParc records |
Suzek, B. E., Huang, H., McGarvey, P., Mazumder, R., Wu,
C. H. (2007) UniRef:
comprehensive and non-redundant UniProt reference
clusters. Bioinformatics 23, 1282-1288. (PMID: 17379688) |
|||
Family |
Whole Protein |
PIRSF |
Comprehensive and non-overlapping clustering of UniProtKB sequences into a hierarchical order to reflect their evolutionary relationships based on whole proteins |
Nikolskaya, A. N., Arighi, C. N., Huang, H., Barker, W. C., Wu, C. H. (2006) PIRSF Family Classification System for Protein Functional and Evolutionary Analysis. Evolutionary Bioinformatics Online 2, 197-209. (PMID: 19455212) |
|
NCBI Clusters of Orthologous Groups of proteins (COGs) |
Phylogenetic classification of proteins encoded in complete genomes |
Tatusov, R. L., Fedorova, N. D., Jackson, J. D., Jacobs, A. R., Kiryutin, B., Koonin, E. V., Krylov, D. M., Mazumder, R., Mekhedov, S. L., Nikolskaya, A. N., Rao, B. S., Smirnov, S., Sverdlov, A. V., Vasudevan, S., Wolf , Y. I., Yin, J. J., Natale, D. A. (2003) The COG database: an updated version includes eukaryotes. BMC Bioinformatics 4, 41-54. (PMID: 12969510) |
|||
PANTHER |
Proteins are classified by expert biologists into families and subfamilies of shared function and further categorized by GO terms |
Mi, H., Guo, N., Kejariwal, A., Thomas, P. D. (2007) PANTHER version 6: protein sequence and function evolution data with expanded representation of biological pathways. Nucleic Acids Res. 35, D247-D252. (PMID: 17130144) |
|||
ProtoNet |
Automatic hierarchical classification of protein sequences |
Kaplan, N., Sasson, O., Inbar, U., Friedlich, M., Fromer, M., Fleischer, H., Portugaly, E., Linial, N., Linial, M. (2005) ProtoNet 4.0: a hierarchical classification of one million protein sequences. Nucleic Acids Res. 33, D216-D218. (PMID: 15608180) |
|||
Protein Domain |
Pfam |
Protein families of domains each represented by multiple sequence alignments and hidden Markov models (HMMs) |
Finn, R. D., Tate, J., Mistry, J., Coggill, P. C., Sammut, S. J., Hotz, H. R., Ceric, G., Forslund, K., Eddy, S. R., Sonnhammer, E. L., Bateman, A. (2008) The Pfam protein families database. Nucleic Acids Res. 36, D281-D288. (PMID: 18039703) |
||
ProDom |
Comprehensive set of protein domain families automatically generated from the UniProtKB |
Bru, C., Courcelle, E., Carrère, S., Beausse, Y., Dalmar, S., Kahn, D. (2005) The ProDom database of protein domain families: more emphasis on 3D. Nucleic Acids Res. 33, D212-D215. (PMID: 15608179) |
|||
Conserved Domains Database (CDD) |
Collections of multiple sequence alignments representing conserved domains |
Marchler-Bauer, A., Anderson, J. B., Chitsaz, F., Derbyshire, M. K., DeWeese-Scott, C., Fong, J. H., Geer, L. Y., Geer, R. C., Gonzales, N. R., Gwadz, M., He, S., Hurwitz, D. I., Jackson, J. D., Ke, Z., Lanczycki, C. J., Liebert, C. A., Liu, C., Lu, F., Lu, S., Marchler, G. H., Mullokandov, M., Song, J. S., Tasneem, A., Thanki, N., Yamashita, R. A., Zhang, D., Zhang, N., Bryant, S. H. (2009) CDD: specific functional annotation with the Conserved Domain Database. Nucleic Acids Res. 37, D205-D210. (PMID: 18984618) |
|||
Simple Modular Architecture Research Tool (SMART) |
Resource for identification and annotation of protein domains and the analysis of domain architectures |
Letunic, I., Doerks, T., Bork, P. (2009) SMART 6: recent updates and new developments. Nucleic Acids Res. 37, D229-D232. (PMID: 18978020) |
|||
Protein Motif |
PRINTS |
Group of conserved motifs used to characterize a protein family |
http://www.bioinf.manchester.ac.uk/dbbrowser/PRINTS/index.php |
Attwood, T.K. (2002) The PRINTS database: a resource for identification of protein families. Brief Bioinform. 3, 252-163. (PMID: 12230034) |
|
PROSITE |
Protein domains, families and functional sites as well as associated patterns and profiles to identify them |
Hulo, N., Bairoch, A., Bulliard, V., Cerutti, L., Cuche, B. A., de Castro, E, Lachaize, C., Langendijk-Genevaux, P. S., Sigrist, C. J. (2008) The 20 years of PROSITE. Nucleic Acids Res. 36, D245-D249. (PMID: 18003654) |
|||
Integrative |
InterPro |
Integrated resource of protein families, domains and functional sites from Pfam, PRINTS, PROSITE, ProDom, SMART, PIRSF etc. |
Hunter, S., Apweiler, R., Attwood, T. K., Bairoch, A., Bateman, A., Binns, D., Bork, P., Das, U., Daugherty, L., Duquenne, L., Finn, R. D., Gough, J., Haft, D., Hulo, N., Kahn, D., Kelly, E., Laugraud, A., Letunic, I., Lonsdale, D., Lopez, R., Madera, M., Maslen, J., McAnulla, C., McDowall, J., Mistry, J., Mitchell, A., Mulder, N., Natale, D., Orengo, C., Quinn, A. F., Selengut, J. D., Sigrist, C. J., Thimma, M., Thomas, P. D., Valentin, F., Wilson, D., Wu, C. H., Yeats, C. (2009) InterPro: the integrative protein signature database. Nucleic Acids Res. 37, D224-D228. (PMID: 18940856) |
||
Structure |
3D Structure |
Worldwide Protein Data Bank (wwPDB) |
Repository for the 3D coordinates and related information on more than 38,000 macromolecular structural data including proteins, nucleic acids and large macromolecular complexes |
Berman, H., Henrick, K., Nakamura, H., Markley, J. L. (2007) The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB data. Nucleic Acids Res. 35, D301-D303. (PMID: 17142228) |
|
Molecular Modeling Database (MMDB) |
3D macromolecular structures, including proteins and polynucleotides. |
Wang, Y., Addess, K. J., Chen, J., Geer, L. Y., He, J., He, S., Lu, S., Madej, T., Marchler-Bauer, A., Thiessen, P. A., Zhang, N., Bryant, S. H. MMDB: annotating protein sequences with Entrez's 3D-structure database. Nucleic Acids Res. 35, D298-D300. (PMID: 17135201) |
|||
ModBase |
3D protein models calculated by comparative modeling |
Pieper, U., Eswar, N., Webb, B. M., Eramian, D., Kelly, L., Barkan, D. T., Carter, H., Mankoo, P., Karchin, R., Marti-Renom, M. A., Davis, F. P., Sali, A. (2009) MODBASE, a database of annotated comparative protein structure models and associated resources. Nucleic Acids Res. 37, D347-D354. (PMID: 18948282) |
|||
SWISS-MODEL Repository |
Annotated protein 3D models |
Kiefer, F., Arnold, K., Künzli, M., Bordoli, L., Schwede, T. (2009) The SWISS-MODEL Repository and associated resources. Nucleic Acids Res. 37, D387-D392. (PMID: 18931379) |
|||
Structural Classification |
CATH |
Hierarchical classification of protein domain structures in the Protein Data Bank |
Cuff, A. L., Sillitoe, I., Lewis, T., Redfern, O. C., Garratt, R., Thornton, J., Orengo, C. A. (2009) The CATH classification revisited--architectures reviewed and new ways to characterize structural divergence in superfamilies. Nucleic Acids Res. 37, D310-D314. (PMID: 18996897) |
||
Structural Classification Of Proteins (SCOP) |
Description of the evolutionary and structural relationships of the proteins of known structure |
Andreeva, A., Howorth, D., Chandonia, J. M., Brenner, S. E., Hubbard, T. J., Chothia, C., Murzin, A. G. (2008) Data growth and its impact on the SCOP database: new developments. Nucleic Acids Res. 36, D419-D425. (PMID: 18000004) |
|||
SUPERFAMILY |
Structural and functional annotation for all proteins and genomes based on a collection of hidden Markov models, which represent structural protein domains at the SCOP superfamily level |
Wilson, D., Pethica, R., Zhou, Y., Talbot, C., Vogel, C., Madera, M., Chothia, C., Gough, J. (2009) SUPERFAMILY--sophisticated comparative genomics, data mining, visualization and phylogeny. Nucleic Acids Res. 37, D380-D386. (PMID: 19036790) |
|||
Protein Folding |
Protein Folding Database (PFD) |
Repository of available experimental protein folding data |
Fulton, K. F., Bate, M. A., Faux, N. G., Mahmood, K., Betts, C., Buckle, A. M. (2007) Protein Folding Database (PFD 2.0): an online environment for the International Foldeomics Consortium. Nucleic Acids Res. 35, D304-D307. (PMID: 17170010) |
||
KineticDB |
Experimental data on protein folding kinetics |
Bogatyreva, N. S., Osypov, A. A., Ivankov, D. N. (2009) KineticDB: a database of protein folding kinetics. Nucleic Acids Res. 37, D342-D346. (PMID: 18842631) |
|||
Protein Modification |
RESID |
Collection of annotations and structures for Protein Pre-, Co- and Post-translational modifications |
Garavelli, J. S. (2004) The RESID Database of Protein Modifications as a resource and annotation tool. Proteomics 4, 1527-1533. (PMID: 15174122) |
||
Phospho3D |
3D structures of phosphorylation sites which stores information retrieved from the phospho.ELM database |
Zanzoni, A., Ausiello, G., Via, A., Gherardini, P. F., Helmer-Citterich, M. (2007) Phospho3D: a database of three-dimensional structures of protein phosphorylation sites. Nucleic Acids Res. 35, D229-D231. (PMID: 17142231) |
|||
Function |
Inter-Molecular interactions |
IntAct |
Protein interaction data from literature and user submission |
Aranda, B., Achuthan, P., Alam-Faruque, Y., Armean, I., Bridge, A., Derow, C., Feuermann, M., Ghanbarian, A. T., Kerrien, S., Khadake, J., Kerssemakers, J., Leroy, C., Menden, M., Michaut, M., Montecchi-Palazzi, L., Neuhauser, S. N., Orchard, S., Perreau, V., Roechert, B., van Eijk, K., Hermjakob, H. (2009) The IntAct molecular interaction database in 2010. Nucleic Acids Res. Oct 22. [Epub ahead of print]. (PMID: 19850723) |
|
Database of Interacting Proteins (DIP) |
Experimentally determined protein-protein interactions |
Salwinski, L., Miller, C. S., Smith, A. J., Pettit, F. K., Bowie, J.U., Eisenberg, D. (2004) The Database of Interacting Proteins: 2004 update. Nucleic Acids Res. 32, D449-D451. (PMID: 14681454) |
|||
Reactome |
A curated knowledgebase of biological pathways |
Matthews, L., Gopinath, G., Gillespie, M., Caudy, M., Croft, D., de Bono, B., Garapati, P., Hemish, J., Hermjakob, H., Jassal, B., Kanapin, A., Lewis, S., Mahajan, S., May, B., Schmidt, E., Vastrik, I., Wu, G., Birney, E., Stein, L., D'Eustachio, P. (2009) Reactome knowledgebase of human biological pathways and processes. Nucleic Acids Res. 37, D619-D622. (PMID: 18981052) |
|||
Biological General Repository for Interaction Datasets (BioGRID) |
Collections of protein and genetic interactions from major model organism species |
Breitkreutz, B. J., Stark, C., Reguly, T., Boucher, L., Breitkreutz, A., Livstone, M., Oughtred, R., Lackner, D. H., Bähler, J., Wood, V., Dolinski, K., Tyers, M. (2008) The BioGRID Interaction Database: 2008 update. Nucleic Acids Res. 36, D637-D640. (PMID: 18000002) |
|||
Metabolic Pathways |
Kyoto Encyclopedia of Genes and Genomes (KEGG) |
Pathway maps on the molecular interaction and reaction networks for Metabolism
|
Kanehisa, M., Goto, S. (2000) KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28, 27-30. (PMID: 10592173) |
||
BioCyc |
Pathway/Genome Databases (PGDBs) on the pathways and genomes of different organisms |
Caspi, R., Foerster, H., Fulcher, C. A., Kaipa, P., Krummenacker, M., Latendresse, M., Paley, S., Rhee, S. Y., Shearer, A., Tissier, C., Walk, T. C., Zhang, P. and Karp, P. D. (2008) The MetaCyc Database of metabolic pathways and enzymes and the BioCyc collection of Pathway/Genome Databases. Nucleic Acids Research 36, D623-D631. (PMID: 17965431) |
|||
MetaCyc |
Non-redundant, experimentally elucidated metabolic pathways |
Caspi, R., Foerster, H., Fulcher, C. A., Kaipa, P., Krummenacker, M., Latendresse, M., Paley, S., Rhee, S. Y., Shearer, A., Tissier, C., Walk, T. C., Zhang, P. and Karp, P. D. (2008) The MetaCyc Database of metabolic pathways and enzymes and the BioCyc collection of Pathway/Genome Databases. Nucleic Acids Research 36, D623-D631. (PMID: 17965431) |
|||
Integrative |
Michigan Molecular Interactions (MiMI) |
Merged view of several popular interaction databases including: BIND, HPRD, IntAct, GRID, and others |
Tarcea, V. G., Weymouth, T., Ade, A., Bookvich, A., Gao, J., Mahavisno, V., Wright, Z., Chapman, A., Jayapandian, M., Ozgür, A., Tian, Y., Cavalcoli, J., Mirel, B., Patel, J., Radev, D., Athey, B., States, D., Jagadish, H. V. (2009) Michigan molecular interactions r2: from interacting proteins to pathways. Nucleic Acids Res. 37, D642-D646. (PMID: 18978014) |
||
Proteomics |
Gel Electrophoresis |
WORLD-2DPAGE Constellation |
List of 2-D PAGE database servers, World-2DPAGE Portal that queries simultaneously world-wide proteomics databases, and World-2DPAGE Repository |
Hoogland, C., Mostaguir, K., Appel, R. D., Lisacek, F. (2008) The World-2DPAGE Constellation to promote and publish gel-based proteomics data through the ExPASy server. J Proteomics 71, 245-248. (PMID: 18617148) |
|
Mass Spectrometry |
Global Proteome Machine Database (GPMDB) |
Mass spectral library for data from a variety of organisms, the identified peptides are matched to the Ensembl genome database |
Craig, R., Cortens, J. C., Fenyo, D., Beavis, R. C. (2008) Using annotated peptide mass spectrum libraries for protein identification. J. Proteome Res. 5, 1843-1849. (PMID: 16889405) |
||
PRoteomics IDEntifications database (PRIDE) |
Protein and peptide identifications that have been described in the scientific literature together with the evidence supporting these identifications |
Vizcaíno, J. A., Côté, R., Reisinger, F., Barsnes, H., Foster, J. M., Rameseder, J., Hermjakob, H., Martens, L. (2009) The Proteomics Identifications database: 2010 update. Nucleic Acids Res. Nov 11. [Epub ahead of print] (PMID: 19906717) |
|||
PeptideAtlas |
Peptides identified in a large set of LC-MS/MS proteomics experiments |
Deutsch, E. W., Lam, H., Aebersold, R. (2008) PeptideAtlas: a resource for target selection for emerging targeted proteomics workflows. EMBO Rep. 9, 429-434. (PMID: 18451766) |
|||
Peptidome |
Tandem mass spectrometry peptide and protein identification data generated by the scientific community |
http://www.ncbi.nlm.nih.gov/peptidome/
|
Slotta, D. J., Barrett, T., Edgar, R. (2009) NCBI Peptidome: a new public repository for mass spectrometry peptide identifications. Nat Biotechnol. 27, 600-601. (PMID: 19587658) |