Table 1 Overview of Protein Bioinformatics Databases

 

Primary Category

Secondary Category

Database Name

Database Content

URL

References

Sequence

NCBI

Reference Sequence (RefSeq)

Biologically non-redundant collection of DNA, RNA, and protein sequences

http://www.ncbi.nlm.nih.gov/RefSeq/

Pruitt, K. D., Tatusova, T., Maglott, D. R. (2007). NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 35, D61-D65. (PMID: 17130148)

Entrez Protein Database

Collection of protein sequences from a variety of sources, including SwissProt, PIR, PRF, PDB, and translations from annotated coding regions in GenBank and RefSeq

http://www.ncbi.nlm.nih.gov/sites/entrez?db=protein

Wheeler, D. L., Barrett, T., Benson, D. A., Bryant, S. H., Canese, K., Chetvernin, V., Church, D. M., DiCuccio, M., Edgar, R., Federhen, S., Geer, L. Y., Kapustin, Y., Khovayko, O., Landsman, D., Lipman, D. J., Madden, T. L., Maglott, D. R., Ostell, J., Miller, V., Pruitt, K. D., Schuler, G. D., Sequeira, E., Sherry, S. T., Sirotkin, K., Souvorov, A., Starchenko, G., Tatusov, R. L., Tatusova, T. A., Wagner, L., Yaschenko, E. (2007) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 35, D5-D12. (PMID: 17170002)

UniProt

UniProt Knowledgebase (UniProtKB)

Collection of functional information on proteins, with accurate, consistent and rich annotation

http://www.uniprot.org/help/uniprotkb

The UniProt Consortium. (2009) The Universal Protein Resource (UniProt) in 2010. Nucleic Acids Res. Oct 20. [Epub ahead of print]. (PMID: 19843607)

UniProt Archive (UniParc)

Comprehensive and non-redundant database that contains most of the publicly available protein sequences in the world

http://www.uniprot.org/help/uniparc

Leinonen, R., Diez, F. G., Binns, D., Fleischmann, W., Lopez, R., Apweiler, R. (2004) UniProt Archive. Bioinformatics 20, 3236-3237. (PMID: 15044231)

UniProt Reference Clusters (UniRef)

Clustered sets of sequences from UniProt Knowledgebase (including splice variants and isoforms) and selected UniParc records

http://www.uniprot.org/help/uniref

Suzek, B. E., Huang, H., McGarvey, P., Mazumder, R., Wu, C. H. (2007) UniRef: comprehensive and non-redundant UniProt reference clusters. Bioinformatics 23, 1282-1288. (PMID: 17379688)

Family

Whole Protein

PIRSF

Comprehensive and non-overlapping clustering of UniProtKB sequences into a hierarchical order to reflect their evolutionary relationships based on whole proteins

http://pir.georgetown.edu/pirwww/dbinfo/pirsf.shtml

Nikolskaya, A. N., Arighi, C. N., Huang, H., Barker, W. C., Wu, C. H. (2006) PIRSF Family Classification System for Protein Functional and Evolutionary Analysis. Evolutionary Bioinformatics Online 2, 197-209. (PMID: 19455212)

NCBI Clusters of Orthologous Groups of proteins (COGs)

Phylogenetic classification of proteins encoded in complete genomes

http://www.ncbi.nlm.nih.gov/COG/

Tatusov, R. L., Fedorova, N. D., Jackson, J. D., Jacobs, A. R., Kiryutin, B., Koonin, E. V., Krylov,  D. M., Mazumder, R., Mekhedov, S. L., Nikolskaya, A. N., Rao, B. S., Smirnov, S., Sverdlov, A. V., Vasudevan, S., Wolf , Y. I., Yin, J. J., Natale,  D. A. (2003) The COG database: an updated version includes eukaryotes. BMC Bioinformatics 4, 41-54. (PMID: 12969510)

PANTHER

Proteins are classified by expert biologists into families and subfamilies of shared function and further categorized by GO terms

http://www.pantherdb.org/

Mi, H., Guo, N., Kejariwal, A., Thomas, P. D. (2007) PANTHER version 6: protein sequence and function evolution data with expanded representation of biological pathways. Nucleic Acids Res. 35, D247-D252. (PMID: 17130144)

ProtoNet

Automatic hierarchical classification of protein sequences

http://www.protonet.cs.huji.ac.il/index.php

Kaplan, N., Sasson, O., Inbar, U., Friedlich, M., Fromer, M., Fleischer, H., Portugaly, E., Linial, N., Linial, M. (2005) ProtoNet 4.0: a hierarchical classification of one million protein sequences. Nucleic Acids Res. 33, D216-D218. (PMID: 15608180)

Protein Domain

Pfam

Protein families of domains each represented by multiple sequence alignments and hidden Markov models (HMMs)

http://pfam.sanger.ac.uk/

Finn, R. D., Tate, J., Mistry, J., Coggill, P. C., Sammut, S. J., Hotz, H. R., Ceric, G., Forslund, K., Eddy, S. R., Sonnhammer, E. L., Bateman, A. (2008) The Pfam protein families database. Nucleic Acids Res. 36, D281-D288. (PMID: 18039703)

ProDom

Comprehensive set of protein domain families automatically generated from the UniProtKB

http://prodom.prabi.fr/prodom/current/html/home.php

Bru, C., Courcelle, E., Carrère, S., Beausse, Y., Dalmar, S., Kahn, D. (2005) The ProDom database of protein domain families: more emphasis on 3D. Nucleic Acids Res. 33, D212-D215. (PMID: 15608179)

Conserved Domains Database (CDD)

Collections of multiple sequence alignments representing conserved domains

http://www.ncbi.nlm.nih.gov/sites/entrez?db=cdd

Marchler-Bauer, A., Anderson, J. B., Chitsaz, F., Derbyshire, M. K., DeWeese-Scott, C., Fong, J. H., Geer, L. Y., Geer, R. C., Gonzales, N. R., Gwadz, M., He, S., Hurwitz, D. I., Jackson, J. D., Ke, Z., Lanczycki, C. J., Liebert, C. A., Liu, C., Lu, F., Lu, S., Marchler, G. H., Mullokandov, M., Song, J. S., Tasneem, A., Thanki, N., Yamashita, R. A., Zhang, D., Zhang, N., Bryant, S. H. (2009) CDD: specific functional annotation with the Conserved Domain Database. Nucleic Acids Res. 37, D205-D210. (PMID: 18984618)

Simple Modular Architecture Research Tool (SMART)

Resource for identification and annotation of protein domains and the analysis of domain architectures

http://smart.embl.de/

Letunic, I., Doerks, T., Bork, P. (2009) SMART 6: recent updates and new developments. Nucleic Acids Res. 37, D229-D232. (PMID: 18978020)

Protein Motif

PRINTS

Group of conserved motifs used to characterize a protein family

http://www.bioinf.manchester.ac.uk/dbbrowser/PRINTS/index.php

Attwood, T.K. (2002) The PRINTS database: a resource for identification of protein families. Brief Bioinform. 3, 252-163. (PMID: 12230034)

PROSITE

Protein domains, families and functional sites as well as associated patterns and profiles to identify them

http://ca.expasy.org/prosite/

Hulo, N., Bairoch, A., Bulliard, V., Cerutti, L., Cuche, B. A., de Castro, E, Lachaize, C., Langendijk-Genevaux, P. S., Sigrist, C. J. (2008) The 20 years of PROSITE. Nucleic Acids Res. 36, D245-D249. (PMID: 18003654)

Integrative

InterPro

Integrated resource of protein families, domains and functional sites from Pfam, PRINTS, PROSITE, ProDom, SMART, PIRSF etc.

http://www.ebi.ac.uk/interpro/

Hunter, S., Apweiler, R., Attwood, T. K., Bairoch, A., Bateman, A., Binns, D., Bork, P., Das, U., Daugherty, L., Duquenne, L., Finn, R. D., Gough, J., Haft, D., Hulo, N., Kahn, D., Kelly, E., Laugraud, A., Letunic, I., Lonsdale, D., Lopez, R., Madera, M., Maslen, J., McAnulla, C., McDowall, J., Mistry, J., Mitchell, A., Mulder, N., Natale, D., Orengo, C., Quinn, A. F., Selengut, J. D., Sigrist,  C. J., Thimma, M., Thomas, P. D., Valentin, F., Wilson, D., Wu, C. H., Yeats, C. (2009) InterPro: the integrative protein signature database. Nucleic Acids Res. 37, D224-D228. (PMID: 18940856)

Structure

3D Structure

Worldwide Protein Data Bank (wwPDB)

Repository for the 3D coordinates and related information on more than 38,000 macromolecular structural data including proteins, nucleic acids and large macromolecular complexes

http://www.wwpdb.org/

Berman, H., Henrick, K., Nakamura, H., Markley, J. L. (2007) The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB data. Nucleic Acids Res. 35, D301-D303. (PMID: 17142228)

Molecular Modeling Database (MMDB)

3D macromolecular structures, including proteins and polynucleotides.

http://www.ncbi.nlm.nih.gov/sites/entrez?db=structure

Wang, Y., Addess, K. J., Chen, J., Geer, L. Y., He, J., He, S., Lu, S., Madej, T., Marchler-Bauer, A., Thiessen, P. A., Zhang, N., Bryant, S. H. MMDB: annotating protein sequences with Entrez's 3D-structure database. Nucleic Acids Res. 35, D298-D300. (PMID: 17135201)

ModBase

3D protein models calculated by comparative modeling

http://modbase.compbio.ucsf.edu/modbase-cgi/index.cgi

Pieper, U., Eswar, N., Webb, B. M., Eramian, D., Kelly, L., Barkan, D. T., Carter, H., Mankoo, P., Karchin, R., Marti-Renom, M. A., Davis, F. P., Sali, A. (2009) MODBASE, a database of annotated comparative protein structure models and associated resources. Nucleic Acids Res. 37, D347-D354. (PMID: 18948282)

SWISS-MODEL Repository

Annotated protein 3D models

http://swissmodel.expasy.org/repository/

Kiefer, F., Arnold, K., Künzli, M., Bordoli, L., Schwede, T. (2009) The SWISS-MODEL Repository and associated resources. Nucleic Acids Res. 37, D387-D392. (PMID: 18931379)

Structural Classification

CATH

Hierarchical classification of protein domain structures in the Protein Data Bank

http://www.cathdb.info/

Cuff, A. L., Sillitoe, I., Lewis, T., Redfern, O. C., Garratt, R., Thornton, J., Orengo, C. A. (2009) The CATH classification revisited--architectures reviewed and new ways to characterize structural divergence in superfamilies. Nucleic Acids Res. 37, D310-D314. (PMID: 18996897)

Structural Classification Of Proteins (SCOP)

Description of the evolutionary and structural relationships of the proteins of known structure

http://scop.mrc-lmb.cam.ac.uk/scop/

Andreeva, A., Howorth, D., Chandonia, J. M., Brenner, S. E., Hubbard, T. J., Chothia, C., Murzin, A. G. (2008) Data growth and its impact on the SCOP database: new developments. Nucleic Acids Res. 36, D419-D425. (PMID: 18000004)

SUPERFAMILY

Structural and functional annotation for all proteins and genomes based on a collection of hidden Markov models, which represent structural protein domains at the SCOP superfamily level

http://supfam.org/SUPERFAMILY/

Wilson, D., Pethica, R., Zhou, Y., Talbot, C., Vogel, C., Madera, M., Chothia, C., Gough, J. (2009) SUPERFAMILY--sophisticated comparative genomics, data mining, visualization and phylogeny. Nucleic Acids Res. 37, D380-D386. (PMID: 19036790)

Protein Folding

Protein Folding Database (PFD)

Repository of available experimental protein folding data

http://pfd.med.monash.edu.au/public_html/index.php

Fulton, K. F., Bate, M. A., Faux, N. G., Mahmood, K., Betts, C., Buckle, A. M. (2007) Protein Folding Database (PFD 2.0): an online environment for the International Foldeomics Consortium. Nucleic Acids Res. 35, D304-D307. (PMID: 17170010)

KineticDB

Experimental data on protein folding kinetics

http://kineticdb.protres.ru/db/index.pl

Bogatyreva, N. S., Osypov, A. A., Ivankov, D. N. (2009) KineticDB: a database of protein folding kinetics. Nucleic Acids Res. 37, D342-D346. (PMID: 18842631)

Protein Modification

RESID

Collection of annotations and structures for Protein Pre-, Co- and Post-translational modifications

http://www.ebi.ac.uk/RESID/

Garavelli, J. S. (2004) The RESID Database of Protein Modifications as a resource and annotation tool. Proteomics 4, 1527-1533. (PMID: 15174122)

Phospho3D

3D structures of phosphorylation sites which stores information retrieved from the phospho.ELM database

http://cbm.bio.uniroma2.it/phospho3d/

Zanzoni, A., Ausiello, G., Via, A., Gherardini, P. F., Helmer-Citterich, M. (2007) Phospho3D: a database of three-dimensional structures of protein phosphorylation sites. Nucleic Acids Res. 35, D229-D231. (PMID: 17142231)

Function

Inter-Molecular interactions

IntAct

Protein interaction data from literature and user submission

http://www.ebi.ac.uk/intact/main.xhtml

Aranda, B., Achuthan, P., Alam-Faruque, Y., Armean, I., Bridge, A., Derow, C., Feuermann, M., Ghanbarian, A. T., Kerrien, S., Khadake, J., Kerssemakers, J., Leroy, C., Menden, M., Michaut, M., Montecchi-Palazzi, L., Neuhauser, S. N., Orchard, S., Perreau, V., Roechert, B., van Eijk, K., Hermjakob, H. (2009) The IntAct molecular interaction database in 2010. Nucleic Acids Res. Oct 22. [Epub ahead of print]. (PMID: 19850723)

Database of Interacting Proteins (DIP)

Experimentally determined protein-protein interactions

http://dip.doe-mbi.ucla.edu/dip/Main.cgi

Salwinski, L., Miller, C. S., Smith, A. J., Pettit, F. K., Bowie, J.U., Eisenberg, D.  (2004) The Database of Interacting Proteins: 2004 update. Nucleic Acids Res. 32, D449-D451. (PMID: 14681454)

Reactome

A curated knowledgebase of biological pathways

http://www.reactome.org/

Matthews, L., Gopinath, G., Gillespie, M., Caudy, M., Croft, D., de Bono, B., Garapati, P., Hemish, J., Hermjakob, H., Jassal, B., Kanapin, A., Lewis, S., Mahajan, S., May, B., Schmidt, E., Vastrik, I., Wu, G., Birney, E., Stein, L., D'Eustachio, P. (2009) Reactome knowledgebase of human biological pathways and processes. Nucleic Acids Res. 37, D619-D622. (PMID: 18981052)

Biological General Repository for Interaction Datasets (BioGRID)

Collections of protein and genetic interactions from major model organism species

http://www.thebiogrid.org

Breitkreutz, B. J., Stark, C., Reguly, T., Boucher, L., Breitkreutz, A., Livstone, M., Oughtred, R., Lackner, D. H., Bähler, J., Wood, V., Dolinski, K., Tyers, M. (2008) The BioGRID Interaction Database: 2008 update. Nucleic Acids Res. 36, D637-D640. (PMID: 18000002)

Metabolic Pathways

Kyoto Encyclopedia of Genes and Genomes (KEGG)

Pathway maps on the molecular interaction and reaction networks for Metabolism

 

http://www.genome.jp/kegg/pathway.html

Kanehisa, M., Goto, S. (2000) KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28, 27-30. (PMID: 10592173)

BioCyc

Pathway/Genome Databases (PGDBs) on the pathways and genomes of different organisms

http://biocyc.org/

Caspi, R., Foerster, H., Fulcher, C. A., Kaipa, P., Krummenacker, M., Latendresse, M., Paley, S., Rhee, S. Y., Shearer, A., Tissier, C., Walk, T. C., Zhang, P. and Karp, P. D. (2008) The MetaCyc Database of metabolic pathways and enzymes and the BioCyc collection of Pathway/Genome Databases. Nucleic Acids Research 36, D623-D631. (PMID: 17965431)

MetaCyc

Non-redundant, experimentally elucidated metabolic pathways

http://metacyc.org/

Caspi, R., Foerster, H., Fulcher, C. A., Kaipa, P., Krummenacker, M., Latendresse, M., Paley, S., Rhee, S. Y., Shearer, A., Tissier, C., Walk, T. C., Zhang, P. and Karp, P. D. (2008) The MetaCyc Database of metabolic pathways and enzymes and the BioCyc collection of Pathway/Genome Databases. Nucleic Acids Research 36, D623-D631. (PMID: 17965431)

Integrative

Michigan Molecular Interactions (MiMI)

Merged view of several popular interaction databases including: BIND, HPRD, IntAct, GRID, and others

http://mimitest.ncibi.org/MimiWeb/main-page.jsp

Tarcea, V. G., Weymouth, T., Ade, A., Bookvich, A., Gao, J., Mahavisno, V., Wright, Z., Chapman, A., Jayapandian, M., Ozgür, A., Tian, Y., Cavalcoli, J., Mirel, B., Patel, J., Radev, D., Athey, B., States, D., Jagadish, H. V. (2009) Michigan molecular interactions r2: from interacting proteins to pathways. Nucleic Acids Res. 37, D642-D646. (PMID: 18978014)

Proteomics

Gel Electrophoresis

WORLD-2DPAGE Constellation

List of 2-D PAGE database servers, World-2DPAGE Portal that queries simultaneously world-wide proteomics databases, and World-2DPAGE Repository

http://world-2dpage.expasy.org/

Hoogland, C., Mostaguir, K., Appel, R. D., Lisacek, F. (2008) The World-2DPAGE Constellation to promote and publish gel-based proteomics data through the ExPASy server. J Proteomics 71, 245-248. (PMID: 18617148)

 

 

Mass Spectrometry

Global Proteome Machine Database (GPMDB)

Mass spectral library for data from a variety of organisms, the identified peptides are matched to the Ensembl genome database

http://www.thegpm.org/GPMDB/index.html

Craig, R., Cortens, J. C., Fenyo, D., Beavis, R. C. (2008) Using annotated peptide mass spectrum libraries for protein identification. J. Proteome Res. 5, 1843-1849. (PMID: 16889405)

PRoteomics IDEntifications database  (PRIDE)

Protein and peptide identifications that have been described in the scientific literature together with the evidence supporting these identifications

http://www.ebi.ac.uk/pride/

Vizcaíno, J. A., Côté, R., Reisinger, F., Barsnes, H., Foster, J. M., Rameseder, J., Hermjakob, H., Martens, L. (2009) The Proteomics Identifications database: 2010 update. Nucleic Acids Res. Nov 11. [Epub ahead of print] (PMID: 19906717)

PeptideAtlas

Peptides identified in a large set of LC-MS/MS proteomics experiments

http://www.peptideatlas.org/

Deutsch, E. W., Lam, H., Aebersold, R.  (2008) PeptideAtlas: a resource for target selection for emerging targeted proteomics workflows. EMBO Rep. 9, 429-434. (PMID: 18451766)

Peptidome

Tandem mass spectrometry peptide and protein identification data generated by the scientific community

http://www.ncbi.nlm.nih.gov/peptidome/

 

Slotta, D. J., Barrett, T., Edgar, R. (2009) NCBI Peptidome: a new public repository for mass spectrometry peptide identifications. Nat Biotechnol. 27, 600-601. (PMID: 19587658)