Cathy H. Wu, Ph.D. [PIR - Protein Information Resource]

Home About PIR Databases Search/Retrieval Download Support

HOME / About / Staff Members / C.H. Wu

		Cathy H. Wu, Ph.D.
		Professor -	Department of Biochemistry and Molecular & Cellular Biology
			Department of Oncology
		Director -	Protein Information Resource
			Georgetown University Medical Center

			wuc@georgetown.edu

10/16/01- Dr. Wu appears in The Scientist
Cathy Wu at the Crossroads: She saved the Protein Information Resource database and now aims to restore it to the world's best
Full article

Primary Expertise

Dr. Wu has conducted bioinformatics research since 1990 and developed several protein classification systems and databases. She has managed large software and database projects, led the bioinformatics effort of the Protein Information Resource (PIR) since 1999, and becoming the PIR Director in 2001. Her research interests include protein family classification and functional annotation, biological data integration, and literature mining.

Academic Appointments

1989-1994 Assistant Professor, Department of Computer Science, University of Texas at Tyler

1990-1999 Assistant Professor (90-94); Associate Professor (94-98); Professor (98-99) of Biomathematics University of Texas Health Center at Tyler

1999-2002 Director of Bioinformatics, PIR (99-02); Vice President (00-02), National Biomedical Research Foundation, Washington, D.C.

2001-present Professor, Department of Biochemistry & Molecular Biology; Director, PIR, Georgetown University Medical Center (GUMC)

2002-present Professor, Department of Oncology; Member, Lombardi Comprehensive Cancer Center, GUMC

Professional Activities

Member, Advisory Committee, Protein Structure Initiative, NIGMS, NIH (2002-present).

Member, Board of Directors, International Society for Computational Biology (2002-2004).

Over 15 Conference Organizing/Program Committees, including: ISMB, PSB, EITC, CBGI, BIOKDD

Over 20 Grant Review Panels/Study Sections at NIH, NSF, and DOE

Over 70 Invited Presentations/Lectures at international conferences, workshops, academia, and industry

Education

B.S., Plant Pathology, National Taiwan University, Taiwan, 1978

M.S., Plant Pathology, Purdue University, W. Lafayette, IN. 1982

Ph.D., Molecular Plant Pathology, Purdue University, W. Lafayette, IN. 1984

Post. Doc., Molecular Biology, Michigan State University, E. Lansing, MI, 1986

M.S., Computer Science. University of Texas at Tyler, Tyler, TX. 1989

Patent

United States Patent No. 5,845,049, December 1, 1998, C. H. Wu. A neural network system with n-gram term weighting method for molecular sequence classification and motif identification

Publications

BOOK: Wang, J., Wu, C. H. and Wang, P. (Editors) (2003).Computational Biology and Genome Informatics. World Scientific, 2003.

BOOK: Wu, C. H. and McLarty, J. M. (2000). Neural Networks and Genome Informatics. Methods in Computational Biology and Biochemistry, Volume 1, Series Editor A. K. Konopka, Elsevier Science. ISBN 0 08 042800 2

Mazumder R., Hu Z.Z., Vinayaka C.R., Sagripanti J.L., Frost S.D., Kosakovsky Pond S.L. and Wu C.H. (2007). Computational analysis and identification of amino acid sites in dengue E proteins relevant to development of diagnostics and vaccines.
Virus Genes [EPub ahead of print].

Huang H., Hu Z.Z., Arighi C.N., Wu C.H. (2007). Integration of bioinformatics resources for functional analysis of gene expression and proteomic data. Front Biosci., 12: 5071-5088.

Suzek B.E., Huang H., McGarvey P., Mazumder R., Wu C.H. (2007). UniRef: comprehensive and non-redundant UniProt reference clusters. Bioinformatics, 23(10):1282-1288.

Huang H., Shukla H., Saxena S., Wu C.H. (2007). Challenges and solutions in proteomics. Current Genomics, 8 (in press).

Mulder N.J., Apweiler R., Attwood T.K., Bairoch A., et al, Wu C.H., Yates C. (2007). New developments in the InterPro database. Nucleic Acids Res., 35(Database issue):D224-228.

UniProt Consortium (2007). The Universal Protein Resource (UniProt). Nucleic Acids Res., 35(Database issue):D193-7.

Torii M., Liu H.F., Hu Z.Z. and Wu C.H. (2006). A comparison study of biomedical short form definition detection algorithms. Proceedings of ACM First International Workshop on Text Mining in Bioinformatics, TMBIO 2006.

Natale D.A., Arighi C.N., Barker W., Blake J., Chang T., Hu Z.Z., Liu H., Smith B., Wu C.H. (2006). Framework for a Protein Ontology Proceedings of ACM First International Workshop on Text Mining in Bioinformatics, TMBIO 2006.

Qiu P., Wang J., Ray Liu K.J., Hu Z.Z., Wu C.H. (2006). Dependence network modeling for biomarker identification.
Bioinformatics, 23:198-206.

Hu Z.Z., Valencia J.C., Huang H., Chi A., Shabanowitz J., Hearing V.J., Appella E., Wu C.H. (2006). Comparative bioinformatics analyses and profiling of lysosome-related organelle proteomes. Int J Mass Spec, 259:147-160.

Chi A., Valencia J.C., Hu Z.Z., Watabe H., Yamaguchi H., Mangini N.J., Huang H., Canfield V.A., Cheng K.C., Yang F., Abe R., Yamagishi S., Shabanowitz J., Hearing V.J., Wu C.H., Appella E., Hunt D.F. (2006). Proteomic and Bioinformatic Characterization of the Biogenesis and Function of Melanosomes. J Proteome Res, 5:3135-3144.

Liu H., Hu Z.Z., Torii M., Wu C.H., Friedman C.(2006). Quantitative Assessment of Dictionary-based Protein Named Entity Tagging. J Am Med Inform Assoc, 13:497-507, 2006.

Han B., Obradovic Z., Hu Z.Z., Wu C.H., Vucetic S.(2006). Substring selection for biomedical document classification. Bioinformatics, 22:2136-42.

Petrova N.V., Wu C.H. (2006). Prediction of catalytic residues using Support Vector Machine with selected protein sequence and structural properties. BMC Bioinformatics, 7:312.

Yuan X., Hu Z.Z., Wu H.T., Torii M., Narayanaswamy M., Ravikumar K.E., Vijay-Shanker K., Wu C.H. (2006). An online literature mining tool for protein phosphorylation.Bioinformatics, 22(13):1668-1669.

Nikolskaya A.N., Arighi C.N., Huang H., Barker W.C., Wu C.H. (2006).PIRSF Family Classification System for Protein Functional and Evolutionary Analysis. Evolutionary Bioinformatics Online, 2:209-221.

Liu, H.F., Hu, Z.Z., Zhang, J., Wu, C.H. (2006). BioThesaurus: a web-based thesaurus of protein and gene names. Bioinformatics, 22, 103-105.

Wu, C.H., Apweiler, R., Bairoch, A., Natale, D.A., Barker, W.C., Boeckmann, B., Ferro, S., Gasteiger, E., Huang, H., Lopez, R., Magrane, M., Martin, M.J., Mazumder, R., O'donovan, C., Redaschi, N., Suzek, B. (2006). The Universal Protein Resource (UniProt): an expanding universe of protein information. Nucleic Acids Research, 34, D187-91.

Liu, H., Hu, Z.Z., Wu, C.H. (2005). DynGO: a tool for visualizing and mining of Gene Ontology and its associations . BMC Bioinformatics, 6, 201.

Mazumder, R., Natale, D., Murthy, S., Thiagarajan, R., Wu, C.H. (2005). Computational identification of strain-, species- and genus-specific proteins. BMC Bioinformatics, 6, 279.

Schneider, M., Bairoch, A., Wu, C.H., Apweiler, R. (2005). Plant Protein Annotation in the UniProt Knowledgebase Plant Physiology, 138, 59-66.

Hu, Z.Z., Narayanaswamy, M., Ravikumar, K.E., Vijay-Shanker, K., Wu, C.H. (2005). Literature mining and database annotation of protein phosphorylation using a rule-based system Bioinformatics, 21(11), 2759-2765.

Mani I., Hu Z., Jang S.B., Samuel K., Krause M., Phillips J., Wu C.H. (2005). Protein name tagging guidelines: lessons learned. Comparative and Functional Genomics, 6(1-2), 72-76.

Natale, D. A., Vinayaka, C. R. and Wu, C. H. (2005). Large-scale, classification-driven, rule-based functional annotation of proteins. Wiley, New York.

Wu, C.H., Huang, H., Nikolskaya, A., Vinayaka, C. R., Chung, S., Zhang, J. (2005). Family Classification and Integrative Associative Analysis for Protein Functional Annotation in Bioinformatics: New Research. Nova Publishers, New York.

Bairoch, A., Apweiler, R., Wu, C. H., Barker, W. C., Boeckmann, B., Ferro, S., Gasteiger, E., Huang, H., Lopez, R., Magrane, M., Martin, M.J., Natale, D.A., O'Donovan, C., Redaschi, N., Yeh, L.S. (2005). The Universal Protein Resource (UniProt).Nucleic Acids Research, 33: D154-159.

Wu, C. H. and Nebert, D. W. (2004). Update on human genome completion and annotations: Protein Information Resource. Human Genomics, 1, 229-233.

Wu, C. H., Huang, H., Nikolskaya, A., Hu, Z. and Barker, W. C. (2004). The iProClass integrated database for protein functional analysis. Computational Biology and Chemistry, 28, 87-96.

Wu, C. H., Nikolskaya, A., Huang, H., Yeh, L.-S., Natale, D., Vinayaka, C. R., Hu, Z., Mazumder, R., Kumar, S., Kourtesis, P., Ledley, R. S., Suzek, B. E., Arminski, L., Chen, Y., Zhang, J., Cardenas, J. L., Chung, S., Castro-Alvear, J., Dinkov, G. and Barker, W. C. (2004). PIRSF family classification system at the Protein Information Resource. Nucleic Acids Research, 32, D112-114.

Apweiler, R., Bairoch, A. and Wu, C. H. (2004). Protein sequence databases. Current Opinion in Chemical Biology, 8, 76-80.

Apweiler R, Bairoch A, Wu, C. H., Barker, W. C., Boeckmann, B., Ferro1, S., Gasteiger, E., Huang, H., Lopez, R., Magrane, M., Martin, M. J., Natale, D. A., O Donovan, C., Redaschi, N., Yeh, L. S. (2004). UniProt: Universal Protein Knowledgebase. Nucleic Acids Research, 32, D115-119.

Hu, Z., Mani, I., Hermoso, V., Liu, H. and Wu, C. H. (2004). iProLINK: an integrated protein resource for literature mining. Computational Biology and Chemistry, 28, 409-416.

Wu, C. H., Yeh, L.-S., Huang, H., Arminski, L., Castro-Alvear, J., Chen, Y., Hu, Z., Kourtesis, P., Ledley, R. S., Suzek, B.E., Vinayaka, C.R., Zhang, J. and Barker, W.C. (2003). The Protein Information Resource. Nucleic Acids Research, 31, 345-347.

Huang, H., Barker, W. C., Chen, Y. and Wu, C. H. (2003). iProClass: An Integrated Database of Protein Family, Function, and Structure Information. Nucleic Acids Research, 31, 390-392.

Wu, C. H., Huang, H., Yeh, L.-S. and Barker, W. C. (2003). Protein family classification and functional annotation. Computational Biology and Chemistry, 27, 37-47.

Wu, C. H., Huang, H., Arminski, L., Castro-Alvear, J., Chen, Y., Hu, Z., Ledley, R. S., Lewis, K. C., Mewes, H. W., Orcutt, B. C., Suzek, B. E., Tsugita, A., Vinayaka, C. R., Yeh, L. S., Zhang, J. and Barker, W. C. (2002). The Protein Information Resource: an integrated public resource of functional annotation of proteins. Nucleic Acids Research, 30, 35-37.

Wu, C.H., Xiao, C., Hou, Z., Huang, H., and Barker, W. C. (2001). iProClass: An integrated and comprehensive protein classification database. Nucleic Acids Research, 29, 52-54.

McGarvey, P., Huang, H., Barker, W. C., Orcutt, B. C. and Wu, C. H. (2000). PIR Web site: New resource for bioinformatics. Bioinformatics, 16, 290-291.

Wu, C. H., Huang, H. and McLarty, J. (1999). Gene family identification network design for protein sequence analysis. International Journal of Artificial Intelligence Tools, 8, 419-432.

Wu, C. H., Shivakumar, S. and Huang, H. (1999). ProClass protein family database. Nucleic Acids Research, 27, 272-274.

Barker, W. C., Garavelli, J. S,, McGarvey, P. B, Marzec, C. R., Orcutt, B. C., Srinivasarao, G. Y., Yeh, L. S., Ledley, R. S., Mewes, H. W., Pfeiffer, F., Tsugita, A. and Wu, C. H. (1999). The PIR-International Protein Sequence Database. Nucleic Acids Research, 27, 39-43.

Wu, C. H., S. Shivakumar, C. V. Shivakumar and S. Chen. (1998). GeneFIND web server for protein family identification and information retrieval. Bioinformatics, 14, 223-224.

Wu, C. H. (1997). Artificial neural networks for molecular sequence analysis. Computers & Chemistry, 21, 237 - 256.

Wu, C. H., Chen, H. L. and Chen, S. (1997). Counter-propagation neural networks for molecular sequence classification: Supervised LVQ and dynamic node allocation. Applied Intelligence, 7, 27-38.

Wu, C. H., Zhao, S. and Chen, H. L. (1996). A protein class database organized with ProSite protein groups and PIR superfamilies. Journal of Computational Biology, 3, 547-562.

Wu, C. H., Zhao, S., Chen, H. L., Lo, C. J. and McLarty, J. (1996). Motif identification neural design for rapid and sensitive protein family search. CABIOS, 12, 109-118.

Wu, C. H. (1996). Gene Classification Artificial Neural System. Methods In Enzymolog, 266, 71-88.

Wu, C. H., Berry, M., Shivakumar, S. and McLarty, J. (1995). Neural networks for full-scale protein sequence classification: Sequence encoding with singular value decomposition. Machine Learning, 21, 177-193.

Wu, C. H. and Shivakumar, S. (1994). Back-propagation and counter-�propagation neural networks for phylogenetic classification of ribosomal RNA sequences. Nucleic Acids Research, 22, 4291-4299.

Wu, C. H., Whitson, G., McLarty, J., Ermongkonchai, A. and Chang, T. (1992). Protein classification artificial neural system. Protein Science, 1, 667-677.

Wu, C. H., Caspar, T., Browse, J., Lindquist, S. and Somerville, C.�(1988). Characterization of an HSP70 cognate gene family in Arabidopsis.� Plant Physiology, 88, 731-740.

Wu, C. H., Warren, H. L., Sitaraman, K. and Tsai, C. Y. (1988). Translational alterations in maize leaves responding to pathogen infection, paraquat treatment or heat shock. Plant Physiology, 86, 1323-1329.

Revised 07/13/07

Protein Information Resource