 |
|
|
 |
|
 |
 |
 |
Cloud-based Knowledge Environment in Protein Information Resource
|
This project aims to develop a cloud-based knowledge environment for scalable semantic mining of scientific literature and integrative knowledge discovery in precision medicine, building upon our novel natural language processing (NLP) technologies and bioinformatics infrastructure at the Protein Information Resource (PIR) and UniProt as well as the Center for Bioinformatics and Computational Biology (CBCB) at University of Delaware. The founding for this cloud-based project is provided by NIH and powered by Amazon and IBM clouds.
|
 |
iTextMine
iTextMine is an Integrated Text Mining System for Large Scale Knowledge Extraction from Literature[1]. The system employs parallel processing for dockerized text mining tools with a common JSON output format, and implements a text alignment algorithm to align entity offsets in the text for result integration. The system currently contains four in-house developed relation extraction tools for phosphorylation, phosphorylation-dependent PPI, miRNA-gene regulation, and gene-disease-drug-response relations. We have processed all Medline abstracts for the four tools. A website is built to allow users to browse the text evidence and view integrated results for knowledge discovery through a network visualization.
iTextMine in AWS
iTextMine in IBM Cloud
|
 |
iPTMnet
iPTMnet is a bioinformatics resource for integrated understanding of protein post-translational modifications (PTMs) in systems biology context[2]. It connects multiple disparate bioinformatics tools and systems text mining, data mining, analysis and visualization tools, and databases and ontologies into an integrated cross-cutting research resource to address the knowledge gaps in exploring and discovering PTM networks.
Here we provide the webservice API for retrieving the data from iPTMnet:
iPTMnet in AWS
iPTMnet in IBM Cloud
|
 |
Protein Ontology
Protein Ontology (PRO) provides an ontological representation of protein-related entities by explicitly defining them and showing the relationships between them [3]. Each PRO term represents a distinct class of entities (including specific modified forms, orthologous isoforms, and protein complexes) ranging from the taxon-neutral to the taxon-specific (e.g. the entity representing all protein products of the human SMAD2 gene is described in PR:Q15796; one particular human SMAD2 protein form, phosphorylated on the last two serines of a conserved C-terminal SSxS motif is defined by PR:000025934).
Here we provide a SPARQL end point server for retrieving data from Protein Ontology database. Users can also download PRO data files directly:
PRO SPARQL in AWS
PRO SPARQL in IBM Cloud
PRO data files in AWS
PRO data files in IBM Cloud
|
 |
Pepetide Match Server
The PIR Peptide Match service[4] is designed to quickly retrieve all occurrences of a given query peptide from UniProt Knowledgebase (UniProtKB) with isoforms. The matched proteins are shown in summary tables with rich annotations, including matched sequence region(s) and links to corresponding proteins in a number of proteomic/peptide spectral databases.
Here we provide a full function web-based tool with webservice API and stand alone programs:
Pepetide Match Server in AWS
Pepetide Match Server in IBM Cloud
|
 |
PIRSitePredict annotation tool
PIRSitePredict is a prediction tool for the position specific annotations based on PIR site rules. It uses InterProScan XML and organism information (Kingdom/Sub-taxon) as inputs, applies the bundled PIR Site Rules, Site HMMs and template sequences to predict the functional sites for the uncharacterized proteins matching InterPro and PIRSF signatures. The supported prediction result formats: TSV (Tab-separated values), XML and GFF3. PIRSitePreidct provides online prediction service and downloadable stand-alone software package. The online prediction service is a web application using Spring MVC 4, Thymeleaf, Bootstrap, and jQuery. The stand-alone software package is a Java command line application.
Here we provide a fully functional website and software package for download:
PIRSitePredict in AWS
PIRSitePredict in IBM Cloud
|
 |
UniProt Knowledgebase
The Universal Protein Resource (UniProt) is a comprehensive resource for protein sequence and annotation data[5]. The UniProt databases are the UniProt Knowledgebase (UniProtKB), the UniProt Reference Clusters (UniRef), and the UniProt Archive (UniParc). UniProt is a collaboration between the European Bioinformatics Institute (EMBL-EBI), the SIB Swiss Institute of Bioinformatics and the Protein Information Resource (PIR).
Here we provide a fully functional website with IDmapping and Peptide Match tools developed at PIR:
UniProt Website in AWS
We have deployed the Website in AWS in both US and UK.
|
 |
UniProt Reference Clusters
The UniProt Reference Clusters [6] (UniRef) provide clustered sets at three resolutions for more than 150M sequences from the UniProt Knowledgebase (including isoforms) and selected UniParc records.
We have successfully tested the clustering pipeline in the AWS environment.
|
References:
[1] Jia Ren, Peter McGarvey, Shruti Rao, Gang Li, K. Vijay-Shanker, Subha Madhavan, Cathy H. Wu. iTextMine: Integrated Text-mining System for Large-Scale Knowledge Extraction from Literature.
Database (Submitted).
[2] Hongzhan Huang, Cecilia N Arighi, Karen E Ross, Jia Ren, Gang Li, Sheng-Chih Chen, Qinghua Wang, Julie Cowart, K Vijay-Shanker, and Cathy H Wu. iPTMnet: an integrated resource for protein post-translational modification network discovery.
Nucleic Acids Res. 2018 Jan 4; 46(D1): D542-D550.
PMID:29145615
[3] Natale DA, Arighi CN, Blake JA, Bona J, Chen C, Chen SC, Christie KR, Cowart J, D'Eustachio P, Diehl AD, Drabkin HJ, Duncan WD, Huang H, Ren J, Ross K, Ruttenberg A, Shamovsky V, Smith B, Wang Q, Zhang J, El-Sayed A, Wu CH. Protein Ontology (PRO): enhancing and scaling up the representation of protein entities.
Nucleic Acids Res. 2017 Jan 4;45(D1):D339-D346.
PMID:27899649
[4] Chuming Chen; Zhiwen Li; Hongzhan Huang; Baris E. Suzek; Cathy H. Wu; UniProt Consortium. A fast Peptide Match Service for UniProt Knowledgebase.
Bioinformatics. 2013 Nov 1;29(21):2808-9.
PMID:23958731
[5] The UniProt Consortium. UniProt: the universal protein knowledgebase.
Nucleic Acids Res. 45: D158-D169 (2017)
PMID:27899622
[6] Baris E. Suzek, Yuqi Wang, Hongzhan Huang, Peter B. McGarvey, Cathy H. Wu, and the UniProt Consortium. UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches.
Bioinformatics. 2015 Mar 15; 31(6): 926-932.
PMID:25398609
|
|
 |
|
|
|
|