Object-Relational Protein Sequence Database
Chunlin Xiao, Lai-Su Yeh, Zhenling Hou, Bruce Orcutt, and Cathy WuABSTRACT
The Protein Information Resource (PIR) for over thirty years
has been maintaining and distributing the PIR-International Protein Sequence
Database (PSD), which is the most comprehensive, well-annotated, and non-redundant
public domain protein sequence database. In order to facilitate the annotation
process and assure database quality, while keeping pace with the large influx
of data being generated by genome sequencing projects, we are migrating the
PIR-PSD and other auxiliary databases to Oracle 8i object-relational database
management system from our home-grown legacy system on VAX/VMS. We use both
relational and object models for database design based on ER and UML modeling,
and adopt a three-tier networking architecture for database implementation.
Flat files are generated for distribution, including the new XML format planned
for our next quarterly release. A Java-based user-friendly web interface has
been developed for querying the database and for supporting database update
in both record and batch modes. With this new object-relational database system,
we have greatly improved the data organization, data consistency and integrity,
information retrieval, database scalability, maintainability, and interoperability
of our databases. This work is supported in part by NIH Grant # P41 LM05798.
Presented at the 4th Annual Conference on Computational Genomics, Baltimore, MD, 2000.