THE PROTEIN INFORMATION RESOURCE DATABASES FOR GENOMIC RESEARCH

Peter B. McGarvey, Geetha Y. Srinivasarao, John S. Garavelli, Cathy Wu and Winona C. Barker

National Biomedical Research Foundation, Georgetown University Medical Center,
3900 Reservoir Road N.W., Washington DC 20007


ABSTRACT

The Protein Information Resource (PIR) supports research on molecular evolution, functional genomics, and computational biology by maintaining a comprehensive, non-redundant, well-classified, and freely available protein sequence database. Data from whole genome sequencing projects are incorporated into the database and sequence analysis tools are applied to the database to classify all entries into families, superfamilies and homology domains. This comprehensive classification effort allows large-scale annotation at the family level, detection of potential sequence errors and identification of redundant entries to be merged. PIR has been able to improve genomic sequence reports by identifying and annotating correct translation initiation sites, translational frameshifts, and translational stop codon exceptions such as selenocysteine. Sequence entries are extensively cross-referenced to major nucleic acid, literature, genome, structure, sequence alignment and family databases. PIR maintains several auxiliary databases to help annotation and for integrity checking. These include: PIR-ALN, containing alignments of superfamilies, families and homology domains; FAMBASE, a searchable database of family representatives; and the RESID Database of covalent protein modifications. All the Databases can be accessed on the PIR Web site (http://www-nbrf.georgetown.edu/pir/) and contain hypertext-links to each other and relevant external databases. The Web site is being redesigned to include new BLAST similarity search engines and pattern matching capabilities. The latest quarterly release of the databases can be accessed through the ATLAS multi-database retrieval software on the Atlas CD-ROM and downloaded by FTP


INTRODUCTION TO PIR

The Protein Information Resource (PIR) was established in 1984 by the National Biomedical Research Foundation (NBRF). The PIR Protein Sequence Database evolved from the original NBRF Protein Sequence Database, developed over 20 years by the late Margaret O. Dayhoff and published as the "Atlas of Protein Sequence and Structure". PIR-International is a collaboration between NBRF, the Munich Information Center for Protein Sequences (MIPS), and the Japan International Protein Information Database (JIPID) to collect and publish what is now the oldest and largest database of biomolecular sequence, source, bibliographic, and feature information.


THE PIR PROTEIN DATABASES

PIR-International Protein Sequence Database: an annotated, non-redundant and cross-referenced database of protein sequences.

PIR Alignment Database, PIR-ALN: contains sequence alignments of superfamilies, families and homology domains produced from information in the Protein Sequence Database.

NRL_3D Sequence--Structure Database: produced from sequences and annotation in the Protein DataBank of three-dimensional structures.

RESID Database of Amino Acid Modifications: based on feature information in the Protein Sequence Database.

PATCHX Protein Sequence Database: derived from publicly available databases and contains protein sequences and associated information not yet included in the PIR-International Protein Sequence Database.

FAMBASE Family Database: a searchable database containing a single representative sequence from each protein family.

MIPS Alignment Database, MIPSALN: contains automatically generated alignments of families having 2 or more sequences. MIPSALN is a subset of the PROT-FAM database produced by our collaborators at MIPS.


IMPROVED COMPLETE GENOMES

PIR incorporates data from genome sequencing projects following publication and submission to the major nucleotide sequence databases. A FASTA comparison of each individual entry with all other entries in the database is preformed to identify similar sequences and independent reports of the same protein from the same species are merged into a single entry. During this process PIR has improved a number of genome sequence reports by:

The new Genomes page on the PIR Web site provides access to all the completed genomic sequences. We are working towards placing all genome entries into superfamilies and providing alignments for all families with 2 or more members. This will provide an excellent tool for researchers to work on comparative genomics. Statistics on the classification of the complete genomes in PIR are shown in Table 1.

Note: ORFs that require translational frameshifting or readthrough of termination codons have been identified and reported by several genome sequencing projects. However, these ORFs are not always translated in the major nucleotide sequence datatabases. PIR provides translations and annotates these sequences.


STATISTICS OF COMPLETED GENOMES IN PIR
Species # Entries # Placed
(superfamilies)
# MIPSALN
(family alignments)
Aquifex aeolicus 1517 681 250
Archaeoglobus fulgidus 2398 931 386
Bacillus subtilis 4212 1700 742
Borrelia burgdorferi (Lyme disease spirochete) 1417 445 147
Chlamydia trachomatis 999 74 36
Escherichia coli 5315 2705 1654
Haemophilus influenzae 1773 983 920
Helicobacter pylori 1583 537 212
Methanobacterium thermoautotrophicum 1963 913 491
Methanococcus jannaschii 1736 810 445
Mycobacterium tuberculosis 3950 38 28
Mycoplasma genitalium 467 264 411
Mycoplasma pneumoniae 679 299 434
Pyrococcus horikoshii 2061 26 9
Synechocystis sp. PCC6803 3151 1040 517
Treponema pallidum (syphilis spirochete) 1040 17 4


THE ATLAS OF PROTEIN AND GENOMIC SEQUENCES CD-ROM

The PIR-International Protein Sequence Database, PIR-ALN, NRL_3D, RESID, and PATCHX databases are available on the "ATLAS of Protein and Genomic Sequences CD-ROM". Included on the CD-ROM is the ATLAS database retrieval software.

The ATLAS Multidatabase Information Retrieval Program is designed to provided simultaneous access to many macromolecular sequence, alignment, and auxiliary databases. Fields such as Title, Journal, Author, Species, Keyword, Superfamily, Feature, and others are indexed to allow fast retrieval. ATLAS maintains a current list of search results that can be refined by additional search commands. A powerful pattern matching command is included to search for sequence patterns in the current list. Several display options are available and the command line interface is easy to use. The User's Guide for the ATLAS program is included on the CD-ROM. The ATLAS program written in C currently runs on PC-DOS, VAX/VMS, OpenVMS, DEC UNIX, SunOS, SGI/IRIX, and Macintosh systems.


SAMPLE PIR-INTERNATIONAL ENTRY

PIR1 section of the Protein Sequence Database, release 58.02, 23-Oct-1998, assembled and annotated by the PIR-International. Create a submission form for RGECGG

PIR1:RGECGG
nitrogen regulation protein I ntrC - Escherichia coli

Species: Escherichia coli

Date: 31-Mar-1990 #sequence_revision 31-Mar-1990 #text_change 17-Jul-1998

Accession: B30377; S40813; G65191; Q90553

Miranda-Rios, J.; Sanchez-Pescador, R.; Urdea, M.; Covarrubias, A.A.
    Nucleic Acids Res. 15, 2757-2770, 1987
    Title: The complete nucleotide sequence of the glnALG operon of
      Escherichia coli K12.
    Reference number: A30377; MUID:87174797
    Accession: B30377
    Molecule type: DNA
    Residues: 1-468 
    Cross-references: EMBL:X05173; NID:g41562; PID:g41565
    Experimental source: strain K-12

Plunkett III, G.; Burland, V.; Daniels, D.L.; Blattner, F.R.
    Nucleic Acids Res. 21, 3391-3398, 1993
    Title: Analysis of the Escherichia coli genome. III. DNA sequence of
      the region from 87.2 to 89.2 minutes.
    Reference number: S40802
    Accession: S40813
    Status: nucleic acid sequence not shown; translation not shown
    Molecule type: DNA
    Residues: 1-141,'GEA',144-468 
    Cross-references: EMBL:L19201; NID:g304961; PID:g304973
    Experimental source: strain K-12, substrain MG1655
Note: the nucleotide sequence was submitted to the EMBL Data Library, October 1993
 Blattner, F.R.; Plunkett III, G.; Bloch, C.A.; Perna, N.T.; Burland, V.;
Riley, M.; Collado-Vides, J.; Glasner, J.D.; Rode, C.K.; Mayhew, G.F.;
Gregor, J.; Davis, N.W.; Kirkpatrick, H.A.; Goeden, M.A.; Rose, D.J.;
Mau, B.; Shao, Y.
    Science 277, 1453-1462, 1997
    Title: The complete genome sequence of Escherichia coli K-12.
    Reference number: A64720; MUID:97426617
    Accession: G65191
    Status: nucleic acid sequence not shown; translation not shown
    Molecule type: DNA
    Residues: 1-141,'GEA',144-468 
    Cross-references: GB:AE000462; GB:U00096; NID:g1790295;
      PID:g1790299; UWGP:b3868
    Experimental source: strain K-12, substrain MG1655

Genetics:
    Gene: glnG; ntrC; glnT
    Map position: 87 min

Function:
    Description: de-uridylylated P-II forms a complex with nitrogen
    regulation protein II (ntrB); ntrB, when complexed with
    de-uridylylated P-II, dephosphorylates nitrogen regulation protein I 
    (ntrC); the uridylylated form of P-II does not complex with ntrB;
    free ntrB phosphorylates nitrogen regulation protein I (ntrC)
    Note: phosphorylated nitrogen regulation protein I (ntrC) activates
     transcription of the glutamine synthase (glnA) gene via interaction
     with sigma-54 factor (DNA-looping) for transcription activation:
     assembly of a multimeric ntrC complex at the enhancer DNA sequence

Superfamily: nitrogen assimilation regulatory protein ntrC; response
    regulator homology; RNA polymerase sigma factor interaction domain
    homology

Keywords: ATP; DNA binding; P-loop; phosphoprotein; signal transduction;
    transcription regulation


Residues            Feature
6-115               Domain: response regulator homology 
140-361             Domain: RNA polymerase sigma factor interaction
                           domain homology 
167-174             Region: nucleotide-binding motif A (P-loop) #status
                      	atypical
234-238             Region: nucleotide-binding motif B
54                  Binding site: phosphate (Asp) (covalent) #status
                      	predicted

Summary: #length 468 #molecular_weight 52196


              5        10        15        20        25        30
    1 M Q R G I V W V V D D D S S I R W V L E R A L A G A G L T C
   31 T T F E N G A E V L E A L A S K T P D V L L S D I R M P G M
   61 D G L A L L K Q I K Q R H P M L P V I I M T A H S D L D A A
   91 V S A Y Q Q G A F D Y L P K P F D I D E A V A L V E R A I S
  121 H Y Q E Q Q Q P R N V Q L N G P T T D I I A K P A M Q D V F
  151 R I I G R L S R S S I S V L I N G E S G T G K E L V A H A L
  181 H R H S P R A K A P F I A L N M A A I P K D L I E S E L F G
  211 H E K G A F T G A N T I R Q G R F E Q A D G G T L F L D E I
  241 G D M P L D V Q T R L L R V L A D G Q F Y R V G G Y A P V K
  271 V D V R I I A A T H Q N L E Q R V Q E G K F R E D L F H R L
  301 N V I R V H L P P L R E R R E D I P R L A R H F L Q V A A R
  331 E L G V E A K L L H P E T E A A L T R L A W P G N V R Q L E
  361 N T C R W L T V M A A G Q E V L I Q D L P G E L F E S T V A
  391 E S T S Q M Q P D S W A T L L A Q W A D R A L R S G H Q N L
  421 L S E A Q P E L E R T L L T T A L R H T Q G H K Q E A A R L
  451 L G W G R N T L T R K L K E L G M E


ALIGNMENTS containing RGECGG:
SA1144  nitrogen assimilation regulatory protein ntrC superfamily 2887.0
Associated Alignments:
DA1066  response regulator homology
DA1489  RNA polymerase sigma factor interaction domain homology

Related Links (Superfamily classification and Alignment):
Protein Classification for Entry=RGECGG at MIPS, Germany.
ProClass for Entry=RGECGG at Univ. of Texas, USA.


SAMPLE PIR-ALN ENTRY

in the PIR-ALN section of the Protein Alignment Database, release 21.02, 23-Oct-1998, assembled and annotated by the PIR-International.


PIRALN:SA1144
nitrogen assimilation regulatory protein ntrC superfamily 2887.0  
Date: 11-Aug-1994 #sequence_revision 05-Dec-1997 #text_change 20-Jun-1998

Members: RGECGG; RGKBCP; S42745; A26934; B26499; A38449; B33862
RGECGG  nitrogen regulation protein I ntrC - Escherichia coli
RGKBCP  nitrogen regulation protein I ntrC - Klebsiella pneumoniae
S42745  nitrogen assimilation regulatory protein ntrC - Azospirillum
        brasilense
A26934  nitrogen assimilation regulatory protein ntrC - Rhizobium
        meliloti
B26499  nitrogen assimilation regulatory protein ntrC - Bradyrhizobium
        sp.
A38449  regulatory protein algB - Pseudomonas aeruginosa
B33862  transcription regulator hydG - Escherichia coli
    Cross-references: PCF:A00579


Superfamily: nitrogen assimilation regulatory protein ntrC; response
    regulator homology; RNA polymerase sigma factor interaction domain
    homology
    Placement: 2887.0
    Other members: PL0151; S23901; S53024; I39494; I39719; S18622;
      S36203; B64992; S19606; C33586; B26981; S18625; A38533; S35232;
      S32951; A41896; S26601; A65033; S04376; S49540; S71029; B70195;
      H70320; C70396; D70315; S70529
    Cross-references: MIPSALN:M03321; PIRALN:DA1066; MIPSALN:M07032;
      PIRALN:DA1489; MIPSALN:M20642

Comment: This superfamily has 16 families and 33 members.

Keywords: DNA binding; phosphoprotein; transcription regulation

Other keywords: ATP; signal transduction; two-component regulatory system; P-loop

Alignment: #sequences 7 #positions 492
    [wide alignment display]


                10        20        30        40        50        60
RGECGG  MQRGIV-----WVVDDDSSIRWVLERALAGAGLTCTTFENGAEVLEALASKTPDVLLSDI
RGKBCP  MQRGIA-----WIVDDDSSIRWVLERALTGAGLSCTTFESGNEVLDALTTKTPDVLLSDI
S42745  MSARTI-----LVADDDRAIRTVLTQALARLGHEVRTTGNASTLWRWVADGQGDLIITDV
A26934  MTGATI-----LVADDDAAIRTVLNQALSRAGYDVRITSNAATLWRWIAAGDGDLVVTDV
B26499  MPAGSI-----LVADDDTAIRTVLNQALSRAGYEVRLTGNAATLWRWVSQGEGDLVITDV
A38449  METTSEKQGRILLVDDESAILRTFRYCLEDEGYSVATASSAPQAEALLQRQVFDLCFLDL
B33862  MTHDNID---ILVVDDDISHCTILQALLRGWGYNVALANSGRQALEQVREQVFDLVLCDV
conser  *    .     ...**. ......  .*  .*. . .  ..            *.   *.
consen  MxxxxI     LVVDDDxAIRTVLxxALxxAGYxVxTxxNAxxxxxxxxxxxxDLxxxDV

                70        80        90       100       110       120
RGECGG  RMPGMDGLALLKQIKQRHPMLPVIIMTAHSDLDAAVSAYQQGAFDYLPKPFDIDEAVALV
RGKBCP  RMPGMDGLALLKQIKQRHPMLPVIIMTAHSDLDAAVSAYQQGAFDYLPKPFDIDEAVALV
S42745  VMPDENGLDLIPRIKKIRPDLRIIVMSAQNTLITAVKAAERGAFEYLPKPFDLKELVSVV
A26934  VMPDENAFDLLPRIKKARPDLPVLVMSAQNTFMTAIKASEKGAYDYLPKPFDLTELIGII
B26499  VMPDENAFDLLPRIKKMRPNLPVIVMSAQNTFMTAIRPSERGAYEYLPKPFDLKELITIV
A38449  RLGEDNGLDVLAQMRVQAPWMRVVIVTAHSAVDTAVDAMQAGAVDYLVKPCSPDQLRLAA
B33862  RMAEMDGIATLKEIKALNPAIPVLIMTAYSSVETAVEALKTGALDYLIKPLDFDNLQATL
conser  ...  ......  ..   * .......* .   .*. .   ** .**.**.. ...   .
consen  RMPxxNGLDLLxxIKxxxPxLPVIIMTAxSxxxTAVxAxxxGAxDYLPKPFDxDELxxxV

               130       140       150       160       170       180
RGECGG  ERAI--SHYQEQQQPRNVQLNGPTTDIIAK-PAMQDVFRIIGRLSRSSISVLINGESGTG
RGKBCP  DRAI--SHYQEQQQPRNAPINSPTADIIGEAPAMQDVFRIIGRLSRSSISVLINGESGTG
S42745  ERALNSNTPPAALPADAGEAD-EQLPLIGRSPAMQEIYRVLARLMGTDLTVTITGESGTG
A26934  GRAL--AEPKRRPSKLEDDSQ-DGMPLVGRSAAMQEIYRVLARLMQTDLTLMITGESGTG
B26499  GRAL--AEPKERVSSPADDGEFDSIPLVGRSPAMQEIYRVLARLMQTDLTVMISGESGTG
A38449  AKQLEVRQLTARLEALEDEVRRQGDGLESHSPAMAAVLETARQVAATDANILILGESGSG
B33862  EKAL---AHTHSIDAETPAVTASQFGMVGKSPAMQHLLSEIALVAPSEATVLIHGDSGTG
conser   ...                      . . ..**.   .  ...  .. ...* *.**.*
consen  xRAL   xxxxxxxxxxxxxx xxxxLxGxSPAMQxxxRxxARLxxTDxTVLIxGESGTG

               190       200       210       220       230       240
RGECGG  KELVAHALHRHSPRAKAPFIALNMAAIPKDLIESELFGHEKGAFTGANTIRQGRFEQADG
RGKBCP  KELVAHALHRHSPRAKAPFIALNMAAIPKDLIESELFGHEKGAFTGANTVRQGRFEQADG
S42745  KELVARALHDYGKRRNGPFVAINMAAIPRELIESELFGHEKGAFTGATNRSTGRFEQAQG
A26934  KELVARALHDYGKRRNGPFVAINMAAIPRDLIESELFGHEKGAFTGAQTRSTGRFEQAEG
B26499  KELVARALHDYGRRRNGPFVAVNMAAIPRDLIESELFGHERGAFTGANTRASGRFEQAEG
A38449  KGELARAIHTWSKRAKKPQVTINCPSLTAELMESELFGHSRGAFTGATESTLGRVSQADG
B33862  KELVARAIHASSARSEKPLVTLNCAALNESLLESELFGHEKGAFTGADKRREGRFVEADG
conser  *...*.*.*  . *   *... *..... .*.*******..****** ..  **...*.*
consen  KELVARALHxxSxRxxxPFVAxNMAAIPxDLIESELFGHEKGAFTGAxTRxxGRFEQADG

               250       260       270       280       290       300
RGECGG  GTLFLDEIGDMPLDVQTRLLRVLADGQFYRVGGYAPVKVDVRIIAATHQNLEQRVQEGKF
RGKBCP  GTLFLDEIGDMPLDVQTRLLRVLADGQFYRVGGYAPVKVDVRIIAATHQNLELRVQEGKF
S42745  GTLFLDEIGDMPLEAQTRLLRVLQEGEYTTVGGRTPIKTDVRIVAATHRDLRTLIRQGLF
A26934  GTLFLDEIGDMPMDAQTRLLRVLQQGEYTTVGGRTPIRSDVRIVAATNKDLKQSINQGLF
B26499  GTLFLDEIGDMPMEAQTRLLRVLQQGEYTTVGGRTPIKTDVRIVAASNKDLRILIQQGLF
A38449  GTLFLDEIGDFPLTLQPKLLRFIQDKEYERVGDPVTRRADVRILAATNRDLGAMVAQGQF
B33862  GTLFLDEIGDISPMMQVRLLRAIQEREVQRVGSNQIISVDVRLIAATHRDLAAEVNAGRF
conser  **********...  *..***... ... .**.  ... ***. **.. .*   . .* *
consen  GTLFLDEIGDMPLxxQTRLLRVLQxGEYxRVGGxxPIKxDVRIxAATHxDLxxxVxQGxF

               310       320       330       340       350       360
RGECGG  REDLFHRLNVIRVHLPPLRERREDIPRLARHFLQVAARELGVEAKLLHPETEAALTRLAW
RGKBCP  REDLFHRLNVIRVHLPPLRERREDIPRLARHFLQIAARELGVEAKQLHPETEMALTRLAW
S42745  REDLFYRLCVVPIRLPPLRERTEDVPLLVRHFLNQCSAQ-GLPVKSIDQPAMDRLKRYRW
A26934  REDLYYRLNVVPLRLPPLRDRAEDIPDLVRHFVQQAEKE-GLDVKRFDQEALELMKAHPW
B26499  REDLFFRLNVVPLRVPPLRERIEDLPDLIRHFFSLAEKD-GLPPKKLDAQALERLKQHRW
A38449  REDLLYRLNVIVLNLPPLRERAEDILGLAERFLARFVKDYGRPARGFSEAAREAMRQYPW
B33862  RQDLYYRLNVVAIEVPSLRQRREDIPLLAGHFLQRFAERNRKAVKGFTPQAMDLLIHYDW
conser  *.**..**.*.   .*.**.* **.. *...*.. .    .   .     .   .    *
consen  REDLFYRLNVVxxxLPPLRERxEDIPxLARHFLQxAxxx GxxxKxxxxxAxxxLxxxxW

               370       380       390       400       410       420
RGECGG  PGNVRQLENTCRWLTVMAAGQEVLIQDLPGELFESTVAESTSQMQPDSWATL-LAQWADR
RGKBCP  PGNVRQLENTCRWLTVMAAGQEVLTQDLPSELFETAIPDNPTQMLPDSWATL-LGQWADR
S42745  PGNVRELENLVRRLAALYS-QEVIGLDVVEAELADTTPAAQPVEEPQGEG---LSAAVER
A26934  PGNVRELENLVRRLTALYP-QDVITREIIENELRSEIPDSPIEKAAARSGSLSISQAVEE
B26499  PGNVRELENLARRLAALYP-QDVITASVIDGEL---APPAVTSGSTATVGVDNLGGAVEA
A38449  PGNVRELRNVIERASIICNQELVDVDHLGFSAA-------QSASSAPRIGE-SLS-----
B33862  PGNIRELENAVERAVVLLTGEYISERELPLAIASTPIPLGQSQDIQP-------------
conser  ***.*.*.*  ...  .   . .    .         .           .   .
consen  PGNVRELENxxRRLxxLxx QxVxxxxLxxxxx    P  xxxxxxx  G   L

               430       440       450       460       470       480
RGECGG  ALRSGHQNLLSEAQP---------ELERTLLTTALRHTQGHKQEAARLLGWGRNTLTRKL
RGKBCP  ALRSGHQNLLSEAQP---------EMERTLLTTALRHTQGHKQEAARLLGWGRNTLTRKL
S42745  HLKDYFAAHKDGMPSNGLYDRVLREVERPLISLSLSATRGNQIKAAQLLGLNRNTLRKKI
A26934  NMRQYFASFGDALPPSGLYDRVLAEMEYPLILAALTATRGNQIKAADLLGLNRNTLRKKI
B26499  YLSSHFSGFPNGVPPPGLYHRILKEIEIPLLTAALAATRGNQIRAADLLGLNRNTLRKKI
A38449  ----------------------LEDLEKAHITAVM-ASSATLDQAAKTLGIDASTLYRKR
B33862  ----------------------LVEVEKEVILAALEKTGGNKTEAARQLGITRKTLLAKL
conser   .            .       . . *  ...... .. ..   ** .**  ..**  *
consen   L            P       L ExExxLITAAL ATxGNxxxAAxLLGxxRNTLxxKx

               490
RGECGG  KELGME
RGKBCP  KELGME
S42745  RDLDIQVVRGLK
A26934  RELGVSVYRSLA
B26499  RDLDIQVYRSGG
A38449  KQYGL
B33862  SR
conser    ..
consen  xxLG


Matrix:

                  Number of differences
                  1   2   3   4   5   6   7

1   RGECGG        .  36 269 272 274 318 278

2   RGKBCP        7   . 273 272 272 318 278

3   S42745       56  56   . 156 152 318 291

4   A26934       56  56  32   . 126 313 291

5   B26499       57  56  31  26   . 323 304

6   A38449       66  66  65  64  67   . 267

7   B33862       59  59  61  61  63  58   .

                    Percent difference


THE PIR WEB SITE

The World Wide Web provides the primary means to access the PIR-International Protein Sequence Database. The PIR home page is found at: http://www-nbrf.georgetown.edu/pir.

The PIR Web site is undergoing a major hardware and software upgrade. The new PIR home page is shown below. The upgraded Web site will be available to the public by December 1, 1998 and contain the following important features:

When viewing an entry on the PIR Web site, lists of other entries in the same superfamily or sharing the same keywords can be obtained, and alignments of the superfamily or homology domains in PIR-ALN can be displayed through hypertext-links. Sample PIR and PIRALN entries are show. In addition, the entry sequence, published sequence, and any tagged feature or homology domain can automatically be viewed and submitted for a BLAST search.


SUPERFAMILY AND HOMOLOGY DOMAIN CLASSIFICATION

SUMMARY


ACKNOWLEDGEMENTS

The work presented here is due to the team effort of all the staff members at PIR as well as our collaborators at MIPS and JIPID. The authors would like to thank C. Marzec and B. Orcutt for programming support, L. Yeh for annotation, L. Arminski and S. Shivakumar for system administration, H. Huang for setting up search engines on the web server, K. Sidman and D. Goins for administrative support and web page development.