home uniprot
 
       Home      About PIR     Databases      Search/Retrieval      Download      Support
HOME / Search / Peptide Match - Command Line Tool


Command Line Tool

A command line tool allows users to query the peptide sequences against their own customized protein sequence database.

The tool provides two major functionalities:

  1. Given a protein sequence database in FASTA format, create the Lucene index for it.
  2. Query the peptide sequences against the above index. The query can be:
    • A peptide sequence or a comma-separated list of peptide sequences or
    • A file in either FASTA format or a list of peptide sequences, one sequence per line.

From Native OS

The runnable jar can be downloaded at here. The source code is also availabe at here. The software is released under GNU General Public License.

Run from executable jar

$ java -jar PeptideMatchCMD_1.1.jar -h
Command line options: -h 
usage: java -jar PeptideMatchCMD_1.1.jar [options]
            Available options:
            ------------------
 -a,--action        The action to perform ("index" or "query").
 -d,--dataFile      The path to a FASTA file to be indexed.
 -e,--LeqI               Treat Leucine (L) and Isoleucine (I) as
                         equivalent (default: no).
 -f,--force              Overwrite the indexDir (default: no).
 -h,--help               Print this message.
 -i,--indexDir      The directory where the index is stored.
 -l,--list               The query peptide sequence file is a list of
                         peptide sequences, one sequence per line
                         (default: no).
 -o,--outputFile    The path to the query result file.
 -Q,--queryFile     The path to the query peptide sequence file in
                         either FASTA format or a list of peptide
                         sequences, one sequence per line.
 -q,--query         One peptide sequence or a comma-separated list of
                         peptide sequences.

Compile from source

$ unzip PeptideMatchCMD_src_1.1.zip
$ cd PeptideMatchCMD_src_1.1
$ ant
$ java -jar PeptideMatchCMD_1.1.jar -h

Tutorial

  • Creating Lucene index using a protein sequence database in FASTA format:
    $ java -jar PeptideMatchCMD_1.1.jar -a index -d uniprot_sprot.fasta -i sprot_index 
    Command line: -a index -d uniprot_sprot.fasta -i sprot_index 
    Indexing to directory "sprot_index" ...
    Indexing "uniprot_sprot.fasta" ...
    Indexing "uniprot_sprot.fasta" finished
    Time used: 00 hours, 06 mins, 31.215 seconds
    
  • Query a peptide sequence:
    $ java -jar PeptideMatchCMD_1.1.jar -a query -i sprot_index -q AAFGGSGGR -o out.txt 
    Command line: -a query -i sprot_index -q AAFGGSGGR -o out.txt 
    Quering...
    
    AAFGGSGGR	has 1 match
    
    Query is finished.
    The result is saved in "out.txt".
    Time used: 00 hours, 00 mins, 00.457 seconds
    
    $ cat out.txt 
    #Command line: -a query -i sprot_index -q AAFGGSGGR -o out.txt 
    ##Query	Subject	SubjectLength	MatchStart	MatchEnd
    AAFGGSGGR	sp|P35908|K22E_HUMAN	639	516	524
    
  • Query a list of peptide sequences:
    $ java -jar PeptideMatchCMD_1.1.jar -a query -i sprot_index -q AAFGGSGGR,GVPDIR -o out.txt 
    Command line: -a query -i sprot_index -q AAFGGSGGR,GVPDIR -o out.txt 
    Quering...
    
    AAFGGSGGR	has 1 match
    GVPDIR	has 4 matches
    
    Query is finished.
    The result is saved in "out.txt".
    Time used: 00 hours, 00 mins, 00.493 seconds
    
    $ cat out.txt 
    #Command line: -a query -i sprot_index -q AAFGGSGGR,GVPDIR -o out.txt 
    ##Query	Subject	SubjectLength	MatchStart	MatchEnd
    AAFGGSGGR	sp|P35908|K22E_HUMAN	639	516	524	
    GVPDIR	sp|Q9CK59|Y1775_PASMU	92	45	50	
    GVPDIR	sp|B1Y8E7|PYRB_LEPCP	320	194	199	
    GVPDIR	sp|B4SHE6|MURD_PELPB	464	252	257	
    GVPDIR	sp|Q6FX42|ATR_CANGA	2379	1135	1140
    
  • Query a list of peptide sequences and treat Leucine (L) and Isoleucine (I) as equivalent:
    $ java -jar PeptideMatchCMD_1.1.jar -a query -i sprot_index -q AAFGGSGGR,GVPDIR -e -o out.txt 
    Command line: -a query -i sprot_index -q AAFGGSGGR,GVPDIR -e -o out.txt 
    Quering...
    
    AAFGGSGGR	has 1 match
    GVPDIR	has 13 matches
    
    Query is finished.
    The result is saved in "out.txt".
    Time used: 00 hours, 00 mins, 00.513 seconds
    
    $ cat out.txt 
    #Command line: -a query -i sprot_index -q AAFGGSGGR,GVPDIR -e -o out.txt 
    ##Query	Subject	SubjectLength	MatchStart	MatchEnd	MatchedLEqIPositions
    AAFGGSGGR	sp|P35908|K22E_HUMAN	639	516	524	
    GVPDIR	sp|Q9CK59|Y1775_PASMU	92	45	50	
    GVPDIR	sp|A0R5Z2|GLFT1_MYCS2	302	182	187	186
    GVPDIR	sp|Q7D4V6|GLFT1_MYCTU	304	179	184	183
    GVPDIR	sp|B1Y8E7|PYRB_LEPCP	320	194	199	
    GVPDIR	sp|A5GDX3|RECF_GEOUR	364	126	131	130
    GVPDIR	sp|P96919|EX5A_MYCTU	575	138	143	142
    GVPDIR	sp|Q17QV2|MON1A_BOVIN	555	441	446	445
    GVPDIR	sp|Q2QZ37|OBGM_ORYSJ	528	500	505	504
    GVPDIR	sp|B4SHE6|MURD_PELPB	464	252	257	
    GVPDIR	sp|Q9M1G3|LRK16_ARATH	669	595	600	599
    GVPDIR	sp|Q5U3H2|SV421_DANRE	808	575	580	579
    GVPDIR	sp|A6H5Y3|METH_MOUSE	1253	1147	1152	1151
    GVPDIR	sp|Q6FX42|ATR_CANGA	2379	1135	1140
    
  • Query peptides in a FASTA file:
    $ java -jar PeptideMatchCMD_1.1.jar -a query -i sprot_index -Q query.fasta -e -o out_fasta.txt 
    Command line: -a query -i sprot_index -Q query.fasta -e -o out_fasta.txt 
    Quering...
    
    example_1	has 1 match
    example_2	has 1 match
    example_3	has 1 match
    example_4	has 1 match
    example_5	has 1 match
    example_6	has 1 match
    example_7	has 1 match
    example_8	has 1 match
    example_9	has 1 match
    example_10	has 1 match
    
    Query is finished.
    The result is saved in "out_fasta.txt".
    Time used: 00 hours, 00 mins, 00.724 seconds
    
  • Query peptides in a list file, one peptide per line:
    $ java -jar PeptideMatchCMD_1.1.jar -a query -i sprot_index -Q query.list -l -e -o out_list.txt 
    Command line: -a query -i sprot_index -Q query.list -l -e -o out_list.txt 
    Quering...
    
    AAFGGSGGR	has 1 match
    ELEVQSEDGTFAK	has 1 match
    FEDPAEGEDTLVEK	has 1 match
    FSDGLITPDFLAK	has 1 match
    GAPEFWAAR	has 1 match
    GVIEANGGKVEK	has 1 match
    HIPVYVSEEMVGHKFGEFSPTR	has 1 match
    HNDVNFGTQDHNR	has 1 match
    IGFYLTTCPR	has 1 match
    ILVGQGNDGVAFVK	has 1 match
    
    Query is finished.
    The result is saved in "out_list.txt".
    Time used: 00 hours, 00 mins, 00.752 seconds
    

From Docker Container

  • Set up local working directory to hold input and output files. It will be mounted into Docker container.
    $ mkdir /your/localworkdir/
    
    $ cd /your/localworkdir/
    
    $ ls 
    uniprot_sprot.fasta query.list query.fasta
    
  • Creating Lucene index using a protein sequence database in FASTA format:
    $ docker run -v /your/localworkdir/:/workdir chenc/peptidematch \
    	-a index -d /workdir/uniprot_sprot.fasta -i /workdir/uniprot_sprot_index -f
    Unable to find image 'chenc/peptidematch:latest' locally
    latest: Pulling from chenc/peptidematch
    7448db3b31eb: Pull complete 
    c36604fa7939: Pull complete 
    29e8ef0e3340: Pull complete 
    a0c934d2565d: Pull complete 
    a360a17c9cab: Pull complete 
    cfcc996af805: Pull complete 
    2cf014724202: Pull complete 
    4bc402a00dfe: Pull complete 
    1da5b1324a69: Pull complete 
    Digest: sha256:923a488fad501b35de6629309a02f6aa786d42edb7aa0666691aa861bbfd831f
    Status: Downloaded newer image for chenc/peptidematch:latest
    Command line options: -a index -d /workdir/uniprot_sprot.fasta -i /workdir/uniprot_sprot_index -f 
    Indexing to directory "/workdir/uniprot_sprot_index" ...
    Indexing "/workdir/uniprot_sprot.fasta" ...
    Indexing "/workdir/uniprot_sprot.fasta" finished
    Time used: 00 hours, 03 mins, 31.116 seconds
    
  • Query a peptide sequence:
    $ docker run -v /your/localworkdir/:/workdir chenc/peptidematch \
    	-a query -q NEKKQQMGKEYREKIEAEL -i /workdir/uniprot_sprot_index -o /workdir/single_query_out.txt
    Command line options: -a query -q NEKKQQMGKEYREKIEAEL -i /workdir/uniprot_sprot_index -o /workdir/single_query_out.txt 
    Quering...
    
    NEKKQQMGKEYREKIEAEL	has 6 matches
    
    Query is finished.
    The result is saved in "/workdir/single_query_out.txt".
    Time used: 00 hours, 00 mins, 00.935 seconds
    
  • Query a list of peptide sequences:
    $ docker run -v /your/localworkdir/:/workdir chenc/peptidematch \
    	-a query -q NEKKQQMGKEYREKIEAEL,EAFEISKKE -i /workdir/uniprot_sprot_index -o /workdir/multi_query_out.txt
    Command line options: -a query -q NEKKQQMGKEYREKIEAEL,EAFEISKKE -i /workdir/uniprot_sprot_index -o /workdir/multi_query_out.txt 
    Quering...
    
    NEKKQQMGKEYREKIEAEL	has 6 matches
    EAFEISKKE	has 15 matches
    
    Query is finished.
    The result is saved in "/workdir/multi_query_out.txt".
    Time used: 00 hours, 00 mins, 00.685 seconds
    
  • Query peptides in a FASTA file:
    $ docker run -v /your/localworkdir/:/workdir chenc/peptidematch \
    	-a query -Q /workdir/query.fasta -i /workdir/uniprot_sprot_index -o /workdir/fasta_query_out.txt
    Command line options: -a query -Q /workdir/query.fasta -i /workdir/uniprot_sprot_index -o /workdir/fasta_query_out.txt 
    Quering...
    
    example_1	has 1 match
    example_2	has 1 match
    example_3	has 1 match
    example_4	has 1 match
    example_5	has 1 match
    example_6	has 1 match
    example_7	has 1 match
    example_8	has 1 match
    example_9	has 1 match
    example_10	has 1 match
    
    Query is finished.
    The result is saved in "/workdir/fasta_query_out.txt".
    Time used: 00 hours, 00 mins, 01.733 seconds
    
  • Query peptides in a list file, one peptide per line:
    $ docker run -v /your/localworkdir/:/workdir chenc/peptidematch \
    	-a query -Q /workdir/query.list -l -i /workdir/uniprot_sprot_index -o /workdir/list_query_out.txt
    Command line options: -a query -Q /workdir/query.list -l -i /workdir/uniprot_sprot_index -o /workdir/list_query_out.txt 
    Quering...
    
    AAFGGSGGR	has 1 match
    ELEVQSEDGTFAK	has 1 match
    FEDPAEGEDTLVEK	has 1 match
    FSDGLITPDFLAK	has 1 match
    GAPEFWAAR	has 1 match
    GVIEANGGKVEK	has 1 match
    HIPVYVSEEMVGHKFGEFSPTR	has 1 match
    HNDVNFGTQDHNR	has 1 match
    IGFYLTTCPR	has 1 match
    ILVGQGNDGVAFVK	has 1 match
    
    Query is finished.
    The result is saved in "/workdir/list_query_out.txt".
    Time used: 00 hours, 00 mins, 01.432 seconds
    
PIR
 HomeAbout PIRDatabasesSearch/AnalysisDownloadSupport  SITE MAPTERMS OF USE
©2018 Protein Information Resource