hmmscan man page on DragonFly

Man page or keyword search:  
man Server   44335 pages
apropos Keyword Search (all sections)
Output format
DragonFly logo
[printable version]

hmmscan(1)			 HMMER Manual			    hmmscan(1)

NAME
       hmmscan - search sequence(s) against a profile database

SYNOPSIS
       hmmscan [options] <hmmdb> <seqfile>

DESCRIPTION
       hmmscan	is  used  to search sequences against collections of profiles.
       For each sequence in <seqfile>, use that query sequence to  search  the
       target  database of profiles in <hmmdb>, and output ranked lists of the
       profiles with the most significant matches to the sequence.

       The <seqfile> may contain more than one query sequence. It  can	be  in
       FASTA  format,  or several other common sequence file formats (genbank,
       embl, and uniprot, among others), or in alignment file formats  (stock‐
       holm,  aligned  fasta, and others). See the --qformat option for a com‐
       plete list.

       The <hmmdb> needs to be	press'ed  using	 hmmpress  before  it  can  be
       searched	 with  hmmscan.	  This	creates	 four  binary  files, suffixed
       .h3{fimp}.

       The output format is designed to be human-readable,  but	 is  often  so
       voluminous  that	 reading  it is impractical, and parsing it is a pain.
       The --tblout and --domtblout options save output in simple tabular for‐
       mats  that are concise and easier to parse.  The -o option allows redi‐
       recting the main output, including throwing it away in /dev/null.

OPTIONS
       -h     Help; print a brief reminder  of	command	 line  usage  and  all
	      available options.

OPTIONS FOR CONTROLLING OUTPUT
       -o <f> Direct  the  main human-readable output to a file <f> instead of
	      the default stdout.

       --tblout <f>
	      Save a simple tabular  (space-delimited)	file  summarizing  the
	      per-target  output,  with	 one  data  line per homologous target
	      model found.

       --domtblout <f>
	      Save a simple tabular  (space-delimited)	file  summarizing  the
	      per-domain  output,  with	 one  data  line per homologous domain
	      detected in a query sequence for each homologous model.

       --acc  Use accessions instead of names in the main output, where avail‐
	      able for profiles and/or sequences.

       --noali
	      Omit  the	 alignment  section  from  the	main  output. This can
	      greatly reduce the output volume.

       --notextw
	      Unlimit the length of each line in the main output. The  default
	      is a limit of 120 characters per line, which helps in displaying
	      the output cleanly on terminals and in editors, but can truncate
	      target profile description lines.

       --textw <n>
	      Set  the	main  output's line length limit to <n> characters per
	      line. The default is 120.

OPTIONS FOR REPORTING THRESHOLDS
       Reporting thresholds control which hits are reported  in	 output	 files
       (the main output, --tblout, and --domtblout).

       -E <x> In the per-target output, report target profiles with an E-value
	      of <= <x>.  The default is 10.0, meaning that on average,	 about
	      10  false	 positives  will be reported per query, so you can see
	      the top of the noise and decide  for  yourself  if  it's	really
	      noise.

       -T <x> Instead  of  thresholding per-profile output on E-value, instead
	      report target profiles with a bit score of >= <x>.

       --domE <x>
	      In the per-domain output, for target profiles that have  already
	      satisfied the per-profile reporting threshold, report individual
	      domains with a conditional E-value of <= <x>.   The  default  is
	      10.0.   A conditional E-value means the expected number of addi‐
	      tional false positive domains in the  smaller  search  space  of
	      those comparisons that already satisfied the per-profile report‐
	      ing threshold (and thus must have at least one homologous domain
	      already).

       --domT <x>
	      Instead  of  thresholding	 per-domain output on E-value, instead
	      report domains with a bit score of >= <x>.

OPTIONS FOR INCLUSION THRESHOLDS
       Inclusion thresholds are stricter than reporting thresholds.  Inclusion
       thresholds  control  which hits are considered to be reliable enough to
       be included in an output alignment or a subsequent  search  round.   In
       hmmscan,	 which	does  not have any alignment output (like hmmsearch or
       phmmer) nor any iterative  search  steps	 (like	jackhmmer),  inclusion
       thresholds have little effect. They only affect what domains get marked
       as significant (!) or questionable (?) in domain output.

       --incE <x>
	      Use an E-value of <= <x> as the per-target inclusion  threshold.
	      The default is 0.01, meaning that on average, about 1 false pos‐
	      itive would be expected in every	100  searches  with  different
	      query sequences.

       --incT <x>
	      Instead  of  using E-values for setting the inclusion threshold,
	      instead use a bit score of >= <x> as  the	 per-target  inclusion
	      threshold.  It would be unusual to use bit score thresholds with
	      hmmscan, because you don't expect a single  score	 threshold  to
	      work  for	 different  profiles; different profiles have slightly
	      different expected score distributions.

       --incdomE <x>
	      Use a conditional E-value of <= <x> as the per-domain  inclusion
	      threshold,  in  targets  that have already satisfied the overall
	      per-target inclusion threshold.  The default is 0.01.

       --incdomT <x>
	      Instead of using E-values, instead use a bit score of >= <x>  as
	      the  per-domain  inclusion  threshold.  As with --incT above, it
	      would be unusual to use a single bit score threshold in hmmscan.

OPTIONS FOR MODEL-SPECIFIC SCORE THRESHOLDING
       Curated profile databases may define specific bit score thresholds  for
       each profile, superseding any thresholding based on statistical signif‐
       icance alone.

       To use these options, the profile must contain the appropriate (GA, TC,
       and/or  NC)  optional  score threshold annotation; this is picked up by
       hmmbuild from  Stockholm	 format	 alignment  files.  Each  thresholding
       option  has  two	 scores:  the per-sequence threshold <x1> and the per-
       domain threshold <x2> These act	as  if	-T<x1>	--incT<x1>  --domT<x2>
       --incdomT<x2>  has been applied specifically using each model's curated
       thresholds.

       --cut_ga
	      Use the GA (gathering) bit scores	 in  the  model	 to  set  per-
	      sequence	(GA1)  and  per-domain	(GA2)  reporting and inclusion
	      thresholds. GA thresholds are generally  considered  to  be  the
	      reliable	curated	 thresholds  defining  family  membership; for
	      example, in Pfam, these thresholds define what gets included  in
	      Pfam Full alignments based on searches with Pfam Seed models.

       --cut_nc
	      Use  the	NC (noise cutoff) bit score thresholds in the model to
	      set per-sequence (NC1) and per-domain (NC2) reporting and inclu‐
	      sion  thresholds.	 NC  thresholds are generally considered to be
	      the score of the highest-scoring known false positive.

       --cut_tc
	      Use the NC (trusted cutoff) bit score thresholds in the model to
	      set per-sequence (TC1) and per-domain (TC2) reporting and inclu‐
	      sion thresholds. TC thresholds are generally  considered	to  be
	      the  score  of  the  lowest-scoring  known true positive that is
	      above all known false positives.

CONTROL OF THE ACCELERATION PIPELINE
       HMMER3 searches are accelerated in a three-step	filter	pipeline:  the
       MSV  filter, the Viterbi filter, and the Forward filter. The first fil‐
       ter is the fastest and most approximate; the last is the	 full  Forward
       scoring	algorithm.  There  is  also a bias filter step between MSV and
       Viterbi. Targets that pass all the steps in the	acceleration  pipeline
       are then subjected to postprocessing -- domain identification and scor‐
       ing using the Forward/Backward algorithm.

       Changing filter thresholds only removes or includes targets  from  con‐
       sideration;  changing  filter  thresholds does not alter bit scores, E-
       values, or alignments, all of which are determined solely  in  postpro‐
       cessing.

       --max  Turn  off	 all  filters, including the bias filter, and run full
	      Forward/Backward postprocessing on every target. This  increases
	      sensitivity somewhat, at a large cost in speed.

       --F1 <x>
	      Set  the P-value threshold for the MSV filter step.  The default
	      is 0.02, meaning that roughly 2% of the highest  scoring	nonho‐
	      mologous targets are expected to pass the filter.

       --F2 <x>
	      Set  the	P-value	 threshold  for	 the Viterbi filter step.  The
	      default is 0.001.

       --F3 <x>
	      Set the P-value threshold for  the  Forward  filter  step.   The
	      default is 1e-5.

       --nobias
	      Turn  off	 the bias filter. This increases sensitivity somewhat,
	      but can come at a high cost in speed, especially	if  the	 query
	      has  biased  residue  composition (such as a repetitive sequence
	      region, or if it is a membrane protein  with  large  regions  of
	      hydrophobicity). Without the bias filter, too many sequences may
	      pass the filter with biased  queries,  leading  to  slower  than
	      expected	performance  as	 the  computationally  intensive  For‐
	      ward/Backward algorithms shoulder an abnormally heavy load.

OTHER OPTIONS
       --nonull2
	      Turn off the null2 score corrections for biased composition.

       -Z <x> Assert that the total number of targets in your searches is <x>,
	      for  the	purposes  of per-sequence E-value calculations, rather
	      than the actual number of targets seen.

       --domZ <x>
	      Assert that the total number of targets in your searches is <x>,
	      for the purposes of per-domain conditional E-value calculations,
	      rather than the number of	 targets  that	passed	the  reporting
	      thresholds.

       --seed <n>
	      Set the random number seed to <n>.  Some steps in postprocessing
	      require Monte Carlo simulation.  The default is to use  a	 fixed
	      seed  (42),  so that results are exactly reproducible. Any other
	      positive integer will give  different  (but  also	 reproducible)
	      results. A choice of 0 uses an arbitrarily chosen seed.

       --qformat <s>
	      Assert  that  the query sequence file is in format <s>. Accepted
	      formats include fasta, embl, genbank, ddbj, uniprot,  stockholm,
	      pfam, a2m, and afa.

       --cpu <n>
	      Set  the	number of parallel worker threads to <n>.  By default,
	      HMMER sets this to the number of CPU cores it  detects  in  your
	      machine  -  that is, it tries to maximize the use of your avail‐
	      able processor cores. Setting <n>	 higher	 than  the  number  of
	      available	 cores	is of little if any value, but you may want to
	      set it to something less. You can also control  this  number  by
	      setting an environment variable, HMMER_NCPU.

	      This  option  is only available if HMMER was compiled with POSIX
	      threads support. This is the  default,  but  it  may  have  been
	      turned off for your site or machine for some reason.

       --stall
	      For  debugging the MPI master/worker version: pause after start,
	      to enable the developer to attach debuggers to the running  mas‐
	      ter  and worker(s) processes. Send SIGCONT signal to release the
	      pause.  (Under gdb: (gdb) signal SIGCONT)

	      (Only available if optional MPI support was enabled at  compile-
	      time.)

       --mpi  Run in MPI master/worker mode, using mpirun.

	      (Only  available if optional MPI support was enabled at compile-
	      time.)

SEE ALSO
       See hmmer(1) for a master man page with a list of  all  the  individual
       man pages for programs in the HMMER package.

       For  complete  documentation,  see  the	user guide that came with your
       HMMER  distribution  (Userguide.pdf);  or  see  the  HMMER   web	  page
       (@HMMER_URL@).

COPYRIGHT
       @HMMER_COPYRIGHT@
       @HMMER_LICENSE@

       For  additional	information  on	 copyright and licensing, see the file
       called COPYRIGHT in your HMMER source distribution, or  see  the	 HMMER
       web page (@HMMER_URL@).

AUTHOR
       Eddy/Rivas Laboratory
       Janelia Farm Research Campus
       19700 Helix Drive
       Ashburn VA 20147 USA
       http://eddylab.org

HMMER @HMMER_VERSION@		 @HMMER_DATE@			    hmmscan(1)
[top]

List of man pages available for DragonFly

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Tweet
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome
Free Shell Accounts :: the biggest list on the net