Provided by: hmmer2_2.3.2+dfsg-12_amd64 bug

NAME

       hmm2build - build a profile HMM from an alignment

SYNOPSIS

       hmm2build [options] hmmfile alignfile

DESCRIPTION

       hmm2build  reads  a  multiple sequence alignment file alignfile , builds a new profile HMM, and saves the
       HMM in hmmfile.

       alignfile may be in ClustalW, GCG MSF, SELEX, Stockholm, or aligned FASTA alignment format. The format is
       automatically detected.

       By default, the model is configured to find one or more nonoverlapping alignments to the complete  model:
       multiple  global  alignments  with respect to the model, and local with respect to the sequence.  This is
       analogous to the behavior of the hmmls program of HMMER 1.  To configure the  model  for  multiple  local
       alignments  with respect to the model and local with respect to the sequence, a la the old program hmmfs,
       use the -f (fragment) option. More rarely, you may want to  configure  the  model  for  a  single  global
       alignment  (global  with  respect  to  both model and sequence), using the -g option; or to configure the
       model for a single local/local alignment (a la standard Smith/Waterman, or the old  hmmsw  program),  use
       the -s option.

OPTIONS

       -f     Configure  the  model  for finding multiple domains per sequence, where each domain can be a local
              (fragmentary) alignment. This is analogous to the old hmmfs program of HMMER 1.

       -g     Configure the model for finding a single global alignment to a target sequence, analogous  to  the
              old hmms program of HMMER 1.

       -h     Print brief help; includes version number and summary of all options, including expert options.

       -n <s> Name  this  HMM <s>.  <s> can be any string of non-whitespace characters (e.g. one "word").  There
              is no length limit (at least not one imposed by HMMER; your shell will complain about command line
              lengths first).

       -o <f> Re-save the starting alignment to <f>, in Stockholm format.  The columns which  were  assigned  to
              match  states  will be marked with x's in an #=RF annotation line.  If either the --hand or --fast
              construction options were chosen, the alignment may have been slightly altered  to  be  compatible
              with Plan 7 transitions, so saving the final alignment and comparing to the starting alignment can
              let  you  view  these  alterations.  See the User's Guide for more information on this arcane side
              effect.

       -s     Configure the model for finding a single local alignment per target sequence. This is analogous to
              the standard Smith/Waterman algorithm or the hmmsw program of HMMER 1.

       -A     Append this model to an existing hmmfile rather than creating hmmfile.  Useful  for  building  HMM
              libraries (like Pfam).

       -F     Force  overwriting  of  an existing hmmfile.  Otherwise HMMER will refuse to clobber your existing
              HMM files, for safety's sake.

EXPERT OPTIONS

       --amino
              Force the sequence alignment to be interpreted as amino acid sequences. Normally HMMER autodetects
              whether the alignment is protein or DNA, but sometimes alignments are so small that  autodetection
              is ambiguous. See --nucleic.

       --archpri <x>
              Set  the  "architecture  prior"  used  by  MAP  architecture  construction  to <x>, where <x> is a
              probability between 0 and 1. This parameter governs a  geometric  prior  distribution  over  model
              lengths.  As  <x>  increases, longer models are favored a priori.  As <x> decreases, it takes more
              residue conservation in a column to make  a  column  a  "consensus"  match  column  in  the  model
              architecture.  The 0.85 default has been chosen empirically as a reasonable setting.

       --binary
              Write the HMM to hmmfile in HMMER binary format instead of readable ASCII text.

       --cfile <f>
              Save the observed emission and transition counts to <f> after the architecture has been determined
              (e.g. after residues/gaps have been assigned to match, delete, and insert states).  This option is
              used  in HMMER development for generating data files useful for training new Dirichlet priors. The
              format of count files is documented in the User's Guide.

       --fast Quickly and heuristically determine the architecture of the model by assigning  all  columns  will
              more  than a certain fraction of gap characters to insert states. By default this fraction is 0.5,
              and it can be changed using the --gapmax option.  The default construction algorithm is a  maximum
              a posteriori (MAP) algorithm, which is slower.

       --gapmax <x>
              Controls  the --fast model construction algorithm, but if --fast is not being used, has no effect.
              If a column has more than a fraction <x> of gap symbols in it,  it  gets  assigned  to  an  insert
              column.   <x>  is a frequency from 0 to 1, and by default is set to 0.5. Higher values of <x> mean
              more columns get assigned to consensus, and models get longer; smaller values of  <x>  mean  fewer
              columns get assigned to consensus, and models get smaller.  <x>

       --hand Specify  the  architecture  of the model by hand: the alignment file must be in SELEX or Stockholm
              format, and the reference annotation line (#=RF in SELEX, #=GC RF in Stockholm) is used to specify
              the architecture. Any column marked with a non-gap symbol  (such  as  an  'x',  for  instance)  is
              assigned as a consensus (match) column in the model.

       --idlevel <x>
              Controls  both  the  determination  of effective sequence number and the behavior of the --wblosum
              weighting option. The sequence alignment is clustered by  percent  identity,  and  the  number  of
              clusters  at a cutoff threshold of <x> is used to determine the effective sequence number.  Higher
              values of <x> give more clusters and higher effective sequence numbers; lower values of  <x>  give
              fewer  clusters  and  lower  effective  sequence  numbers.   <x> is a fraction from 0 to 1, and by
              default is set to 0.62 (corresponding to the clustering level used in  constructing  the  BLOSUM62
              substitution matrix).

       --informat <s>
              Assert  that  the  input  seqfile  is in format <s>; do not run Babelfish format autodection. This
              increases the reliability of the program  somewhat,  because  the  Babelfish  can  make  mistakes;
              particularly  recommended  for  unattended,  high-throughput  runs  of HMMER. Valid format strings
              include FASTA, GENBANK, EMBL, GCG, PIR, STOCKHOLM, SELEX, MSF, CLUSTAL, and PHYLIP. See the User's
              Guide for a complete list.

       --noeff
              Turn off the effective sequence number calculation, and use the true number of sequences  instead.
              This will usually reduce the sensitivity of the final model (so don't do it without good reason!)

       --nucleic
              Force  the alignment to be interpreted as nucleic acid sequence, either RNA or DNA. Normally HMMER
              autodetects whether the alignment is protein or DNA, but sometimes alignments are  so  small  that
              autodetection is ambiguous. See --amino.

       --null <f>
              Read a null model from <f>.  The default for protein is to use average amino acid frequencies from
              Swissprot 34 and p1 = 350/351; for nucleic acid, the default is to use 0.25 for each base and p1 =
              1000/1001.  For  documentation of the format of the null model file and further explanation of how
              the null model is used, see the User's Guide.

       --pam <f>
              Apply a heuristic PAM- (substitution matrix-) based prior on match emission probabilities  instead
              of the default mixture Dirichlet. The substitution matrix is read from <f>.  See --pamwgt.

              The  default  Dirichlet state transition prior and insert emission prior are unaffected. Therefore
              in principle you could combine --prior with --pam but this isn't recommended, as  it  hasn't  been
              tested. ( --pam itself hasn't been tested much!)

       --pamwgt <x>
              Controls  the weight on a PAM-based prior. Only has effect if --pam option is also in use.  <x> is
              a positive real number, 20.0 by default.  <x> is the number of "pseudocounts" contriubuted by  the
              heuristic prior. Very high values of <x> can force a scoring system that is entirely driven by the
              substitution matrix, making HMMER somewhat approximate Gribskov profiles.

       --pbswitch <n>
              For  alignments  with  a  very  large  number of sequences, the GSC, BLOSUM, and Voronoi weighting
              schemes are slow; they're O(N^2) for N sequences. Henikoff position-based weights (PB weights) are
              more efficient. At or above a certain threshold sequence number <n>  hmm2build  will  switch  from
              GSC,  BLOSUM, or Voronoi weights to PB weights. To disable this switching behavior (at the cost of
              compute time, set <n> to be something larger than the number of sequences in your alignment.   <n>
              is a positive integer; the default is 1000.

       --prior <f>
              Read  a  Dirichlet  prior  from <f>, replacing the default mixture Dirichlet.  The format of prior
              files is documented in the User's Guide, and an example is given in the  Demos  directory  of  the
              HMMER distribution.

       --swentry <x>
              Controls  the  total  probability  that  is  distributed  to  local entries into the model, versus
              starting at the beginning of the model as in a global alignment.  <x> is a probability from  0  to
              1,  and by default is set to 0.5.  Higher values of <x> mean that hits that are fragments on their
              left (N or 5'-terminal) side will be penalized  less,  but  complete  global  alignments  will  be
              penalized  more.   Lower values of <x> mean that fragments on the left will be penalized more, and
              global alignments on this side will be favored.  This option only affects the configurations  that
              allow  local  alignments,  e.g.   -s  and  -f; unless one of these options is also activated, this
              option has no effect.  You have independent control over local/global alignment behavior  for  the
              N/C (5'/3') termini of your target sequences using --swentry and --swexit.

       --swexit <x>
              Controls the total probability that is distributed to local exits from the model, versus ending an
              alignment at the end of the model as in a global alignment.  <x> is a probability from 0 to 1, and
              by  default  is set to 0.5.  Higher values of <x> mean that hits that are fragments on their right
              (C or 3'-terminal) side will be penalized less, but complete global alignments will  be  penalized
              more.   Lower  values  of  <x> mean that fragments on the right will be penalized more, and global
              alignments on this side will be favored.  This option only affects the configurations  that  allow
              local  alignments, e.g.  -s and -f; unless one of these options is also activated, this option has
              no effect.  You have independent control over local/global alignment behavior for the N/C  (5'/3')
              termini of your target sequences using --swentry and --swexit.

       --verbose
              Print  more  possibly  useful  stuff,  such  as  the  individual  scores  for each sequence in the
              alignment.

       --wblosum
              Use the BLOSUM filtering algorithm to weight the sequences, instead of the default.   Cluster  the
              sequences  at  a  given percentage identity (see --idlevel); assign each cluster a total weight of
              1.0, distributed equally amongst the members of that cluster.

       --wgsc Use the Gerstein/Sonnhammer/Chothia ad hoc sequence  weighting  algorithm.  This  is  already  the
              default,  so  this  option  has no effect (unless it follows another option in the -\-w family, in
              which case it overrides it).

       --wme  Use the Krogh/Mitchison maximum entropy algorithm to "weight" the sequences. This  supersedes  the
              Eddy/Mitchison/Durbin  maximum  discrimination algorithm, which gives almost identical weights but
              is less robust. ME weighting seems to give a marginal increase in sensitivity over the default GSC
              weights, but takes a fair amount of time.

       --wnone
              Turn off all sequence weighting.

       --wpb  Use the Henikoff position-based weighting scheme.

       --wvoronoi
              Use the Sibbald/Argos Voronoi sequence weighting algorithm in place of the default GSC weighting.

SEE ALSO

       Master man page, with full list of and guide to the individual man pages: see hmmer2(1).

       For          complete           documentation,           see           the           user           guide
       (ftp://selab.janelia.org/pub/software/hmmer/2.3.2/Userguide.pdf);    or   see   the   HMMER   web   page,
       http://hmmer.janelia.org/.

COPYRIGHT

       Copyright (C) 1992-2003 HHMI/Washington University School of Medicine.
       Freely distributed under the GNU General Public License (GPL).
       See the file COPYING in your distribution for details on redistribution conditions.

AUTHOR

       Sean Eddy
       HHMI/Dept. of Genetics
       Washington Univ. School of Medicine
       4566 Scott Ave.
       St Louis, MO 63110 USA
       http://www.genetics.wustl.edu/eddy/

HMMER 2.3.2                                         Oct 2003                                        hmm2build(1)