Ubuntu Manpage: run_tipp.py - an identification and phylogenetic profiling tool

NAME

       run_tipp.py - an identification and phylogenetic profiling tool

DESCRIPTION

       usage: run_tipp.py [-h] [-v] [-A N] [-P N] [-F N] [--distance DISTANCE]

              [-M  DIAMETER]  [-S  DECOMP]  [-p DIR] [-rt] [-o OUTPUT] [-d OUTPUT_DIR] [-c CONFIG] [-t TREE] [-r
              RAXML] [-a ALIGN] [-f FRAG] [-m MOLECULE] [--ignore-overlap]  [-x  N]  [-cp  CHCK_FILE]  [-cpi  N]
              [-seed N] [-R N] [-at N] [-D] [-pt N] [-PD N] [-tx TAXONOMY] [-txm MAPPING] [-adt TREE] [-C N]

       This  script  runs the SEPP algorithm on an input tree, alignment, fragment file, and RAxML info file. It
       uses      a      reference      dataset      which      has       to       be       downloaded       from
       https://obj.umiacs.umd.edu/tipp/tipp2-refpkg.tar.gz

       If  the  local administrator has not set the path to this reference dataset in /etc/tipp/tipp.config, you
       should copy this file to ~/.tipp/ and put the path to  the  dataset  in  the  reference  section  of  the
       configuration file, see tipp.config(5).

   optional arguments:
       -h, --help
              show this help message and exit

       -v, --version
              show program's version number and exit

   DECOMPOSITION OPTIONS:
              These  options  determine  the  alignment  decomposition size and taxon insertion size. If None is
              given, then the default is to align/place at 10% of total taxa. The  alignment  decomosition  size
              must be less than the taxon insertion size.

       -A N, --alignmentSize N
              max  alignment  subset size of N [default: 10% of the total number of taxa or the placement subset
              size if given]

       -P N, --placementSize N
              max placement subset size of N [default: 10% of the total number of taxa or the  alignment  length
              (whichever bigger)]

       -F N, --fragmentChunkSize N
              maximum fragment chunk size of N. Helps controlling memory. [default: 20000]

       --distance DISTANCE
              minimum p-distance before stopping the decomposition[default: 1]

       -M DIAMETER, --diameter DIAMETER
              maximum tree diameter before stopping the decomposition[default: None]

       -S DECOMP, --decomp_strategy DECOMP
              decomposition strategy [default: using tree branch length]

   OUTPUT OPTIONS:
              These options control output.

       -p DIR, --tempdir DIR
              Tempfile files will be written to DIR. Full-path required. [default: /tmp/sepp]

       -rt, --remtemp
              Remove template directory. [default: disabled]

       -o OUTPUT, --output OUTPUT
              output files with prefix OUTPUT. [default: output]

       -d OUTPUT_DIR, --outdir OUTPUT_DIR
              output to OUTPUT_DIR directory. full-path required.  [default: .]

   INPUT OPTIONS:
              These  options  control  input.  To run SEPP the following is required. A backbone tree (in newick
              format), a RAxML_info file (this is the file generated by RAxML during estimation of the  backbone
              tree.  Pplacer  uses  this info file to set model parameters), a backbone alignment file (in fasta
              format), and a fasta file including fragments. The input sequences are assumed to  be  DNA  unless
              specified otherwise.

       -c CONFIG, --config CONFIG
              A  config  file,  including  options used to run SEPP.  Options provided as command line arguments
              overwrite config file values for those options. [default: None]

       -t TREE, --tree TREE
              Input tree file (newick format) [default: None]

       -r RAXML, --raxml RAXML
              RAxML_info file including model parameters, generated by RAxML.[default: None]

       -a ALIGN, --alignment ALIGN
              Aligned fasta file [default: None]

       -f FRAG, --fragment FRAG
              fragment file [default: None]

       -m MOLECULE, --molecule MOLECULE
              Molecule type of sequences. Can be amino, dna, or rna [default: dna]

       --ignore-overlap
              When a query sequence has the same name as a backbone sequence, ignore  the  query  sequences  and
              keep the backbone sequence [default: False]

   OTHER OPTIONS:
              These options control how SEPP is run

       -x N, --cpu N
              Use N cpus [default: number of cpus available on the machine]

       -cp CHCK_FILE, --checkpoint CHCK_FILE
              checkpoint file [default: no checkpointing]

       -cpi N, --interval N
              Interval  (in  seconds)  between  checkpoint  writes. Has effect only with -cp provided. [default:
              3600]

       -seed N, --randomseed N
              random seed number. [default: 297834]

   TIPP OPTIONS:
              These arguments set settings specific to TIPP

       -R N, --reference_pkg N
              Use a pre-computed reference package [default: None]

       -at N, --alignmentThreshold N
              Enough alignment subsets are selected to reach a commulative probability of N. This  should  be  a
              number between 0 and 1 [default: 0.95]

       -D, --dist
              Treat fragments as distribution

       -pt N, --placementThreshold N
              Enough  placements  are  selected to reach a commulative probability of N. This should be a number
              between 0 and 1 [default: 0.95]

       -PD N, --push_down N
              Whether to classify based on children below or above insertion point. [default: True]

       -tx TAXONOMY, --taxonomy TAXONOMY
              A file describing the taxonomy. This is a commaseparated text file that has the following  fields:
              taxon_id,parent_id,taxon_name,rank.  If  there are other columns, they are ignored. The first line
              is also ignored.

       -txm MAPPING, --taxonomyNameMapping MAPPING
              A comma-separated text file mapping alignment sequence  names  to  taxonomic  ids.  Formats  (each
              line):  sequence_name,taxon_id.  If  there  are other columns, they are ignored. The first line is
              also ignored.

       -adt TREE, --alignmentDecompositionTree TREE
              A newick tree file used for decomposing taxa into alignment subsets. [default: the backbone tree]

       -C N, --cutoff N
              Placement probability requirement to count toward  the  distribution.  This  should  be  a  number
              between 0 and 1 [default: 0.0]

NAME

DESCRIPTION

SEE ALSO