Provided by: seqan-raptor_3.0.1+ds-9_amd64 bug

NAME

       Raptor-layout  -  A fast and space-efficient pre-filter for querying very large collections of nucleotide
       sequences.

DESCRIPTION

       Computes an HIBF layout that tries to minimize the disk space consumption of  the  resulting  index.  The
       space  is  estimated  using  a  k-mer  count  per  user  bin which represents the potential denisity in a
       technical  bin  in  an  interleaved  Bloom  filter.   You  can  pass  the  resulting  layout  to   raptor
       (https://github.com/seqan/raptor) to build the index and conduct queries.

OPTIONS

   Main options:
       --input-file (std::filesystem::path)
              The  input must be a file containing paths to sequence data you wish to estimate; one filepath per
              line. If your file contains auxiliary information (e.g. species  IDs),  your  file  must  be  tab-
              separated.

       Example file:

       ```

       /absolute/path/to/file1.fasta

       /absolute/path/to/file2.fa.gz

       ```

       --kmer-size (unsigned 8 bit integer)
              The k-mer size influences the size estimates of the input. Choosing a k-mer size that is too small
              for  your data will result in files appearing more similar than they really are. Likewise, a large
              k-mer size might miss out on certain similarities. For DNA sequences, a k-mer size between [16,32]
              has proven to work well. Default: 19.

       --num-hash-functions (unsigned 64 bit integer)
              The number of hash functions to use when  building  the  HIBF  from  the  resulting  layout.  This
              parameter is needed to correctly estimate the index size when computing the layout. Default: 2.

       --false-positive-rate (double)
              The  false  positive  rate  you  aim  for  when  building the HIBF from the resulting layout. This
              parameter is needed to correctly estimate the index size when computing the layout. Default: 0.05.

       --output-filename (std::filesystem::path)
              A file name for the resulting layout. Default: "binning.out".

       --threads (unsigned 64 bit integer)
              The number of threads to use. Currently, only merging of sketches is parallelized, so if the  flag
              --disable-rearrangement  is set, --threads will have no effect. Default: 1. Value must be in range
              [1,18446744073709551615].

   HyperLogLog Sketches:
       To improve the layout, you can estimate the sequence similarities using HyperLogLog sketches.

       --disable-estimate-union
              The sketches are used to estimate the sequence similarity among a set  of  user  bins.  This  will
              improve  the layout computation as merging user bins that do not increase technical bin sizes will
              be preferred. This may use more RAM and can be disabled in RAM-critical  environments.  Attention:
              Also disables rearrangement which depends on union estimations.

       --disable-rearrangement
              As  a  preprocessing  step,  rearranging  the order of the given user bins based on their sequence
              similarity may lead to favourable small unions and thus a smaller index. Depending on  the  number
              of  input samples (user bins), this may be time-consuming and can thus be disabled if a suboptimal
              layout is sufficient.

   Parameter Tweaking:
   Special options

REFERENCES

       [1] Philippe Flajolet, Éric Fusy, Olivier Gandouet, Frédéric Meunier.  HyperLogLog:  the  analysis  of  a
       near-optimal  cardinality  estimation  algorithm.  AofA: Analysis of Algorithms, Jun 2007, Juan les Pins,
       France. pp.137-156. hal-00406166v2, https://doi.org/10.46298/dmtcs.3545

   Common options
       -h, --help
              Prints the help page.

       -hh, --advanced-help
              Prints the help page including advanced options.

       --version
              Prints the version information.

       --copyright
              Prints the copyright/license information.

       --export-help (std::string)
              Export the help page information. Value must be one of [html, man, ctd, cwl].

VERSION

       Last update: Unavailable
       Raptor-layout version: 3.0.1 (commit unavailable)
       Sharg version: 1.1.1
       SeqAn version: 3.4.0-rc.3

URL

       https://github.com/seqan/raptor

LEGAL

       Raptor-layout Copyright: BSD 3-Clause License
       Author: Svenja Mehringer
       Contact: svenja.mehringer@fu-berlin.de
       SeqAn Copyright: 2006-2023 Knut Reinert, FU-Berlin; released under the 3-clause BSDL.
       In your academic works please cite: Raptor: A fast and space-efficient pre-filter for querying very large
       collections of nucleotide sequences; Enrico Seiler, Svenja Mehringer, Mitra Darvish,  Etienne  Turc,  and
       Knut Reinert; iScience 2021 24 (7): 102782. doi: https://doi.org/10.1016/j.isci.2021.102782
       For full copyright and/or warranty information see --copyright.

raptor-layout 3.0.1 (commit unavailable)           Unavailable                                  RAPTOR-LAYOUT(1)