Ubuntu Manpage: bamtofastq - convert SAM, BAM or CRAM files to FastQ

Provided by: biobambam2_2.0.185+ds-2_amd64

NAME

       bamtofastq - convert SAM, BAM or CRAM files to FastQ

SYNOPSIS

       bamtofastq [options]

DESCRIPTION

       bamtofastq  reads  a  SAM,  BAM or CRAM file from standard input and converts it to the FastQ format. The
       output can be split into multiple files according to the pair flags of the reads involved. bamtofastq can
       collate the source reads according to their read names, i.e. place pairs of reads next to each  other  in
       the  output.  bamtofastq writes its output to the standard output channel by default. All output channels
       can be compressed using gzip.

       The following key=value pairs can be given:

       F=<stdout>: output file for the first mates of pairs if collation is active.

       F2=<stdout>: output file for the second mates of pairs if collation is active.

       S=<stdout>: output file for single end reads if collation is active.

       O=<stdout>: output file for unmatched (orphan) first mates if collation is active.

       O2=<stdout>: output file for unmatched (orphan) second mates if collation is active.

       collate=<0|1>: Valid values are

       1:     collate read pairs

       0:     output reads to standard output in the order in which they appear in the BAM file

       combs=<0|1>: print some counts after finishing collation based output

       filename=<stdin>: input file name (data is read from standard input if this option is not given)

       inputformat=<bam>: input file format All versions of bamtofastq come  with  support  for  the  BAM  input
       format. If the program in addition is linked to the io_lib package, then the following options are valid:

       bam:   BAM (see http://samtools.sourceforge.net/SAM1.pdf)

       sam:   SAM (see http://samtools.sourceforge.net/SAM1.pdf)

       cram:  CRAM (see http://www.ebi.ac.uk/ena/about/cram_toolkit)

       reference=:  file  name  of  the reference for CRAM input files. If this key is unset, then the CRAM file
       header will be scanned for obtaining a reference file name.

       exclude=<SECONDARY>: Do not include reads in the output that have any of the given flags set.  The  flags
       are given separated by commas. Valid flags are:

       PAIRED:
              read was paired in sequencing

       PROPER_PAIR:
              read has been mapped as part of a proper pair

       UNMAP: read was not mapped

       MUNMAP:
              mate of read was not mapped

       REVERSE:
              read was mapped to the reverse strand

       MREVERSE:
              mate of read was mapped to the reverse strand

       READ1: read was first read of a pair during sequencing

       READ2: read was second read of a pair during sequencing

       SECONDARY:
              alignment is secondary, i.e. an alternative mapping to the primary alignment in the same file

       QCFAIL:
              read as marked as having failed quality control

       DUP:   read is marked as a duplicate of another read in the same file (see bammarkduplicates)

       SUPPLEMENTARY:
              read is marked as supplementary alignment

       disablevalidation=<0>: Valid values are

       0:     run input file validation on alignments (this is the default)

       1:     do  not check the validity of the input file (this may help for some broken input files, but it is
              a security risk as it can lead to the execution of arbitrary code through a forged input file).

       colhlog=<18> base two logarithm of the size of the hash table used for collation (the default value is 18
       and  should  work  reasonably  well  for  most  input  files.   Please  see  the   biobambam   paper   at
       arxiv.org/abs/1306.0836 for details).

       colsbs=<128M>  size of hash table overflow list in bytes (the default is 128MB and should work reasonably
       well for most input files. Please see the biobambam paper at arxiv.org/abs/1306.0836 for details).

       T=<bamtofastq_hostname_pid_time> file name of temporary file used for collation

       ranges=<>: coordinate ranges selected from input. This option is only available for input  files  in  BAM
       and  CRAM format which have a corresponding index file (.bai for BAM, .crai for CRAM) and if input is via
       file (i.e. the filename argument is set).  Valid ranges consist of either

       whole reference sequence:
              a whole reference sequence (e.g. "chr1")

       half open interval on reference sequence:
              an interval on a reference sequence  half  open  on  the  right  (e.g.  "chr1:50000"  which  means
              alignments overlapping chr1 from position 50000 to the end of chr1)

       interval on reference sequence:
              an  interval  on  a reference sequence (e.g. "chr1:50000-60000" which means alignments overlapping
              positions 50000 to 60000 on chr1)

       For  BAM  input  multiple  ranges  are  separated  by  space  characters  (e.g.  ranges="chr1:10000-20000
       chr1:30000-40000").  CRAM input supports a single range only.

       gz=<[0|1]>: compress output files using gzip. By default output is uncompressed.

       level=<-1|0|1|9|11>: set compression level of the output FastQ/FastA files if gz=1. Valid values are

       -1:    zlib/gzip default compression level

       0:     uncompressed

       1:     zlib/gzip level 1 (fast) compression

       9:     zlib/gzip level 9 (best) compression

       If libmaus has been compiled with support for igzip (see https://software.intel.com/en-us/articles/igzip-
       a-high-performance-deflate-compressor-with-optimizations-for-genomic-data) then an additional valid value
       is

       11:    igzip compression

       fasta=<0|1>: output FastA instead of FastQ if fasta=1.

       outputperreadgroup=<0|1>  split output by read group if outputperreadgroup=1 (default is 0). If splitting
       by read group is performed then no output is written on standard output but all data is written to files.
       The file names will be generated using the outputdir and  outputperreadgroupsuffix  parameters  and  read
       group names.

       outputdir=<>  output  directory if outputperreadgroup=1. By default the output files are generated in the
       current directory.

       outputperreadgrouprgsm=<0|1> include SM field of read group in output filenames  if  outputperreadgroup=1
       (default is 0)

       outputperreadgroupprefix= add given prefix ahead of file names if outputperreadgroup=1 (default is to add
       no prefix)

       outputperreadgroupsuffixF=<_1.fq>  output  file  name  suffix  for  first  mates  of  complete  pairs  if
       outputperreadgroup=1.  Default is _1.fq if gz=0 and _1.fq.gz for gz=1.

       outputperreadgroupsuffixF2=<_2.fq> output file  name  suffix  for  second  mates  of  complete  pairs  if
       outputperreadgroup=1.  Default is _2.fq if gz=0 and _2.fq.gz for gz=1.

       outputperreadgroupsuffixO=<_o1.fq>  output  file  name  suffix  for  first  mates  of incomplete pairs if
       outputperreadgroup=1.  Default is _o1.fq if gz=0 and _o1.fq.gz for gz=1.

       outputperreadgroupsuffixO2=<_o2.fq> output file name suffix for  second  mates  of  incomplete  pairs  if
       outputperreadgroup=1.  Default is _o2.fq if gz=0 and _o2.fq.gz for gz=1.

       outputperreadgroupsuffixS=<_s.fq>  output file name suffix for singled end reads if outputperreadgroup=1.
       Default is _s.fq if gz=0 and _s.fq.gz for gz=1.

       tryoq=<0|1>: use content of OQ aux field if present instead of quality field when converting to FastQ. By
       default the quality field is used.  This option is currently mutually exclusive with the tags option.

       tags=<>: provide a comma separated list of aux fields which will  be  copied  from  the  input  alignment
       records  to  the comment section of the output FastQ records.  By default no aux fields are copied.  This
       option is currently mutually exclusive with the tryoq option.

       split=<0>: split named output files into chunks of this number of reads. The output file  names  will  be
       extended  by  _NNNNNN  if gz=0 and by _NNNNNN.gz if gz=1 where NNNNNN denotes the NNNNNN+1'th output file
       (i.e. numbers start with 000000).  The suffixes k, m, g, K, M and G  can  be  used  to  denote  that  the
       argument is to be multiplied by 1024, 1024^2, 1024^3, 1000, 1000^2 or 1000^3 respectively.

       cols=<>: If set to an unsigned number then wrap the sequence and quality lines at this number of columns.
       By default no wrapping is performed.

       splitprefix=<bamtofastq_split>: file prefix if split>0 and collate=0.

       casava18=<0>:  produce  read  names  as expected by the c18pe input option of fastqtobam using the ne aux
       fields produced by fastqtobam.

       maxoutput=<>: produce no more than this number of output records.  By default there  is  no  limit.  This
       option is only active for collate=0.

AUTHOR

       Written by German Tischler.

REPORTING BUGS

       Report bugs to <germant@miltenyibiotec.de>

COPYRIGHT

       Copyright  ©  2009-2014  German  Tischler,  © 2011-2014 Genome Research Limited.  License GPLv3+: GNU GPL
       version 3 <http://gnu.org/licenses/gpl.html>
       This is free software: you are free to change and redistribute it.  There is NO WARRANTY, to  the  extent
       permitted by law.

BIOBAMBAM                                          March 2014                                      BAMTOFASTQ(1)