Provided by: qtltools_1.3.1+dfsg-4build4_amd64 bug

NAME

       QTLtools mbv - Match genotypes in a VCF to a BAM file

SYNOPSIS

       QTLtools  mbv --bam [sample.bam|sample.sam|sample.cram] --vcf [in.vcf|in.bcf|in.vcf.gz] --out output_file
       [OPTIONS]

DESCRIPTION

       This mode checks if the genotypes in the VCF are observed in the RNAseq reads in the BAM file to  quickly
       solve  sample  mislabeling and detect cross-sample contamination and PCR amplification bias.  The details
       of the  method  are  described  <https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6044394/>.   In  brief,  we
       measure,  for  each  individual  in the VCF, the proportions of heterozygous and homozygous genotypes for
       which both alleles are captured by the sequencing reads in the BAM file.  A 'match' would have  close  to
       100%  concordance  for  both measures, whereas a 'mismatch' will have significantly lower concordance for
       both metrics.  Increased cross-sample contaminations leads to  decreased  homozygous  concordance  values
       with  no  change  in  heterozygous  concordance  while  increased  amplification  bias leads to decreased
       heterozygous concordance with no change in homozygous concordance.  We recommend using  uniquely  mapping
       reads only by specifying the correct --filter-mapping-quality.

OPTIONS

       --vcf [in.vcf|in.bcf|in.vcf.gz]
              Genotypes in VCF/BCF format.  Should contain all the samples in the dataset.  REQUIRED.

       --bam [in.bam|in.sam|in.cram]
              Sequence data in BAM/SAM/CRAM format.  REQUIRED.

       --out output
              Output file name REQUIRED.

       --reg chr:start-end
              Genomic region to be processed.  E.g. chr4:12334456-16334456, or chr5

       --filter-mapping-quality integer
              Minimum  mapping  quality  for  a  read  or  read pair to be considered.  Set this to only include
              uniquely mapped reads.  DEFAULT=10

       --filter-base-quality integer
              Minimum phred quality for a base to be considered.  DEFAULT=5

       --filter-binomial-pvalue float
              Binomial p-value threshold below which a heterozygous genotype is considered as exhibiting allelic
              imbalance.  DEFAULT=0.05

       --filter-minimal-coverage integer
              Minimum number of reads overlapping a genotype for it to be considered.  DEFAULT=10

       --filter-imputation-qual float
              Minimum imputation information score for a variant to be considered.  DEFAULT=0.9

       --filter-imputation-prob float
              Minimum posterior probability for a genotype to be considered.  DEFAULT=0.99

       --filter-keep-duplicates
              Keep reads designated as duplicate by the aligner.

OUTPUT FILE COLUMNS

       --out filename
        This file does not have header and it contains the following columns:

         1   The sample ID in the VCF against which the sequence data has been matched
         2   The number of missing genotypes for this sample
         3   The total number of heterozygous genotypes examined
         4   The total number of homozygous genotypes examined
         5   The number of heterozygous genotypes considered for the matching, i.e. those that  are  covered  by
             more than --filter-minimal-coverage
         6   The number of homozygous genotypes considered for the matching, i.e. those that are covered by more
             than --filter-minimal-coverage
         7   The number of heterozygous genotypes that match between this sample and the BAM file
         8   The number of homozygous genotypes that match between this sample and the BAM file
         9   The percentage of heterozygous genotypes that match between this sample and the BAM file
        10   The percentage of homozygous genotypes that match between this sample and the BAM file
        11   The number of heterozygous genotypes with significant allelic imbalance

EXAMPLES

       o Running mbv on an RNAseq sample mapped with GEM:

         QTLtools   mbv   --bam   HG00381.chr22.bam  --out  HG00381.chr22.mbv.txt  --vcf  genotypes.chr22.vcf.gz
         --filter-mapping-quality 150

         You can then plot column 9 vs. 10 to identify the genotyped sample in the VCF that  matches  best  your
         sequence data.

SEE ALSO

       QTLtools(1)

       QTLtools website: <https://qtltools.github.io/qtltools>

BUGS

       Please submit bugs to <https://github.com/qtltools/qtltools>

CITATION

       Fort A., Panousis N. I., Garieri M. et al. MBV: a method to solve sample mislabeling and detect technical
       bias  in  large  combined  genotype  and  sequencing  assay  datasets,  Bioinformatics 33(12), 1895 2017.
       <https://doi.org/10.1093/bioinformatics/btx074>

AUTHORS

       Olivier Delaneau (olivier.delaneau@gmail.com), Halit Ongen (halitongen@gmail.com)

QTLtools-v1.3                                      06 May 2020                                   QTLtools-mbv(1)