GATK4 -Estimate contamination

Description

Calculates the fraction of reads coming from cross-sample contamination, given results from GetPileupSummaries.

Parameters

Details

Calculates the fraction of reads coming from cross-sample contamination, given results from GetPileupSummaries. (Chipster tool Tabulate pileup metrics for inferring contamination with GATK4) The resulting contamination table is used with FilterMutectCalls. (Chipster tool Filter Mutect2 calls with GATK4))

This tool borrows from ContEst by Cibulskis et al the idea of estimating contamination from ref reads at hom alt sites. However, ContEst uses a probabilistic model that assumes a diploid genotype with no copy number variation and independent contaminating reads. That is, ContEst assumes that each contaminating read is drawn randomly and independently from a different human. This tool uses a simpler estimate of contamination that relaxes these assumptions. In particular, it works in the presence of copy number variations and with an arbitrary number of contaminating samples. In addition, this tool is designed to work well with no matched normal data. However, one can run GetPileupSummaries on a matched normal bam file and input the result to this tool.

Based on GATK4 CalculateContamination tool. For more detailed information, see GATK documentation,

Output

References

This tool is based on the GATK4 package.

For instructions on citing GATK4, please see GATK FAQ