CNA-seq / Correct for GC content

Description

Takes the counts per bin for a CNA-seq data set and corrects them for GC content and mappability.

Parameters

Method (ratio, median, none) [ratio]
Degree of smoothing (...) [0.65]
Distribution family to use (gaussian, symmetric) [symmetric]

Details

Correcting for GC content is necessary because it affects enzyme chemistry and therefore also the depth of sequencing coverage. Mappability represents the uniqueness of sequences in the genome and also needs to be corrected for. The R package performing the correction is QDNAseq.

Output

The input data set with raw counts replaced by corrected ones. The data is also log2-transformed.

References

QDNAseq:
Scheinin et al. (2014) DNA copy number analysis of fresh and formalin-fixed specimens by shallow whole-genome sequencing with identification and exclusion of problematic regions in the genome assembly. Genome Res In Press.

GC content:
Benjamini and Speed (2012) Summarizing and correcting the GC content bias in high-throughput sequencing. Nucleic Acids Res 40: e72

Mappability:
Koehler et al. (2011) The uniqueome: a mappability resource for short-tag sequencing. Bioinformatics 27: 272-274