CNA-seq / Correct for GC content


Takes the counts per bin for a CNA-seq data set and corrects them for GC content and mappability.



Correcting for GC content is necessary because it affects enzyme chemistry and therefore also the depth of sequencing coverage. Mappability represents the uniqueness of sequences in the genome and also needs to be corrected for. The R package performing the correction is QDNAseq.


The input data set with raw counts replaced by corrected ones. The data is also log2-transformed.


Scheinin et al. (2014) DNA copy number analysis of fresh and formalin-fixed specimens by shallow whole-genome sequencing with identification and exclusion of problematic regions in the genome assembly. Genome Res In Press.

GC content:
Benjamini and Speed (2012) Summarizing and correcting the GC content bias in high-throughput sequencing. Nucleic Acids Res 40: e72

Koehler et al. (2011) The uniqueome: a mappability resource for short-tag sequencing. Bioinformatics 27: 272-274