MACS2 performs several steps as described below, ranging from duplicate filtering and peak model building to the actual peak detection and multiple testing correction. It has also an option to link nearby peaks together in order to call broad peaks.
If the read length parameter is set to zero, MACS2 detects read length automatically. MACS2 then proceeds to filter out duplicate reads. By default it calculates the maximum number of duplicate reads in a single position warranted by the sequencing depth, and removes redundant reads in excess of this number. Alternatively, you can select to keep only one read, or all duplicates.
MACS2 models the distance between the paired forward and reverse strand peaks from the data. It slides a window across the genome to find enriched regions, which have M-fold more reads than background. The size of the window is twice the bandwidth parameter. The expected background is the number of reads times their length divided by the mappable genome size. Note that the mappable genome size is always less than the real genome size because of repetitive sequence. The regions' fold enrichment must be higher than 10 and less than 30, but you can change these values if not enough regions are found. A smaller value for the lower cutoff provides more regions for model building, but it can also include spurious data into the model and thereby adversely affect the peak finding results. MACS2 uses 1000 enriched regions to model the distance d between the forward and reverse strand peaks.
In the actual peak detection phase, MACS2 extends the reads in the 3' direction to the fragment length obtained from modeling. If the model building failed or if it was switched off, the reads are extended to the value of the extension size parameter. If a control sample is available, MACS2 scales the samples linearly to the same read number. It then selects candidate peaks by scanning the genome again, now using a window size which is twice the fragment length. MACS2 calculates a p-value for each peak using a dynamic Poisson distribution to capture local biases in read background levels. If a control sample is available, it is used to calculate the local background. Finally, q-values are calculated using the Benjamini-Hochberg correction.
The analysis results consist of the following files:
Please cite the following article and the MACS2 website.
Zhang Y, Liu T, Meyer CA, Eeckhoute J, Johnson DS, Bernstein BE, Nusbaum C, Myers RM, Brown M, Li W, Liu XS. (2008) Model-based Analysis of ChIP-Seq (MACS), Genome Biology, 2008;9(9):R137.