ChIP-seq / Find peaks using MACS, treatment only
Description
This tool identifies statistically significantly enriched genomic regions in ChIP-seq data.
The analysis is performed on one treatment samples alone, without taking into account samples from a control experiment.
Parameters
- File format (ELAND, BAM, BED) [BAM]
- MACS version (1.4.2, 2) [1.4.2]
- Mappable genome size (hg18, hg19, mm9, mm10, rn5, user-specified) [hg19]
- User-specified mappable genome size []
- Read length (1..200) [25]
- Bandwidth (1..1000) [200]
- P-value cutoff (0..1) [0.00001]
- Peak model (yes, no) [yes]
- Upper M-fold cutoff (1..100) [30]
- Lower M-fold cutoff (1..100) [10]
Details
This tool is based on the Model-Based Analysis of ChIP-Seq (MACS) algorithm. The bandwidth, p-value cutoff, whether to use a peak model and the M-fold cutoff parameters should be titrated to obtain optimal results.
A good starting point for optimizing the bandwidth, which determines the window size used by MACS to scan the genomic regions, is to input a value roughly half the size of the average fragment length of the DNA.
Lowering the p-value cutoff will increase the specificity of the found peak regions but may result in a restrictively low number of peaks. Conversely, applying a less stringent cutoff will improve sensitivity but may yield too high proportion of false positives.
By default MACS tries to build a peak model, based on the bandwidth and the M-fold cutoff limits, that is later used to determine statistically significantly enriched genomic regions.
A small value for the lower cutoff limit will provide a greater number of peak regions for model building, but could potentially include spurious data into the model that will adversely affect the peak finding results.
Setting it too high improves the quality of the model, but could exclude so many candidate model regions that model building fails.
If a good compromise setting is difficult to obtain, there is the option to turn off the model building altogether, resulting in MACS simply using a shift-size of 100 bp when searching for peaks.
Output
The analysis output consists of the following:
- positive-peaks.tsv: A table containing all the genomic regions, including a number of associated descriptive statistics, that have been found to be statistically significantly enriched at the p-value cutoff level defined by the user.
- analysis-log.txt: A text file listing the output from the various steps of the MACS algorithm, which can be useful for diagnostic purposes and to get to know the details of the process.
- model-plot.png: If the parameter that enables model building is turned on and the model building is successful, a plot of the model is generated.
The shape of the modelled peaks, together with the estimated shift size, may help in assessing the quality of the model.
Smooth curves and a shift size compatible with the expected binding properties of the transcription factor in question are likely indicators of a successful model.
References
This tool uses the MACS package for peak detection. Please read the following article for more detailed information.
Zhang Y, Liu T, Meyer CA, Eeckhoute J, Johnson DS, Bernstein BE, Nusbaum C,
Myers RM, Brown M, Li W, Liu XS. (2008) Model-based Analysis of ChIP-Seq (MACS), Genome Biology, 2008;9(9):R137.