ChIP-seq and FAIRE-seq / Find peaks using F-seq

Description

This tool will search for statistically significantly enriched broad peaks, such as regions of open chromatin or transcription factor / histone binding sites, in sequencing data.

Parameters

File format (ELAND, BAM, BED) [BAM]
Score cutoff (0..100) [4]
Fragment size (-1..1000) [-1]
Feature length (0..1000) [800]
Extend alignments (0..1000) [0]

Details

F-seq identifies peaks by calculating kernel density estimation at each base. These kernel density estimation-based probabilities are directly proportional to the probability of seeing a sequence read at that location. While file format and fragment size are defined by the experiment parameters chosen at the time of the experiment, the feature length, score cutoff and whether to extend alignments should be titrated to obtain optimal results.

Increasing the score cutoff will increase the specificity of the found peak regions but may result in a restrictively low number of peaks. Conversely, applying a less stringent cutoff will improve sensitivity but may yield too high proportion of false positives.

By default F-seq estimates the fragment size from data. However, if your data contains less than 50000 mappable reads, this parameter must be set manually. A value roughly the size of the average fragment length of the DNA is a good starting point.

Feature length controls the smoothness of the kernel density estimates. Larger values will lead to smoother kernel density estimation. When analyzing FAIRE-seq data, a value roughly the size of the open chromatin region serves a good starting point (in human cell lines ~800). In the case of DNase-seq, set this parameter to 0.

Extending mapped reads artificially to the full fragment size can improve prediction. This approach was, for example, applied in Gaulton et al. Nat Genet. 2010 A map of open chromatin in human pancreatic islets.

Output

The analysis output consists of the following:

peaks.bed: A table containing all the genomic regions, including a number of associated descriptive statistics, that have been found to be statistically significantly enriched at the score cutoff level defined by the user.

References

This tool uses the F-seq package for peak detection. Please cite the following article:

Boyle AP, Guinney J., Crawford GE, Furey TS. (2008) F-Seq: a feature density estimator for high-throughput sequence tags, Bioinformatics, 2008;24(21):2537.