ChIP-seq and FAIRE-seq/ Find motifs with GADEM and match to JASPAR
Description
Given a set of genomic regions, this tool discovers sequence motifs and matches them against the JASPAR database of known transcription factor binding sites.
For each motif, 10 best matches to JASPAR are reported.
Parameters
- P-value cutoff (0...1) [0.0002]
- E-value cutoff (0...1) [0.01]
- Genome ([BSgenome.Hsapiens.UCSC.hg18: hg18, BSgenome.Hsapiens.UCSC.hg19, BSgenome.Mmusculus.UCSC.mm9, BSgenome.Mmusculus.UCSC.mm10, BSgenome.Rnorvegicus.UCSC.rn5, BSgenome.Dmelanogaster.UCSC.dm3]) [BSgenome.Hsapiens.UCSC.hg19]
Details
For a thorough description of the technical details please consult the original publications cited in the References section below. Briefly, the analysis proceeds through the following steps:
- Given a set of genomic regions, the tool first performs a de novo motif discovery in an unseeded fashion.
The starting set of position weighted matrices (PWM) is obtained through a combination of space dyads and expectation maximization procedures.
Further optimization is achieved using genetic algorithms techniques.
- The discovered motifs are matched against the PWM of known transcription factors in the JASPAR database.
- The top ten best matching transcription factors are reported for each motif, logo plots created, and E-values scoring the match strength are calculated.
Output
The analysis output consists of the following:
- motif-analysis-summary.tsv: This file summarizes the analysis with information on
- - number of consensus motifs
- - number of matches per motif
- - matches repartition
- - the UPAC nucleotide sequences for the motifs
- - number of occurrences of consensus sequence for each motif
- - the position weighted matrices for each motif
- logo-plot-(...).pdf: For each consensus motif, the ten best matching transcription factors from the JASPAR database are represented with a logo plot.
References
This tool uses the following Bioconductor packages:
- MotIV
- rGADEM
- BSgenome annotation packages
For more details refer to these publications:
S. Mahony, P.V. Benos "STAMP: a web tool for exploring DNA-binding motif similarities." Nucl
Acids Res, (2007) 35:W253-258
S Mahony, PE Auron, PV Benos, "DNA familial binding profiles made easy: comparison of
various motif alignment and clustering strategies", PLoS Computational Biology (2007) 3(3):e61
L. Leiping. GADEM: A Genetic Algorithm Guided Formation of
Spaced Dyads Coupled with an EM Algorithm for Motif Discovery.
J Comput Biology, (2009) Feb;16(2):317-29.