Dimont is a universal tool for de-novo motif discovery. Dimont has successfully been applied to ChIP-seq, ChIP-exo and protein-binding microarray (PBM) data.
Input sequences must be supplied in an annotated FastA format as generated using the "Dimont sequence extractor" tool. In the annotation of each sequence, you need to provide a value that reflects the confidence that this sequence is bound by the factor of interest. Such confidences may be peak statistics (e.g., number of fragments under a peak) for ChIP data or signal intensities for PBM data. In addition, you need to provide an anchor position within the sequence. In case of ChIP data, this anchor position could for instance be the peak summit. For instance, an annotated FastA file for ChIP-seq data comprising sequences of length 1000 centered around the peak summit could look like:
> peak: 500; signal: 515
> peak: 500; signal: 199
where the anchor point is given as 500 for the first two sequences, and the confidence amounts to 515 and 199, respectively. The FastA comment may contain additional annotations of the format
key1 : value1; key2: value2;....code>
Accordingly, you would need to set the parameter "Position tag" to
peak and the parameter "Value tag" to
signal for the input file.
For the standard deviation of the position prior, the initial motif length and the number of pre-optimization runs, we provide default values that worked well in our studies on ChIP and PBM data. However, you may want adjust these parameters to meet your prior information.
The parameter "Markov order of the motif model" sets the order of the inhomogeneous Markov model used for modeling the motif. If this parameter is set to 0, you obtain a position weight matrix (PWM) model. If it is set to 1, you obtain a weight array matrix (WAM) model. You can set the order of the motif model to at most 3.
The parameter "Markov order of the background model" sets the order of the homogeneous Markov model used for modeling positions not covered by a motif. If this parameter is set to -1, you obtain a uniform distribution, which worked well for ChIP data. For PBM data, orders of up to 4 resulted in an increased prediction performance in our case studies. The maximum allowed value is 5.
The parameter "Weighting factor" defines the proportion of sequences that you expect to be bound by the targeted factor with high confidence. For ChIP data, the default value of 0.2 typically works well. For PBM data, containing a large number of unspecific probes, this parameter should be set to a lower value, e.g. 0.01.
The "Equivalent sample size" reflects the strength of the influence of the prior on the model parameters, where higher values smooth out the parameters to a greater extent.
The parameter "Delete BSs from profile" defines if BSs of already discovered motifs should be deleted, i.e., "blanked out", from the sequence before searching for futher motifs.
dimont-log.txt: Logfile, logfile of the Dimont run.
dimont-predictions-*.txt: Predictions, binding sites predicted by Dimont.
dimont-logo-rc-*.png: Sequence logo (rc), the sequence logo of the reverse complement of the motif discovered by Dimont.
dimont-logo-*.png: Sequence logo, the sequence logo of the motif discovered by Dimont.
dimont-model-*.xml: Dimont model, the model (as XML) of the motif discovered by Dimont. Can be used in "DimontPredictor".
If you use Dimont, please cite
J. Grau, S. Posch, I. Grosse, and J. Keilwagen. A general approach for discriminative de-novo motif discovery from high-throughput data. Nucleic Acids Research, 41(21):e197, 2013.