DimontPredictor allows for predicting binding sites in new data using a previously trained Dimont model. For training a Dimont model see tool "Dimont".
This tool may be useful if you, for instance, want to predict binding sites of a previously discovered motifs in other data sets, or if you want to try different p-values for filtering predictions.
Input sequences must be supplied in an annotated FastA format as generated using the "Dimont sequence extractor" tool. In the annotation of each sequence, you need to provide a value that reflects the confidence that this sequence is bound by the factor of interest. Such confidences may be peak statistics (e.g., number of fragments under a peak) for ChIP data or signal intensities for PBM data.
For instance, an annotated FastA file for ChIP-seq data comprising sequences of length 1000 centered around the peak summit could look like:
> peak: 500; signal: 515
> peak: 500; signal: 199
where the anchor point is given as 500 for the first two sequences, and the confidence amounts to 515 and 199, respectively. The FastA comment may contain additional annotations of the format
key1 : value1; key2: value2;....code>
Accordingly, you would need to set the parameter "Value tag" to
signalcode> for the input file.
The parameter "Weighting factor" defines the proportion of sequences that you expect to be bound by the targeted factor with high confidence. For ChIP data, the default value of 0.2 typically works well. For PBM data, containing a large number of unspecific probes, this parameter should be set to a lower value, e.g. 0.01.
The parameter "p-value" defines a threshold on the p-values of predicted binding sites, and only binding sites with a lower p-value are reported by DimontPredictor. The Dimont tool uses a p-value threshold of 1E-3, which is also the default value of DimontPredictor.
dimont-predictor-log.txt: Logfile, logfile of the DimontPredictor run.
dimont-predictor-predictions.txt: Predictions, binding sites predicted by DimontPredictor.
dimont-predictor-logo-rc.png: Sequence logo (rc), The sequence logo of the reverse complement of the predictions.
dimont-predictor-logo.png: Sequence logo, the sequence logo of the predictions.
If you use Dimont, please cite
J. Grau, S. Posch, I. Grosse, and J. Keilwagen. A general approach for discriminative de-novo motif discovery from high-throughput data. Nucleic Acids Research, 41(21):e197, 2013.