DimontPredictor

Description

DimontPredictor allows for predicting binding sites in new data using a previously trained Dimont model. For training a Dimont model see tool "Dimont".

Inputs

A Dimont model as available from the output of a Dimont run.
An annotated FastA file with sequences, e.g., from ChIP-seq experiments; can be generated using the "Dimont data extractor" tool.

Parameters

p-value: The maximum p-value allowed for predicted binding sites.
Value tag: The tag for the value information in the FastA-annotation of the input file, default as generated by "Dimont data extractor".
Weighting factor: The value for weighting the data, a value between 0 and 1. Recommended values: 0.2 for ChIP-seq/ChIP-exo, 0.01 for PBM data.

Details

This tool may be useful if you, for instance, want to predict binding sites of a previously discovered motifs in other data sets, or if you want to try different p-values for filtering predictions.

Input sequences must be supplied in an annotated FastA format as generated using the "Dimont sequence extractor" tool. In the annotation of each sequence, you need to provide a value that reflects the confidence that this sequence is bound by the factor of interest. Such confidences may be peak statistics (e.g., number of fragments under a peak) for ChIP data or signal intensities for PBM data.

For instance, an annotated FastA file for ChIP-seq data comprising sequences of length 1000 centered around the peak summit could look like:

> peak: 500; signal: 515 ggccatgtgtatttttttaaatttccac... > peak: 500; signal: 199 GGTCCCCTGGGAGGATGGGGACGTGCTG... ...
where the anchor point is given as 500 for the first two sequences, and the confidence amounts to 515 and 199, respectively. The FastA comment may contain additional annotations of the format key1 : value1; key2: value2;....code>

Accordingly, you would need to set the parameter "Value tag" to signalcode> for the input file.

The parameter "Weighting factor" defines the proportion of sequences that you expect to be bound by the targeted factor with high confidence. For ChIP data, the default value of 0.2 typically works well. For PBM data, containing a large number of unspecific probes, this parameter should be set to a lower value, e.g. 0.01.

The parameter "p-value" defines a threshold on the p-values of predicted binding sites, and only binding sites with a lower p-value are reported by DimontPredictor. The Dimont tool uses a p-value threshold of 1E-3, which is also the default value of DimontPredictor.

Output

dimont-predictor-log.txt: Logfile, logfile of the DimontPredictor run.
dimont-predictor-predictions.txt: Predictions, binding sites predicted by DimontPredictor.
dimont-predictor-logo-rc.png: Sequence logo (rc), The sequence logo of the reverse complement of the predictions.
dimont-predictor-logo.png: Sequence logo, the sequence logo of the predictions.

Reference

If you use Dimont, please cite

J. Grau, S. Posch, I. Grosse, and J. Keilwagen. A general approach for discriminative de-novo motif discovery from high-throughput data. Nucleic Acids Research, 41(21):e197, 2013.