Dimont sequence extractor prepares an annotated FastA file as required by Dimont from a genome (in FastA format) and a tabular file (e.g., BED, GTF, narrowPeak,...).
This version of the tool allows to use custom genomes.
The regions specified in the tabular file are used to determine the center of the extracted sequences. All extracted sequences have the same length as specified by parameter "Width".
In case of ChIP data, the center position could for instance be the peak summit. An annotated FastA file for ChIP-seq data comprising sequences of length 1000 centered around the peak summit might look like:
> peak: 500; signal: 515
ggccatgtgtatttttttaaatttccac...
> peak: 500; signal: 199
GGTCCCCTGGGAGGATGGGGACGTGCTG...
...
where the anchor point is given as 500 for the first two sequences, and the confidence amounts to 515 and 199, respectively.
extracted.fa
: Extracted sequences, the sequences extracted from the given genome using the supplied region specifications.
If you use Dimont, please cite
J. Grau, S. Posch, I. Grosse, and J. Keilwagen. A general approach for discriminative de-novo motif discovery from high-throughput data. Nucleic Acids Research, 41(21):e197, 2013.