Weeder

Description

This tools retrieves promoter sequences for a set of genes and finds shared sequence motifs in them. Currently the tool works only for human, mouse, rat, drosophila and yeast genes.

Parameters

Details

This tool retrieves upstream sequences for the specified genes and submits them to the Weeder program. It needs to access the chip-specific annotations, so if you have not specified the chiptype during normalization of, e.g., Illumina data, it will not work. RefSeq IDs are used for retrieving promoter sequences constructed and annotated by UCSC genome browser staff. The same promoter sequences can be downloaded as a single FastA-formatted file from UCSC Golden Path folder. User can define how long promoter sequences are used for the analysis:

		Human		Mouse		Rat		Drosophila	Yeast
Small		1000 bp		1000 bp		1000 bp		1000 bp 	500 bp
Medium		2000 bp		2000 bp		2000 bp		2000 bp		1000 bp
Large		5000 bp		5000 bp		5000 bp		5000 bp		2500 bp

After retrieving the promoter sequences, they are submitted to the Weeder program that finds common motifs (putative transcription factor binding sites). Weeder can find motifs of two different sizes. The sizes are the same for all species. The length of the common motif and the number of allowed mismatches in the motifs varies with the setting:

			Length		Mismatches
Small			6		1
			8		2
Medium			10		3

In addition, the user can define:

Output

Tool returns a specified number of common motifs written to an HTML file. These can then be further analyzed to verify whether they are known transcription factor binding sites.

Reference

Pavesi et al (2004) Nuc Acids Res. Jul (W199-203)