Transform read counts

Description

Transforms raw read counts using the DESeq2 Bioconductor package. The input file has to contain all the genes, not just differentially expressed ones. Note that you can use the resulting transformed values only for visualization and clustering, not for differential expression analysis which needs raw counts.

Parameters

Details

This tool takes as input a table of raw counts. The count table has to be associated with a phenodata file describing the experimental groups. These files are best created by the tool "Utilities / Define NGS experiment", which combines count files for different samples to one table, and creates a phenodata file for it.

Both variance stabilizing transformation (VST) and regularized log transformation (rlog) aim to remove the dependence of the variance on the mean. In particular, genes with low expression level and therefore low read counts tend to have high variance, which is not removed efficiently by the ordinary logarithmic transformation. VST and rlog remove the experiment-wide trend of variance over mean calculated by the DESeq2 method. This dispersion calculation does not take into account the group information, and the transformation is therefore said to be blind.

VST runs faster than rlog. If the library size of the samples and therefore their size factors vary widely, the rlog transformation is a better option than VST. Both options produce log2 scale data which has been normalized by the DESeq2 method with respect to library size. You can also choose to skip the transformation and just obtain normalized data.

It is very important that the majority of the input genes don't have large differences in counts due to experimental groups. Therefore the input file should contain all the genes, not just the differentially expressed ones. If you need to obtain transformed counts for differentially expressed genes, you have to perform the transformation first on the whole count table, and then use the interactive Venn diagram to intersect the transformed count table with the list of differentially expressed genes:

  1. perform the transformation on the whole count table containing all the genes.
  2. select both the transformed table and the list of differentially expressed genes (use ctrl/cmd to select multiple files).
  3. select the interactive Venn diagram as a visualization method.
  4. click on the intersection area in the graph, go to the Selected tab, and click on the button Create dataset from selected.

Output

Depending on the transformation method chosen, the one of the following result files is produced.


References

This tool uses the DESeq2 package. Please read the following article for more detailed information:

M Love, W Huber and S Anders: Moderated estimation of fold change and dispersion for RNA-Seq data with DESeq2. Genome Biol. 2014 15:550