Differential expression analysis using DESeq2


Differential expression analysis using the DESeq2 Bioconductor package. This tool allows you to have more than two experimental groups and account for a second experimental factor.



This tool takes as input a table of raw counts. The count table has to be associated with a phenodata file describing the experimental groups. These files are best created by the tool "Utilities / Define NGS experiment", which combines count files for different samples to one table, and creates a phenodata file for it.

DESeq2 performs an internal normalization where geometric mean is calculated for each gene across all samples. The counts for a gene in each sample is then divided by this mean. The median of these ratios in a sample is the size factor for that sample. This procedure corrects for library size and RNA composition bias, which can arise for example when only a small number of genes are very highly expressed in one experiment condition but not in the other.

As small numbers of replicates make it impossible to estimate within-group variance reliably, DESeq2 uses shrinkage estimation for dispersions and fold changes. A dispersion value is estimated for each gene through a model fit procedure. You need to have biological replicates of each experiment condition in order to estimate dispersion properly. If there are no replicates, DESeq will estimate dispersion using the samples from the different conditions as replicates.

DESeq2 fits negative binomial generalized linear models for each gene and uses the Wald test for significance testing. In addition to the group information, you can give an additional experimental factor like pairing to the analysis.

DESeq2 detects automatically count outliers using Cooks's distance and removes these genes from analysis. It also automatically removes genes whose mean of normalized counts is below a threshold determined by an optimization procedure. Removing these genes with low counts improves the detection power by making the multiple testing adjustment of the p-values less severe.


The analysis output consists of the following files. Note that if you have more than two experimental groups, the output figures sum up information from all pairwise comparisons.


This tool uses the DESeq2 package. Please read the following article for more detailed information:

M Love, W Huber and S Anders: Moderated estimation of fold change and dispersion for RNA-Seq data with DESeq2. Genome Biol. 2014 15:550