RNA-seq / Differential expression analysis using DESeq
Description
Differential expression analysis using the DESeq Bioconductor package. If you have more than 2 experimental groups, please use DESeq2.
Both DESeq can accommodate a second experimental factor like pairing.
Parameters
- Column describing groups [group]
- Column describing additional experimental factor [EMPTY]
- Apply normalization (yes, no) [yes]
- Dispersion estimation method (parametric, local) [parametric]
- Use fitted dispersion values (when higher than original values, always) [when higher than original values]
- Multiple testing correction (none, Bonferroni, Holm, Hochberg, BH, BY) [BH]
- P-value cutoff (0-1) [0.05]
- Plot width (200-3200 [600]
- Plot height (200-3200) [600]
Details
This tool takes as input a table of raw counts from the different samples. The count file has to be associated with a phenodata file describing the experimental groups.
These files are best created by the tool "Utilities / Define NGS experiment", which combines count files for different samples to one table,
and creates a phenodata file for it. You can perform the analysis with DESeq or DESeq2, the former being more conservative.
When normalization is enabled, size factors for each sample are calculated the following way: Geometric mean is calculated for each gene across all samples.
The counts for a gene in each sample is then divided by this mean. The median of these ratios in a sample is the size factor for that sample. This procedure
corrects for RNA composition bias (which can arise for example when only a small number of genes are very highly expressed in one experiment condition but not in the other).
In addition to the experimental groups, you can give an additional factor like pairing to the analysis. If given, p-values in the output table are
from a likelihood ratio test of a model including the experimental groups and experimental factor vs a model which only includes the experimental factor.
A dispersion value is estimated for each gene through a model fit procedure, which can be performed in a "local" or "parametric" mode.
The local mode is more robust and should be used when there are no replicates.
Users can select to replace the original dispersion values by the fitted ones always, or only when the fitted value is higher than the original one
(more conservative option).
You need to have biological replicates of each experiment condition in order to estimate dispersion properly.
If you have biological replicates only for one condition, DESeq will estimate dispersion using the replicates of that single
condition. If there are no replicates at all, DESeq will estimate dispersion using the samples from the different conditions as replicates.
Output
The analysis output consists of the following files:
- de-list-deseq.tsv: Table containing the results of the statistical testing, including fold change estimates and p-values.
- de-list-deseq.bed: The BED version of the results table contains genomic coordinates and log2 fold change values.
- deseq_report.pdf: A PDF file containing:
- A scatter plot where the significantly differentially expressed genes are highlighted.
- Plot of dispersion estimates as a function of the counts values, with the fitted model overlaid.
- Plot of the raw and adjusted p-value distributions of the statistical test.
References
This tool uses the DESeq and DESeq2 packages for statistical analysis. Please read the following article for more detailed information:
S Anders and W Huber: Differential expression analysis for sequence count data. Genome Biology 2010, 11:R106.
M Love, W Huber and S Anders: Moderated estimation of fold change and
dispersion for RNA-Seq data with DESeq2.