RNA-seq / Differential expression using edgeR for multivariate experiments


Differential expression analysis for multifactor experiments using the generalized linear models (glm) -based statistical methods of the edgeR Bioconductor package.



This tool takes as input a table of raw counts from the different samples. The count file has to be associated with a phenodata file describing the experimental factors. These files are best created by the tool "Utilities / Define NGS experiment", which combines count files for different samples to one table and created a phenodata file for it.

You can choose to ignore in statistical testing those genes which are not expressed in either experimental group, or are expressed in very low levels (less than 5 counts). These genes have little chance of showing significant evidence for differential expression, and removing them reduces the severity of multiple testing adjustment of p-values.

Trimmed mean of M-values (TMM) normalization is used to calculate normalization factors in order to reduce RNA composition effect, which can arise for example when a small number of genes are very highly expressed in one experiment condition but not in the other.

Dispersion is estimated using Cox-Reid profile-adjusted likelyhood (CR) method. Trended dispersions are estimated prior to estimating tagwise dispersions.

Statistical analysis to identify differentially expressed genomic features (genes, miRNAs,...) is performed using a multivariate regression model. A maximum of three different variables, and their interactions can be specified for the model. It is highly recommended to always have at least two biological replicates for each experiment condition.

Note that unlike with basic edgeR or DESeq2, no filtering is done to the results table. Please notice that in the results table you will have all the comparisons between the different effects defined in the parameters -section. You can filter the table for example based on the adjusted p-values (FDR-columns) using the tool Utilities / Filter table by column value.

Include interactions

You can choose whether you want to include only the main effects ("Include interactions in the model" = "no"), the main effects and all the interaction terms ("yes") or the main effect 1 and its interactions with main effects 2 and 3 ("nested"). The "nested" option is needed when you want to do comparisons both between and within subjects (you might want to see the chapter 3.5. in EdgeR userguide. If you are using "nested" option, note that you need to choose the effects accordingly:
Main effect 1 = the group in which the comparison is done between the subjects ("disease" in the example)
Main effect 2 = the group in which the comparison is done within the subjects ("treatment" in the example)
Main effect 3 = the subjects / column explaining the pairing of the samples ("patient" in the example)

Example of the use of the "nested" option: We have an experiment with 3 cancer patients and 3 controls. There are 2 samples from each patient, before and after the treatment. The normal-cancer comparison is thus done between individuals, while the untreated-treated comparison is done within individuals. In the phenodata-file we mark create corresponding groups:
disease = normal/cancer
treatment = untreated/treated
patient = the individuals

Note that you want to mark the "control"-situation (normal patient, untreated) with smaller index.

In the parameters-section, we choose the "nested" -option and the effects:

In the resulting edger_glm.tsv -file there will be several columns corresponding to the different comparisons. In the example situation:
as.factor(disease) = comparison between the cancer patients and the controls
as.factor(disease)1:as.factor(treatment)2 = comparison between the untreated and treated control patients
as.factor(disease)2:as.factor(treatment)2 = comparison between the untreated and treated cancer patients



This tool uses the edgeR package for statistical analysis. Please cite the following articles:

MD Robinson, DJ McCarthy and GK Smyth. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics, 26 (1):139-40, Jan 2010.

DJ McCarthy, Y Chen and GK Smyth. Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation. Nucleic Acids Res, 40 (10):4288-97, May 2012.