Statistics / Linear modelling

Description

Statistical testing of multiple dependent variables at the same time using linear modelling.

Parameters

main.effect1 [group]
main.effect2 [EMPTY]
main.effect3 [EMPTY]
technical.replication [EMPTY]
pairing [EMPTY]
treat.main.effect1.as.factor (yes,no) [no]
treat.main.effect2.as.factor (yes,no) [no]
treat.main.effect3.as.factor (yes,no) [no]
adjust.p.values (yes,no) [yes]
p.value.adjustment.method (none, Bonferroni, Holm, Hochberg, BH, BY) [BH]
Interactions (main, two-way, three-way) [main]

Details

This tool integrates linear modelling to Chipster as implemented in the limma package. In order to use this tool, you need to define your experimental setup in the phenodata file by adding new columns to the table. Experimental factors are then described in these columns using numbers. For example in a study where you would like to compare expression in both males and females and cancerous and non-cancerous tissue, two columns in the phenodata are needed: gender and group.

You can have a maximum of three factors (parameters main.effect1...main.effect3). The parameter technical replication allows you to specify which samples are technical replicates (the same RNA hybridized on different chips), and it is treated in a special way in the linear models (using a mixed model). The parameter pairing allows you to describe which samples are paired (for example derived from the same individual), and it's also treated in a special way.

Factors can either be treated as continuous (e.g., time) or non-continuous (e.g., cell-type). If the factor is treated as continuous (linear), it is also inputted into the model as a continuous variable. If the factor needs to be treated as non-continuous (non-linear), this needs to be specified by changing the corresponding parameter treat.main.effect1.as.factor...treat.main.effect3.as.factor to "yes". When treated as a factor, dummy contrasts for comparing all possible factor levels to the first one are automatically constructed.

It is possible to have interactions of the factors in the model, and it is possible to define which kind of interactions need to be reported. The model can contain either the main effects of the factors only (no interactions), or all two-way or three-way interactions. Interactions are put into the model considering their marginality, so main effects are always included in the models that contain interactions.

Output

Two tab-delimited text files are output as a result of the analysis. The "limma-design.tsv" table contains the design matrix for the linear model as required by the limma package, and can be used to make sure that the analysis was setup appropriately. The "limma.tsv" table combines the p-values and fold changes results, together with the expression values and annotation information for the probes tested, in a convenient summary table that is suitable for further processing or analysis. In this table there are p-values for all contrasts for each of the main effects, the number and factor level being reflected in the column headers. For example, a column labeled "chip.p.adjusted.main13.tsv" refers to the comparison of level 3 to level 1 (the reference level) for the first (1) of the main effects. Same applies to the fold changes. When an effect has been setup to be treated as "factor" the fold change value is essentially the log2 of the ratio between the averaged expression values of samples for a particular factor level and the first (reference) factor level. In the case the effect is treated as "linear", the fold change values are actually an estimate of the slope of the linear regression fitted to the data.

Note that you usually want to pick the right p.adjusted -column from the result table for further analysis: if you for example are interested in the differences between the levels of the first main effect (usually in the “group” column in phenodata), you should sort and filter your table according to p.adjusted.main12 -column. (You can use Preprocessing / Filter using a column value tool for this.)

The Intercept is sort of a baseline: the expression level for that group for which all the main effects are on their reference level (marked with the smallest number in phenodata file). So the actual test is sort of superfluous: we are testing whether this baseline is in zero -whether this gene is expressed at all. Naturally this is not usually the case, and thus the p-values in the intercept column are usually very small. This is however rarely of any interest.

References

This tool uses Bioconductor package limma. Please cite the following articles:

Smyth, G. K. (2004). Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Statistical Applications in Ge- netics and Molecular Biology, Vol. 3, No. 1, Article 3.

Smyth, G. K., Michaud, J., and Scott, H. (2005). The use of within-array replicate spots for assessing differential expression in microarray experiments. Bioinformatics 21(9), 2067-2075.