Pathways / Gene set test

Description

Groups genes to gene sets like KEGG pathways or GO categories, and tests differential expression for a gene set rather than for individual genes using the Globaltest Bioconductor package.

Parameters

Details

Traditional pathway analysis approaches look for pathway enrichment in a list of differentially expressed genes, which has been obtained using a statistical test beforehand. If the expression changes of the individual genes are small they might not be detected, and consequently the pathway they belong to is not found. Gene set tests, on the other hand, have the possibility of detecting also those pathways, where a large number genes exhibit a small change. The test statistics for pathway is a weighted average of test statistics for each gene.

This tool should be run on normalized, unfiltered data so that pathways include all the genes. You can choose to ignore pathways which contain less than a given number of genes. In addition to a result table, also plots showing the contributing genes are produced for both GO and KEGG results. You can select for how many pathways such a plot is produced. You can also select which multiple testing correction to apply, the default is the Benjamini and Hochberg method, which transforms p-values into false discovery rates. Note that in addition to testing for GO or KEGG pathways, you can also run this tool using the current list of genes as a "pathway".

Output

  1. A table summarizing the analysis results per gene set.
  2. The table includes information about the size of the gene set, how many individual gene for that gene set were tested, the gene set representative Q statistic, the expected Q value based on an asymptotic distribution, the variance in the expected Q statistic, the unadjusted p-values and the ones adjusted for multiple testing. If you would like to retrieve the genes listed as "tested", you can run the tool "Utilities/ Extract genes from GO term" on the normalized data set.

  3. An image illustrating the contributing genes in a barplot.
  4. This image shows which genes (or groups of genes) contribute to a given pathway to be significant. The bars are p-values of the individual genes. The genes are ordered in a hierarchical clustering graph using absolute correlation as the distance measure. The genes marked with black branches are most significantly associated with the experimental group tested for. Red bars mean a negative association (downregulation) and green bars a positive association (upregulation).

References

This tool uses Bioconductor package Globaltest. Please cite the articles:

Goeman, J. J., Oosting, J., Cleton-Jansen, A. M., Anninga, J. K., and van Houwelingen, J. C. (2005). Testing association of a pathway with survival using gene expression data. Bioinformatics, 21(9):1950–1957.

Goeman, J. J., van de Geer, S. A., de Kort, F., and van Houwelingen, J. C. (2004). A global test for groups of genes: testing association with a clinical outcome. Bioinformatics, 20(1):93–99.

Goeman, J. J., van de Geer, S. A., and van Houwelingen, J. C. (2006). Testing against a high-dimensional alternative. Journal of the Royal Statistical Society Series B-Statistical Methodology, 68(3):477–493.