Statistical analysis for marker gene studies
Description
This tool produces several visualizations (rarefaction and rank-abundance curves and an RDA plot), and compares the diversity between groups of samples using several ANOVA-type of analyses.
It also performs indicator species analysis and contribution diversity approach.
Parameters
- Method for standardizing species abundance values (total, normalize, pa, chi.square, hellinger, log) [hellinger]
Details
This tool takes two input files: a count table that contains the counts of the identified species or operational taxonomic units (OTUs) in each sample, and a phenodata
table that describes the grouping of the samples. Statistical tests only work for datasets which contain 2-3 groups.
The analyses are based on the functionality of the R packages vegan, rich, biodiversityR, pegas and labdsv.
Visual data analysis
- Rarefaction curve allows you to assess how much the sampling efficiency affects the number of observed taxa or OTUs, e.g. did you sample both groups equally well?
- Rank-abundance curve displays the relative species abundances or evenness across the groups.
- Redundancy Analysis (RDA) ordination plot shows if the explanatory variable (group) explains the difference between samples.
Before running the RDA analysis, the species abundance values are standardized using the method specified by the parameter.
Statistical data analysis
- ANOVA type of statistical analyses, which compare the diversity or standardized abundance between the groups (for more information, please see the references below):
- Analysis of Molecular Variance (AMOVA). This is based on a euclidean distance matrix between sequences. Distances (or their variance, to be more exact) are partitioned according to a grouping variable
into a within group and between groups variance (this is similar to standard one-way ANOVA).
- Permutational Multivariate Analysis of Variance Using Distance Matrices.
- Multivariate homogeneity of groups dispersions (variances). The betadisper is a multivariate analogue of Levene's test for homogeneity of variances.
Non-euclidean distances between objects and group centroids are handled by reducing the original distances to principal coordinates. This procedure has been used also as a means of assessing beta diversity.
- Indicator species analysis tries to find the taxa that differentiates the groups best
- Indicator species analysis using the Dufrene-Legendre method tries to find the "species" (taxa or OTUs) that have a high specificity to a single group. Calculates the indicator value (fidelity and relative abundance) of species.
- Indicator Species Analysis Minimizing Intermediate Occurrences calculates the degree of the species being present or absent in a certain group of samples.
Output
The analysis output consists of the following:
- rank-abundance_rarefaction_RDA.pdf: plots
- stat-results.txt: test results
References
Consult the R packages vegan, rich, biodiversityR, pegas and labdsv for more details about the methods.
Excoffier, L., Smouse, P. E. and Quattro, J. M. (1992) Analysis of molecular variance inferred from metric distances among DNA haplotypes: application to human mitochondrial DNA restriction data. Genetics, 131, 479-491.
Anderson, M.J. 2001. A new method for non-parametric multivariate analysis of variance. Austral Ecology, 26: 32-46.
Anderson, M.J. (2006) Distance-based tests for homogeneity of multivariate dispersions. Biometrics 62, 245-253.
Lu, H. P., Wagner, H. H. and Chen, X. Y. 2007. A contribution diversity approach to evaluate species diversity. Basic and Applied Ecology, 8, 1-12.
Dufrene, M. and Legendre, P. 1997. Species assemblages and indicator species: the need for a flexible asymmetrical approach. Ecol. Monogr. 67(3):345-366.
Aho, K., D.W. Roberts, and T.W.Weaver. 2008. Using geometric and non-geometric internal evaluators to compare eight vegetation classification methods. J. Veg. Sci. In press.