Produce species count table and phenodata

Description

This tool generates a count table and a phenodata file, which you can use to assign samples to different experimental groups for further statistical analysis.

Parameters

Cutting level for taxonomic names (0-9) [0]
Rarefy counts (yes, no) [no]
Produce binary table instead of counts(yes, no) [no]

Details

You need to give the following two input files (they can be produced using the tool Classify sequences to taxonomic units):

sequences-taxonomy-assignment.txt
picked.count_table

You can set the cutting level for taxonomic names (0 means output all levels).
If you set the parameter Produce binary table instead of counts = yes, sequence counts are converted to 0 and 1, indicating the presence and absence of species. This kind of binary table is typically used when studying the co-occurence of species.

Typically your samples have different number of sequences. If you choose to rarefy counts, the count table will be subsampled so that each sample will have the same number of sequences as your smallest sample. Note that rarefying might not be the best way to normalize microbiome data, and alternative methods including those implemented in the DESeq2 and edgeR Bioconductor packages have been shown to work well. In Chipster these packages are available in the following tools, and you can use them for differential abundance analysis, PCA and heatmaps for microbiome data giving the file counttable_transposed.tsv as input:

Quality control / PCA and heatmap of samples with DESeq2
RNA-seq / Differential expression using DESeq2
RNA-seq / Differential expression using edgeR
RNA-seq / Differential expression using edgeR for multivariate experiments

Output

The analysis output consists of the following:

counttable.tsv: Count table where rows are samples and columns are taxonomic groups (species).
counttable_transposed.tsv: Count table where rows are species and columns are samples. This tool is suitable for the DEseq2 and edgeR -based tools in Chipster. Note that if you request to rarefy counts or produce binary table, this file is not produced.
phenodata.tsv: Phenodata file allowing you to assign samples to experimental groups.