Seurat -Compare clusters
Description
Find the markers for a specific cluster compared to another cluster(s).
Parameters
- Number of the cluster of interest [1]
- Cluster to compare to [all others]
- Min.pct [0.25]
- Differential expression threshold for a cluster marker gene [0.25]
- Which test to use for finding marker genes [wilcox]
Details
Seurat function FindMarkers is used to identify positive and negative marker genes for the clusters of interest,
determined by the user.
By default, differentially expressed genes are tested between the cluster of interest and all the other cells by
default.
User can also define to compare the cluster of interest to another cluster or clusters by typing
the numbers of the clusters to compare to in the parameter field.
When comparing to group of clusters, separate the clusters with a comma (,).
You can filter out genes prior to statistical testing by requiring that a gene has to be expressed in at
least a certain fraction of cells in either of the two groups
(min.pct=0.25).
You can also require that the change in expression has to be at least certain percentage between the groups
(thresh.test=0.25).
Both of these parameters can be set to 0, but with a dramatic increase in time since this will test a large
number of genes that are unlikely to be highly discriminatory.
The marker genes for each cluster are written in the markers.tsv file.
Seurat currently implements the following tests:
- "wilcox": Wilcoxon rank sum test (default)
- "bimod": Likelihood-ratio test for single cell gene expression, (McDavid et al., Bioinformatics, 2013)
- "roc": Standard AUC classifier
- "t": Student's t-test
- "tobit": Tobit-test for differential gene expression (Trapnell et al., Nature Biotech, 2014)
- "poisson": Likelihood ratio test assuming an underlying poisson distribution. Use only for UMI-based
datasets
- "negbinom": Likelihood ratio test assuming an underlying negative binomial distribution. Use only for
UMI-based datasets
- "MAST": GLM-framework that treates cellular detection rate as a covariate (Finak et al, Genome Biology,
2015)
The "Poisson" and "negbiom" options should ONLY be used on UMI datasets, as they assume an underlying
poisson and negative-binomial distribution, respectively.
Please note that the DESeq2 method has not been included, because it was not designed for situations where there
are thousands of samples (cells) and it is therefore very slow.
The markers.tsv result file contains marker genes and associated statistics for all the clusters:
- p-val = p-values for the differentially expressed genes (larger the p-value -> higher the
likelihood that the gene is in the list just be chance)
- avg_logFC = average log fold change (how much higher (lower) the expression of this gene is in
the particular
cluster, compared to all the other cells
- pct.1 = what percentage of the cells in the particular cluster show some expression for this gene
- pct.2 = what percentage of the cells not in the particular cluster (=all the other cells)
show some expression for this gene
- p-val_adj = adjusted/corrected p-value. This value is multiple testing corrected:
when we test over thousands of genes, we would statistically start getting some significantly
differentially
expressed genes just by chance. There are different methods to correct for this,
here a Bonferroni correction is used. When filtering the table and reporting your results,
use this value.
For more details, please check:
The Seurat tutorials
Output
- markers.tsv : Marker genes for the cluster of interest