Seurat -Clustering and detection of cluster marker genes

Description

This tool clusters cells, visualizes the result in a tSNE plot, and finds marker genes for the clusters.

Parameters

Number of principal components to use [10]
Resolution for granularity [0.6]
Perplexity, expected number of neighbors for tSNE plot [30]
Point size in tSNE plot [30]
Min fraction of cells where a cluster marker gene is expressed [0.25]
Differential expression threshold for a cluster marker gene [0.25]
Which test to use for finding marker genes [wilcox]

Details

Cells are clustered using principal components (PCs) rather than genes. Therefore you need to give as input the Seurat R-object from the Seurat PCA -tool. That tool also produces PC heatmaps and the elbow plot, which help you to decide the number of PCs to use.

Graph-based clustering is performed using the Seurat function FindClusters, which first constructs a KNN graph using the Euclidean distance in PCA space, and then refines the edge weights between any two cells based on the shared overlap in their local neighborhoods (Jaccards distance). It then cuts the graph in clusters using the Leuvain algorithm which optimizes the standard modularity function (please see the links below for more information).
The resolution parameter sets the granularity, with increased values leading to more clusters. It has been found that setting this parameter between 0.6-1.2 typically returns good results for single cell datasets of around 3K cells. Optimal resolution often increases for larger datasets -use a value above (below) 1.0 if you want to obtain a larger (smaller) number of communities.

In order to visualize the clusters, non-linear dimensional reduction is performed using tSNE on the same PCs as used for the graph-based clustering, and the tSNE plot is then colored by the clustering results. Cells belonging to the same cluster should co-localize on the tSNE plot, because tSNE aims to place cells, which have a similar local neighborhood in high-dimensional space, together in low-dimensional space. The perplexity parameter is a guess about the number of close neighbors each cell has, so it allows you to balance attention between local and global aspects of the data (read more). If you have low number of cells, try lowering the perplexity value.

Next, Seurat function FindAllMarkers is used to identify positive and negative marker genes for the clusters. These genes are differentially expressed between a cluster and all the other cells. You can filter out genes prior to statistical testing by requiring that a gene has to be expressed in at least a certain fraction of cells in either of the two groups (min.pct=0.25). You can also require that the change in expression has to be at least certain percentage between the groups (thresh.test=0.25). Both of these parameters can be set to 0, but with a dramatic increase in time since this will test a large number of genes that are unlikely to be highly discriminatory. The marker genes for each cluster are written in the markers.tsv file.

Seurat currently implements the following tests:

"wilcox": Wilcoxon rank sum test (default)
"bimod": Likelihood-ratio test for single cell gene expression, (McDavid et al., Bioinformatics, 2013)
"roc": Standard AUC classifier
"t": Student's t-test
"tobit": Tobit-test for differential gene expression (Trapnell et al., Nature Biotech, 2014)
"poisson": Likelihood ratio test assuming an underlying poisson distribution. Use only for UMI-based datasets
"negbinom": Likelihood ratio test assuming an underlying negative binomial distribution. Use only for UMI-based datasets
"MAST": GLM-framework that treates cellular detection rate as a covariate (Finak et al, Genome Biology, 2015)

The "Poisson" and "negbiom" options should ONLY be used on UMI datasets, as they assume an underlying poisson and negative-binomial distribution, respectively. Please note that the DESeq2 method has not been included, because it was not designed for situations where there are thousands of samples (cells) and it is therefore very slow.

The result file contains marker genes for all the clusters. You can retrieve markers for a specific cluster using the tool Utilities / Filter table by column value. For example, to get the markers for cluster 2, fill in the parameters accordingly:
Column to filter by = cluster
Does the first column have a title = no
Cutoff = 2
Filtering criteria = equal-to

For more details, please check:
The Seurat tutorials

The Seurat clustering approach was heavily inspired by the manuscripts SNN-Cliq, Xu and Su, Bioinformatics, 2015 and PhenoGraph, Levine et al., Cell, 2015 which applied graph-based clustering approaches to scRNA-seq data and CyTOF data, respectively.

Output

seurat_obj.Robj: The Seurat R-object to pass to the next Seurat tool, or to import to R. Not viewable in Chipster.
tSNEplot.pdf: Cluster visualization in tSNE plot, heatmap showing the expression of ten top marker genes (in terms of fold change) for each cluster.
markers.tsv: Top marker genes