Seurat -Clustering


This tool clusters the cells, does non-linear dimensional reduction and finds markers for the clusters.



As input, give the Seurat R-object (Robj) from the Seurat PCA -tool.

If you have low number of cells in your data, try lowering the perplexity parameter (the expected number of neighbours.)

Seurat function FindClusters first constructs a KNN graph based on the Euclidean distance in PCA space, and refines the edge weights between any two cells based on the shared overlap in their local neighborhoods (Jaccards distance). To cluster the cells, the function uses the smart local moving algorithm to iteratively group cell groupings together with the goal of optimizing the standard modularity function. (See the links below for more information).
The resolution parameter sets the 'granularity' of the downstream clustering, with increased values leading to a greater number of clusters. We find that setting this parameter between 0.6-1.2 typically returns good results for single cell datasets of around 3K cells. Optimal resolution often increases for larger datasets -use a value above (below) 1.0 if you want to obtain a larger (smaller) number of communities.
The principal components to use need to be decided by the user based on the PCAplots from the Seurat PCA -tool.

Non-linear dimensional reduction (tSNE) is used for visualizing and exploring the dataset. Cells within the graph-based cluster determined above should co-localize on the tSNE plot -this is because the tSNE aims to place cells with similar local neighborhoods in high-dimensional space together in low-dimensional space. Same user-defined PCs are used here as with the clustering step. The tSNE result is visualized in the tSNEplot.pdf.

Next, Seurat function FindAllMarkers is used to identify the positive and negative markers for the clusters compared to all other cells. The min.pct parameter requires a gene to be detected at a minimum percentage in either of the two groups of cells, and the thresh.test parameter requires a gene to be differentially expressed (on average) by some amount between the two groups. Both of these parameters can be set to 0, but with a dramatic increase in time - since this will test a large number of genes that are unlikely to be highly discriminatory. The marker genes for each recognized cluster are written in the markers.tsv file.

For testing, Seurat currently implements "bimod" (likelihood-ratio test for single cell gene expression, McDavid et al., Bioinformatics, 2011, default), "roc" (standard AUC classifier), "t" (Students t-test), and "tobit" (Tobit-test for differential gene expression, as in Trapnell et al., Nature Biotech, 2014), "poisson", and "negbinom". The latter two options should only be used on UMI datasets, and assume an underlying poisson or negative-binomial distribution.

To get the markers for a specific cluster, you can use the tool Utilities / Filter table by column value . For example, to get the markers for cluster 2, fill in the parameters accordingly:
Column to filter by = cluster
Does the first column have a title = no
Cutoff = 2
Filtering criteria = equal-to

For more details, please check:
The Seurat tutorials

The Seurat clustering approach was heavily inspired by these manuscripts which applied graph-based clustering approaches to scRNA-seq data:
[SNN-Cliq, Xu and Su, Bioinformatics, 2015] and
CyTOF data [PhenoGraph, Levine et al., Cell, 2015].

Smart local moving algorithm:
[SLM, Blondel et al., Journal of Statistical Mechanics]