This tool clusters the cells, does non-linear dimensional reduction and finds markers for the clusters.

- Number of principal components to use [10]
- Resolution for granularity [0.6]
- Cluster biomarker gene has to be expressed in at least this fraction of cells [0.25]
- Differential expression threshold for a cluster biomarker gene [0.25]
- Which test to use for finding marker genes [bimod, roc, t, tobit, poisson, negbinom]
- Only positive changes [TRUE]

As input, give the Seurat R-object (Robj) from the Seurat PCA -tool.

Seurat function *FindClusters* first constructs a KNN graph based on the Euclidean distance in PCA space,
and refines the edge weights between any two cells based on the shared overlap in their local neighborhoods (Jaccards distance).
To cluster the cells, the function uses the smart local moving algorithm to iteratively group cell groupings together
with the goal of optimizing the standard modularity function.
(See the links below for more information).

The **resolution** parameter sets the 'granularity' of the downstream clustering,
with increased values leading to a greater number of clusters.
We find that setting this parameter between 0.6-1.2 typically returns good results for single cell datasets
of around 3K cells. Optimal resolution often increases for larger datasets
-use a value above (below) 1.0 if you want to obtain a larger (smaller) number of communities.

The **principal components to use** need to be decided by the user based on the PCAplots from the Seurat PCA -tool.

Non-linear dimensional reduction (tSNE) is used for visualizing and exploring the dataset. Cells within the graph-based
cluster determined above should co-localize on the tSNE plot -this is because the tSNE aims to place cells with similar
local neighborhoods in high-dimensional space together in low-dimensional space.
Same user-defined PCs are used here as with the clustering step.
The tSNE result is visualized in the **tSNEplot.pdf**.

Next, Seurat function *FindAllMarkers* is used to identify the positive and negative markers for the clusters compared
to all other cells.
The **min.pct** parameter requires a gene to be detected at a minimum percentage in either of the two groups of cells,
and the **thresh.test** parameter requires a gene to be differentially expressed (on average) by some amount
between the two groups. Both of these parameters can be set to 0, but with a dramatic increase in time -
since this will test a large number of genes that are unlikely to be highly discriminatory.
The marker genes for each recognized cluster are written in the **markers.tsv** file.

By default only positive markers are listed.

For testing, Seurat currently implements "bimod" (likelihood-ratio test for single cell gene expression, McDavid et al., Bioinformatics, 2011, default),
"roc" (standard AUC classifier), "t" (Students t-test),
and "tobit" (Tobit-test for differential gene expression, as in Trapnell et al., Nature Biotech, 2014),
"poisson", and "negbinom".
The latter two options should only be used on UMI datasets, and assume an underlying poisson or negative-binomial distribution.

To get the markers for a specific cluster, you can use the tool Utilities / Filter table by column value .
For example, to get the markers for cluster 2, fill in the parameters accordingly:

Column to filter by = cluster

Does the first column have a title = no

Cutoff = 2

Filtering criteria = equal-to

For more details, please check:

The Seurat tutorials

The Seurat clustering approach was heavily inspired by these manuscripts
which applied graph-based clustering approaches to scRNA-seq data:

[SNN-Cliq, Xu and Su, Bioinformatics, 2015]
and

CyTOF data
[PhenoGraph, Levine et al., Cell, 2015].

Smart local moving algorithm:

[SLM, Blondel et al., Journal of Statistical Mechanics]

- seurat_obj.Robj: The Seurat R-object to pass to the next Seurat tool, or to import to R. Not viewable in Chipster.
- tSNEplot.pdf: Quality control plots (explained in detail above)
- markers.tsv: Top markers