Seurat -PCA


This tool performs principal component analysis (PCA) on the highly variable genes across the single cells, selected using the tool Filtering, regression and detection of variable genes.



As input, give the Seurat R-object (Robj) from the tool Seurat -Filtering, regression and detection of variable genes.

To overcome the extensive technical noise in any single gene for scRNA-seq data, Seurat clusters cells based on their PCA scores, with each PC essentially representing a 'metagene' that combines information across a correlated gene set. Determining how many PCs to include downstream is therefore an important step, but it can be challenging. The result file PCAplots.pdf contains several plots which can help to explore the primary sources of heterogeneity in the dataset and to decide which principal components to include in the downstream analysis.

In the first plot, the top genes associated with the first two principal components are shown. Second plot shows the principal components 1 and 2 for the cells in the dataset, and the third plot is a heatmap focusing on principal component 1. We also list the top 5 genes associated with high/low loadings for the first 5 PCs in the PCAgenes.txt file.

Next plots can be used to decide which principal components to use. By default, 12 heatmaps are plotted, focusing on the 12 first principal components, in which both cells and genes are sorted by their principal component scores. You can change the number of heatmaps to plot. Explore especially the PCs which you choose for downstream analysis: heatmaps can display the 'extremes' across both genes and cells, and can be useful to exclude PCs that may be driven primarily by ribosomal/mitochondrial or cell cycle genes.

The last plot shows the standard deviations of the principal components. Here, the cutoff is where there is an elbow in the graph.

For more details, please check the Seurat tutorials.