Chipster manual

Chipster training courses

We run several bioinformatics courses on different topics every year in Finland and abroad. The courses at CSC are open for everybody, but you can also contact us to discuss options for hosting a course on your site.

We make our course materials publicly available so that anyone can download them for their own use. The materials include slides and exercises, and many courses have also lecture videos. The exercise data are available as example sessions on the Chipster server, and we also provide ready-made analysis sessions which you can use as a reference when doing exercises on your own.

We also provide training accounts to our Chipster server in Finland. Note, that users from Finnish universities and research institutes can use their HAKA and VIRTU logins to access Chipster, so no training account is needed. However, we kindly ask you to inform us about upcoming Chipster courses (number of participants, type of analysis jobs) so that we can add computing resources to the server if needed.

We run currently the following Chipster courses (follow the links for more info). CSC runs also other bioinformatics courses, you can find the upcoming courses in our course calendar.

RNA-seq data analysis
Single cell RNA-seq data analysis
Virus detection using small RNA-seq
Microbial community analysis of amplicon sequencing data (16S)
Detection and annotation of genomic variants
ChIP-seq data analysis
Microarray data analysis

RNA-seq data analysis

This course introduces RNA-seq data analysis methods, tools and file formats. It covers all the steps from quality control and alignment to quantification and differential expression analysis, and also experimental design is discussed. The course takes two days (or one long day if you omit exercise sheets 3 and 4). You will learn how to

check the quality of reads with FastQC and MultiQC
remove bad quality data with Trimmomatic
infer strandedness with RseQC
align reads to the reference genome with HISAT2 and STAR
perform alignment level quality control using RseQC and SAMtools
quantify expression by counting reads per genes using HTSeq
check the experiment level quality with PCA plots and heatmaps
analyze differential expression with DESeq2 and edgeR
take multiple factors (including batch effects) into account in differential expression analysis

Course material (2020):

slides
lecture videos (2019)
exercises (data is available on Chipster server in the example sessions listed in the exercise sheets)

ENCODE data with two samples. These exercises cover the whole workflow.
drosophila pasilla dataset. These exercises focus on differential expression analysis and how to take confounding factors into account.
parathyroid dataset. As before, but even harder.
lung and lymphnode comparison. Test your new skills with minimal instructions!

additional material: We have collected instructions for RNA-seq and written a book which might be useful for you when getting started.

Single cell RNA-seq data analysis

There are two courses for single cell RNA-seq data analysis. The more recent one focuses on the analysis of 10X data starting from the digital gene expression matrix (DGE), while the older one also covers the preprocessing of DropSeq data from raw reads to a DGE. Both courses show how to find sub-populations of cells using clustering with the Seurat tools, but the older course uses Seurat v2 instead of v3. You will also learn how to compare two samples and detect conserved cluster markers and differentially expressed genes in them. The course takes one day.

Course 1 (October 2020)
You will learn how to

create Seurat v3 object
perform QC and filter out low quality cells
normalize expression values
detect highly variable genes using VST
scale data and regress out unwanted variability
perform principle component analysis (PCA) and select PCs to be used for clustering
cluster cells and find marker genes for a cluster
visualize clusters with UMAP and tSNE
run canonical correlation analysis (CCA) to identify common sources of variation between two datasets
integrate samples using the mutual nearest neighbor approach (anchors)
find conserved cluster marker genes for two samples
find differentially expressed genes in a cluster between two samples
visualize genes with cell type specific responses in two samples

Course material:

slides (2020)
lecture videos (2020)
exercises: Clustering 10X Genomics data and Integrated analysis of two samples (2020). The data are available on Chipster server in the example sessions listed in the exercise sheet.
Note: excellent lecture videos covering scRNA-seq theory from another course (May 2019)

Course 2 (March 2019)
You will learn how to

check the quality of reads with FastQC
tag reads with cell and molecular barcodes
trim and filter reads
align reads to the reference genome with HISAT2 and STAR
tag reads with gene names
visualize aligned reads in genomic context using the Chipster genome browser
estimate the number of usable cells by checking the inflection point
detect bead synthesis errors
create and filter DGE
create Seurat v2 object
regress out unwanted variability
detect variable genes and perform principle component analysis
cluster cells and find marker genes for a cluster
run canonical correlation analysis (CCA) to identify common sources of variation between two datasets
align two samples for integrated analysis
find conserved cluster markers for two samples
find differentially expressed genes in a cluster between two samples
visualize genes with cell type specific responses in two samples

Course material:

slides (March 2019)
exercises 1: Preprocessing DropSeq data. The data is available on Chipster server in the example session listed in the exercise sheet.
exercises 2: Clustering 10X Genomics data. The data is available on Chipster server in the example session listed in the exercise sheet.
exercises 3: Integrated analysis of two samples using 10X Genomics data. The data is available on Chipster server in the example session listed in the exercise sheet.

Virus detection using small RNA-seq

This course introduces the VirusDetect pipeline covering all the analysis steps and file formats. VirusDetect allows you to detect known viruses and identify news ones by sequencing small RNAs (siRNA) in host samples. siRNA sequences are assembled to contigs and compared to known virus sequences. The course takes about 5 hours. You will learn how to

run VirusDetect and interpret the result files
subtract reads originating from the host genome
set parameters for filtering the BLAST matches

Course material (2018):

slides
lecture video from a webinar
exercises (data is available on Chipster server in the example sessions listed in the exercise sheets)

Potato dataset with different amounts of reads.
Rasberry dataset demonstrates how to provide the host genome.

Microbial community analysis of 16S amplicon sequencing data

This course introduces microbial community analysis of (16S rRNA) amplicon sequencing data. It covers preprocessing, alignment to reference, taxonomic classification, and statistical analysis. The course takes one day. You will learn how to

check the quality of reads with FastQC and MultiQC
remove bad quality data with Trimmomatic and Mothur
remove redudancy, perform alignment and preclustering with Mothur
do taxonomic classification with Mothur
perform statistical analysis (based on R packages vegan, rich, biodiversityR, pegas, labdsv and DESeq2)

Course material (2020):

lecture videos
slides

Community analysis with Chipster
Introduction to community analysis (2017) by Anu Mikkonen

exercises. The data is available on Chipster server in the example session listed in the exercise sheet.

Detection and annotation of genomic variants

This course covers variant analysis from raw sequence reads to variant annotation, introducing the theory, analysis tools and file formats involved. The course takes one day. You will learn how to

check the quality control with FastQC and PRINSEQ
remove bad quality data with Trimmomatic
align reads to the reference genome with BWA
perform alignment level quality control with SAMtools
mark duplicates with Picard
call and filter variants with Samtools, BCFtools and VCFtools. Note that the GATK variant calling platform will be integrated in Chipster in 2019.
annotate variants with Variant Effect Predictor (VEP)
visualize variants in genomic context with the Chipster genome browser

Course material (2016):

slides

Variant analysis with Chipster
Ensembl variation data resources (Variant Effect Predictor VEP described from slide 52 onwards) by Ben Moore
Effects of variants on protein structure by Hanka Venselaar

lecture videos
exercises. The data is available on Chipster server in the example session listed in the exercise sheet.

ChIP-seq data analysis

This course covers ChIP-seq analysis from quality control and alignment to peak calling, motif detection, and pathway analysis. It introduces the theory, analysis tools and file formats involved. The course takes one day. You will learn how to

check the quality control with FastQC
align reads to the reference genome with Bowtie
call peaks with MACS2
filter peaks
visualize reads and peaks in genomic context with the Chipster genome browser
retrieve nearby genes
pathway analysis
motif detection

Course material (2016):

slides
exercises. The data is available on Chipster server in the example session listed in the exercise sheet.

Microarray data analysis

This course covers microarray data analysis from quality control and normalization to differential expression analysis, clustering and pathway analysis. It introduces the theory, analysis tools and file formats involved. The course takes one and half day. You will learn how to

perform quality control
normalize data
find differentially expressed genes and take batch effects into acount
perform clustering
visualize results in different ways
pathway analysis

Course material (2018):

slides (including exercises). The data is available on Chipster server in the example sessions listed in the exercises.