Quality control / RNA-seq strandedness inference and inner distance estimation using RseQC

Description

Infers strandedness from RNA-seq reads. For paired end reads also calculates inner distance plot.

Parameters

Details

A subset of 200 000 reads is first made from the input FASTQ files. This makes the processing much faster, and has only negligible effect on results.

Next the reads are aligned against the selected reference genome using Bowtie2. The alignment is then compared to reference annotation to infer the strandedness of reads.

For non strand specific experiment the ratio of explanations should be close to 50:50

Fraction of reads failed to determine: 0.0000
Fraction of reads explained by "1++,1--,2+-,2-+": 0.5074
Fraction of reads explained by "1+-,1-+,2++,2--": 0.4926

For strand specific experiments there are two scenarios:

Reads in file 1 are always on the same strand as the gene:

Fraction of reads failed to determine: 0.0072
Fraction of reads explained by "1++,1--,2+-,2-+": 0.9441
Fraction of reads explained by "1+-,1-+,2++,2--": 0.0487
 

Reads in file 2 are always on the same strand as the gene:

Fraction of reads failed to determine: 0.0011
Fraction of reads explained by "1++,1--,2+-,2-+": 0.0053
Fraction of reads explained by "1+-,1-+,2++,2--": 0.9936

Some portion of there reads will remain undetermined. They may not align at all, or align to a region for which strandedness can not be determined. The distribution of the rest of the reads between explanations will give the strandedness information. For non-strand specific experiments we expect to see even distribution between explanations, and for strand specific experiments we expect to see all reads to comply to one explanation. Some deviation from this is normal due to mis-aligned reads, sampling etc, but large deviation is usually indicative of some problem.

A more in-depth explanation of the output format can be found in the RSeQC manual.

For paired end data inner distance plot is also calculated.

Output

References

This tool uses the RSeQC package. Please cite the article:

Wang L, Wang S, Li W* RSeQC: quality control of RNA-seq experiments Bioinformatics (2012) 28 (16): 2184-2185. doi: 10.1093/bioinformatics/bts356

Please see the RSeQC homepage for more details.