Directional RNA-seq methods are gaining popularity. Several protocols and products are available for the library preparation step, and different tools and softwares have different options to take these into account. Since this has caused a lot of confusion due to incoherent parameter naming, we try to clarify this issue a bit here.
To be able to select the right parameters for your data, first you need to know which library prep method was used when generating your data. In general, there are three types of library preps:
Be extra careful to assign the files correctly in TopHat! Using these parameters assumes you are giving the files in specific order: read1, read2. In Chipster always check from the parameters window that your files are assigned correctly.
Why is this so important? If you use wrong directionality parameter, in the read counting step the reads are considered to be from the wrong strand. This means that in the cases where there are no genes on that other strand, you won't get any hits, and if there are genes in the same location on the other strand, your reads are counted for that wrong gene.
How can I check I chose correctly? It's a good idea to check that!
Using the tool Infer strandedness and inner distance from FASTQ: We added this tool under the Quality control category to help you. The tool aligns subsets of the input FASTQ files against the reference genome, and this alignment is then compared to the reference annotation to deduce the strandedness. Check out the help page of this tool for more information!
In Tophat: Check the mapping rate after the TopHat alignment (in Chipster, open file tophat-summary.txt). Note that you can run TopHat with the different parameters to a small part of the data and then compare the results.
In HTSeq: Check the number of reads that are not counted for any gene (=the "no-feature reads"). (In Chipster, open file htseq-count-info.txt).
Below we list some common library preparation kits and their corresponding parameters in different tools. Is your kit missing from the list? If you have the data generated with that kit and figure out the library type, please let us know too, so we can add that kit to the list below.
Information regarding the strand is not conserved (it is lost during the amplification of the mRNA fragments).
TruSeq RNA Sample Prep kit
TopHat / Cufflinks / Cuffdiff: library-type fr-unstranded
HTSeq: stranded -- no
Directional, first strand:
The second read (read 2) is from the original RNA strand/template, first read (read 1) is from the opposite strand. The information of the strand is preserved as the original RNA strand is degradated due to the dUTPs incorporated in the second synthesis step.
All dUTP methods, NSR, NNSR
TruSeq Stranded Total RNA Sample Prep Kit
TruSeq Stranded mRNA Sample Prep Kit
NEB Ultra Directional RNA Library Prep Kit
Agilent SureSelect Strand-Specific
TopHat / Cufflinks / Cuffdiff: library-type fr-firststrand
HTSeq: stranded -- reverse
Directional, second strand:
The first read (read 1) is from the original RNA strand/template, second read (read 2) is from the opposite strand. The directionality is preserved, as different adapters are ligated to different ends of the fragment.
Directional Illumina (Ligation), Standard SOLiD
ScriptSeq v2 RNA-Seq Library Preparation Kit
SMARTer Stranded Total RNA
Encore Complete RNA-Seq Library Systems
TopHat / Cufflinks / Cuffdiff: library-type fr-secondstrand
HTSeq: stranded -- yes
Note also that the --fr/--rf/--ff or "Order of mates to align" parameter in Bowtie has similar sounding parameter options: [--fr: "Forward/reverse", --rf: "Reverse/Forward", --ff: "Forward/forward"]. However, these parameters are a bit different story, as they explain how the paired end reads are oriented towards each other (-> <-, -> -> or <- ->). The default (--fr, -> <-) is appropriate for Illumina's paired-end reads: it means that read 1 appears upstream of the reverse complement of read 2, or vice versa. When running TopHat, the library-type parameter is delivered to Bowtie, so the user doesn't have to worry about that too much.