Alignment / TopHat2 for single end reads and own genome

Description

This tool aligns Illumina single end RNA-seq reads to a genome provided as a FASTA file. You need to supply the reads in one or more FASTQ files.

Supplying a GTF file containing known gene and exon locations is recommended, because it improves the alignment process.

Parameters

Details

TopHat2 maps Illumina RNA-Seq reads to a genome in order to identify exon-exon splice junctions. The alignment process consists of several steps. If annotation is available as a GTF file, TopHat will extract the transcript sequences and use Bowtie2 to align reads to this virtual transcriptome first. Only the reads that do not fully map to the transcriptome will then be mapped on the genome. The reads that still remain unmapped are split into shorter segments, which are then aligned to the genome. Segment mappings are used to find potential splice sites. Sequences flanking a splice site are concatenated, and unmapped segments are mapped to them. Segment alignments are then stitched together to form whole read alignments.

The "anchor length" means that TopHat2 will report junctions spanned by reads with at least this many bases on each side of the junction. Note that individual spliced alignments may span a junction with fewer than this many bases on one side. However, every junction involved in spliced alignments is supported by at least one read with this many bases on each side. By default no mismatches are allowed in the anchor, but you can change this.

TopHat2 will ignore donor-acceptor pairs which are closer than the minimum intron length or further than the maximum intron length apart.

After running TopHat2, Chipster indexes the BAM file using the SAMtools package. This way the results are ready to be visualized in the genome browser.

If your RNA-seq data was produced with a stranded/directional protocol, it is important that you select the correct strandedness option in the parameter "Library type":

You can read more about stranded data here.

Output

This tool returns the following files:

In cases where Tophat2 for some reason fails, file tophat2.log is returned instead.

Reference

This tool is based on the TopHat package. Please cite the following article:

Kim D, Petrtea G, Trapnell C, et al. TopHat2: accurate alignments of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biology (2013) 14: R36.