Alignment / TopHat2 for single end reads

Description

This tool aligns Illumina single end RNA-seq reads to publicly available genomes. You need to supply the reads in one or more FASTQ files. Chipster has GTF files for the genomes offered, but you can also provide your own. In general using a GTF file containing known gene and exon locations is recommended, because it improves the alignment process. If you would like us to add new reference genomes to Chipster, please contact us.

Parameters

Details

TopHat2 maps Illumina RNA-Seq reads to a genome in order to identify exon-exon splice junctions. The alignment process consists of several steps. If annotation is available as a GTF file, TopHat will extract the transcript sequences and use Bowtie2 to align reads to this virtual transcriptome first. Only the reads that do not fully map to the transcriptome will then be mapped on the genome. The reads that still remain unmapped are split into shorter segments, which are then aligned to the genome. Segment mappings are used to find potential splice sites. Sequences flanking a splice site are concatenated, and unmapped segments are mapped to them. Segment alignments are then stitched together to form whole read alignments.

The "anchor length" means that TopHat2 will report junctions spanned by reads with at least this many bases on each side of the junction. Note that individual spliced alignments may span a junction with fewer than this many bases on one side. However, every junction involved in spliced alignments is supported by at least one read with this many bases on each side. By default no mismatches are allowed in the anchor, but you can change this.

TopHat2 will ignore donor-acceptor pairs which are closer than the minimum intron length or further than the maximum intron length apart.

If your RNA-seq data was produced with a stranded/directional protocol, it is important that you select the correct strandedness option in the parameter "Library type":

You can read more about stranded data here.

After running TopHat2, Chipster indexes the BAM file using the SAMtools package. This way the results are ready to be visualized in the genome browser.

Output

This tool returns the following files:

In cases where Tophat2 for some reason fails, file tophat2.log is returned instead.

Reference

This tool is based on the TopHat package. Please cite the following article:

Kim D, Petrtea G, Trapnell C, et al. TopHat2: accurate alignments of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biology (2013) 14: R36.