HISAT2 for single end reads

Description

This tool aligns Illumina single end RNA-seq reads to publicly available genomes.

Parameters

Genome (list of supported genomes) [latest human]
RNA-strandness (unstranded, F, R) [unstranded]
How many hits to report per read (1-1000000) [5]
Base quality encoding used (phred+33, phred+64) [phred+33]
Minimum intron length [20]
Maximum intron length [500000]
Allow soft clipping (yes, no) [yes]
Are long anchors required (yes, no) [no]

Details

HISAT2 (hierarchical indexing for spliced alignment of transcripts) is a highly efficient system for aligning reads from RNA sequencing experiments. HISAT uses an indexing scheme based on the Burrows-Wheeler transform and the Ferragina-Manzini (FM) index, employing two types of indexes for alignment: a whole-genome FM index to anchor each alignment and numerous local FM indexes for very rapid extensions of these alignments. HISAT's hierarchical index for the human genome contains 48,000 local FM indexes, each representing a genomic region of ~64,000 bp.

HISAT2 searches by default for up to 5 distinct, primary alignments for each read, but you can change this number. Primary alignments mean alignments whose alignment score is equal or higher than any other alignments. It is possible that multiple distinct alignments have the same score. Note that HISAT2 does not "find" alignments in any specific order, so for reads that have more than 5 distinct, valid alignments, HISAT2 does not guarantee that the 5 alignments reported are the best possible in terms of alignment score. By default soft clipping is allowed, meaning that the ends of the read don't need to align if this increases the alignment score.

If you are planning to do transcriptome assembly afterwards, you should set the long anchor parameter to yes. With this option, HISAT2 requires longer anchor lengths for de novo discovery of splice sites. This leads to fewer alignments with short anchors, which helps transcript assemblers improve significantly in computation and memory usage.

After running HISAT2, Chipster indexes the BAM file using the SAMtools package. This way the results are ready to be visualized in the genome browser.

Output

This tool returns the following files:

hisat.bam: BAM file containing the alignments
hisat.bam.bai: Index for the BAM file
hisat.log: Summary of the alignment results

Reference

This tool is based on the HISAT2 package. Please cite the following article: Kim D, Langmead B and Salzberg SL. HISAT: a fast spliced aligner with low memory requirements Nature Methods 2015.