HISAT2 for paired end reads

Description

This tool aligns Illumina paired end reads to publicly available genomes.

You need to supply the reads in two or more FASTQ files. Files can be compressed with gzip. You may also need two list files to assign the FASTQ files to each direction.

List files are optional if you provide just two FASTQ files. Chipster will try to assign the files to directions based on file names. This assumes the files are named so that the beginning of the name is identical and the directions are specified with _1 and _2, e.g. Abc123_1, Abc123_2. If your files are named differently, you need to provide list files to make sure the files are assigned correctly (see Details for more information).

Parameters

Details

HISAT2 (hierarchical indexing for spliced alignment of transcripts) is a highly efficient system for aligning reads from RNA sequencing experiments. HISAT2 uses an indexing scheme based on the Burrows-Wheeler transform and the Ferragina-Manzini (FM) index, employing two types of indexes for alignment: a whole-genome FM index to anchor each alignment and numerous local FM indexes for very rapid extensions of these alignments. HISAT's hierarchical index for the human genome contains 48,000 local FM indexes, each representing a genomic region of ~64,000 bp.

HISAT2 searches by default for up to 5 distinct, primary alignments for each read, but you can change this number. Primary alignments mean alignments whose alignment score is equal or higher than any other alignments. It is possible that multiple distinct alignments have the same score. The alignment score for a paired-end alignment equals the sum of the alignment scores of the individual mates. Note that HISAT2 does not "find" alignments in any specific order, so for reads that have more than 5 distinct, valid alignments, HISAT2 does not guarantee that the 5 alignments reported are the best possible in terms of alignment score. By default soft clipping is allowed, meaning that the ends of the read don't need to align if this increases the alignment score.

If you are planning to do transcriptome assembly afterwards, you should set the long anchor parameter to yes. With this option, HISAT2 requires longer anchor lengths for de novo discovery of splice sites. This leads to fewer alignments with short anchors, which helps transcript assemblers improve significantly in computation and memory usage.

After running HISAT2, Chipster indexes the BAM file using the SAMtools package. This way the results are ready to be visualized in the genome browser.

Output

This tool returns the following files:

Reference

This tool is based on the HISAT2 package. Please cite the following article: Kim D, Langmead B and Salzberg SL. HISAT: a fast spliced aligner with low memory requirements Nature Methods 2015.