Bowtie for paired-end reads

Description

This tool aligns paired-end reads to publicly available genomes or transcriptomes. You need to supply the reads in two fastq files containing the reads in the same order. If you would like us to add new reference genomes to Chipster, please contact us. If you would like to align reads against your own datasets, please use the tool "Bowtie for paired-end reads and own genome".

Parameters

Details

Bowtie aligns reads to a reference sequence such as genome or transcriptome. There are two modes: mismatches are considered either throughout the read (this is the so-called v-mode when running Bowtie on command line), or only in the user-defined seed region in the high-quality left end of the read (n-mode in command-line Bowtie). In the latter case also base quality values at all mismatch positions are taken into account.

Bowtie needs you to specify the minimum and maximum insert size for valid paired-end alignments. If e.g. 60 is specified for minimum insert size and a paired-end alignment consists of two 20-bp alignments in the appropriate orientation with a 20-bp gap between them, that alignment is considered valid. A 19-bp gap would not be valid in that case. If 100 is specified for maximum insert size and a paired-end alignment consists of two 20-bp alignments in the proper orientation with a 60-bp gap between them, that alignment is considered valid. A 61-bp gap would not be valid in that case.

You also need to specify the upstream/downstream mate orientations for a valid paired-end alignment against the forward reference strand. The default setting "mate1 upstream of reverse complement of mate2 or vice versa" is appropriate for most Illumina datasets. With this setting candidate paired-end alignments are valid if mate1 appears upstream of the reverse complement of mate2 and the insert length constraints are met. Also, alignments are valid if mate2 appears upstream of the reverse complement of mate1 and all other constraints are met.

A paired-end alignment is reported as a pair of mate alignments, both on a separate line. The alignment for the mate that occurs closest to the beginning of the reference sequence (the "upstream" mate) is always printed before the alignment for the downstream mate.

If a read has a higher number of reportable alignments than what was allowed, the user can request these multireads to be put in a separate fastq file for further inspection. Similarly, the user can request that reads which don't align are put in a separate fastq file.

After running Bowtie, Chipster converts the alignment file to BAM format, and sorts and indexes it using the SAMtools package. This way the results are ready to be visualized in the genome browser.

Output

This tool returns the alignment in BAM format and an index file for it (.bai). It also produces a log file, which allows you to see what percentage of the reads align with the selected parameter settings. Optionally also fastq files are produced for the unaligned reads, or reads that align to multiple locations.

Reference

Langmead et al. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome (2009) Genome Biology 10:R25