BWA for single end reads and own genome
Description
Aligns single end reads to the reference genome sequence given by the user. The alignment is constructed using the BWA aln algorithm.
The genome needs to be supplied in FASTA format and the reads in FASTQ format.
Parameters
- What pre-indexed genome is used as the reference. (Human genome (hg19), Mouse genome (mm9), Rat genome (rn4), Mouse miRBase17.) [Mouse genome]
- Seed length. How many bases of the left, good quality part of the read should be used as the seed region. If the seed length is longer than the reads, the seeding will be disabled. Corresponds to the command line parameter -l, [32]
- Maximum number of differences in the seed region. Maximum number of differences such as mismatches or indels in the seed region.[2]
- Maximum edit distance for the whole read. Maximum edit distance if the value is more than one. If the value is between 1 and 0 then it defines the fraction of missing alignments given 2% uniform base error rate. In the latter case, the maximum edit distance is automatically chosen for different read lengths. Corresponds to the command line parameter -n. [0.04]
- Quality value format used. Note that this parameter is taken into account only if you chose to apply the mismatch limit to the seed region. Are the quality values in the Sanger format (ASCII characters equal to the Phred quality plus 33\) or in the Illumina Genome Analyzer Pipeline v1.3 or later format (ASCII characters equal to the Phred quality plus 64). Corresponds to the command line parameter -I. [Sanger]
- Maximum number of gaps. Maximum number of gap openings for one read. Corresponds to the command line parameter -o.[1]
- Maximum number of gap extensions. Maximum number of gap extensions, -1 for disabling long gaps. Corresponds to the command line parameter -e.[-1]
- Gap opening penalty. Corresponds to the command line parameter -O. [11]
- Gap extension penalty. Corresponds to the command line parameter -E. [4]
- Mismatch penalty threshold. BWA will not search for suboptimal hits with a score lower than the alignment score minus this. Corresponds to the command line parameter -M. [3].
- Disallow gaps in region. Disallow a long deletion within the given number of bp towards the 3’-end. Corresponds to the command line parameter -d. [16].
- Disallow an indel within the given number of bp towards the ends. Do not put an indel within the defined value of bp towards the ends. Corresponds to the command line parameter -i.[5]
- Quality trimming threshold. Quality threshold for read trimming down to 35bp. Corresponds to the command line parameter -q. [0].
- Barcode length. Length of barcode starting from the 5 prime-end. The barcode of each read will be trimmed before mapping. Corresponds to the command line parameter -B. [0].
- How many valid alignments are reported per read. Maximum number of alignments to report. Corresponds to the command line parameter bwa samse -n [3].
Details
This tool uses BWA short read aligner to align a set
of FASTQ formatted sequences against a FASTA formatted reference sequence. Aligning is performed with Burrows-Wheeler
Transform based BWA aln algoritm that allows gaps in the alignments. This algorithm is designed for short queries up
to ~200bp with low error rate (<3%).
It is possible to give the tool more than one FASTQ file. The tool will run the alignment for each file separately,
and finally merge the resulting BAM files.
Note that this aligner is more accurate but significantly slower than Bowtie
Output
As a result the tool returns a sorted and indexed BAM-formatted alignment, which is ready for viewing in the Chipster genome browser.