BWA MEM for single or paired end reads and own genome
Description
This tool aligns single end reads or paired-end reads to the reference genome sequence given by the user.
The reads have to be supplied in fastq format. If two input files is selected, one of the file is used as a reference genome and the another one
is used as the reads file for single-end alignment.
If three input files are defined, then paired end analysis is performed.
Alignment parameters
- Minimum seed length Matches shorter than this will be missed
when looking for maximal exact matches or MEMs in the first alignment phase.(BWA MEM option -k)
- Maximum gap length Gaps longer than this will not be found. Note also that scoring matrix and hit
length affect the maximum gap length, in addition to this band width parameter.(BWA MEM option -w)
- Match score Score for a matching base.(BWA MEM option -A)
- Mismatch penalty Penalty for a mismatching base (BWA MEM option -B).
- Gap opening penalty Gap opening penalty (BWA MEM option -O).
- Gap extension penalty Gap extension penalty (BWA MEM option -E).
- Penalty for end clipping Penalty for 5'- and 3'-end clipping. When performing the Smith-Waterman
extension of the seed alignments, BWA-MEM keeps track of the best score reaching the end of the read.
If this score is larger than the best SW score minus the clipping penalty, clipping will not be applied (BWA MEM option -L).
Read group parameters
- Read group identifier If you want to add the read group line into the BAM file,
you have to define read group identifier (DI:value).
- Sample name for read group The name of the sample sequenced in this read group (SM:value).
- Platform for read group With this setting you can platform or technology used to produce
the read. Options: ILLUMINA, SOLID, LS454, HELICOS, PACBIO.(PT:value)
- Library identifier for read group DNA preparation library identifier. The Mark Duplicates
tool uses this field to determine which read groups might contain molecular duplicates, in case the
same DNA library was sequenced on multiple lanes.(LB:value).
More information: BWA manual
Details
It is possible to give the tool more than one FASTQ file/file pair. The tool will run the alignment for each
file/file pair separately, and finally merge the resulting BAM files.
If you provide two FASTQ files, the tool will by default perform a paired-end alignment with them. It will try assign
R1 and R2 reads correctly by file name.
If you have more than two FASTQ files (or wish to perform single-end alignment for two files), you will
need to provide a list of filenames of the FASTQ files; one list for single-end alignment, and two for
paired-end alignment (one file for R1 files, and another one for the R2 files) as a text file
(e.g.R1files.txt and R2files.txt). These lists can be generated with the tool
Utilities / Make a list of file names .
The read pairs must be ordered identically in both lists.
To run, Select the list file/files (R1files.txt and R2files.txt) and ALL FASTQ files, and assign the list files correctly.
When assigning the list files, they are automatically inactivated in the "reads" file list.
Output
As a result the tool returns a sorted and indexed BAM-formatted alignment, which is ready for viewing in the Chipster genome browser.