BWA MEM for single or paired end reads and own genome

Description

This tool aligns single end reads or paired-end reads to the reference genome sequence given by the user. The reads have to be supplied in fastq format. If two input files is selected, one of the file is used as a reference genome and the another one is used as the reads file for single-end alignment. If three input files are defined, then paired end analysis is performed.

Alignment parameters

Minimum seed length Matches shorter than this will be missed when looking for maximal exact matches or MEMs in the first alignment phase.(BWA MEM option -k)
Maximum gap length Gaps longer than this will not be found. Note also that scoring matrix and hit length affect the maximum gap length, in addition to this band width parameter.(BWA MEM option -w)
Match score Score for a matching base.(BWA MEM option -A)
Mismatch penalty Penalty for a mismatching base (BWA MEM option -B).
Gap opening penalty Gap opening penalty (BWA MEM option -O).
Gap extension penalty Gap extension penalty (BWA MEM option -E).
Penalty for end clipping Penalty for 5'- and 3'-end clipping. When performing the Smith-Waterman extension of the seed alignments, BWA-MEM keeps track of the best score reaching the end of the read. If this score is larger than the best SW score minus the clipping penalty, clipping will not be applied (BWA MEM option -L).

Read group parameters

Read group identifier If you want to add the read group line into the BAM file, you have to define read group identifier (DI:value).
Sample name for read group The name of the sample sequenced in this read group (SM:value).
Platform for read group With this setting you can platform or technology used to produce the read. Options: ILLUMINA, SOLID, LS454, HELICOS, PACBIO.(PT:value)
Library identifier for read group DNA preparation library identifier. The Mark Duplicates tool uses this field to determine which read groups might contain molecular duplicates, in case the same DNA library was sequenced on multiple lanes.(LB:value).

More information: BWA manual

Details

It is possible to give the tool more than one FASTQ file/file pair. The tool will run the alignment for each file/file pair separately, and finally merge the resulting BAM files.

If you provide two FASTQ files, the tool will by default perform a paired-end alignment with them. It will try assign R1 and R2 reads correctly by file name.

If you have more than two FASTQ files (or wish to perform single-end alignment for two files), you will need to provide a list of filenames of the FASTQ files; one list for single-end alignment, and two for paired-end alignment (one file for R1 files, and another one for the R2 files) as a text file (e.g.R1files.txt and R2files.txt). These lists can be generated with the tool Utilities / Make a list of file names . The read pairs must be ordered identically in both lists.

To run, Select the list file/files (R1files.txt and R2files.txt) and ALL FASTQ files, and assign the list files correctly. When assigning the list files, they are automatically inactivated in the "reads" file list.

Output

As a result the tool returns a sorted and indexed BAM-formatted alignment, which is ready for viewing in the Chipster genome browser.