Minimap2 for mapping reads to genomes

Description

Minimap2 is a fast general-purpose alignment program to map DNA and long mRNA sequences against a large reference database. It can be used for:

mapping of accurate short reads (preferably longer that 100 bases)
mapping 1kb genomic reads at error rate 15% (e.g. PacBio or Oxford Nanopore genomic reads)
mapping full-length noisy Direct RNA or cDNA reads
mapping and comparing assembly contigs or closely related full chromosomes of hundreds of megabases in length

Parameters

Minimap2 parameters

Genome Choose a genome form the list to be used as the reference for mapping. Optionally you can provide your own reference genome as the second input file.
Task type Defines the mapping task to perform.

Read group parameters

Read group identifier If you want to add the read group line into the BAM file, you have to define read group identifier (DI:value).
Sample name for read group The name of the sample sequenced in this read group (SM:value).
Platform for read group With this setting you can indicate the platform or technology used to produce the read. Options: ILLUMINA, SOLID, LS454, HELICOS, PACBIO (PT:value).
Library identifier for read group DNA preparation library identifier. The Mark Duplicates tool uses this field to determine which read groups might contain molecular duplicates, in case the same DNA library was sequenced on multiple lanes (LB:value).

Details

Minimap2 aligner can be used for several different alignment and mapping tasks, including mapping of read sets containing very long reads (e.g. PacBio or Oxford nanopore reads). The Minimap2 tool in Chipster is intended only for single-end type mapping tasks where all the reads are in one input file. The reads can be in FASTQ or FASTA format.

The reference sequence set can be defined in two ways. 1) If only one input file is defined, then the reference genome is given with the Genome parameter that lists the genomes available in Chipster. 2) Alternatively, you can give the genome as a second input file in fasta format. If the reference sequence file is provided by the user, then the value of the Genome parameter is ignored.

User must always select a task type from the task menu. Assigning a task type takes in use a predefined set of minimap2 parameters that are optimal for a specific analysis task. The predefined task types include:

Task Mimimap2 parameters used
Map PacBio subreads to a genome -ax map-pb

Map Oxford nanopore reads to a genome -ax map-ont

Map PacBio Iso-seq or traditional cDNA to reference -ax splice -uf

Map Nanopore 2D cDNA-seq data to reference -ax splice

Map Nanopore Direct RNA-seq to reference -ax splice -uf -k14

Mapping against SIRV control reference -ax splice --splice-flank=no

Aligning assembly to reference genome -ax asm5

As there is no obvious default analysis for Minimap2, the task type does not have any default value. Instead this value must be defined by the user for each Minimap2 task.

Output

alignment.bam: alignments sorted by chromosomal coordinates.
alignment.bam.bai: index file for the alignments
minimap2.log: Information about the minimap2 run.

References

This tool uses the Minimap2 aligner. Please cite the article:

Heng Li: Minimap2: pairwise alignment for nucleotide sequences

Please see the Minimap2 manual for more details.

Task	Mimimap2 parameters used
Map PacBio subreads to a genome	-ax map-pb
Map Oxford nanopore reads to a genome	-ax map-ont
Map PacBio Iso-seq or traditional cDNA to reference	-ax splice -uf
Map Nanopore 2D cDNA-seq data to reference	-ax splice
Map Nanopore Direct RNA-seq to reference	-ax splice -uf -k14
Mapping against SIRV control reference	-ax splice --splice-flank=no
Aligning assembly to reference genome	-ax asm5