Minimap2 for mapping reads to genomes
Description
Minimap2 is a fast general-purpose alignment program to map DNA and long mRNA sequences against a large reference database. It can be used for:
- mapping of accurate short reads (preferably longer that 100 bases)
- mapping 1kb genomic reads at error rate 15% (e.g. PacBio or Oxford Nanopore genomic reads)
- mapping full-length noisy Direct RNA or cDNA reads
- mapping and comparing assembly contigs or closely related full chromosomes of hundreds of megabases in length
Parameters
Minimap2 parameters
- Genome Choose a genome form the list to be used as the reference for mapping. Optionally you can provide your own reference genome as the second input file.
- Task type Defines the mapping task to perform.
Read group parameters
- Read group identifier If you want to add the read group line into the BAM file, you have to define read group identifier (DI:value).
- Sample name for read group The name of the sample sequenced in this read group (SM:value).
- Platform for read group With this setting you can indicate the platform or technology used to produce the read. Options: ILLUMINA, SOLID, LS454, HELICOS, PACBIO (PT:value).
- Library identifier for read group DNA preparation library identifier. The Mark Duplicates tool uses this field to determine which read groups might contain molecular duplicates, in case the same DNA library was sequenced on multiple lanes (LB:value).
Details
Minimap2 aligner can be used for several different alignment and mapping tasks, including mapping of read sets containing very
long reads (e.g. PacBio or Oxford nanopore reads).
The Minimap2 tool in Chipster is intended only for single-end type mapping tasks where all the reads are in one input file.
The reads can be in FASTQ or FASTA format.
The reference sequence set can be defined in two ways. 1) If only one input file is defined, then the reference genome is given with
the Genome parameter that lists the genomes available in Chipster.
2) Alternatively, you can give the genome as a second input file in fasta format. If the reference sequence file is provided by the user, then the value of the Genome parameter is ignored.
User must always select a task type from the task menu. Assigning a task type takes in use a predefined set of minimap2 parameters that
are optimal for a specific analysis task. The predefined task types include:
Task | Mimimap2 parameters used |
|
Map PacBio subreads to a genome | -ax map-pb |
Map Oxford nanopore reads to a genome | -ax map-ont |
Map PacBio Iso-seq or traditional cDNA to reference | -ax splice -uf |
Map Nanopore 2D cDNA-seq data to reference | -ax splice |
Map Nanopore Direct RNA-seq to reference | -ax splice -uf -k14 |
Mapping against SIRV control reference | -ax splice --splice-flank=no |
Aligning assembly to reference genome | -ax asm5 |
As there is no obvious default analysis for Minimap2, the task type does not have
any default value. Instead this value must be defined by the user for each Minimap2 task.
Output
- alignment.bam: alignments sorted by chromosomal coordinates.
- alignment.bam.bai: index file for the alignments
- minimap2.log: Information about the minimap2 run.
References
This tool uses the Minimap2 aligner. Please cite the article:
Heng Li: Minimap2: pairwise alignment for nucleotide sequences
Please see the Minimap2 manual for more details.