STAR for paired end reads and human genome

Description

Aligns paired end RNA-seq reads using the STAR aligner. If you have just one pair of read files, Chipster sets reads 1 file and reads 2 file based on file names. If you have more pairs of read files for one sample, you need to provide a list of filenames of the FASTQ files for each direction (e.g. 1files.txt and 2files.txt). You can generate the lists with the tool "Utilities/ Make a list of filenames". Alignment results are given in a BAM file, which is automatically indexed and hence ready to be viewed in Chipster genome browser.

Parameters

Genome [Homo_sapiens.GRCh38]
Maximum alignments per read [10]
Maximum mismatches per read pair [10]

Details

This tool uses the STAR (Spliced Transcripts Alignment to a Reference) aligner, which can accurately detect annotated and novel splice junctions in RNA-seq data. The tool uses a 2-pass mapping process where STAR performs the 1st pass mapping, automatically extracts splice junctions, inserts them into the genome index, and re-maps all reads in the 2nd mapping pass. This doesn't increase the number of detected novel junctions, but it allows more spliced reads mapping to novel junctions.

Chipster offers an Ensembl GTF file to detect annotated splice junctions, but you can also give your own. For example the GENCODE GTF is recommended by the STAR developers. The minimum required overhang is 1 base for annotated junctions and 8 bases for novel junctions. The tool uses STAR's parameter --outFilterType BySJout to filter out alignments which contain spurious novel junctions. It also uses the STAR parameter --outSAMstrandField intronMotif to add XS strand tags to spliced reads (the Cufflinks assembler needs these) and to remove unannotated non-canonical junctions for which the strand cannot be determined.

Maximum alignments per read -parameter sets the maximum number of loci the read is allowed to map to. Alignments (all of them) will be output only if the read maps to no more loci than this value. Otherwise no alignments will be output, and the read will be counted as "mapped to too many loci" in the Log_final.txt file.

Maximum number of mismatches per pair -parameter filters out alignments which contain more mismatches than this number. Use value 999 to switch off this filter.

Output

alignment.bam: alignments sorted by chromosomal coordinates.
alignment.bam.bai: index file for the alignments
Log_final.txt: summary file listing the percentage of uniquely mapped reads etc.
Log_progress.txt: process summary allowing you to see for example if the 2nd pass mapping was done.

References

This tool uses the STAR aligner. Please cite the article:

Alexander Dobin et al: STAR: ultrafast universal RNA-seq aligner (2013) Bioinformatics 29: 15-21.

Please see the STAR manual for more details.