Chipster manual

RNA-seq / Count reads per transcripts using eXpress

Description

Counts reads per transcripts using the eXpress package.

Parameters

Quality scale used in the FASTQ file (phred + 33, phred + 64) [phred + 33]
How many valid alignments should Bowtie2 search per read (20, 100, 1000) [100]
Mean fragment length (10-10,000) [200]
Fragment length standard deviation (5-200) [60]

Details

This tool takes as input either paired end or single end reads in FASTQ files, and transcripts in a multi-fasta file. Both the read and transcript file can be zipped. Reads are aligned to the transcripts using Bowtie2, and you should allow as many multimappings as possible. Note that Bowtie2 can be very slow if many alignments are allowed per read.

Counting reads at transcript level is complicated by the fact that transcript isoforms typically have overlapping parts. In order to assign ambiguously mapping reads to different isoforms, an expectation maximization (EM) approach is used. This approach alternates between two steps: an expectation step where reads are assigned to transcripts with a probability according to those transcripts’ abundances (which are initially assumed to be equal), and a maximization step where the abundances are updated based on the assignment probability.

eXpress can resolve multi-mappings of reads across transcripts and gene families, learn fragment length distribution from data, and correct for sequence specific bias near the ends of fragments, which arises due to primers used in library preparation. In addition, eXpress also includes a model for sequencing errors including indels.

The fragment length and standard deviation parameters are used by eXpress. While the empirical distribution is estimated from paired-end reads on-the-fly, these values parameterize the prior distribution. If only single-end reads are available, this prior distribution is also used to determine the effective length

eXpress reports the following abundance measures for each transcript: estimated counts, effective counts, FPKM and TPM. The authors of eXpress recommend using rounded effective counts differential expression analysis with tools like edgeR. You can combine several count files to a count table using the tool "Utilities / Create NGS experiment"

Output

The analysis output consists of the following files:

full-express-output.tsv: The full result file of eXpress sorted by bundle identifier.
effective-counts-express.tsv: Result file containing rounded effective counts.

Reference

Please cite the following article if you use this tool:
A Roberts and L Pachter: Streaming fragment assignment for real-time analysis of sequencing experiments Nature Methods 10,71–73(2013).