Differential expression using Cuffdiff
Description
This tool performs differential expression analysis using the Cufflinks package version 2.1.1.
You will need to provide a list of file names of replicate BAM files belonging for each group. These
lists can be generated with the tool Utilities / Make a list of file names. The file names
of these lists are used to set the labels in the output files, so they can be named something relevant to
your use case, e.g. treatment.txt and control.txt.
Parameters
- Output type (concise, complete) [concise]
- Annotation GTF (Human (hg19), Mouse (mm10), Rat (rn5)) [Human (hg19)]
- Chromosome names in my BAM file look like (chr1, 1) [1]
- Allowed false discovery rate (0-1) [1]
- Enable multi-mapped read correction (yes,no) [no]
- Correct for sequence-specific bias (yes, no) [no]
- Genome used for bias correction (Human (hg19), Mouse (mm10), Mouse (mm9), Rat (rn4)) [Human (hg19)]
- Library type (fr-unstranded, fr-firststrand, fr-secondstrand) [fr-unstranded]
Details
Given GTF and BAM files, Cuffdiff performs differential expression analysis of genes and transcripts using the
Cufflinks algorithm.
Cufflinks can detect sequence-specific bias and correct for it in abundance estimation.
By default, Cufflinks will uniformly divide each multi-mapped read to all of the positions it maps to.
If multi-mapped read correction is enabled, Cufflinks will re-estimate the transcript abundances dividing each multi-mapped read probabilistically based on the initial abundance estimation,
the inferred fragment length and fragment bias, if bias correction is enabled.
FDR-adjusted p-values (q-values) are calculated. The concise output files include only those genes or transcripts which have a q-value
lower than the given FDR. The value of the Significant-column is adjusted accordingly (yes/no) in all output files.
Output
The concise analysis output consists of the following files:
- de-genes-cufflinks.tsv: Table containing the results of the statistical testing for differentially expressed genes, including fold change estimates and p-values.
- de-genes-cufflinks.bed: The BED version of the results table for genes contains genomic coordinates and fold change estimates for quick navigation in the Genome Browser.
- de-isoforms-cufflinks.tsv: Table containing the results of the statistical testing for differentially expressed isoforms, including fold change estimates and p-values.
- de-isoform-cufflinks.bed: The BED version of the results table for isoforms contains genomic coordinates and fold change estimates for quick navigation in the Genome Browser.
- cufflinks.log: Contains useful information of the analysis run.
If the complete output is selected, the following optional outputs may also be generated:
1. FPKM tracking files
- genes.fpkm_tracking.tsv: Gene FPKMs. Tracks the summed FPKM of transcripts sharing each gene_id
- isoforms.fpkm_tracking.tsv: Transcript FPKMs
- tss_groups.fpkm_tracking.tsv: Primary transcript FPKMs. Tracks the summed FPKM of transcripts sharing each tss_id
- cds.fpkm_tracking.tsv: Coding sequence FPKMs. Tracks the summed FPKM of transcripts sharing each p_id, independent of tss_id
2. Count tracking files
- genes.count_tracking.tsv: Gene counts. Tracks the summed counts of transcripts sharing each gene_id
- isoforms.count_tracking.tsv: Transcript counts
- tss_groups.count_tracking.tsv: Primary transcript counts. Tracks the summed counts of transcripts sharing each tss_id
- cds.count_tracking.tsv: Coding sequence counts. Tracks the summed counts of transcripts sharing each p_id, independent of tss_id
3. Read group tracking files
- genes.read_group_tracking.tsv: Gene read group tracking. Tracks the summed expression and counts of transcripts sharing each gene_id in each replicate
- isoforms.read_group_tracking.tsv: Transcript read group tracking
- tss_groups.read_group_tracking.tsv: Primary transcript FPKMs. Tracks the summed expression and counts of transcripts sharing each tss_id in each replicate
- cds.read_group_tracking.tsv: Coding sequence FPKMs. Tracks the summed expression and counts of transcripts sharing each p_id, independent of tss_id in each replicate
4. Differential expression tests
- gene_exp.diff.tsv: Gene differential FPKM. Tests differences in the summed FPKM of transcripts sharing each gene_id
- isoform_exp.diff.tsv: Transcript differential FPKM.
- tss_group_exp.diff.tsv: Primary transcript differential FPKM. Tests differences in the summed FPKM of transcripts sharing each tss_id
- cds_exp.diff.tsv: Coding sequence differential FPKM. Tests differences in the summed FPKM of transcripts sharing each p_id independent of tss_id
5. Differential splicing tests
- splicing.diff.tsv: This tab delimited file lists, for each primary transcript, the amount of overloading detected among its isoforms, i.e. how much differential splicing exists between isoforms processed from a single primary transcript. Only primary transcripts from which two or more isoforms are spliced are listed in this file.
6. Differential coding output
- cds.diff.tsv: This tab delimited file lists, for each gene, the amount of overloading detected among its coding sequences, i.e. how much differential CDS output exists between samples. Only genes producing two or more distinct CDS (i.e. multi-protein genes) are listed here.
7. Differential promoter use
- promoters.diff.tsv: This tab delimited file lists, for each gene, the amount of overloading detected among its primary transcripts, i.e. how much differential promoter use exists between samples. Only genes producing two or more distinct primary transcripts (i.e. multi-promoter genes) are listed here.
8. Read group info
- read_groups.info.txt: This tab delimited file lists, for each replicate, key properties used by Cuffdiff during quantification, such as library normalization factors.
8. Run info
- run.info.txt: This tab delimited file lists various bits of information about a Cuffdiff run to help track what options were provided.
References
This tool uses the Cufflinks package for statistical analysis. Please read the following article for more detailed information:
Trapnell et al. Differential analysis of gene regulation at transcript resolution with RNA-seq. Nat Biotechnol. 2013 Jan;31(1):46-53.