Analysis tools and visualizations

Next generation sequencing (NGS):

Chipster genome browser
  • View NGS reads in their genomic context using Ensembl annotations
  • Zoom in to nucleotide level
  • View automatically calculated coverage (total or strand-specific) as line graph or density graph
  • Highlight SNPs
  • Use BED, VCF and GTF files to jump from one location to another
Quality control
  • FastX
  • FastQC
  • PRINSEQ
  • RseQC
  • MultiQC
Utilities
  • SAMtools
  • BEDtools
  • FastX
  • PRINSEQ
  • TagCleaner
  • Trimmomatic
  • Picard
Alignment
  • STAR
  • HISAT2
  • Minimap
  • Bowtie2
  • Bowtie
  • BWA
  • TopHat
RNA-seq
  • Count reads per gene with HTSeq
  • Count reads per transcripts with eXpress
  • Differential expression with edgeR, DESeq2, DEXSeq and Cuffdiff
  • Assemble transcripts with Cufflinks
Single cell RNA-seq
  • Seurat
  • Drop-seq tools
miRNA-seq
  • Differential expression with edgeR and DESeq2
  • Pathway analysis for miRNA target genes
  • Correlate with gene expression
Community analysis of amplicon sequencing data (16S rRNA)
  • Mothur package
Virus detection using small RNA-seq data
  • VirusDetect
Variants
  • SAMtools
  • bcftools
  • VCFtools
  • Annotation with Bioconductor
ChIP-seq and FAIRE-seq
  • Detect peaks using MACS
  • Detect peaks using F-seq
  • Filter peaks based on p-value, no of reads, etc
  • Find common sequence motifs and scan them against JASPAR
  • Find common sequence motifs with Dimont
  • Search sequences with a motif
  • Retrieve genes nearest to the peaks
  • Filter genes based on peak location and distance
  • GO enrichment for the nearby genes
MeDIP-seq
  • Methylated regions with MEDIPS
CNA-seq
  • Count reads in bins, segment, and call copy number aberrations
  • Plot copy number profiles
  • Identify common regions
  • Test for DNA copy number induced differential expression
  • Plot combined profiles of copy number and expression
  • Plot copy number -induced gene expression
  • Add cytogenetic bands
  • Count overlapping CNVs by comparing to Database of Genomic variants
Genomic region manipulations
  • BEDTools
  • In house tools to find, fuse or remove overlapping regions
  • In house tool to combine region files

Microarrays and proteomics:

Normalization
  • Affymetrix 3' expression
    • mas5
    • plier
    • rma
    • gcrma
    • Li-Wong (dChip)
    • vsn with mas5 and plier
  • Affymetrix exon arrays
    • rma
  • Affymetrix SNP arrays
    • SNPRMA + CRLMM
  • cDNA / Agilent
    • Background Correction
      • none
      • subtract
      • Edwards
      • normexp
    • Within-chip
      • none
      • median
      • loess
    • Within-gene
      • none
      • scale
      • quantile
      • aquantile
      • vsn
  • Illumina
    • none
    • median
    • quantile
    • vsn
    • rsn
    • loess
  • Removal of batch effects
    • Mixed modeling
Missing values
  • Imputation
    • Mean
    • Median
    • K-nearest neighbor
  • Removal
Quality-control
  • Affymetrix
    • Average background
    • Scaling factor
    • 3'/5' ratio of qc-probesets
    • RNA degradation plot
    • RLE plot
    • NUSE plot
  • cDNA
    • histogram
    • density plot
    • MA plot
  • Illumina
    • histogram
    • density plot
Filtering
  • Flags (P/M/A)
  • Expression value
  • Standard deviation
    • Chip
    • Gene
  • Coefficient of variation
  • Interquartile range
  • Free filtering by values in a column
    • Descriptive statistics
    • Fold change
    • P-value
Search
  • Similarly expressed genes by correlation
    • Pearson
    • Spearman
  • Genes by common name
  • Genes by identifier (Affymetrix ID, Agilent ProbeName, etc.)
  • Chromosome location
Plot
  • Volcano plot (GUI)
  • Venn diagram (GUI)
  • Histogram (GUI)
  • Line graph, i.e., expression profile (GUI)
  • 2D Scatter plot (GUI)
  • 3D Scatter plot (GUI)
  • Chip image (GUI)
  • K-means clustering (GUI)
  • Hierarchical clustering (GUI)
  • Self-organizing maps (GUI)
  • Chromosomal location of genes
  • Chromosome specific idiogram
  • RNA degradation plot
  • Density plot
  • MA plot
  • RLE plot
  • NUSE plot
  • Boxplot
  • Correlogram
  • Gene set enrichment analysis
  • Heatmap
  • Annotated dendrogram
Annotate
  • Annotate a gene list using EntrezGene, RefSeq, GO, KEGG, PubMed, and UniGene
  • Annotate miRNA (target genes from miRBase, miRtarget2, PicTar, TarBase and TargetScan)
Statistical testing
  • Single slide
    • Noise (sd) envelope
    • Newton method
  • One group
    • T-test
    • Wilcoxon rank sum test
  • Two Groups
    • Empirical Bayes
    • T-test
    • F-test
    • Mann-Whitney U test
    • SAM
    • LPE
    • ROTS
  • Multiple groups
    • Empirical Bayes
    • ANOVA
    • Kruskall-Wallis test
    • SAM
  • Linear modeling
    • A maximum of three main effects and their interactions
    • Technical replication (one level)
    • Pairing (one level)
  • Time series
    • Periodically expressed genes
    • Independent component analysis
  • Association analysis for SNP data
    • Checking the Hardy-Weinberg equilibrium
    • Testing the genotype association
    • Testing the dominant model for inheritance
    • Testing the recessive model for inheritance
  • Multiple testing correction
    • None
    • Bonferroni
    • Holm
    • Hochberg
    • Benjamini and Hochberg
    • Benjamini and Yakutieli
  • Experimental design
    • Estimation of sample size
    • Estimation of power
    • Estimation of fold change
  • Correlation with phenodata
  • Correlation of miRNA with target gene expression
Clustering
  • K-means
  • Hierarchical
    • Distances
      • Euclidian
      • Manhattan
      • Pearson
      • Spearman
    • Methods
      • Single linkage
      • Average linkage
      • Complete linkage
      • Ward
    • Resampling testing of the result
      • Methods
        • Bootstrapping
  • Self-organizing map (SOM)
  • Quality threshold (QT) clustering
Ordination
  • Principal component analysis (PCA)
  • Non-metric multidimensional scaling (NMDS)
  • Detrended correspondence analysis (DCA)
Classification
  • K-nearest neighbor
    • Cross-validation
    • Prediction of test group membership
  • Discriminant analysis
  • Neural nets
  • Support vector machines
  • Naive Bayes
Pathway analysis
  • Gene set (globaltest)
    • Own gene list
    • Groups inferred from KEGG
    • Groups inferred from GO
    • Enrichment analysis of GO terms in a list of genes
  • Hypergeometric test for over- or underrepresentation
    • GO categories
    • KEGG pathways
    • ConsensusPathDB
    • PFAM categories
    • Cytobands
    • GO categories for miRNA targets
    • KEGG pathways for miRNA targets
  • SAFE
    • KEGG pathways
  • Mapping to Reactome pathways
  • Mapping to protein interactions from IntAct
Promoter analysis
  • Retrieve promoters from UCSC genome database
  • Weeder (finds sequence motifs common to a set of promoters)
  • Cosmo (finds sequence motifs common to a set of promoters)
  • ClusterBuster (finds clusters of putative TF binding sites using JASPAR matrices)
aCGH analysis
  • Import from CanGEM database
  • Call copy number aberrations
  • Plot copy number profiles
  • Identify common regions
  • Test for DNA copy number induced differential expression
  • Plot combined profiles of copy number and expression
  • Plot copy number -induced gene expression
  • Fetch probe positions from CanGEM
  • Add cytogenetic bands
  • Count overlapping CNVs by comparing to Database of Genomic variants
  • Sample size calculations with an adapted BH method
Import / Export
  • Export and import in SOFT format for GEO database
  • Export in tab2mage format for ArrayExpress database
  • Import from ArrayExpress database
  • Import from GEO database