Automating your data analysis with workflows

Data analysis typically involves several analysis steps. Once a suitable combination has been found and analysis completed, it is often desirable to save the steps taken as an automatic workflow. Reusing workflows serves many purposes. Firstly, it saves time as multi-step analysis can be executed with just one mouse click. Sharing workflows within a research group brings consistency to analysis and provides an easy way for bioinformaticians to help biologists. Sharing workflows in a wider context is also beneficial as providing a downloadable workflow file facilitates the reproduction of published results. In addition to supporting user-made workflows, Chipster also includes some ready-made workflows for finding and analyzing differentially expressed genes, miRNAs and proteins.

User-made workflows

Chipster keeps track of the analysis steps taken, and displays them visually in the Workflow-panel. You can experiment with different methods and parameters, and prune the resulting workflow by simply deleting the unwanted steps. When a satisfactory analysis pipeline is ready, you simply click on the beginning point of it and save the workflow by selecting Workflow->Save starting from selected. You can save the workflow file anywhere you like and change its name, but the ending has to be .bsh. The workflow is saved as a file, which contains instructions to run certain analysis tools with the selected parameter settings in a certain order. Note that you can also run a workflow for several identical input files such as FASTQ or BAM at the same time: Simply select all the input files, and the option "Run recent for each".

If you would like to apply the same workflow to another normalized dataset, select Workflow->Open and run or Workflow->Run recent. You can also share a workflow with others by simply giving them a copy of your workflow file.

Ready-made workflows

Chipster's ready-made workflows are started from the menu Workflows / Run from repository. They cover finding and analyzing differentially expressed genes, miRNAs and proteins as described below. Please note that these workflows should be run on normalized data and the experimental groups needs to be defined in the phenodata.

1. Find differentially expressed genes

This workflow performs quality control of the samples, finds differentially expressed genes, and runs annotation and clustering from them.

2. Find differentially expressed miRNAs

This workflow performs quality control of the samples, finds differentially expressed miRNAs, and runs annotation and clustering from them. It also performs pathway analysis using KEGG and GO databases for the potential targets of these miRNAs. Please use the miRNA systematic names (e.g. hsa-miR-100) as identifiers when importing your data to Chipster. Note that this workflow takes time to run because of the large pathway analysis part.

3. Find differentially expressed proteins

This workflow performs quality control of the samples, finds differentially expressed proteins, and runs pathway analysis and clustering for them. Please use UniProt accession numbers (e.g. Q5STU3) as identifiers when importing your data to Chipster. Marking also an annotation column in the Import tool allows Chipster to show these annotations in visualizations. Please note that if your data is not log-transformed, you can transform it in Chipster prior to running the workflow using the tool "Utilities/ Change interpretation".