Chipster manual

Chipster genome browser visualization

Chipster genome browser enables you to view next generation sequencing data and results in their genomic context using Ensembl annotations. You can zoom in to nucleotide level, highlight SNPs, view automatically calculated coverage and visualize BED scores and copy number aberrations.

Data import: file formats, preprocessing
Selecting files and annotations to be visualized
Navigating and viewing options
Viewing at different zoom levels
Using own genome file (instead of the ones provided in Chipster)
Viewing copy number data
Saving the session

Data import: file formats, preprocessing

Chipster genome browser supports SAM, BAM, BED, GTF, VCF and fasta files as well as tsv, where the first three columns are chr, start and end. SAM files need to be converted to BAM, BAM and fasta files need to be sorted and indexed, and BED, GTF, VCF and tsv files need to be sorted by chromosomal location. You can find sorting and indexing tools for the different file types in the Utilities category. If you import external files, sorting and indexing is accomplished during the data import step with Chipster preprocessor. If your data is already in the right format and your BAM file has the matching index file (.bai), you can skip the preprocessing step and import your data directly.

Selecting files and annotations to be visualized

You can select files for viewing by clicking on them in the Datasets panel or in the Workflow panel (keep the control key down to select several files). Note that if you want to view a BAM file, it needs to have an index file (.bai) with the same name. If you select several BAM files, they should have different names so that Chipster can match them with the correct index files. Select Genome browser as the visualization method from the pulldown menu. In order to have more space for genome browsing you can maximise the visualization panel by clicking on the Maximise button.

Next you need to select the matching genome from the settings panel. If you are using the genome browser for the first time, you are asked to download annotations for the selected genome. Using downloaded gene and transcript annotations ensures optimal performance, and you need to download them only once. However, it is also possible to opt for using gene and transcript annotations over the internet. The actual genome sequence is by default used over the internet.

Navigating and viewing options

There are several ways to move around:

Enter a gene symbol or select the desired chromosome and base pair location, and click "Go".
Clicking on the chromosome image will move you to that location.
In order to move sideways, you can pull with the mouse or use the arrow keys.
In order to zoom in and out, you can use the mouse wheel or the arrow keys.
Recommended:You can use BED, GTF, VCF and tsv files as guides to quickly view a list of features such as SNPs, ChIP-seq peaks or differentially expressed genes: First open the guide file as a spreadsheet, click the Detach button. Then select files for visualization and open the genome browser. Clicking on the start coordinate in the guide file now becomes a link, which automatically moves the genome browser to that position. Note that in the spreadsheet view you can sort the data by clicking on a column name.

Viewing at different zoom levels

At every zoom level the grey bar on the chromosome indicates what area is shown below. Annotation track displays genes as blue boxes. Genes on the forward strand are shown on top of the ruler and genes on the reverse strand are shown below. When zoomed out, the alignment tracks show rough coverage based on sampling summaries of the BAM files. You can turn off any track by unticking the corresponding box. Tracks corresponding to BED, VCF or TSV files show genomic regions as blue boxes. Clicking on a box displays details about the selected feature in the side panel. For BED files you can select to visualize the scores (and color if available).

When zoomed in, the annotation track displays transcripts instead of genes. Translated regions are colored in blue and untranslated in yellow (please see the legend tab). Alignment tracks display the actual reads and the automatically calculated coverage. Reads colored in yellow mean that the pile of reads is truncated. In order to view all the reads, please tick the box Show all. Using the Options in the side panel allows you to

turn the reads off.
set the scale for viewing the automatically calculated coverage.
set the type of the coverage to be calculated and displayed. Strand-specific calculates reads that match forward strand and reverse strand separately. Strand-specific for RNA-seq calculates reads that match transcripts in forward strand and reverse strand separately, based on the XS tag in the BAM file. This option requires that sequencing was performed with a strand-specific RNA-seq protocol and reads were mapped to the reference genome with the TopHat aligner.
view coverage as a density graph instead of a profile. This is a very compact way of viewing data and hence handy for a large number of samples.
mark low complexity regions in the annotations track.

When zoomed in to the nucleotide level, reads are colored according to their nucleotide content (please see the legend tab). You can also choose to color only those bases which differ from the reference sequence by selecting the Highlight SNPs option. When this option and total coverage has been selected, colored bars will be added to the coverage track on those positions, where more than 20% of the bases differ from the reference.

Using own genome file

If you would like to visualize a genome which is not available in Chipster, please do the following:

Import the genome fasta file containing all the chromosomes to Chipster. Unzip the file if it is compressed (.gz) using the tool "Utilities / Extract .gz file".
Select the fasta file and run the tool "Utilities / Index FASTA"
If you would like to visualize gene and transcript annotations, you need a GTF file which has the same format as the Ensembl GTF files. Unzip the GTF file as indicated above. You also need to sort the GTF file by chromosomal coordinates using the tool "Utilities / Sort GTF".
Select the fasta file and the GTF file and your analysis result files (BAM, BED, VCF etc) and open the Genome browser. Note that all the files must have the same chromosome naming.
Your selected genome is now available in the Genome pull-down menu.

Viewing copy number data

Chipster has a lot of analysis tools for microarray and sequencing based copy number data. In addition to pdf images, these tools produce tsv files with gain and loss frequencies, regions and log ratios, which can visualized in the genome browser. For more information about the files, please see the copy number tutorial.

Saving the session

When you finish viewing your data, you can save the session by selecting "File / Save session". Saving your preprocessed data files this way allows you to continue with them next time. You can name the session file anything you like, but the ending has to be .zip. If you would like to save an individual preprocessed file, please right-click on it in either Datasets or Workflow panel and select "Export".

References

This visualization uses the Picard package. Please see the Picard homepage for more details.