Hierarchical clustering creates a dendrogram describing the relationships between genes or chips in a selected genelist. Clustering consists of two separate steps. First, all pairwise distances between objects (genes or chips) are calculated. The dendrogram is then drawn using these distances by a selected method. The maximum number of genes/samples to be clustered is 20 000.
The results of the hierarchical clustering can be checked using bootstrapping testing. Bootstrapping creates a user specified number of pseudodatasets from the original one. In the pseudodatasets, each row or column (depending on whether genes or chips were clustered), can be present zero, one or several times. Every bootstrapped dataset is then converted into a dendrogram. Say, if 100 bootstrap samples were used, 100 trees are produced. A majority rule consensus is then created from these trees, and the results are displayed for the user. In the majority rule consensus tree every node in the tree is labeled with a number. This number represents the number of trees where that node was present. The higher the number the better. Note that bootstrapping can only be done using Euclidian distance or Pearson correlation. Bootstrap resampling on datasets larger than 1000 genes is not possible due to computing time limitations. Please note that you can run hierarchical clustering on datasets including up to 20000 genes, provided the resampling option is turned off.
A file with information on how to draw the tree. This file can be visualised using the interactive "Hierarchical clustering" visualisation. Please note that you can select genes in this visualisation by drawing a box in the heatmap area. Clicking on the tab "Selected" allows you to create a new data set based on your selection.
This tools uses R packages ape and amap. The citation for ape is:
Paradis E., Claude J. & Strimmer K. 2004. APE: analyses of phylogenetics and evolution in R language. Bioinformatics 20: 289–290. PDF [37 KB].