The reproducibility-optimization procedure ROTS enables the selection of a suitable gene ranking statistic directly from the given dataset. The statistic is optimized among a family of t-type statistics, two special cases of which are the ordinary t-statistic and the signal log-ratio. The optimality is defined in terms of maximal overlap of top-ranked genes in group-preserving bootstrap datasets. Importantly, besides the group labels, no a priori information about the properties of the data is required and no fixed cutoff for the gene rankings needs to be specified. For more details about the reproducibility-optimization procedure, see Elo et al. (2008).
The user is given the option to adjust the largest top list size considered in the reproducibility calculations, since lowering this size can markedly reduce the computation time. In general, we recommend using a size of several thousands.
ROTS assumes that the expression data do not contain any missing values. In case of missing data entries, we recommend to impute them using, e.g., the K-nearest neighbour imputation.
The false discovery rate (FDR) for the optimized test statistic is calculated by permuting the sample labels. The results for all the genes can be obtained by setting the FDR cutoff to 1.
A text file containing the gene expression values, the values of the optimized ROTS-statistic for the genes and the corresponding FDR-values. A separate text file containing the optimized parameters and the corresponding reproducibility values.
Please cite the following article, if you use ROTS:
Elo, L.L., Filén, S., Lahesmaa, R. and Aittokallio, T. (2008) Reproducibility-optimized test statistic for ranking genes in microarray studies. IEEE/ACM Transactions on Computational Biology and Bioinformatics 5: 423-431.