Clustering / Classification


Classification tries to identify the genes that best differentiate between two or more groups of samples. Classification methods include variaties of discriminant analysis, neural nets, support vector machines and naive Bayes. The tool does not yet implement validation using a separate test set, only cross-validation.



The available methods are K-nearest neighbor (knn), linear discriminant analysis (lda) and various other forms of discriminant analysis (dlda, slda, qda), regression and classification trees (rpart), support vector machines (svm), neural nets (lvq, nnet), naive Bayes (naiveBayes) and bagging. Currently the genes can be used as such or standardized to the same mean and standard deviation before the analysis. Prediction of novel samples is not currently available, but the results can be validated using leave-one-out (LOO) cross-validation. It is possible to tune the cross-validation to select the significant features that are used for building the classifier at each cross-validation cycle.


A file listing all the samples, their known and their predicted classes.