Clustering / Classification

Description

Classification tries to identify the genes that best differentiate between two or more groups of samples. Classification methods include variaties of discriminant analysis, neural nets, support vector machines and naive Bayes. The tool does not yet implement validation using a separate test set, only cross-validation.

Parameters

Method (knn, lda, dlda, slda, qda, rpart, svm, lvq, naiveBayes, nnet, bagging) [knn]
Standardize genes before analysis (yes, no) [yes]
Validation type (crossvalidate, predict) [crossvalidate]
Crossvalidation type (LOO) [LOO]
Feature selection in crossvalidation (yes, no) [no]
Feature selection threshold (0-1) [0.75]
Phenodata column describing the groups to test (empty,...) [empty]
Phenodata column describing the samples in the training groups (empty,...) [empty]

Details

The available methods are K-nearest neighbor (knn), linear discriminant analysis (lda) and various other forms of discriminant analysis (dlda, slda, qda), regression and classification trees (rpart), support vector machines (svm), neural nets (lvq, nnet), naive Bayes (naiveBayes) and bagging. Currently the genes can be used as such or standardized to the same mean and standard deviation before the analysis. Prediction of novel samples is not currently available, but the results can be validated using leave-one-out (LOO) cross-validation. It is possible to tune the cross-validation to select the significant features that are used for building the classifier at each cross-validation cycle.

Output

A file listing all the samples, their known and their predicted classes.