Validating clustering for gene expression data bioinformatics

At the moment there do not seem to be any clear-cut guidelines regarding the choice of a clustering algorithm to be used for grouping genes based on their expression profiles.

In contrast to classical clustering techniques such as hierarchical clustering (Sokal and Michener, 1958) and -means clustering (Hartigan and Wong, 1979), biclustering does not require genes in the same cluster to behave similarly over all experimental conditions.

Instead, a bicluster is defined as a subset of genes that exhibit compatible expression patterns over a subset of conditions.

While the ‘best’ method is dependent on the exact validation strategy and the number of clusters to be used, overall appears to be a solid performer.

Interestingly, the performance of correlation-based hierarchical clustering and model-based clustering (another method that has been advocated by a number of researchers) appear to be on opposite extremes, depending on what validation measure one employs.

Results: In this paper, we consider six clustering algorithms (of various flavors!

