Samuel Kaski, Janne Sinkkonen, and Arto Klami. Regularized Discriminative ClusteringIn C. Molina, T. Adali, J. Larsen, M. Van Hulle, editors, Neural Networks for Signal Processing XIII, pages 289-298. IEEE, New York, NY, 2003. (postscript, gzipped postscript, pdf)

A generative distributional clustering model for continuous data is reviewed and methods for optimizing and regularizing it are introduced and compared. Based on pairs of auxiliary and primary data, the primary data space is partitioned into Voronoi regions that are maximally homogeneous in terms of auxiliary data. Then only variation in the primary data associated with variation in the auxiliary data influences the clusters. Because the whole primary space is partitioned, new samples can be easily clustered in terms of primary data alone. In experiments, the approach is shown to produce more homogeneous clusters than alternative methods. Two regularization methods are demonstrated to further improve the results: An entropy-type penalty for unequal cluster sizes, and the inclusion of a K-means component to the model. The latter can alternatively be interpreted as special kind of joint distribution modeling where the emphasis between discrimination and unsupervised modeling of primary data can be tuned.