Samuel Kaski, Janne Sinkkonen, and Arto Klami. Regularized Discriminative Clustering. In C. Molina, T. Adali, J. Larsen,
M. Van Hulle, editors, Neural Networks for Signal
Processing XIII, pages 289-298. IEEE, New York, NY, 2003. (postscript,
gzipped postscript, pdf)
A generative distributional clustering model for continuous data is
reviewed and methods for optimizing and regularizing it are
introduced and compared. Based on pairs of auxiliary and primary
data, the primary data space is partitioned into Voronoi regions
that are maximally homogeneous in terms of auxiliary data. Then
only variation in the primary data associated with variation in the
auxiliary data influences the clusters. Because the whole primary space
is partitioned, new samples can be easily clustered in terms of
primary data alone. In experiments, the approach is shown to
produce more homogeneous clusters than alternative methods. Two
regularization methods are demonstrated to further improve the
results: An entropy-type penalty for unequal cluster sizes, and the
inclusion of a K-means component to the model. The latter can
alternatively be interpreted as special kind of joint distribution
modeling where the emphasis between discrimination and unsupervised
modeling of primary data can be tuned.