Samuel Kaski. Discriminative clustering. In Bulletin of the International Statistical Institute. Invited
Paper Proceedings of the 54th Session, volume 2, pages 270-273.
International Statistical Institute, 2003. (postscript,
gzipped
postscript,
pdf)
Discriminative clustering (DC) uses auxiliary data to define what is
relevant in the primary data. It partitions the continuous primary
data space to local clusters that have maximally homogeneous
(categorical) auxiliary data. The task has several interpretations:
searching for maximally predictive clusters, clusters that maximize
mutual information with the auxiliary data, clusters for which
contingency tables detect optimally dependency with the auxiliary
data, or K-means clusters in the so-called Fisher or learning metric.
DC can be applied to adjust the resolution of an existing
classification, or to guide clustering with auxiliary data.