nnsp03

Samuel Kaski. Discriminative clustering. In Bulletin of the International Statistical Institute. Invited Paper Proceedings of the 54th Session, volume 2, pages 270-273. International Statistical Institute, 2003. (postscript, gzipped postscript, pdf)

Discriminative clustering (DC) uses auxiliary data to define what is relevant in the primary data. It partitions the continuous primary data space to local clusters that have maximally homogeneous (categorical) auxiliary data. The task has several interpretations: searching for maximally predictive clusters, clusters that maximize mutual information with the auxiliary data, clusters for which contingency tables detect optimally dependency with the auxiliary data, or K-means clusters in the so-called Fisher or learning metric. DC can be applied to adjust the resolution of an existing classification, or to guide clustering with auxiliary data.