Janne Sinkkonen, Samuel Kaski, and Janne Nikkilä.
Discriminative Clustering: Optimal Contingency Tables by Learning Metrics.
Proceedings of the ECML'02, 13th European Conference on Machine Learning.
(postscript,
gzipped postscript)
The learning metrics principle describes a way to derive
metrics to the data space from paired data. Variation of the primary
data is assumed relevant only to the extent it causes changes in the
auxiliary data. Discriminative clustering finds clusters of
primary data that are homogeneous in the auxiliary data. In this
paper, discriminative clustering using a mutual
information criterion is shown to be asymptotically equivalent to
vector quantization in learning metrics. We also present a new,
finite-data variant of discriminative clustering and show that it
builds contingency tables that detect optimally statistical
dependency between the clusters and the auxiliary data. A
finite-data algorithm is demonstrated to outperform the older mutual
information maximizing variant.