Janne Sinkkonen, Samuel Kaski, and Janne Nikkilä. Discriminative Clustering: Optimal Contingency Tables by Learning Metrics. Proceedings of the ECML'02, 13th European Conference on Machine Learning. (postscript, gzipped postscript)

The learning metrics principle describes a way to derive metrics to the data space from paired data. Variation of the primary data is assumed relevant only to the extent it causes changes in the auxiliary data. Discriminative clustering finds clusters of primary data that are homogeneous in the auxiliary data. In this paper, discriminative clustering using a mutual information criterion is shown to be asymptotically equivalent to vector quantization in learning metrics. We also present a new, finite-data variant of discriminative clustering and show that it builds contingency tables that detect optimally statistical dependency between the clusters and the auxiliary data. A finite-data algorithm is demonstrated to outperform the older mutual information maximizing variant.