Clustering by maximizing the dependency between two paired, continuous-valued multivariate data sets is studied. The new method, associative clustering (AC), maximizes a Bayes factor between similarly parameterized models for dependent and independent cluster sets. The setup is analogous (but not identical) to that of the Information Bottleneck (IB), for which our clustering criterion offers a well-founded and asymptotically well-behaving criterion for small data sets: With suitable prior assumptions the Bayes factor becomes equivalent to the hypergeometric probability of a contingency table, while for large data it becomes the standard mutual information. An optimization algorithm is introduced, with empirical comparisons to a combination of IB and K-means, and to plain K-means. Two case studies cluster genes 1) to find dependencies between gene expression and transcription factor binding, and 2) to find dependencies between expression in different organisms.