We study data fusion under the assumption that data source-specific
variation is irrelevant and only shared variation is
relevant. Traditionally the shared variation has been sought by
maximizing a dependency measure, such as correlation of linear
projections in Canonical Correlation Analysis. In this traditional
framework it is hard to tackle overfitting and model order selection,
and thus we turn to probabilistic generative modeling which makes all
tools of Bayesian inference applicable. We introduce a family of
probabilistic models for the same task, and present conditions under
which they seek dependency. We show that probabilistic CCA is a
special case of the model family, and derive a new dependency-seeking
clustering algorithm as another example. The solution is computed with
variational Bayes.
This work was supported by the Academy of Finland,
decision no. 207467, and by the IST Programme of the European
Community, under the PASCAL Network of Excellence, IST-2002-506778.