Janne Sinkkonen, Juuso Parkkinen, Janne Aukia and Samuel Kaski.
A simple infinite topic mixture for rich graphs and relational data.
Presentation in the NIPS 2008 Workshop on Analyzing Graphs: Theory and Applications, December 12, Whistler, Canada. (pdf extended abstract)

We propose a simple component or "topic" model for relational data, that is, for heterogeneous collections of co-occurrences between categorical variables. Graphs are a special case, as collections of dyadic co-occurrences (edges) over a set of vertices. The model is especially suitable for finding global components from collections of massively heterogeneous data, where encoding all the relations to a more sophisticated model becomes cumbersome, as well as for quick-and- dirty modeling of graphs enriched with, e.g., link properties or nodal attributes. The model is here estimated with collapsed Gibbs sampling, which allows sparse data structures and good memory efficiency for large data sets. Other inference methods should be straightforward to implement. We demonstrate the model with various medium-sized data sets (scientific citation data, MovieLens ratings, protein interactions), with brief comparisons to a full relational model and other approaches.