Samuel Kaski, Janne Nikkilä, Janne Sinkkonen, Leo Lahti, Juha Knuuttila, and
Christophe Roos.
Associative clustering for exploring dependencies between functional
genomics data sets.
IEEE/ACM Transactions on Computational Biology and
Bioinformatics.
Accepted for publication.
(preprint pdf,
ps,
gzipped ps)
High-throughput genomic measurements, interpreted as co-occurring
data samples from multiple sources, open up a fresh problem for
machine learning: What is in common in the different data sets, that
is, what kind of statistical dependencies there are between the
paired samples from the different sets. We introduce a clustering
algorithm for exploring the dependencies. Samples within each data
set are grouped such that the dependencies between groups of
different sets capture as much of pairwise dependencies between the
samples as possible. We formalize this problem in a novel
probabilistic way, as optimization of a Bayes factor. The method is
applied to reveal commonalities and exceptions in the expression of
organisms, and to suggest regulatory interactions, in the form of
dependencies between gene expression profiles and regulator binding
patterns.
©2005 IEEE. Personal use of this material is permitted. However,
permission to reprint/republish this material for advertising or promotional
purposes or for creating new collective works for resale or redistribution
to servers or lists, or to reuse any copyrighted component of this work in
other works must be obtained from the IEEE.
This material is presented to ensure timely dissemination of scholarly
and technical work. Copyright and all rights therein are retained by
authors or by other copyright holders. All persons copying this
information are expected to adhere to the terms and constraints
invoked by each author's copyright. In most cases, these works may not
be reposted without the explicit permission of the copyright holder.