Jarkko Venna, and Samuel Kaski. Comparison of visualization methods for an atlas of gene expression data sets. Information Visualization, 6:139-154, 2007.
(preprint pdf)
This paper has two intertwined goals: (i) to study the feasibility
of an atlas of gene expression data sets as a visual
interface to expression databanks, and (ii) to study which
dimensionality reduction methods would be suitable for visualizing
very high-dimensional data sets. Several new methods have been
recently proposed for the estimation of data manifolds or
embeddings, but they have so far not been compared in the task of
\emph{visualization}. In visualizations the dimensionality is
constrained, in addition to the data itself, by the presentation
medium. It turns out that an older method, curvilinear components
analysis, outperforms the new ones in terms of trustworthiness of
the projections. In a sample databank on gene expression, the main
sources of variation were the differences between data sets,
different labs, and different measurement methods. This hints at a
need for better methods for making the data sets commensurable, in
accordance with earlier studies. The good news is that the
visualized overview, expression atlas, reveals many of these
subsets. Hence, we conclude that dimensionality reduction even from
1339 to 2 can produce a useful interface to gene expression
databanks.
This material is presented to ensure timely dissemination of scholarly
and technical work. Copyright and all rights therein are retained by
authors or by other copyright holders. All persons copying this
information are expected to adhere to the terms and constraints
invoked by each author's copyright. In most cases, these works may not
be reposted without the explicit permission of the copyright holder.