Merja Oja, Petri Törönen, Janne Nikkilä, Eero Castrén, and Samuel Kaski. Learning metrics for SOM-based clustering and visualization of yeast gene expression data. In Bioinformatics 2002, Bergen, Norway, April 4-7, 2002. A poster. (postscript (A3 size), gzipped postscript)

We have studied the application of a new clustering and information visualization methodology to functional genomics. The large data sets produced by gene expression measurements in a variety of treatments potentially include valuable information about the function and co-regulation of genes. One of the key problems is that the important variation is hidden within all the biological and measurement noise in very high-dimensional expression spaces.

The learning metrics principle is a new approach to finding the important aspects of data. It is assumed that changes of gene expression are important only to the extent that they cause changes in certain auxiliary data, in this study the functional classes of the genes. The metric of the expression data space is changed in a rigorous fashion to measure only the important changes. The new metric can be used in many standard data analysis methods. Here we will use the Self-Organizing Map (SOM) for clustering and visualization of gene expression.

We cluster the yeast genes based on their expression in a set of knock-out mutation experiments. The methodological novelty in the study is that, for the normalized gene expression, the new metric needs be defined on a very high-dimensional hypersphere.

The SOM computed in the new metric is more accurate in modeling the functional classes of the genes. The visualizations are as intuitive as in the usual inner product (correlation) metric. For example, genes involved in pheromone response, ribosomal RNA transcription, and in amino-acid metabolism were clearly clustered by both metrics, and genes involved in peroxisomal transport appeared to be clustered better by the new metric than by the correlation metric.

In summary, the learning metric principle appears to be promising for focusing the analysis to more functionally meaningful variation in gene expression.


Last modified: Wed Mar 9 08:27:10 EET 2005