Nonlinear dimensionality reduction has so far been treated either as a data representation problem or as a search for a lower-dimensional manifold embedded in the data space. A main application for both is information visualization, to make visible the neighborhood or proximity relationships in the data, but neither approach has been designed to optimize this task. We give such visualization a new conceptualization as an information retrieval problem; a projection is good if neighbors of data points can be retrieved well based on the visualized projected points. This makes it possible to rigorously quantify goodness in terms of precision and recall. A method is introduced to optimize retrieval quality; it turns out to be an extension of Stochastic Neighbor Embedding, one of the earlier nonlinear projection methods, for which we give a new interpretation: it optimizes recall.
S. Kaski and J. Venna belong to the Adaptive Informatics Research
Centre, a national centre of excellence of the Academy of Finland. This work was supported in part by the IST Programme of the
European Community, under the PASCAL Network of Excellence,
IST-2002-506778. This publication only reflects the authors' views. All
rights are reserved because of other commitments.