This work is part of a proactive information retrieval project that aims at
estimating relevance from implicit user feedback. The noisy
feedback signal needs to be complemented with all available
information, and textual content is one of the natural sources. Here
we take the first steps by investigating whether this source is at
all useful in the challenging setting of estimating the relevance of
a new document based on only few samples with known relevance. It
turns out that even sophisticated unsupervised methods like
multinomial PCA (or Latent Dirichlet Allocation) cannot help much.
By contrast, feature extraction supervised by relevant auxiliary
data may help.
This work was supported by the Academy of Finland, decision
no. 79017, and by the IST Programme of the European
Community, under the PASCAL Network of Excellence, IST-2002-506778.