Query formulation and efficient navigation through data to reach relevant
results are undoubtedly major challenges for image or video retrieval.
Queries of good quality are typically not available and the search
process needs to rely on relevance feedback given by the user, which
makes the search process iterative. Giving explicit relevance feedback
is laborious, not always easy, and may even be impossible in
ubiquitous computing scenarios. A central question then is: Is it
possible to replace or complement scarce explicit feedback with
implicit feedback inferred from various sensors not specifically
designed for the task? In this paper, we present preliminary results
on inferring the relevance of images based on implicit feedback about
users' attention, measured using an eye tracking device. It is
shown that, in reasonably controlled setups at least, already fairly
simple features and classifiers are capable of detecting the relevance
based on eye movements alone, without using any explicit feedback.
This work was supported in part by the PASCAL2 Network of Excellence
of the European Community.
Copyright ACM, 2008. This is the author's version of the work. It is
posted here by permission of ACM for your personal use. Not for redistribution.
The definite version was published in MIR'08: Proceedings of the 1st ACM International Conference on Multimedia Information Retrieval, http://doi.acm.org/10.1145/1460096.1460120.