Jaakko Peltonen, Janne Sinkkonen, and Samuel Kaski.
Discriminative clustering of text documents.
In: Lipo Wang, Jagath C. Rajapakse, Kunihiko Fukushima, Soo-Young Lee,
Xin Yao (eds.) Proceedings of ICONIP'02, 9th International Conference
on Neural Information Processing, volume 4, pages 1956-1960. IEEE,
Piscataway, NJ, 2002. (postscript,
gzipped postscript)
Vector-space and distributional methods for text document clustering
are discussed. Discriminative clustering, a recently proposed method,
uses external data to find task-relevant characteristics of the
documents, yet the clustering is defined even with no external data.
We introduce a distributional version of discriminative clustering that
represents text documents as probability distributions. The methods are
tested in the task of clustering scientific document abstracts, and the
ability of the methods to predict an independent topical classification
of the abstracts is compared. The discriminative methods found topically
more meaningful clusters than the vector space and distributional
clustering models.