Jaakko Peltonen, Janne Sinkkonen, and Samuel Kaski. Discriminative clustering of text documents. In: Lipo Wang, Jagath C. Rajapakse, Kunihiko Fukushima, Soo-Young Lee, Xin Yao (eds.) Proceedings of ICONIP'02, 9th International Conference on Neural Information Processing, volume 4, pages 1956-1960. IEEE, Piscataway, NJ, 2002. (postscript, gzipped postscript)

Vector-space and distributional methods for text document clustering are discussed. Discriminative clustering, a recently proposed method, uses external data to find task-relevant characteristics of the documents, yet the clustering is defined even with no external data. We introduce a distributional version of discriminative clustering that represents text documents as probability distributions. The methods are tested in the task of clustering scientific document abstracts, and the ability of the methods to predict an independent topical classification of the abstracts is compared. The discriminative methods found topically more meaningful clusters than the vector space and distributional clustering models.