RESEARCH ON STATISTICAL MACHINE LEARNING AND DATA MINING
by the
Statistical Machine Learning and Bioinformatics
group.
The group develops machine learning methods for discriminative
generative modeling, data fusion by modeling dependencies between data
sets, supervised unsupervised learning, models for defining and
extracting "relevant" signals from data (see an illustrative poster), various other probabilistic
models, dimensionality reduction, and information visualization (see an illustrative poster). Methods we have developed so far include supervised
(discriminative and associative) clustering, relevant component
analysis, principle of learning metrics (some
demos here), and discriminative EM.
|
|
PUBLICATIONS
2010
-
Abhishek Tripathi, Arto Klami, Matej Orešič, Samuel Kaski. Matching samples of multiple views. Accepted for Data Mining and Knowledge Discovery.
-
Ilkka Huopaniemi, Tommi Suvitaival, Matej Orešič, and Samuel Kaski. Graphical multi-way models . Accepted to ECML PKDD 2010.
-
Arto Klami. Inferring Task-relevant Image Regions from Gaze Data. In IEEE International Workshop on Machine Learning for Signal Processing (MLSP), pp. 101-106, 2010.
-
Abhishek Tripathi, Arto Klami, Sami Virpioja. Bilingual Sentence Matching Using Kernel CCA. In IEEE International Workshop on Machine Learning for Signal Processing (MLSP), pp. 130-135, 2010.
-
Juuso Parkkinen, Kristian Nybo, Jaakko Peltonen and Samuel Kaski. Graph Visualization With Latent Variable Models.
To appear in MLG 2010.
(preprint pdf)
-
Arto Klami, Seppo Virtanen, and Samuel Kaski. Bayesian Exponential Family Projections for Coupled Data Sources. In Uncertainty in Artificial Intelligence (UAI) 2010. (pdf)
- Ilkka Huopaniemi, Tommi Suvitaival, Janne Nikkilä, Matej Orešič, and Samuel Kaski. Multivariate multi-way analysis of multi-source data. Bioinformatics 2010 26: i391-i398 & ISMB 2010 (pdf) (Recommended)
-
José Caldas and Samuel Kaski.
Hierarchical generative biclustering for microRNA expression analysis.
In Proceedings of the 14th International Conference on Research in Computational Molecular Biology (RECOMB), 2010.
(html)
- Jaakko Peltonen, Helena Aidos, Nils Gehlenborg, Alvis Brazma, and Samuel Kaski. An Information Retrieval
Perspective on Visualization of Gene Expression Data with Ontological Annotation. In Proceedings of the
2010 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2010), accepted
for publication.
(abstract,
preprint pdf)
- Jarkko Venna, Jaakko Peltonen, Kristian Nybo, Helena Aidos, and Samuel Kaski.
Information Retrieval Perspective to Nonlinear Dimensionality Reduction for
Data Visualization. Journal of Machine Learning Research, 11:451-490, 2010.
(abstract,
preprint pdf,
final pdf at JMLR)
(Recommended)
-
Juuso Parkkinen and Samuel Kaski. Searching for functional gene modules with interaction component models.
BMC Systems Biology 2010, 4:4. (html)
(Recommended)
-
Simon Rogers, Arto Klami, Janne Sinkkonen, Mark Girolami, Samuel Kaski. Infinite Factorization of Multiple Non-parametric Views. Machine Learning 2010, DOI 10.1007/s10994-009-5155-1.
(online)
(Recommended)
2009
- Jaakko Peltonen, Yusuf Yaslan, and Samuel Kaski.
Relevant subtask learning by constrained mixture models.
Intelligent Data Analysis, to appear.
(abstract,
preprint pdf)
- Ilkka Huopaniemi, Tommi Suvitaival, Janne Nikkilä, Matej Orešič, and Samuel Kaski. Multi-Way, Multi-View Learning. Talk in the NIPS 2009 workshop on Learning from Multiple Sources with Application to Robotics, December 12, Whistler, Canada. (pdf)
- László Kozma, Arto Klami, and Samuel Kaski:
GaZIR: Gaze-based Zooming Interface for Image Retrieval.
In Proceedings of 11th Conference on Multimodal Interfaces and The Sixth Workshop on Machine Learning for Multimodal Interaction (ICMI-MLMI),
2009.
(abstract, pdf)
(Recommended)
- Antti Ajanki, David R. Hardoon, Samuel Kaski, Kai Puolamäki
and John Shawe-Taylor:
Can Eyes Reveal Interest? - Implicit Queries from Gaze Patterns.
User Modeling and User-Adapted Interaction, 2009.
(abstract, DOI)
- Ilkka Huopaniemi, Tommi Suvitaival, Janne Nikkilä,Matej Orešič, and Samuel Kaski. Two-way analysis of high-dimensional collinear data.
In Data Mining and Knowledge Discovery 19(2):261-276, 2009 (pdf)
(Recommended)
- Leo Lahti, Samuel Myllykangas, Sakari Knuutila, and Samuel Kaski. Dependency detection with similarity constraints.
In Proc. MLSP'09 IEEE International Workshop on Machine Learning for Signal
Processing, to appear. (preprint pdf)
- Kai Puolamäki and Samuel Kaski.
Bayesian solutions to the label switching problem.
In N. Adams, C. Robardet, A. Siebes, and J.-F. Boulicaut,
editors, Advances in Intelligent Data Analysis VIII, Proceedings of the
8th International Symposium on Intelligent Data Analysis, IDA 2009,
pages 381-392, Berlin, 2009. Springer. (pdf)
- Jaakko Peltonen, Jarkko Venna, and Samuel Kaski.
Visualizations for Assessing Convergence and Mixing of Markov
Chain Monte Carlo Simulations. Computational
Statistics and Data Analysis, 53, 4453-4470, 2009.
(abstract,
preprint pdf,
final version on publisher pages)
© Elsevier B. V.
-
Juuso Parkkinen, Adam Gyenge, Janne Sinkkonen and Samuel Kaski. A block model suitable for sparse graphs.. In MLG 2009, The 7th International Workshop on Mining and Learning with Graphs, Leuven, Belgium, July 2-4,2009.
(abstract, pdf extended abstract).
- Jaakko Peltonen. Visualization by Linear Projections as Information
Retrieval. In José Príncipe and Risto Miikkulainen, editors,
Advances in Self-Organizing Maps (proceedings of WSOM 2009), pages 237-245.
Springer, Berlin Heidelberg, 2009.
(abstract,
preprint pdf,
final paper on Springer pages)
© Springer-Verlag Berlin Heidelberg 2009
- José Caldas, Nils Gehlenborg, Ali Faisal, Alvis Brazma, and Samuel
Kaski. Probabilistic retrieval and visualization of biologically relevant
microarray experiments. Bioinformatics, 25(12): i145-i153, 2009.
(html).
See also: Software,
Poster (best poster award at the 5th ISCB Student Council Symposium).
(Recommended)
- Jarkko Ylipaavalniemi, Eerika Savia, Sanna Malinen, Riitta Hari,
Ricardo Vigário, and Samuel Kaski. Dependencies between stimuli and spatially independent fMRI
sources: Towards brain correlates of natural stimuli. To appear in
NeuroImage, in press.
(abstract,
DOI)
(Recommended)
-
Jaakko Peltonen, Helena Aidos, and Samuel Kaski. Supervised Nonlinear
Dimensionality Reduction by Neighbor Retrieval. In the Proceedings of the 34th IEEE
International Conference on Acoustics, Speech, and Signal Processing
(ICASSP 2009), pp. 1809-1812, 2009.
(abstract,
preprint pdf)
-
Eerika Savia, Arto Klami and Samuel Kaski. Fast Dependent Components for fMRI Analysis. In the IEEE 2009
International Conference on Acoustics, Speech, and Signal Processing
(ICASSP 2009), pp. 1737-1740, 2009.
(abstract, preprint pdf)
-
Abhishek Tripathi, Arto Klami and Samuel Kaski. Using Dependencies to Pair
Samples for Multi-View Learning. In the IEEE 2009
International Conference on Acoustics, Speech, and Signal Processing
(ICASSP 2009), in press.
(abstract, preprint pdf)(Runner-up Best Student Paper Award)
(Recommended)
-
Eerika Savia, Kai Puolamäki and Samuel Kaski. Two-Way Grouping by One-Way Topic Models. In the Proceedings of IDA 2009, The 8th International Symposium on
Intelligent Data Analysis, to appear.
(abstract, preprint pdf)
2008
-
Gayle Leen and Colin Fyfe. Learning shared and separate features of two related data sets using GPLVM's. Poster in the NIPS 2008 Learning from Multiple Sources
Workshop, December 13, Whistler, Canada.
(pdf extended abstract)
-
Jaakko Peltonen, Yusuf Yaslan, and Samuel Kaski. Variational Bayes Learning from
Relevant Tasks Only. Poster in the NIPS 2008 Learning from Multiple Sources
Workshop, December 13, Whistler, Canada.
(abstract,
pdf extended abstract)
- Janne Sinkkonen, Juuso Parkkinen, Janne Aukia and Samuel Kaski. A simple infinite topic mixture for rich graphs and relational data. Poster in the NIPS 2008 Workshop on Analyzing Graphs: Theory and Applications, December 12, Whistler, Canada.
(abstract,
pdf extended abstract)
-
Simon Rogers, Janne Sinkkonen, Arto Klami, Mark Girolami, and Samuel Kaski. Two-level infinite mixture for multi-domain data. In the NIPS 2008 Workshop on Learning from Multiple Sources, 2008.
(extended abstract).
-
Arto Klami and Samuel Kaski. Probabilistic approach to detecting
dependencies between data sets. Neurocomputing, 72:1-3, pp. 39-46,
2008. (abstract,
DOI).
-
José Caldas and Samuel Kaski.
Bayesian biclustering with the plaid model.
In proceedings of the IEEE International Workshop on Machine Learning for Signal Processing XVIII (MLSP), Cancún, Mexico, pages 291-296, 2008.
(html)
-
Eerika Savia, Kai Puolamäki and Samuel Kaski. Latent Grouping Models
for User Preference Prediction. Published online in Machine Learning,
September 2008. In Press.
(abstract,
DOI).
-
Arto Klami. Modeling of Mutual Dependencies. D.Sc. thesis. Dissertations in Information and Computer Science, Report D6. Espoo, Finland, 2008.
-
Janne Sinkkonen, Janne Aukia and Samuel Kaski. Infinite mixtures for
multi-relational categorical data. In MLG 2008, The 6th International
Workshop on Mining and Learning with Graphs, Helsinki, July 4-5,
2008. (pdf).
- Kai Puolamäki, Antti Ajanki, and Samuel Kaski:
Learning to Learn Implicit Queries from Gaze Patterns.
International Conference on Machine Learning (ICML 2008),
Helsinki, Finland, July 5-9, 2008.
(abstract, pdf)(Recommended)
- Abhishek Tripathi, Arto Klami and Samuel Kaski. Simple integrative preprocessing preserves what is shared in data sources. BMC
Bioinformatics, 2008,9:111.(html)
2007
- Janne Aukia, Samuel Kaski and Janne Sinkkonen. Inferring vertex properties from topology in large networks. Poster in the
NIPS 2007 Workshop on Statistical Network Models, December 8, Whistler, Canada.
(pdf extended abstract,
poster)
- Juuso Parkkinen and Samuel Kaski. Searching for functional gene
modules with interaction component models. Presentation in the
NIPS 2007 Workshop on Machine Learning in Computational Biology, December 7, Whistler, Canada.
(abstract,
pdf extended abstract)
-
Samuel Kaski and Jaakko Peltonen. Learning from Relevant Tasks Only.
In Joost N. Kok, Jacek Koronacki, Ramon Lopez de Mantaras, Stan Matwin, Dunja Mladenic, and Andrzej Skowron, editors,
Machine Learning: ECML 2007 (Proceedings of the 18th European Conference on Machine Learning), Lecture Notes in Artificial Intelligence 4701, pages 608-615. Springer-Verlag, Berlin, Germany, 2007.
(abstract, preprint pdf, final paper on Springer pages)
©2007
Springer-Verlag. (Recommended)
-
Jaakko Peltonen, Jacob Goldberger, and Samuel Kaski. Fast Semi-supervised
Discriminative Component Analysis. In Konstantinos Diamantaras, Tülay Adali,
Ioannis Pitas, Jan Larsen, Theophilos Papadimitriou, and Scott Douglas, editors,
Machine Learning for Signal Processing XVII, pages 312-317. IEEE, 2007.
(abstract, preprint pdf)
-
Janne Sinkkonen, Janne Aukia and Samuel Kaski. Inferring vertex properties
from topology in large networks. In MLG'07, The 5th International
Workshop on Mining and Learning with Graphs, Firenze, Aug 1-3,
2007. (pdf).
(Recommended)
-
Jarkko Venna. Dimensionality Reduction for Visual Exploration of
Similarity Structures. D.Sc. thesis. Dissertations in Computer and Information Science, Report D20. Espoo, Finland, 2007.
-
Kristian Nybo, Jarkko Venna and Samuel Kaski. The self-organizing map as a visual neighbor retrieval method. In Proceedings of 6th Int. Workshop on Self-Organizing Maps (WSOM '07). Bielefeld University, Bielefeld, Germany, 2007.
(pdf)
-
Arto Klami and Samuel Kaski. Local Dependent Components. In
Zoubin Ghahramani (Ed.), Proceedings of the 24th International
Conference on Machine Learning (ICML 2007), pp. 425-433. Omni
Press, 2007. (abstract, pdf) (Recommended)
-
Jarkko Ylipaavalniemi, Eerika Savia, Ricardo Vigário and Samuel Kaski. Functional Elements and Networks in fMRI. Proceedings of the 15th European Symposium on Artificial Neural Networks (ESANN 2007), pages 561-566, Bruges, Belgium, April 2007. (abstract, pdf)
- Jarkko Venna and Samuel Kaski. Nonlinear Dimensionality
Reduction as Information Retrieval. In Marina Meila and
Xiaotong Shen, editors, Proceedings of AISTATS 2007, the 11th
International Conference on Artificial Intelligence and Statistics. Omnipress, 2007. JMLR Workshop and Conference
Proceedings, Volume 2: AISTATS 2007.
(abstract,
pdf) (Recommended)
- Jarkko Venna, and Samuel Kaski. Comparison of visualization methods for an atlas of gene expression data sets.
Information Visualization, 6:139-154, 2007.
(abstract,
preprint pdf)
- David R. Hardoon, John Shawe-Taylor, Antti Ajanki, Kai
Puolamäki, and Samuel Kaski:
Information Retrieval by Inferring Implicit Queries from Eye Movements.
In Marina Meila and Xiaotong Shen, editors, Proceedings of AISTATS 2007, the 11th International Conference on International
Conference on Artificial Intelligence and Statistics. Omnipress, 2007. JMLR Workshop and Conference
Proceedings, Volume 2: AISTATS 2007. (abstract,
pdf)
2006
- Jarkko Venna and Samuel Kaski. Nonlinear dimensionality reduction viewed as information retrieval. Poster in the
NIPS 2006 workshop on Novel Applications of Dimensionality Reduction, December 9, Whistler, Canada.
(abstract,
pdf extended abstract,
pdf poster in A0 size)
- Jaakko Peltonen and Samuel Kaski. Learning when only some of the
training data are from the same distribution as test data. Poster in the
NIPS 2006 workshop on Learning when test and training inputs have different
distributions, December 9, Whistler, Canada.
(abstract,
pdf extended abstract,
pdf poster in A0 size)
- Jaakko Peltonen, Jacob Goldberger, and Samuel Kaski. Fast
Discriminative Component Analysis for Comparing Examples. In NIPS 2006
workshop on Learning to Compare Examples, December 8, Whistler, Canada.
(abstract,
pdf)
- Arto Klami and Samuel Kaski. Generative models that discover dependencies between data sets. In S. McLoone, T. Adali, J. Larsen, M. Van Hulle, A. Rogers, S.C. Douglas, editors, Machine Learning for Signal Processing XVI, pages 123-128. IEEE, 2006. (abstract, preprint pdf) (Recommended)
- Jarkko Venna and Samuel Kaski. Local multidimensional scaling. Neural Networks, 19, pp 889--899, 2006. (abstract, preprint pdf) (Recommended)
- Jarkko Venna and Samuel Kaski. Visualizing Gene Interaction Graphs with Local Multidimensional Scaling. In Michel Verleysen,
editor, Proceedings of the 14th European Symposium on Artificial
Neural Networks (ESANN'2006), pages 557--562, Bruges, 2006. (abstract, preprint pdf)
2005
- Samuel Kaski. From learning metrics towards dependency
exploration. In Proceedings of WSOM'05, 5th Workshop On Self-Organizing
Maps, pages 307--314. Paris, 2005. (abstract, preprint pdf; a summary of underlying
motivations)
- Jarkko Venna and Samuel Kaski. Local multidimensional scaling with controlled tradeoff between
trustworthiness and continuity. In Proceedings of the 5th Workshop on Self-Organizing Maps (WSOM'2005), pages. 695--702, Paris, 2005(abstract, preprint pdf)
- Samuel Kaski, Janne Nikkilä, Janne Sinkkonen, Leo Lahti, Juha
Knuuttila, and Christophe Roos.
Associative clustering for exploring dependencies between functional
genomics data sets.
IEEE/ACM Transactions on Computational Biology and
Bioinformatics, 2:203-216, 2005. (abstract,
preprint pdf,
ps,
gzipped ps;
the most thorough description of associative clustering, including three
bioinformatics case studies)
(Recommended)
- J. Salojärvi, K. Puolamäki, S. Kaski:
Expectation Maximization Algorithms for Conditional Likelihoods .
In Luc De Raedt and Stefan Wrobel, editors,
Proceedings of the 22nd International Conference on Machine
Learning (ICML 2005), pp. 753-760. ACM press, New York, USA. 2005. (abstract, pdf).(Recommended)
- E. Savia, K. Puolamäki, J. Sinkkonen and S. Kaski:
Two-Way Latent Grouping Model for User Preference Prediction .
In: Fahiem Bacchus and Tommi Jaakkola, editors,
Proceedings of the 21st Conference on Uncertainty in Artificial Intelligence (UAI 2005),
pp. 518-525. AUAI press, Corvallis, Oregon, USA. 2005. (abstract, pdf).(Recommended)
- J. Salojärvi, K. Puolamäki, S. Kaski:
On Discriminative Joint Density Modeling.
In: Gama, Camacho, Brazdil, Jorge, Torgo (eds.): Machine Learning: ECML 2005. (Proceedings of 16th European Conference on Machine Learning),
Lecture Notes in Artificial Intelligence 3270, pages 341-352. Springer-Verlag, Berlin, Germany. 2005. (abstract,
pdf,DOI).
- Janne Nikkilä, Christophe Roos, and Samuel Kaski.
Integration of transcription factor binding and gene expression by associative clustering.
In Proceedings of Symposium of Knowledge Representation in
Bioinformatics. Espoo, Finland, 15.-17. June 2005.
To appear.
-
Janne Sinkkonen, Samuel Kaski Janne Nikkilä, and Leo Lahti. Associative
Clustering (AC): Technical Details.
Technical Report A84, Helsinki University of Technology,
Publications in Computer and Information Science, Espoo, Finland, April 2005.
(ps,
pdf;
accompanying report including only additional technical details and derivations of the method.)
- Jarkko Venna, and Samuel Kaski.
Visualized atlas of a gene expression databank
In Proceedings of Symposium of Knowledge Representation in
Bioinformatics. Espoo, Finland, 15.-17. June 2005.
(abstract,
pdf)
- Arto Klami and Samuel Kaski. Non-parametric dependent components.
In Proceedings of ICASSP'05, IEEE International Conference on
Acoustics, Speech, and Signal Processing,
pages V-209 - V-212, IEEE, 2005.
(abstract,
pdf; a generalization of canonical correlation analysis for non-Gaussian data)
- Samuel Kaski, Janne Sinkkonen, and Arto Klami. Discriminative
clustering. Neurocomputing, 69:18-41, 2005.
(preprint abstract,
preprint pdf; publisher's site; the most
comprehensive presentation on DC) (Recommended)
- Samuel Kaski, Janne Nikkilä, Eerika Savia, and Christophe Roos.
Discriminative clustering of yeast stress response.
In Udo Seiffert, Lakhmi Jain, and Patric Schweizer, editors,
Bioinformatics using Computational Intelligence Paradigms, pages
75-92. Springer, Berlin, 2005.
(preprint abstract,
preprint pdf; bioinformatics
application of DC, originally
submitted in 2003)
- Jaakko Peltonen and Samuel Kaski. Discriminative Components of Data.
IEEE Transactions on Neural Networks, 16:68-83, 2005.
(preprint abstract,
preprint pdf,
final paper on IEEE pages)
(A generalization of linear discriminant analysis for data visualization.) (Recommended)
2004
- Jaakko Peltonen, Arto Klami, and Samuel Kaski. Improved Learning
of Riemannian Metrics for Exploratory Analysis. Neural
Networks, vol. 17, pages 1087-1100, 2004.
(preprint abstract,
preprint gzipped
postscript,
Elsevier page linking the final paper,
erratum to final paper on Elsevier pages)
(Recommended)
(Review of theory;
better distance approximations; and application to self-organizing maps and
Sammon's mapping) © Elsevier Ltd.
- Jaakko Peltonen. Data Exploration with Learning Metrics. D.Sc. thesis.
Dissertations in Computer and
Information Science, Report D7. Espoo, Finland, 2004.
- Janne Sinkkonen, Janne Nikkilä, Leo Lahti, and Samuel
Kaski. Associative Clustering.
In:
Boulicaut, Esposito, Giannotti,Pedreschi (eds.): Machine Learning:
ECML2004 ( Proceedings of 15th European Conference on Machine
Learning), Lecture Notes in Computer Science 3201, pages 396-406,
2004.
(abstract,
pdf)
(Clustering continuous data by dependency between two sets, with
applications to gene homology detection.)
- Jaakko Peltonen, Janne Sinkkonen, and Samuel Kaski. Sequential
Information Bottleneck for Finite Data. In: Russ Greiner and Dale Schuurmans, editors,
Proceedings of the Twenty-First International Conference on Machine Learning (ICML 2004),
pp. 647-654, Omnipress, Madison, WI, 2004.
(abstract,
pdf)
(Finite-data version of Sequential Information Bottleneck, based on a Bayes factor.)
(Recommended)
- Samuel Kaski and Janne Sinkkonen. Principle of learning
metrics for data analysis. Journal of VLSI Signal
Processing, special issue on Machine Learning for Signal Processing,
vol 37, pp. 177-188.
(abstract,
postscript (draft), gzipped postscript (draft),
Kluwer page linking the PDF)
(Recommended) © Kluwer
(Overview, some new asymptotic theory, and sketches of new
directions. Note: sumbitted in 2002)
2003
- Jaakko Peltonen, Janne Sinkkonen, and Samuel Kaski. Finite Sequential
Information Bottleneck (fsIB). Technical Report A74, Helsinki University
of Technology, Publications in Computer and Information Science, Espoo, Finland,
December 2003. (postscript,
gzipped postscript)
(Longer preliminary version of the ICML paper, containing some proofs omitted from it for brevity.)
- Janne Sinkkonen. Learning Metrics and Discriminative Clustering.
PhD thesis. Dissertations on Computer and Information Science, report D2.
Espoo, Finland, 2003.
- Samuel Kaski, Janne Nikkilä, Merja Oja, Jarkko Venna, Petri
Törönen, and Eero Castren.
Trustworthiness and metrics in visualizing similarity of gene
expression.
BMC Bioinformatics, 4:48, 2003.
(Includes an application of learning metrics.)
- Jarkko Salojärvi, Ilpo Kojo, Jaana Simola and Samuel Kaski. Can
relevance be inferred from eye movements in information retrieval ? In Proceedings
of the Workshop on Self-Organizing Maps (WSOM'03), Hibikino,
Kitakyushu, Japan, September 2003. pp. 261-266. (abstract,postscript,gzipped
postscript) (Includes an application of learning metrics.)
- Jaakko Peltonen, Arto Klami and Samuel Kaski. Learning
Metrics for Information Visualization. In Proceedings of the
Workshop on Self-Organizing Maps (WSOM'03),
Hibikino, Kitakyushu, Japan, September 2003. pp. 213-218. (abstract,postscript, gzipped
postscript) (An extension of previous work, with a new distance
computation algorithm applicable to e.g. Sammon's mapping.)
- Janne Sinkkonen, Janne Nikkilä, Leo Lahti and Samuel
Kaski. Associative Clustering by Maximizing a Bayes Factor.
Technical Report A68, Helsinki University of Technology, Publications in
Computer and Information Science, Espoo, Finland, June 2003. (postscript, gzipped
postscript) (Extension of DC to clustering of both margins of
continuous co-occurrence data.)
- Samuel Kaski. Discriminative clustering. In Bulletin
of the International Statistical Institute. Invited Paper Proceedings
of the 54th Session, volume 2, pages 270-273. International
Statistical Institute, 2003. (abstract, postscript, gzipped
postscript, pdf)
(Overview of our recent work on DC.)
- Samuel Kaski and Jaakko Peltonen. Informative
discriminant analysis. In: Tom Fawcett and Nina Mishra, editors, Proceedings
of the Twentieth International Conference on Machine Learning
(ICML-2003), pp. 329-336, AAAI Press, Menlo Park, CA, 2003. (abstract,postscript, gzipped
postscript, pdf) (A
generalization of linear discriminant analysis for data visualization.)
- Samuel Kaski, Janne Sinkkonen, and Arto Klami. Regularized
Discriminative Clustering. In C. Molina, T. Adali, J.
Larsen, M. Van Hulle, editors, Neural Networks for Signal
Processing XIII, pages 289-298. IEEE, New York, NY, 2003. (abstract,postscript, gzipped
postscript, pdf)
(Regularization and a tunable compromise between K-means and DC.)
- Jarkko Salojärvi, Samuel Kaski and Janne Sinkkonen. Discriminative
clustering in Fisher metrics. In: O. Kaynak, E. Alpaydin, E. Oja,
L. Xu, editors, Artificial Neural Networks and Neural
Information Processing - Supplementary proceedings ICANN/ICONIP
2003, Istanbul, Turkey, June, pp. 161-164. (abstract,postscript, gzipped
postscript) (A method to improve clustering results of DC,
motivated by the asymptotic connection to Fisher or learning metrics.)
- Jarkko Venna, Samuel Kaski and Jaakko Peltonen. Visualizations
for Assessing Convergence and Mixing of MCMC. N. Lavrac, D.
Gamberger, H. Blockeel, L. Todorovski, Editors,Proceedings of the
14th European Conference on Machine Learning (ECML 2003), pp.
432-443. Springer, Berlin, 2003. ( abstract,postscript, gzipped
postscript) (A method to visually analyze MCMC simulations)
- Jarkko Venna and Samuel Kaski. Visualizing
high-dimensional posterior distributions in Bayesian modeling. In:
O. Kaynak, E. Alpaydin, E. Oja, L. Xu, editors, Artificial
Neural Networks and Neural Information Processing - Supplementary
proceedings ICANN/ICONIP 2003, Istanbul, Turkey, June, pp.
165-168. (abstract,postscript,gzipped
postscript) (A method to visualize high-dimensional posterior
distributions using self-organizing maps in Fisher metric.)
2002
- Jaakko Peltonen, Janne Sinkkonen, and Samuel Kaski. Discriminative
clustering of text documents. In: Lipo Wang, Jagath C. Rajapakse,
Kunihiko Fukushima, Soo-Young Lee, Xin Yao (eds.) Proceedings of
ICONIP'02, 9th International Conference on Neural Information Processing,
volume 4, pages 1956-1960. IEEE, Piscataway, NJ, 2002. (abstract,postscript, gzipped
postscript) (An extension of discriminative clustering to textual
data.)
- Jaakko Peltonen, Arto Klami, and Samuel Kaski. Learning
More Accurate Metrics for Self-Organizing Maps. In José R.
Dorronsoro, editor, Artificial Neural Networks - ICANN 2002,
International Conference, Madrid, Spain, August 2002, Proceedings, pp.
999-1004. Springer, 2002. (abstract,postscript, gzipped
postscript) ©
Springer-Verlag (Improved estimates and approximations for
Self-Organizing Maps that learn metrics, with more extensive testing.)
- Janne Sinkkonen, Samuel Kaski, and Janne Nikkilä. Discriminative
Clustering: Optimal Contingency Tables by Learning Metrics. In:
Tapio Elomaa, Heikki Mannila, Hannu Toivonen (eds.) Machine
Learning: ECML 2002 (Proceedings of the ECML'02, 13th European
Conference on Machine Learning), Lecture Notes in Artificial
Intelligence 2430, Springer, Berlin, pp. 418-430, 2002. (abstract,postscript, gzipped
postscript) ©
Springer-Verlag (Finite-data theory of DC. Also connects DC to
learning metrics and introduces a new algorithm based on the generative
interpretation.)
- Janne Sinkkonen and Samuel Kaski. Clustering based on
conditional distributions in an auxiliary space. Neural
Computation, 14:217-239, 2002. (abstract, postscript, gzipped postscript)
(Recommended) (Infinite-data theory
of DC.)
2001
- Samuel Kaski, Janne Sinkkonen, and Jaakko Peltonen. Bankruptcy
analysis with self-organizing maps in learning metrics. IEEE
Transactions on Neural Networks, 12:936-947, 2001. (preprint abstract,
preprint postscript,
preprint gzipped
postscript, final
paper on IEEE pages) (Recommended; see
also the ICANN'02 paper above.)
- Samuel Kaski, Janne Sinkkonen, and Jaakko Peltonen. Learning
metrics for self-organizing maps. In Proceedings of IJCNN'01,
International Joint Conference on Neural Networks, pages 914-919.
IEEE, Piscataway, NJ, 2001. (abstract,postscript, gzipped
postscript) (Short version of the previous paper)
- Samuel Kaski. Learning metrics for exploratory data
analysis. In David Miller, Tulay Adali, Jan Larsen, Marc Van Hulle,
and Scott Douglas, editors, Neural Networks for Signal Processing
XI, Proceedings of the 2001 IEEE Signal Processing Society Workshop,
pages 53-62. IEEE, New York, NY, 2001. (abstract, postscript,gzipped postscript) (Plenary, an overview)
- Samuel Kaski and Janne Sinkkonen. A
topography-preserving latent variable model with learning metrics. In
N. Allinson, H. Yin, L. Allinson, and J. Slack, editors, Advances in
Self-Organizing Maps, pages 224-229. Springer, London, 2001. (abstract, postscript,gzipped postscript) (New preliminary work)
- Samuel Kaski, Janne Sinkkonen, and Janne Nikkilä. Clustering
gene expression data by mutual information with gene function. In
Georg Dorffner, Horst Bischof, and Kurt Hornik, editors, Artificial
Neural Networks - ICANN 2001, pages 81-86. Springer, Berlin, 2001. (abstract, postscript,gzipped postscript) (Short version of the
application in the 2001 Neural Computation paper)
2000
- Samuel Kaski. Convergence of a stochastic
semisupervised clustering algorithm. Technical Report A62, Helsinki
University of Technology, Publications in Computer and Information
Science, Espoo, Finland, November 2000. (postscript,gzipped
postscript) (Convergence proof for the Neural Computation 2002 paper)
- Janne Sinkkonen and Samuel Kaski. Clustering by
similarity in an auxiliary space. In Proceedings of IDEAL 2000,
Second International Conference on Intelligent Data Engineering and
Automated Learning. Springer, 2000. In press. (abstract, postscript, gzipped
postscript) (Earlier version of the algorithm in the 2001 Neural
Computation paper)
- Samuel Kaski and Janne Sinkkonen. Metrics that learn
relevance. In Proceedings of IJCNN-2000, International Joint
Conference on Neural Networks, volume V, pages 547-552. IEEE Service
Center, Piscataway, NJ, 2000. (abstract, postscript, gzipped
postscript, errata)
(First presentation of the ideas)
- Janne Sinkkonen and Samuel Kaski. Semisupervised
clustering based on conditional distributions in an auxiliary space.
Technical Report A60, Helsinki University of Technology, Publications in
Computer and Information Science, Espoo, Finland, 2000. (abstract, postscript, gzipped postscript)
(Earlier version of the 2001 Neural Computation paper, with an
application to text documents)
This material is presented to ensure timely dissemination of scholarly
and technical work. Copyright and all rights therein are retained by
authors or by other copyright holders. All persons copying this
information are expected to adhere to the terms and constraints invoked
by each author's copyright. In most cases, these works may not be
reposted without the explicit permission of the copyright holder.
Page maintained by jve at cis.hut.fi,
last updated Friday, 24-Sep-2010 10:39:59 EEST