Learning to Translate

Reaching good quality machine translation (MT) is difficult and the development of a traditional MT system requires a lot of human effort. However, the availability of large corpora makes it possible to use various probabilistic and statistical methods to let the system generate the necessary resources automatically.

Within-language translation

In addition to the traditional translation from one language to another, we develop methods that enable translations or interpretations within one language.

The underpinning idea is the fact that contextual, experiential and/or disciplinary diversity impede interpersonal communication and understanding. Therefore, even speakers of one language may need translation tools that facilitate efficient communication, for example, between representatives of different professional domains.

Methodology

Our approach is extensively based on unsupervised statistical machine learning techniques, including independent component analysis, self-organizing map, clustering, expectation maximization algorithm, compression and Bayesian methods. We also want to take carefully into account the underlying cognitive, linguistic and philosophical issues in order to avoid local minima in the technology development.

Activities

In 2006, we organized a Finnish-Swedish Machine Translation Challenge with our collaborators from the University of Helsinki. During autumn 2005 and early 2006 we organized a seminar on statistical machine translation.

Publications

Jaakko Väyrynen, Tero Tapiovaara, Kimmo Kettunen and Marcus Dobrinkat. Normalized compression distance as automatic MT evaluation metric. Submitted, 2009.
Timo Honkela, Sami Virpioja, and Jaakko Väyrynen. Adaptive translation: Finding interlingual mappings using self-organizing maps. In Vera Kurkova, Roman Neruda, and Jan Koutnik, editors, Proceedings of ICANN'08, volume 5163 of Lecture Notes in Computer Science, pages 603-612. Springer, 2008.
Marcus Dobrinkat. Domain adaptation in statistical machine translation systems via user feedback. Master's thesis, Helsinki University of Technology, Department of Information and Computer Science, Espoo, Finland, December 2008.
David Ellis, Mathias Creutz, Timo Honkela, and Mikko Kurimo. Speech to speech machine translation: Biblical chatter from Finnish to English. In Proceedings of the IJCNLP-08 Workshop on NLP for Less Privileged Languages, pages 123-130, Hyderabad, India, January 2008. Asian Federation of Natural Language Processing.
Sami Virpioja, Jaakko J. Väyrynen, Mathias Creutz, and Markus Sadeniemi. Morphology-aware statistical machine translation based on morphs induced in an unsupervised manner. In Proceedings of the Machine Translation Summit XI, pages 491-498, September 2007.
Kettunen, K., Sadeniemi, M. Lindh-Knuutila, T. and Honkela, T. Analysis of EU Languages Through Text Compression. Proceedings of FinTAL 2006, pp. 99-109.
Lindh-Knuutila, T., Honkela, T. and Lagus, K. Simulating Meaning Negotiation using Observational Language Games. Proceedings of the Third International Symposium on the Emergence and Evolution of Linguistic Communication. Rome, Italy, September, 2006, pp. 168-179.

Please, see also the full list of publications.

Learning to Translate

Within-language translation

Methodology

Activities

Publications

Links