Learning to Translate
Reaching good quality machine translation (MT) is difficult and the
development of a traditional MT system requires a lot of human
effort. However, the availability of large corpora makes it possible
to use various probabilistic and statistical methods to let the system
generate the necessary resources automatically.
Within-language translation
In addition to the traditional translation from one language
to another, we develop methods that enable translations
or interpretations within one language.
The underpinning idea is the fact that contextual,
experiential and/or disciplinary
diversity impede interpersonal communication and understanding.
Therefore, even speakers of one language may need translation tools
that facilitate efficient communication, for example, between representatives
of different professional domains.
Methodology
Our approach is extensively based on unsupervised statistical machine
learning techniques, including independent component analysis,
self-organizing map, clustering, expectation maximization algorithm,
compression and Bayesian methods. We also want to take carefully into
account the underlying cognitive, linguistic and philosophical issues
in order to avoid local minima in the technology development.
Activities
In 2006, we organized a
Finnish-Swedish Machine Translation Challenge with our collaborators from
the University of Helsinki.
During autumn 2005 and early 2006 we organized a seminar
on
statistical machine translation.
Publications
- Jaakko Väyrynen, Tero Tapiovaara, Kimmo Kettunen and Marcus
Dobrinkat. Normalized compression distance as automatic MT
evaluation metric. Submitted, 2009.
- Timo Honkela, Sami Virpioja, and Jaakko Väyrynen. Adaptive translation: Finding interlingual mappings using self-organizing maps. In Vera Kurkova, Roman Neruda, and Jan Koutnik, editors, Proceedings of ICANN'08, volume 5163 of Lecture Notes in Computer Science, pages 603-612. Springer, 2008.
- Marcus Dobrinkat. Domain adaptation in statistical machine translation systems via user feedback. Master's thesis, Helsinki University of Technology, Department of Information and Computer Science, Espoo, Finland, December 2008.
- David Ellis, Mathias Creutz, Timo Honkela, and Mikko Kurimo. Speech to speech machine translation: Biblical chatter from Finnish to English. In Proceedings of the IJCNLP-08 Workshop on NLP for Less Privileged Languages, pages 123-130, Hyderabad, India, January 2008. Asian Federation of Natural Language Processing.
- Sami Virpioja, Jaakko J. Väyrynen, Mathias Creutz, and Markus Sadeniemi. Morphology-aware statistical machine translation based on morphs induced in an unsupervised manner. In Proceedings of the Machine Translation Summit XI, pages 491-498, September 2007.
- Kettunen, K., Sadeniemi, M. Lindh-Knuutila, T. and Honkela, T. Analysis of EU Languages Through Text Compression. Proceedings of FinTAL 2006, pp. 99-109.
- Lindh-Knuutila, T., Honkela, T. and Lagus, K. Simulating Meaning Negotiation using Observational Language Games. Proceedings of the Third International Symposium on the Emergence and Evolution of Linguistic Communication. Rome, Italy, September, 2006, pp. 168-179.
Please, see also the full list of publications.
Links
Page maintained by timo.honkela at hut.fi,
last updated Friday, 28-Aug-2009 14:01:34 EEST