Course 4

Data-driven and Hybrid Approaches to Machine Translation
4 - 5.30pm


Martin Forst (Powerset) and Alexander Fraser (IMS)

Abstract:

In a globalized world, machine translation is a hot topic. Manufacturers and software companies needs translations of technical manuals into the languages of their international customers, armies and intelligence services need translations for the foreign documents that they find, and often users of the World Wide Web would like to know more about documents in a foreign language.
Recently, the dominant approach to machine translation (MT) has become statistical machine translation (SMT), in which strings in a source language are directly mapped to strings in a target language with the aid of statistical models, and without the use of deep linguistic analysis. Earlier approaches to MT have been playing a less important role than in the past. One such approach is the transfer-based approach, in which a source text is analyzed into an abstract representation, this representation is transferred into a target language abstract representation and target language text is then generated. Lately, however, researchers from the SMT community have acknowledged the importance of morphosyntax (and other levels of analysis) for translation and begun to integrate linguistic analysis into their systems. Conversely, researchers coming from the transfer-based tradition have integrated ideas from SMT into their systems.
The course will present the topic of machine translations from the two perspectives of SMT and transfer-based MT. In the first week, SMT will be introduced, and we will provide enough theoretical and practical background that students can build their own statistical machine translation systems based on the freely available Moses toolkit. In the second week, we will present the ideas and technology underlying "Grammatical Machine Translation", which extends a transfer-based machine translation approach using ideas from statistical machine translation. Because the grammar development and processing platform XLE plays a central role in this approach, it is recommended that students who take Course 4 also take Course 1.

Literature:

Chris Callison-Burch, Cameron Fordyce, Philipp Koehn, Christof Monz, and Josh Schroeder. (Meta-) evaluation of machine translation . In Proceedings of the ACL 2007 Second Workshop on Statistical Machine Translation, Prague, Czech Republic, 2007.

Jaime Carbonell, Steve Klein, David Miller, Michael Steinbaum, Tomer Grassiany, and Jochen Frei. Context-based Machine Translation . In Proceedings of the 7th Conference of the Association for Machine Translation in the Americas, "Visions for the Future of Machine Translation" , Cambridge, MA, 2006.

Michael Collins, Philipp Koehn, and Ivona Kucerova. Clause restructuring for statistical machine translation . In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics, pages 531-540, Ann Arbor, MI, 2005.

Alexander Fraser and Daniel Marcu. Measuring word alignment quality for statistical machine translation . In Computational Linguistics, 33(3):293-303, 2007.

Kevin Knight. A statistical machine translation tutorial workbook , 1999.

Philipp Koehn, Franz J. Och, and Daniel Marcu. Statistical phrase-based translation . In Proceedings of the Human Language Technology and North American Association for Computational Linguistics Conference, pages 127-133, Edmonton, Canada, 2003.

Adam Lopez. Statistical Machine Translation . In ACM Computing Surveys 40(3), Article 8, pages 1-49, 2008.

Kishore A. Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. BLEU: a method for automatic evaluation of machine translation . Technical Report RC22176 (W0109-022), IBM Research Division, Thomas J. Watson Research Center, Yorktown Heights, NY, 2001.

Stefan Riezler and John Maxwell. Grammatical Machine Translation . In Proceedings of Human Language Technology conference - North American chapter of the Association for Computational Linguistics annual meeting (HLT-NAACL'06) , New York, NY.