Example: barber

Google’s Multilingual Neural Machine Translation …

Google s Multilingual Neural Machine Translation System:Enabling Zero-Shot TranslationMelvin Johnson, Mike Schuster, Quoc V. Le, Maxim Krikun, Yonghui Wu,Zhifeng Chen, Nikhil Vi gas, Martin Wattenberg, Greg Corrado,Macduff Hughes, Jeffrey DeanAbstractWe propose a simple, elegant solution to use a single Neural Machine Translation (NMT) modelto translate between multiple languages. Our solution requires no change in the model architecturefrom our base system but instead introduces an artificial token at the beginning of the input sentenceto specify the required target language.

Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation MelvinJohnson,MikeSchuster,QuocV.Le,MaximKrikun,YonghuiWu,

Tags:

  Machine, Host, Translation, Multilingual, Neural, S multilingual neural machine translation

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of Google’s Multilingual Neural Machine Translation …

1 Google s Multilingual Neural Machine Translation System:Enabling Zero-Shot TranslationMelvin Johnson, Mike Schuster, Quoc V. Le, Maxim Krikun, Yonghui Wu,Zhifeng Chen, Nikhil Vi gas, Martin Wattenberg, Greg Corrado,Macduff Hughes, Jeffrey DeanAbstractWe propose a simple, elegant solution to use a single Neural Machine Translation (NMT) modelto translate between multiple languages. Our solution requires no change in the model architecturefrom our base system but instead introduces an artificial token at the beginning of the input sentenceto specify the required target language.

2 The rest of the model, which includes encoder, decoder andattention, remains unchanged and is shared across all languages. Using a shared wordpiece vocabulary,our approach enables Multilingual NMT using a single model without any increase in parameters, whichis significantly simpler than previous proposals for Multilingual NMT. Our method often improvesthe Translation quality of all involved language pairs, even while keeping the total number of modelparameters constant. On the WMT 14 benchmarks, a single Multilingual model achieves comparableperformance for English French and surpasses state-of-the-art results for English German.

3 Similarly, asingle Multilingual model surpasses state-of-the-art results for French English and German Englishon WMT 14 and WMT 15 benchmarks respectively. On production corpora, Multilingual models of upto twelve language pairs allow for better Translation of many individual pairs. In addition to improvingthe Translation quality of language pairs that the model was trained with, our models can also learnto perform implicit bridging between language pairs never seen explicitly during training, showing thattransfer learning and zero-shot Translation is possible for Neural Translation .

4 Finally, we show analysesthat hints at a universal interlingua representation in our models and show some interesting exampleswhen mixing IntroductionNeural Machine Translation (NMT) [22,2,5] is an end-to-end approach to Machine Translation that hasrapidly gained adoption in many large-scale settings [24]. Almost all such systems are built for a singlelanguage pair so far there has not been a sufficiently simple and efficient way to handle multiple languagepairs using a single model without making significant changes to the basic NMT this paper we introduce a simple method to translate between multiple languages using a single model,taking advantage of Multilingual data to improve NMT for all languages involved.

5 Our method requires nochange to the traditional NMT model architecture. Instead, we add an artificial token to the input sequenceto indicate the required target language. All other parts of the system as described in [24] encoder, decoder,attention, and shared wordpiece vocabulary stay exactly the same. We call our system Multilingual GNMT since it is an extension of [24]. This method has several attractive benefits: Simplicity: Since no changes are made to the architecture of the model, scaling to more languages istrivial any new data is simply added, possibly with over- or under-sampling such that all languagesare appropriately represented, and used with a new token if the target language changes.

6 This alsosimplifies production deployment since it can cut down the total number of models necessary whendealing with multiple languages. Note that at Google, we support a total of over 100 languages as sourceand target, so theoretically1002models would be necessary for the best possible translations between1 [ ] 14 Nov 2016all pairs, if each model could only support a single language pair. Clearly this would be problematic ina production environment. Low-resource language improvements: In a Multilingual NMT model, all parameters are implicitlyshared by all the language pairs being modeled.

7 This forces the model to generalize across languageboundaries during training. It is observed that when language pairs with little available data andlanguage pairs with abundant data are mixed into a single model, Translation quality on the low resourcelanguage pair is significantly improved. Zero-shot Translation : A surprising benefit of modeling several language pairs in a single modelis that the model implicitly learns to translate between language pairs it has never seen (zero-shottranslation) a working example of transfer learning within Neural Translation models.

8 For example,a Multilingual NMT model trained with Portuguese English and English Spanish examples cangenerate reasonable translations for Portuguese Spanish although it has not seen any data for thatlanguage pair. We show that the quality of zero-shot language pairs can easily be improved with littleadditional data of the language pair in the remaining sections of this paper we first discuss related work and explain our Multilingual systemarchitecture in more detail. Then, we go through the different ways of merging languages on the source andtarget side in increasing difficulty (many-to-one, one-to-many, many-to-many), and discuss the results of anumber of experiments on WMT benchmarks, as well as on some of Google s production datasets.

9 We presentresults from transfer learning experiments and show how implicitly-learned bridging (zero-shot Translation )performs in comparison to explicit bridging ( , first translating to a common language like English andthen translating from that common language into the desired target language) as typically used in machinetranslation systems. We describe visualizations of the new system in action, which provide early evidenceof shared semantic representations (interlingua) between languages. Finally we also show some interestingapplications of mixing languages with examples: code-switching on the source side and weighted targetlanguage mixing, and suggest possible avenues for further Related WorkInterlingual Translation is a classic method in Machine Translation [16,10].

10 Despite its distinguished history,most practical applications of Machine Translation have focused on individual language pairs because it wassimply too difficult to build a single system that translates reliably from and to several Machine Translation [22,2,5] is a promising end-to-end learning approach to Machine translationwhich quickly was extended to Multilingual Machine Translation in various ways. One early attempt is thework in [12] which proposed Multilingual training in a multitask learning setting. Their model is a basicencoder-decoder network for Multilingual NMT, in this case without an attention mechanism.


Related search queries