Transcription of Attention Is All You Need - arXiv
{{id}} {{{paragraph}}}
Attention Is All You NeedAshish Vaswani Google Shazeer Google Parmar Google Uszkoreit Google Jones Google N. Gomez University of ukasz Kaiser Google Polosukhin dominant sequence transduction models are based on complex recurrent orconvolutional neural networks that include an encoder and a decoder. The bestperforming models also connect the encoder and decoder through an attentionmechanism. We propose a new simple network architecture, the Transformer,based solely on Attention mechanisms, dispensing with recurrence and convolutionsentirely. Experiments on two machine translation tasks show these models tobe superior in quality while being more parallelizable and requiring significantlyless time to train. Our model achieves BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, includingensembles, by over 2 BLEU.
Attention Is All You Need Ashish Vaswani Google Brain avaswani@google.com Noam Shazeer Google Brain noam@google.com Niki Parmar Google Research nikip@google.com
Domain:
Source:
Link to this page:
Please notify us if you found a problem with this document:
{{id}} {{{paragraph}}}