PDF4PRO ⚡AMP

Modern search engine that looking for books and documents around the web

Example: tourism industry

arXiv:1901.02860v3 [cs.LG] 2 Jun 2019

Transformer-XL: Attentive Language ModelsBeyond a Fixed-Length ContextZihang Dai 12, Zhilin Yang 12, Yiming Yang1, Jaime Carbonell1,Quoc V. Le2, Ruslan Salakhutdinov11 Carnegie Mellon University,2 Google have a potential of learninglonger-term dependency, but are limited by afixed-length context in the setting of propose a novel neural ar-chitectureTransformer-XLthat enables learn-ing dependency beyond a fixed length with-out disrupting temporal coherence. It con-sists of a segment-level recurrence mechanismand a novel positional encoding scheme. Ourmethod not only enables capturing longer-termdependency, but also resolves the context frag-mentation problem. As a result, Transformer-XL learns dependency that is 80% longer thanRNNs and 450% longer than vanilla Trans-formers, achieves better performance on bothshort and long sequences, and is up to 1,800+times faster than vanilla Transformers duringevaluation.

short and long sequences, and is up to 1,800+ times faster than vanilla Transformers during evaluation. Notably, we improve the state-of-the-art results of bpc/perplexity to 0.99 on en-wiki8, 1.08 on text8, 18.3 on WikiText-103, 21.8 on One Billion Word, and 54.5 on Penn Treebank (without finetuning). When trained

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Spam in document Broken preview Other abuse

Transcription of arXiv:1901.02860v3 [cs.LG] 2 Jun 2019

Related search queries