PDF4PRO ⚡AMP

Modern search engine that looking for books and documents around the web

Example: marketing

BERT: Pre-training of Deep Bidirectional Transformers for ...

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding( Bidirectional Encoder Representations from Transformers )Jacob DevlinGoogle AI LanguagePre-training in NLP Word embeddings are the basis of deep learning for NLP Word embeddings (word2vec, GloVe) are often pre-trained on text corpus from co-occurrence statisticsking[ , , , ..]queen[ , , , ..]the king wore a crownInner Productthe queen wore a crownInner ProductContextual Representations Problem: Word embeddings are applied in a context free manner Solution: Train contextual representations on text corpus[ , , , ..]open a bank accounton the river bankopen a bank account[.]

E.g., 10x-100x bigger model trained for 100x-1,000x as many steps. Imagine it’s 2013: Well-tuned 2-layer, 512-dim LSTM sentiment analysis gets 80% accuracy, training for 8 hours. Pre-train LM on same architecture for a week, get 80.5%. Conference reviewers: “Who would do something so expensive for such a small gain?”

Tags:

  X100, Rebt

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Spam in document Broken preview Other abuse

Transcription of BERT: Pre-training of Deep Bidirectional Transformers for ...

Related search queries