Transcription of Self-supervised Learning
{{id}} {{{paragraph}}}
Self-supervised LearningHung-yi Lee (Embeddingsfrom Language Models)BERT(Bidirectional Encoder Representations from Transformers)ERNIE (Enhanced Representation through Knowledge Integration)Big Bird: Transformers for Longer SequencesSource of image: Hoover340M parametersBERTGPT-2T5 GPT-3 ELMoSource: of image: (94M)BERT (340M)GPT-2 (1542M)The models become larger and larger ..Megatron (8B)GPT-2T5 (11B)TuringNLG (17B)The models become larger and larger ..GPT-3 is 10times larger than Turing (340M) GPT-3 (175B)BERTGPT-3 Transformer( )Outline BERT seriesGPT seriesSelf- supervised LearningSupervised labelModel ModelSelf- supervised Masking InputBERT MASKR andom(special token) Transformer EncoderLinear (all characters)==orRandomly masking some tokens ..softmaxMasking InputBERT MASKR andom(special token) Transformer EncoderLinear==orRandomly masking some tokens.
•Corpus of Linguistic Acceptability (CoLA) •Stanford Sentiment Treebank (SST-2) •Microsoft Research Paraphrase Corpus (MRPC) •Quora Question Pairs (QQP) ... Sentiment analysis Random initialization Init by pre-train This is the model to be learned. this is good
Domain:
Source:
Link to this page:
Please notify us if you found a problem with this document:
{{id}} {{{paragraph}}}