Example: confidence

Sentence-BERT: Sentence Embeddings using Siamese BERT …

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processingand the 9th International Joint Conference on Natural Language Processing, pages 3982 3992,Hong Kong, China, November 3 7, 2019 Association for Computational Linguistics3982 Sentence -BERT: Sentence Embeddings using Siamese BERT-NetworksNils Reimers and Iryna GurevychUbiquitous Knowledge Processing Lab (UKP-TUDA)Department of Computer Science, Technische Universit at (Devlin et al., 2018) and RoBERTa (Liuet al.)

rive sentence embeddings from BERT. To bypass this limitations, researchers passed single sen-tences through BERT and then derive a fixed sized vector by either averaging the outputs (similar to average word embeddings) or by using the output of the special CLS token (for example:May et al. (2019);Zhang et al. Qiao et al. )).

Tags:

  Embedding

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of Sentence-BERT: Sentence Embeddings using Siamese BERT …

1 Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processingand the 9th International Joint Conference on Natural Language Processing, pages 3982 3992,Hong Kong, China, November 3 7, 2019 Association for Computational Linguistics3982 Sentence -BERT: Sentence Embeddings using Siamese BERT-NetworksNils Reimers and Iryna GurevychUbiquitous Knowledge Processing Lab (UKP-TUDA)Department of Computer Science, Technische Universit at (Devlin et al., 2018) and RoBERTa (Liuet al.)

2 , 2019) has set a new state-of-the-artperformance on Sentence -pair regression taskslike semantic textual similarity (STS). How-ever, it requires that both sentences are fedinto the network, which causes a massive com-putational overhead: Finding the most sim-ilar pair in a collection of 10,000 sentencesrequires about 50 million inference computa-tions (~65 hours) with BERT. The constructionof BERT makes it unsuitable for semantic sim-ilarity search as well as for unsupervised taskslike this publication, we present Sentence -BERT(SBERT), a modification of the pretrainedBERT network that use Siamese and triplet net-work structures to derive semantically mean-ingful Sentence Embeddings that can be com-pared using cosine-similarity.

3 This reduces theeffort for finding the most similar pair from 65hours with BERT / RoBERTa to about 5 sec-onds with SBERT, while maintaining the ac-curacy from evaluate SBERT and SRoBERTa on com-mon STS tasks and transfer learning tasks,where it outperforms other state-of-the-artsentence Embeddings IntroductionIn this publication, we present Sentence -BERT(SBERT), a modification of the BERT network us-ing Siamese and triplet networks that is able toderive semantically meaningful Sentence embed-dings2.

4 This enables BERT to be used for certainnew tasks, which up-to-now were not applicablefor BERT. These tasks include large-scale seman-1 Code available: meaningfulwe mean that semanticallysimilar sentences are close in vector similarity comparison, clustering, and informa-tion retrieval via semantic set new state-of-the-art performance onvarious Sentence classification and Sentence -pairregression tasks. BERT uses a cross-encoder: Twosentences are passed to the transformer networkand the target value is predicted.

5 However, thissetup is unsuitable for various pair regression tasksdue to too many possible combinations. Findingin a collection ofn= 10 000sentences the pairwith the highest similarity requires with BERTn (n 1)/2 = 49 995 000inference a modern V100 GPU, this requires about 65hours. Similar, finding which of the over 40 mil-lion existent questions of Quora is the most similarfor a new question could be modeled as a pair-wisecomparison with BERT, however, answering a sin-gle query would require over 50 common method to address clustering and se-mantic search is to map each Sentence to a vec-tor space such that semantically similar sentencesare close.

6 Researchers have started to input indi-vidual sentences into BERT and to derive fixed-size Sentence Embeddings . The most commonlyused approach is to average the BERT output layer(known as BERT Embeddings ) or by using the out-put of the first token (the[CLS]token). As wewill show, this common practice yields rather badsentence Embeddings , often worse than averagingGloVe Embeddings (Pennington et al., 2014).To alleviate this issue, we developed Siamese network architecture enables thatfixed-sized vectors for input sentences can be de-rived.

7 using a similarity measure like cosine-similarity or Manhatten / Euclidean distance, se-mantically similar sentences can be found. Thesesimilarity measures can be performed extremelyefficient on modern hardware, allowing SBERTto be used for semantic similarity search as wellas for clustering. The complexity for finding the3983most similar Sentence pair in a collection of 10,000sentences is reduced from 65 hours with BERT tothe computation of 10,000 Sentence Embeddings (~5 seconds with SBERT) and computing cosine-similarity (~ seconds).

8 By using optimizedindex structures, finding the most similar Quoraquestion can be reduced from 50 hours to a fewmilliseconds (Johnson et al., 2017).We fine-tune SBERT on NLI data, which cre-ates Sentence Embeddings that significantly out-perform other state-of-the-art Sentence embeddingmethods like InferSent (Conneau et al., 2017) andUniversal Sentence Encoder (Cer et al., 2018). Onseven Semantic Textual Similarity (STS) tasks,SBERT achieves an improvement of pointscompared to InferSent and points compared toUniversal Sentence Encoder.

9 On SentEval (Con-neau and Kiela, 2018), an evaluation toolkit forsentence Embeddings , we achieve an improvementof and points, can be adapted to a specific task. Itsets new state-of-the-art performance on a chal-lenging argument similarity dataset (Misra et al.,2016) and on a triplet dataset to distinguish sen-tences from different sections of a Wikipedia arti-cle (Dor et al., 2018).The paper is structured in the following way:Section 3 presents SBERT, section 4 evaluatesSBERT on common STS tasks and on the chal-lenging Argument Facet Similarity (AFS) corpus(Misra et al.)

10 , 2016). Section 5 evaluates SBERTon SentEval. In section 6, we perform an ablationstudy to test some design aspect of SBERT. In sec-tion 7, we compare the computational efficiency ofSBERT Sentence Embeddings in contrast to otherstate-of-the-art Sentence embedding Related WorkWe first introduce BERT, then, we discuss state-of-the-art Sentence embedding (Devlin et al., 2018) is a pre-trainedtransformer network (Vaswani et al., 2017), whichset for various NLP tasks new state-of-the-art re-sults, including question answering, Sentence clas-sification, and Sentence -pair regression.


Related search queries