arXiv:1704.00051v2 [cs.CL] 28 Apr 2017

Reading Wikipedia to Answer Open-Domain Questions Danqi Chen Adam Fisch, Jason weston & Antoine Bordes Computer Science Facebook AI Research Stanford University 770 Broadway Stanford, CA 94305, USA New York, NY 10003, USA. Abstract 2016), Wikipedia contains up-to-date knowledge that humans are interested in. It is designed, how- This paper proposes to tackle open- ever, for humans not machines to read. [ ] 28 Apr 2017 . domain question answering using Using Wikipedia articles as the knowledge Wikipedia as the unique knowledge source causes the task of question answering (QA). source: the answer to any factoid question to combine the challenges of both large-scale is a text span in a Wikipedia article. open-domain QA and of machine comprehension This task of machine reading at scale of text. In order to answer any question, one must combines the challenges of document re- first retrieve the few relevant articles among more trieval (finding the relevant articles) with than 5 million items, and then scan them care- that of machine comprehension of text fully to identify the answer.

We term this setting, (identifying the answer spans from those machine reading at scale (MRS). Our work treats articles). Our approach combines a search Wikipedia as a collection of articles and does not component based on bigram hashing rely on its internal graph structure. As a result, our and TF-IDF matching with a multi-layer approach is generic and could be switched to other recurrent neural network model trained to collections of documents, books, or even daily up- detect answers in Wikipedia paragraphs. dated newspapers. Our experiments on multiple existing QA Large-scale QA systems like IBM's DeepQA. datasets indicate that (1) both modules (Ferrucci et al., 2010) rely on multiple sources are highly competitive with respect to to answer: besides Wikipedia, it is also paired existing counterparts and (2) multitask with KBs, dictionaries, and even news articles, learning using distant supervision on books, etc.

As a result, such systems heavily rely their combination is an effective complete on information redundancy among the sources to system on this challenging task. answer correctly. Having a single knowledge source forces the model to be very precise while 1 Introduction searching for an answer as the evidence might This paper considers the problem of answering appear only once. This challenge thus encour- factoid questions in an open-domain setting us- ages research in the ability of a machine to read, ing Wikipedia as the unique knowledge source, a key motivation for the machine comprehen- such as one does when looking for answers in an sion subfield and the creation of datasets such encyclopedia. Wikipedia is a constantly evolv- as SQuAD (Rajpurkar et al., 2016), CNN/Daily ing source of detailed information that could fa- Mail (Hermann et al.)

, 2015) and CBT (Hill et al., cilitate intelligent machines if they are able to 2016). leverage its power. Unlike knowledge bases (KBs) However, those machine comprehension re- such as Freebase (Bollacker et al., 2008) or DB- sources typically assume that a short piece of rel- Pedia (Auer et al., 2007), which are easier for evant text is already identified and given to the computers to process but too sparsely populated model, which is not realistic for building an open- for open-domain question answering (Miller et al., domain QA system. In sharp contrast, methods . Most of this work was done while DC was with Face- that use KBs or information retrieval over docu- book AI Research. ments have to employ search as an integral part of the solution. Instead MRS is focused on simul- augmented neural networks (Bahdanau et al., taneously maintaining the challenge of machine 2015; weston et al.)

, 2015; Graves et al., 2014) and comprehension, which requires the deep under- release of new training and evaluation datasets like standing of text, while keeping the realistic con- QuizBowl (Iyyer et al., 2014), CNN/Daily Mail straint of searching over a large open resource. based on news articles (Hermann et al., 2015), In this paper, we show how multiple existing CBT based on children books (Hill et al., 2016), or QA datasets can be used to evaluate MRS by re- SQuAD (Rajpurkar et al., 2016) and WikiReading quiring an open-domain system to perform well on (Hewlett et al., 2016), both based on Wikipedia. all of them at once. We develop DrQA, a strong An objective of this paper is to test how such system for question answering from Wikipedia new methods can perform in an open-domain QA. composed of: (1) Document Retriever, a mod- framework.

Ule using bigram hashing and TF-IDF matching QA using Wikipedia as a resource has been ex- designed to, given a question, efficiently return plored previously. Ryu et al. (2014) perform open- a subset of relevant articles and (2) Document domain QA using a Wikipedia-based knowledge Reader, a multi-layer recurrent neural network model. They combine article content with multi- machine comprehension model trained to detect ple other answer matching modules based on dif- answer spans in those few returned documents. ferent types of semi-structured knowledge such Figure 1 gives an illustration of DrQA. as infoboxes, article structure, category structure, Our experiments show that Document Retriever and definitions. Similarly, Ahn et al. (2004) also outperforms the built-in Wikipedia search engine combine Wikipedia as a text resource with other and that Document Reader reaches state-of-the- resources, in this case with information retrieval art results on the very competitive SQuAD bench- over other documents.

Buscaldi and Rosso (2006). mark (Rajpurkar et al., 2016). Finally, our full sys- also mine knowledge from Wikipedia for QA. In- tem is evaluated using multiple benchmarks. In stead of using it as a resource for seeking answers particular, we show that performance is improved to questions, they focus on validating answers re- across all datasets through the use of multitask turned by their QA system, and use Wikipedia learning and distant supervision compared to sin- categories for determining a set of patterns that gle task training. should fit with the expected answer. In our work, 2 Related Work we consider the comprehension of text only, and use Wikipedia text documents as the sole resource Open-domain QA was originally defined as find- in order to emphasize the task of machine reading ing answers in collections of unstructured docu- at scale, as described in the introduction.

Ments, following the setting of the annual TREC. There are a number of highly developed full competitions1 . With the development of KBs, pipeline QA approaches using either the Web, as many recent innovations have occurred in the con- does QuASE (Sun et al., 2015), or Wikipedia as a text of QA from KBs with the creation of re- resource, as do Microsoft's AskMSR (Brill et al., sources like WebQuestions (Berant et al., 2013). 2002), IBM's DeepQA (Ferrucci et al., 2010) and and SimpleQuestions (Bordes et al., 2015) based . YodaQA (Baudi s, 2015; Baudi s and Sediv` y, 2015). on the Freebase KB (Bollacker et al., 2008), or on the latter of which is open source and hence automatically extracted KBs, , OpenIE triples reproducible for comparison purposes. AskMSR. and NELL (Fader et al., 2014). However, KBs is a search-engine based QA system that relies have inherent limitations (incompleteness, fixed on data redundancy rather than sophisticated lin- schemas) that motivated researchers to return to guistic analyses of either questions or candidate the original setting of answering from raw text.

Answers , , it does not focus on machine com- A second motivation to cast a fresh look at prehension, as we do. DeepQA is a very sophisti- this problem is that of machine comprehension of cated system that relies on both unstructured infor- text, , answering questions after reading a short mation including text documents as well as struc- text or story. That subfield has made consider- tured data such as KBs, databases and ontologies able progress recently thanks to new deep learning to generate candidate answers or vote over evi- architectures like attention-based and memory- dence. YodaQA is an open source system mod- 1. eled after DeepQA, similarly combining websites, Open-domain QA SQuAD, TREC, WebQuestions, WikiMovies Q: How many of Warsaw's inhabitants spoke Polish in 1933? Document Document Retriever Reader 833,500. Figure 1: An overview of our question answering system DrQA.

Information extraction, databases and Wikipedia rather than using a KB, with positive results. in particular. Our comprehension task is made more challenging by only using a single resource. 3 Our System: DrQA. Comparing against these methods provides a use- In the following we describe our system DrQA for ful datapoint for an upper bound benchmark on MRS which consists of two components: (1) the performance. Document Retriever module for finding relevant Multitask learning (Caruana, 1998) and task articles and (2) a machine comprehension model, transfer have a rich history in machine learning Document Reader, for extracting answers from a ( , using ImageNet in the computer vision com- single document or a small collection of docu- munity (Huh et al., 2016)), as well as in NLP ments. in particular (Collobert and weston , 2008). Sev- Document Retriever eral works have attempted to combine multiple QA training datasets via multitask learning to (i) Following classical QA systems, we use an effi- achieve improvement across the datasets via task cient (non-machine learning) document retrieval transfer; and (ii) provide a single general system system to first narrow our search space and focus capable of asking different kinds of questions due on reading only articles that are likely to be rel- to the inevitably different data distributions across evant.

arXiv:1704.00051v2 [cs.CL] 28 Apr 2017

Tags:

Information

Transcription of arXiv:1704.00051v2 [cs.CL] 28 Apr 2017

Related search queries

arXiv:1704.00051v2 [cs.CL] 28 Apr 2017

Tags:

Information

Documents from same domain

Related documents

Related search queries