Learning Phrase Representations using RNN Encoder- …

Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1724 1734,October 25-29, 2014, Doha, 2014 Association for Computational LinguisticsLearning Phrase Representations using RNN Encoder Decoderfor statistical machine TranslationKyunghyun ChoBart van Merri enboer Caglar GulcehreUniversit e de Montr BahdanauJacobs University, Bougares Holger SchwenkUniversit e du Maine, BengioUniversit e de Montr eal, CIFAR Senior this paper, we propose a novel neu-ral network model called RNN Encoder Decoder that consists of two recurrentneural networks (RNN).

One RNN en-codes a sequence of symbols into a fixed-length vector representation, and the otherdecodes the representation into another se-quence of symbols. The encoder and de-coder of the proposed model are jointlytrained to maximize the conditional prob-ability of a target sequence given a sourcesequence. The performance of a statisti-cal machine translation system is empiri-cally found to improve by using the con-ditional probabilities of Phrase pairs com-puted by the RNN Encoder Decoder as anadditional feature in the existing log-linearmodel.

Qualitatively, we show that theproposed model learns a semantically andsyntactically meaningful representation oflinguistic IntroductionDeep neural networks have shown great success invarious applications such as objection recognition(see, , (Krizhevsky et al., 2012)) and speechrecognition (see, , (Dahl et al., 2012)). Fur-thermore, many recent works showed that neu-ral networks can be successfully used in a num-ber of tasks in natural language processing (NLP).These include, but are not limited to, languagemodeling (Bengio et al.)

, 2003), paraphrase detec-tion (Socher et al., 2011) and word embedding ex-traction (Mikolov et al., 2013). In the field of sta-tistical machine translation (SMT), deep neuralnetworks have begun to show promising results.(Schwenk, 2012) summarizes a successful usageof feedforward neural networks in the frameworkof Phrase -based SMT this line of research on using neural net-works for SMT, this paper focuses on a novel neu-ral network architecture that can be used as a partof the conventional Phrase -based SMT proposed neural network architecture, whichwe will refer to as anRNN Encoder Decoder, con-sists of two recurrent neural networks (RNN) thatact as an encoder and a decoder pair.

The en-coder maps a variable-length source sequence to afixed-length vector, and the decoder maps the vec-tor representation back to a variable-length targetsequence. The two networks are trained jointly tomaximize the conditional probability of the targetsequence given a source sequence. Additionally,we propose to use a rather sophisticated hiddenunit in order to improve both the memory capacityand the ease of proposed RNN Encoder Decoder with anovel hidden unit is empirically evaluated on thetask of translating from English to French.

Wetrain the model to learn the translation probabil-ity of an English Phrase to a corresponding Frenchphrase. The model is then used as a part of a stan-dard Phrase -based SMT system by scoring eachphrase pair in the Phrase table. The empirical eval-uation reveals that this approach of scoring phrasepairs with an RNN Encoder Decoder improvesthe translation qualitatively analyze the trained RNNE ncoder Decoder by comparing its Phrase scoreswith those given by the existing translation qualitative analysis shows that the RNNE ncoder Decoder is better at capturing the lin-guistic regularities in the Phrase table, indirectlyexplaining the quantitative improvements in theoverall translation performance.

The further anal-ysis of the model reveals that the RNN Encoder Decoder learns a continuous space representationof a Phrase that preserves both the semantic andsyntactic structure of the RNN Encoder Preliminary: Recurrent Neural NetworksA recurrent neural network (RNN) is a neural net-work that consists of a hidden statehand anoptional outputywhich operates on a variable-length sequencex= (x1,..,xT). At each timestept, the hidden stateh t of the RNN is updatedbyh t =f(h t 1 ,xt),(1)wherefis a non-linear activation be as simple as an element-wise logistic sigmoid function and as com-plex as a long short-term memory (LSTM)unit (Hochreiter and Schmidhuber, 1997).

An RNN can learn a probability distributionover a sequence by being trained to predict thenext symbol in a sequence. In that case, the outputat each timesteptis the conditional distributionp(xt|xt 1,..,x1). For example, a multinomialdistribution (1-of-Kcoding) can be output using asoftmax activation functionp(xt,j= 1|xt 1,..,x1) =exp(wjh t ) Kj =1exp(wj h t ),(2)for all possible symbolsj= 1,..,K, wherewjare the rows of a weight matrixW. By combiningthese probabilities, we can compute the probabil-ity of the sequencexusingp(x) =T t=1p(xt|xt 1.)

,x1).(3)From this learned distribution, it is straightfor-ward to sample a new sequence by iteratively sam-pling a symbol at each time RNN Encoder DecoderIn this paper, we propose a novel neural networkarchitecture that learns toencodea variable-lengthsequence into a fixed-length vector representationand todecodea given fixed-length vector rep-resentation back into a variable-length a probabilistic perspective, this new modelis a general method to learn the conditional dis-tribution over a variable-length sequence condi-tioned on yet another variable-length sequence, (y1.

,yT |x1,..,xT), where onex1x2xTyT'y2y1cDecoderEncoderFigure 1: An illustration of the proposed RNNE ncoder note that the input and output sequencelengthsTandT may encoder is an RNN that reads each symbolof an input sequencexsequentially. As it readseach symbol, the hidden state of the RNN changesaccording to Eq. (1). After reading the end ofthe sequence (marked by an end-of-sequence sym-bol), the hidden state of the RNN is a summarycof the whole input decoder of the proposed model is anotherRNN which is trained togeneratethe output se-quence by predicting the next symbolytgiven thehidden stateh t.

Learning Phrase Representations using RNN Encoder- …

Tags:

Information

Advertisement

Transcription of Learning Phrase Representations using RNN Encoder- …

Related search queries

Learning Phrase Representations using RNN Encoder- …

Tags:

Information

Advertisement

Documents from same domain

Related documents

Related search queries