Abstract - arXiv

A Convolutional Neural Network for modelling SentencesNal KalchbrennerEdward Grefenstette{ , , of Computer ScienceUniversity of OxfordPhil BlunsomAbstractThe ability to accurately represent sen-tences is central to language understand-ing. We describe a convolutional architec-ture dubbed the Dynamic ConvolutionalNeural Network (DCNN) that we adoptfor the semantic modelling of network uses Dynamick-Max Pool-ing, a global pooling operation over lin-ear sequences. The network handles inputsentences of varying length and inducesa feature graph over the sentence that iscapable of explicitly capturing short andlong-range relations. The network doesnot rely on a parse tree and is easily ap-plicable to any test theDCNN in four experiments: small scalebinary and multi-class sentiment predic-tion, six-way question classification andTwitter sentiment prediction by distant su-pervision.}

The network achieves excellentperformance in the first three tasks and agreater than25%error reduction in the lasttask with respect to the strongest IntroductionThe aim of a sentence model is to analyse andrepresent the semantic content of a sentence forpurposes of classification or generation. The sen-tence modelling problem is at the core of manytasks involving a degree of natural language com-prehension. These tasks include sentiment analy-sis, paraphrase detection, entailment recognition,summarisation, discourse analysis, machine trans-lation, grounded language learning and image re-trieval. Since individual sentences are rarely ob-served or not observed at all, one must representa sentence in terms of features that depend on thewords and shortn-grams in the sentence that arefrequently observed. The core of a sentence modelinvolves a feature function that defines the process The cat sat on the red mat The cat sat on the red matFigure 1: Subgraph of a feature graph inducedover an input sentence in a Dynamic Convolu-tional Neural Network.

The full induced graphhas multiple subgraphs of this kind with a distinctset of edges; subgraphs may merge at differentlayers. The left diagram emphasises the poolednodes. The width of the convolutional filters is 3and 2 respectively. With dynamic pooling, a fil-ter with small width at the higher layers can relatephrases far apart in the input which the features of the sentence are extractedfrom the features of the words types of models of meaning have beenproposed. Composition based methods have beenapplied to vector representations of word meaningobtained from co-occurrence statistics to obtainvectors for longer phrases. In some cases, com-position is defined by algebraic operations overword meaning vectors to produce sentence mean-ing vectors (Erk and Pad o, 2008; Mitchell andLapata, 2008; Mitchell and Lapata, 2010; Tur-ney, 2012; Erk, 2012; Clarke, 2012).

In othercases, a composition function is learned and ei-ther tied to particular syntactic relations (Guevara,2010; Zanzotto et al., 2010) or to particular wordtypes (Baroni and Zamparelli, 2010; Coecke etal., 2010; Grefenstette and Sadrzadeh, 2011; Kart-saklis and Sadrzadeh, 2013; Grefenstette, 2013).Another approach represents the meaning of sen-tences by way of automatically extracted logicalforms (Zettlemoyer and Collins, 2005). [ ] 8 Apr 2014A central class of models are those based onneural networks. These range from basic neu-ral bag-of-words or bag-of-n-grams models to themore structured recursive neural networks andto time-delay neural networks based on convo-lutional operations (Collobert and Weston, 2008;Socher et al., 2011; Kalchbrenner and Blunsom,2013b).Neural sentence models have a num-ber of advantages.

They can be trained to obtaingeneric vectors for words and phrases by predict-ing, for instance, the contexts in which the wordsand phrases occur. Through supervised training,neural sentence models can fine-tune these vec-tors to information that is specific to a certaintask. Besides comprising powerful classifiers aspart of their architecture, neural sentence modelscan be used to condition a neural language modelto generate sentences word by word (Schwenk,2012; Mikolov and Zweig, 2012; Kalchbrennerand Blunsom, 2013a).We define a convolutional neural network archi-tecture and apply it to the semantic modelling ofsentences. The network handles input sequencesof varying length. The layers in the network in-terleave one-dimensional convolutional layers anddynamick-max pooling layers. Dynamick-maxpooling is a generalisation of the max pooling op-erator.

The max pooling operator is a non-linearsubsampling function that returns the maximumof a set of values (LeCun et al., 1998). The op-erator is generalised in two respects. First,k-max pooling over a linear sequence of values re-turns the subsequence ofkmaximum values in thesequence, instead of the single maximum , the pooling parameterkcan be dynam-ically chosen by makingka function of other as-pects of the network or the filters across each row of features inthe sentence matrix. Convolving the same filterwith then-gram at every position in the sentenceallows the features to be extracted independentlyof their position in the sentence. A convolutionallayer followed by a dynamic pooling layer anda non-linearity form a feature map. Like in theconvolutional networks for object recognition(LeCun et al., 1998), we enrich the representationin the first layer by computing multiple featuremaps with different filters applied to the inputsentence.

Subsequent layers also have multiplefeature maps computed by convolving filters withall the maps from the layer below. The weights atthese layers form an order-4 tensor. The resultingarchitecture is dubbed a Dynamic ConvolutionalNeural layers of convolutional and dynamicpooling operations induce a structured featuregraph over the input sentence. Figure 1 illustratessuch a graph. Small filters at higher layers can cap-ture syntactic or semantic relations between non-continuous phrases that are far apart in the inputsentence. The feature graph induces a hierarchicalstructure somewhat akin to that in a syntactic parsetree. The structure is not tied to purely syntacticrelations and is internal to the neural experiment with the network in four set-tings. The first two experiments involve predict-ing the sentiment of movie reviews (Socher etal.)

, 2013b). The network outperforms other ap-proaches in both the binary and the multi-class ex-periments. The third experiment involves the cat-egorisation of questions in six question types inthe TREC dataset (Li and Roth, 2002). The net-work matches the accuracy of other state-of-the-art methods that are based on large sets of en-gineered features and hand-coded knowledge re-sources. The fourth experiment involves predict-ing the sentiment of Twitter posts using distant su-pervision (Go et al., 2009). The network is trainedon million tweets labelled automatically ac-cording to the emoticon that occurs in them. Onthe hand-labelled test set, the network achieves agreater than25%reduction in the prediction errorwith respect to the strongest unigram and bigrambaseline reported in Go et al. (2009).The outline of the paper is as follows. Section 2describes the background to the DCNN includingcentral concepts and related neural sentence mod-els.

Section 3 defines the relevant operators andthe layers of the network. Section 4 treats of theinduced feature graph and other properties of thenetwork. Section 5 discusses the experiments andinspects the learnt feature BackgroundThe layers of the DCNN are formed by a convo-lution operation followed by a pooling begin with a review of related neural sentencemodels. Then we describe the operation ofone-dimensional convolutionand the classical Time-Delay Neural Network (TDNN) (Hinton, 1989;Waibel et al., 1990). By adding a max pooling1 Code available to the network, the TDNN can be adopted asa sentence model (Collobert and Weston, 2008). Related Neural Sentence ModelsVarious neural sentence models have been de-scribed. A general class of basic sentence modelsis that of Neural Bag-of-Words (NBoW) generally consist of a projection layer thatmaps words, sub-word units orn-grams to highdimensional embeddings; the latter are then com-bined component-wise with an operation such assummation.

The resulting combined vector is clas-sified through one or more fully connected model that adopts a more general structureprovided by an external parse tree is the RecursiveNeural Network (RecNN) (Pollack, 1990; K uchlerand Goller, 1996; Socher et al., 2011; Hermannand Blunsom, 2013). At every node in the tree thecontexts at the left and right children of the nodeare combined by a classical layer. The weights ofthe layer are shared across all nodes in the layer computed at the top node gives a repre-sentation for the sentence. The Recurrent NeuralNetwork (RNN) is a special case of the recursivenetwork where the structure that is followed is asimple linear chain (Gers and Schmidhuber, 2001;Mikolov et al., 2011). The RNN is primarily usedas a language model, but may also be viewed as asentence model with a linear structure. The layercomputed at the last word represents the , a further class of neural sentence mod-els is based on the convolution operation and theTDNN architecture (Collobert and Weston, 2008;Kalchbrenner and Blunsom, 2013b).

Abstract - arXiv

Tags:

Information

Advertisement

Transcription of Abstract - arXiv

Related search queries

Abstract - arXiv

Tags:

Information

Advertisement

Documents from same domain

Related documents

Related search queries