Recurrent Attention Network on Memory for Aspect …

Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 452 461 Copenhagen, Denmark, September 7 11, 2017 Association for Computational LinguisticsRecurrent Attention Network on Memory for Aspect sentiment AnalysisPeng Chen Zhongqian Sun Lidong Bing Wei YangAI LabTencent Inc.{patchen, sallensun, lyndonbing, propose a novel framework based onneural networks to identify the sentimentof opinion targets in a framework adopts multiple-attentionmechanism to capture sentiment featuresseparated by a long distance, so that itis more robust against irrelevant informa-tion. The results of multiple attentionsare non-linearly combined with a recur-rent neural Network , which strengthens theexpressive power of our model for han-dling more complications. The weighted- Memory mechanism not only helps usavoid the labor-intensive feature engineer-ing work, but also provides a tailor-madememory for different opinion targets of asentence.}

We examine the merit of ourmodel on four datasets: two are from Se-mEval2014, reviews of restaurants andlaptops; a twitter dataset, for testing itsperformance on social media data; and aChinese news comment dataset, for testingits language sensitivity. The experimentalresults show that our model consistentlyoutperforms the state-of-the-art methodson different types of IntroductionThe goal of Aspect sentiment analysis is to iden-tify the sentiment polarity ( , negative, neutral,or positive) of a specific opinion target expressedin a comment/review by a reviewer. For exam-ple, in I bought a mobile phone, its camera iswonderful but the battery life is short , there arethree opinion targets, camera , battery life , and mobile phone . The reviewer has a positive senti-ment on the camera , a negative sentiment on the Corresponding author. battery life , and a mixed sentiment on the mo-bile phone.

Sentence-oriented sentiment analysismethods (Socher et al., 2011; Appel et al., 2016)are not capable to capture such fine-grained senti-ments on opinion order to identify the sentiment of an individ-ual opinion target, one critical task is to model ap-propriate context features for the target in its orig-inal sentence. In simple cases, the sentiment ofa target is identifiable with a syntactically nearbyopinion word, wonderful for camera .However, there are many cases in which opinionwords are enclosed in more complicated , Its camera is not wonderful enough mightexpress a neutral sentiment on camera , but notnegative. Such complications usually hinder con-ventional approaches to Aspect sentiment model the sentiment of the above phrase-like word sequence ( not wonderful enough ),LSTM-based methods are proposed, such as targetdependent LSTM (TD-LSTM) (Tang et al., 2015).

TD-LSTM might suffer from the problem that af-ter it captures a sentiment feature far from thetarget, it needs to propagate the feature word byword to the target, in which case it s likely to losethis feature, such as the feature cost-effective for the phone in My overall feeling is that thephone, after using it for three months and consid-ering its price, is really cost-effective .1 Attentionmechanism, which has been successfully used inmachine translation (Bahdanau et al., 2014), canenforce a model to pay more Attention to the im-portant part of a sentence. There are already someworks using Attention in sentiment analysis to ex-ploit this advantage (Wang et al., 2016; Tang et al.,2016). Another observation is that some types of1 Although LSTM could keep information for a long dis-tance by preventing the vanishing gradient problem, it usuallyrequires a large training corpus to capture the flexible usageof structures are particularly challenging fortarget sentiment analysis .

For example, in Ex-cept Patrick, all other actors don t play well , theword except and the phrase don t play well produce a positive sentiment on Patrick . It shard to synthesize these features just by LSTM,since their positions are dispersed. Single atten-tion based methods ( (Wang et al., 2016)) arealso not capable to overcome such difficulty, be-cause attending multiple words with one attentionmay hide the characteristic of each attended this paper, we propose a novel frameworkto solve the above problems in target sentimentanalysis. Specifically, our framework first adoptsa bidirectional LSTM (BLSTM) to produce thememory ( the states of time steps generatedby LSTM) from the input, as bidirectional recur-rent neural networks (RNNs) were found effec-tive for a similar purpose in machine translation(Bahdanau et al., 2014). The Memory slices arethen weighted according to their relative positionsto the target, so that different targets from thesame sentence have their own tailor-made mem-ories.

After that, we pay multiple attentions on theposition-weighted Memory and nonlinearly com-bine the Attention results with a Recurrent Network , GRUs. Finally, we apply softmax on the out-put of the GRU Network to predict the sentimenton the framework introduces a novel way of ap-plying multiple- Attention mechanism to synthesizeimportant features in difficult sentence s sort of analogous to the cognition procedureof a person, who might first notice part of theimportant information at the beginning, then no-tices more as she reads through, and finally com-bines the information from multiple attentions todraw a conclusion. For the above sentence, ourmodel may attend the word except first, andthen attends the phrase don t play well , finallycombines them to generate a positive feature for Patrick . Tang et al. (2016) also adopted the ideaof multiple attentions, but they used the result ofa previous Attention to help the next Attention at-tend more accurate information.

Their vector fedto softmax for classification is only from the finalattention, which is essentially a linear combinationof input embeddings (they did not have a memorycomponent). Thus, the above limitation of singleattention based methods also holds for (Tang et al.,2016). In contrast, our model combines the resultsof multiple attentions with a GRU Network , whichhas different behaviors inherited from RNNs, suchas forgetting, maintaining, and non-linearly trans-forming, and thus allows a better prediction evaluate our approach on four datasets: thefirst two come from SemEval 2014 (Pontiki et al.,2014), containing reviews of restaurant domainand laptop domain; the third one is a collection oftweets, collected by (Dong et al., 2014); to exam-ine whether our framework is language-insensitive(since languages show differences in quite a fewaspects in expressing sentiments), we prepared adataset of Chinese news comments with peoplementions as opinion targets.

The experimental re-sults show that our model performs well for differ-ent types of data, and consistently outperforms thestate-of-the-art Related WorkThe task of Aspect sentiment classification belongsto entity-level sentiment analysis . Conventionalrepresentative methods for this task include rule-based methods (Ding et al., 2008) and statistic-based methods (Jiang et al., 2011; Zhao et al.,2010). Ganapathibhotla and Liu (2008) extracted2-tuples of (opinion target, opinion word) fromcomments and then identified the sentiment ofopinion targets. Deng and Wiebe (2015) adoptedProbabilistic Soft Logic to handle the task. Thereare also statistic-based approaches which employSVM (Jiang et al., 2011) or MaxEnt-LDA (Zhaoet al., 2010). These methods need either labo-rious feature engineering work or massive extra-linguistic Networks (NNs) have the capability offusing original features to generate new represen-tations through multiple hidden layers.

RecursiveNN (Rec-NN) can conduct semantic compositionson tree structures, which has been used for syntac-tic analysis (Socher et al., 2010) and sentence sen-timent analysis (Socher et al., 2013). (Dong et al.,2014; Nguyen and Shirai, 2015) adopted Rec-NNfor Aspect sentiment classification, by convertingthe opinion target as the tree root and propagatingthe sentiment of targets depending on the contextand syntactic relationships between them. How-ever, Rec-NN needs dependency parsing whichis likely ineffective on nonstandard texts such asnews comments and tweets. (Chen et al., 2016)employed Convolution NNs to identify the LSTMB ackward LSTML ocationWeightedMemoryAttentionLayerAtten tionLayerEpisode0 Episode1 EpisodenRecurrent Attention On 1: Model architecture. The dotted lines on the right indicate a layer may or may not be of a clause which is then used to infer thesentiment of the target.

The method has an as-sumption that an opinion word and its target liein the same clause. TD-LSTM (Tang et al., 2015)utilizes LSTM to model the context informationof a target by placing the target in the middle andpropagating the state word by word from the be-ginning and the tail to the target respectively tocapture the information before and after it. Never-theless, TD-LSTM might not work well when theopinion word is far from the target, because thecaptured feature is likely to be lost ((Cho et al.,2014) reported similar problems of LSTM-basedmodels in machine translation).(Graves et al., 2014) introduced the concept ofmemory for NNs and proposed a differentiableprocess to read and write Memory , which is calledNeural Turing Machine (NTM). Attention mech-anism, which has been used successfully in manyareas (Bahdanau et al., 2014; Rush et al., 2015),can be treated as a simplified version of NTM be-cause the size of Memory is unlimited and we onlyneed to read from it.

Recurrent Attention Network on Memory for Aspect …

Tags:

Information

Transcription of Recurrent Attention Network on Memory for Aspect …

Related search queries

Recurrent Attention Network on Memory for Aspect …

Tags:

Information

Documents from same domain

Related documents

Related search queries