Example: barber

Predicting Croatian Phrase Sentiment Using a …

Predicting Croatian Phrase Sentiment Using a Deep Matrix-Vector ModelSini a Bi din, Jan najder, Goran Glava University of Zagreb, Faculty of Electrical Engineering and ComputingText Analysis and Knowledge Engineering LabUnska 3, 10000 Zagreb, { , Sentiment analysis tasks rely on the existence of a Sentiment lexicon. Such lexicons, however, typically contain single wordsannotated with prior Sentiment . Problems arise when trying to model the Sentiment of multiword phrases such as very good or not bad .In this paper, we use a recently proposed deep neural network model to classify the Sentiment of phrases in Croatian . The experimentalresults suggest that reasonable classification of Phrase -level Sentiment for Croatian is achievable with such a model, reaching a performancecomparable to that of an analogous model for sentimenta besednih zvez v hrva cini z uporabo globinskega modela matrik vektorjevNapovedovanje sentimenta besednih zvez v hrva cini z uporabo globinskega modela matrik vektorjev Mnogo analiz sentimenta sezana a na obstoj leksikona z informacijami o sentimentu.}

Predicting Croatian Phrase Sentiment Using a Deep Matrix-Vector Model Siniša Bidin, Jan Šnajder, Goran Glavaš¯ University of Zagreb, Faculty of …

Tags:

  Using, Phrases, Predicting, Sentiment, Croatian, Predicting croatian phrase sentiment using

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of Predicting Croatian Phrase Sentiment Using a …

1 Predicting Croatian Phrase Sentiment Using a Deep Matrix-Vector ModelSini a Bi din, Jan najder, Goran Glava University of Zagreb, Faculty of Electrical Engineering and ComputingText Analysis and Knowledge Engineering LabUnska 3, 10000 Zagreb, { , Sentiment analysis tasks rely on the existence of a Sentiment lexicon. Such lexicons, however, typically contain single wordsannotated with prior Sentiment . Problems arise when trying to model the Sentiment of multiword phrases such as very good or not bad .In this paper, we use a recently proposed deep neural network model to classify the Sentiment of phrases in Croatian . The experimentalresults suggest that reasonable classification of Phrase -level Sentiment for Croatian is achievable with such a model, reaching a performancecomparable to that of an analogous model for sentimenta besednih zvez v hrva cini z uporabo globinskega modela matrik vektorjevNapovedovanje sentimenta besednih zvez v hrva cini z uporabo globinskega modela matrik vektorjev Mnogo analiz sentimenta sezana a na obstoj leksikona z informacijami o sentimentu.}

2 Vendar tak ni leksikoni tipi cno vsebujejo samo posamezne besede, ozna cene zvnaprej njim sentimentom. Problemi se pojavijo, ko bi eleli modelirati Sentiment ve cbesednih enot, kot so zelo dobro ali ni slabo . Vprispevku uporabimo pred kratkim predlagano globinsko nevronsko mre o, s katero klasificiramo Sentiment besednih zvez v hrva rezultati nakazujejo, da je s takim modelom mogo ce dose ci razmeroma dobro klasifikacijo besednih zvez glede na njihovsentiment, saj je delovanje modela primerljivo z analognim modelom za angle ki IntroductionThe Sentiment of a word, a Phrase , or a document refersto its subjective attitude, polarity, or expression of Phrase nicely done has a positive, whereas horri-bly wrong has a negative Sentiment . Sentiment analysisexplores the ways of identifying or extracting sentimentfrom text. Applying methods of Sentiment analysis on largeramounts of text, nowadays widely available on the web, al-lows us to do things such as attempt to judge the popularityof a product or predict the outcome of an this paper, we focus on classifying the Sentiment ofCroatian phrases consisting of two words.

3 Given Sentiment -labeled phrases such as very bad , not bad , and verygood , we aim to train a model to correctly learn that bad bears a negative Sentiment , and good a positive one. Also,the model should learn that very is an intensifier: it am-plifies the Sentiment of a word it is paired with. Likewise, not should be recognized as a negator, a word that invertsthe Sentiment of the word or a Phrase it appears next learn the Sentiment of Croatian bigrams, we employ adeep neural network model proposed by Socher et al. (2012).This model has shown to have good results when appliedto the English language, which is something we aim toreplicate for Croatian . We train and evaluate the deep neuralmodel on two datasets of phrases , achieving performancecomparable to the results obtained for English Related workThis work is most closely related to two prominent areasof natural language processing: Sentiment analysis and com-positionality in vector spaces. Compositionality in vectorspaces refers to the problem of learning a useful representa-tion of a composition of multiple vector on compositionality, the model we use (Socheret al.)

4 , 2012) is a generalization of earlier models. Onemodel proposes vector composition through additive andmultiplicative functions (Mitchell and Lapata, 2010), whileanother captures compositionality of words by linear com-binations of nouns represented as vectors and adjectivesas matrices (Baroni and Zamparelli, 2010). Finally, a gen-eral approach for Sentiment analysis of phrases was laid outby Yessenalina and Cardie (2011), interesting also in thatit introduces a model that uses matrices to represent wordsand matrix multiplication to compose related work focusing also on Sentiment anal-ysis is the one by Socher et al. (2011), where predictionsof sentence-level Sentiment distributions are made Using arecursive model that attempts to model Sentiment via com-positional semantics. Later models improve on this andachieve state-of-the-art results for the tasks of sentence-levelsentiment classification (Socher et al., 2012; Socher et al.,2013), the first of which is the very model we are Using Training the matrix-vector modelTo classify the Phrase Sentiment , we use theMV-RNNmodel proposed by Socher et al.

5 (2012). This model canbe applied by recursive operators to any n-gram, but wesimplify it to the point where it only handles bigrams. TheMV recursive neural network model derives its name fromthe matrix-vector representation of words. In essence, thismeans that each wordwof a lexicon is modeled Using twoseparate pieces of data: ann-dimensionalvectorxrepresent-ing some semantic property of the word (such as Sentiment )and ann-by-nmatrixXrepresenting the way the word in-fluences the same semantic property of other words withwhich it constitutes a (x,X),x Rn,X Rn nGiven an initial set of wordMV-representationsandsome initial shared weightsW, all initialized to some ( ,9. KONFERENCA JEZIKOVNE TEHNOLOGIJE Informacijska dru ba - IS 20149th Language Technologies Conference Information Society - IS 201495aABbBaAbfp very good Figure 1:Two words, very and good , havingMV-representations(a,A)and(b,B)res pectively, affecteach others meaning (viaBaandAb) and combine usingfto form a basis for Phrase Sentiment ) continuous values, in addition to a non-linear func-tiong( , a sigmoid), we can use a combining functionfto determine the vector representationpof an entire is depicted in Fig.

6 1. The function represents possi-ble effects the two words have on each others Sentiment bymultiplying each one s matrix with the others (Ba,Ab) =g(W[BaAb]),W Rn 2nWe can then use the vectorpto determine the sentimentof the Phrase it represents. Instead of focusing on only twoclasses of Sentiment (negative and positive), the model canpredict a Sentiment distribution overKclasses. Applyingthe softmax function topin combination with some weightsWclass, element-wise, gives us an estimatedof membershipprobability for each of theKsentiment classes:d= softmax (Wclassp),Wclass RK n,d RKsoftmaxi(z) =ezi ni6=jezjTo determine the amount of error between the referenceand predicted Sentiment probability distributions,y Yandd, respectively, we compute the binary cross entropyerrors for each of theKclasses. The loss functionJissimply the mean error across all training instances:E(y,d) = 1KK i=1(yiln (di) + (1 yi) ln (1 di))J=1NN i=1E(Y(i),d(i))While the initial vector componentsxof all the wordMV-representationscould be initialized to random values,we can also pretrain them, which has been shown to bebeneficial for many tasks (Erhan et al.)

7 , 2010). Followingthese insights, we initialize the vectors to word embeddingsproduced byword2vec,1an implementation of the skip-grammodel by Mikolov et al. (2013), trained on the fHrWaC2corpus ( najder et al., 2013; Ljube i c and Erjavec, 2011).1 , we set all the initial word matrix componentsXto the identity matrix, adding a small amount of I, it ceases to have an effect on the Sentiment ofa word when multiplied with that word s vector, as in thedefinition of functionf. This ensures that words by defaultdo not function as operators; they neither intensify, attenuate,nor flip the Sentiment of the words they are paired model s total number of parameters equals2n2+Kn+L(n+n2), corresponding to sizes ofW,Wclass, andthe MV-representations of allLwords in the lexicon. Weoptimize these parameters by minimizingJwith stochasticgradient descent, Using a starting learning rate of = diminishing it linearly towards zero. Due to the largespace complexity (O(Ln2)), there are practical restrictionson the value ofn.

8 However, it has been shown that settingnto larger values (larger than 11) does not improve theperformance (Socher et al., 2012).4. EvaluationWe evaluate the model on two different datasets ofphrases:3(1) a synthetic dataset where phrases have been as-sembled and their Sentiment distributions labeled manuallyand (2) a dataset of manually translated common phrasesextracted from movie reviews in movies are commonly rated on a scale of 1 to10, and indeed our source for the second dataset uses thatvery same rating scheme, we will be classifying phrases intoK=10 Sentiment classes that each correspond to a particularrating ranging from 1 (the worst) to 10 (the best). Addition-ally, we will use the same model trained forK=10 classesand apply it to classification of Sentiment intoK=3 DatasetsThe datasets consist of unique two-word phrases pairedwith their Sentiment distributions over a certain numberKof classes. It should be noted that a reference sentimentdistribution is never assigned to an individual word but ex-clusively to phrases .

9 Each Phrase occurs only once in adataset, but an individual word may occur multiple times, asa part of different phrases ( , good ).Synthetic first set consists of 1500 differentphrases composed of Croatian words, assembled by pairingeach of the 25 different adverbs with each of the 60 differentadjectives. The set is divided into 1200 training phrases and300 test phrases . Each of the phrases is manually labeled bya probability distribution over theK=10 Sentiment classes,determined subjectively by a single author considering thephrase outside of context. None of the phrases have beenlabeled with ambiguous Sentiment , meaning their sentimentprobability distributions contain only one single reviews second dataset is based on apublicly available dataset of bigrams extracted from moviereviews written in of the phrases is associ-ated with its frequency of occurrence within reviews witheach of 10 different possible ratings. Note that here we3 Datasets are available KONFERENCA JEZIKOVNE TEHNOLOGIJE Informacijska dru ba - IS 20149th Language Technologies Conference Information Society - IS 201496assume a correlation between a review s rating and the sen-timent of phrases expressed within it, and so use the fre-quencies of occurrence to construct for each unique Phrase aprobability distribution overK=10 Sentiment classes.

10 Sucha simplistic assumption might not hold in all cases ( ,a positive Phrase might, for whatever reason, appear oftenin negatively scored reviews and vice versa). Each phrasethat occurred in total at least 300 times was manually trans-lated into Croatian by a single annotator Using his subjectivejudgment. The translated phrases are then compiled intoa dataset consisting of 1026 different phrases containing208 unique words. The dataset is divided into a training setconsisting of 821 and a test set consisting of 205 Results and discussionWe evaluate the MV-RNN model for several differ-ent sizes of the word vector (n= 8, 10, 13, and 15).We present the results Using two different measures: (1)the F1-score and (2) the mean Kullback-Leibler diver-gence (KL-divergence). The KL-divergence measures the(dis)similarity between the reference and predicted probabil-ity distributionsyandd, respectively:KL(y,d) = iyilnyidiWe compute two F1-scores: (1) forK=10 classes and(2) forK=3 classes (thepositive,negative, andneutralclass).


Related search queries