Learning Word Vectors for Sentiment Analysis

Learning Word Vectors for Sentiment AnalysisAndrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang,Andrew Y. Ng,andChristopher PottsStanford UniversityStanford, CA 94305[amaas, rdaly, ptpham, yuze, ang, vector -based approaches to se-mantics can model rich lexical meanings, butthey largely fail to capture Sentiment informa-tion that is central to many word meanings andimportant for a wide range of NLP tasks. Wepresent a model that uses a mix of unsuper-vised and supervised techniques to learn wordvectors capturing semantic term document in-formation as well as rich Sentiment proposed model can leverage both con-tinuous and multi-dimensional Sentiment in-formation as well as non- Sentiment annota-tions.]

We instantiate the model to utilize thedocument-level Sentiment polarity annotationspresent in many online documents ( starratings). We evaluate the model using small,widely used Sentiment and subjectivity cor-pora and find it out-performs several previ-ously introduced methods for Sentiment clas-sification. We also introduce a large datasetof movie reviews to serve as a more robustbenchmark for work in this IntroductionWord representations are a critical component ofmany natural language processing systems. It iscommon to represent words as indices in a vocab-ulary, but this fails to capture the rich relationalstructure of the lexicon.

vector -based models domuch better in this regard. They encode continu-ous similarities between words as distance or anglebetween word Vectors in a high-dimensional general approach has proven useful in taskssuch as word sense disambiguation, named entityrecognition, part of speech tagging, and documentretrieval (Turney and Pantel, 2010; Collobert andWeston, 2008; Turian et al., 2010).In this paper, we present a model to capture bothsemantic and Sentiment similarities among semantic component of our model learns wordvectors via an unsupervised probabilistic model ofdocuments. However, in keeping with linguistic andcognitive research arguing that expressive contentand descriptive semantic content are distinct (Ka-plan, 1999; Jay, 2000; Potts, 2007), we find thatthis basic model misses crucial Sentiment informa-tion.

For example, while it learns thatwonderfulandamazingare semantically close, it doesn t cap-ture the fact that these are both very strong positivesentiment words , at the opposite end of the , we extend the model with a supervisedsentiment component that is capable of embracingmany social and attitudinal aspects of meaning (Wil-son et al., 2004; Alm et al., 2005; Andreevskaiaand Bergler, 2006; Pang and Lee, 2005; Goldbergand Zhu, 2006; Snyder and Barzilay, 2007). Thiscomponent of the model uses the vector represen-tation of words to predict the Sentiment annotationson contexts in which the words appear. This causeswords expressing similar Sentiment to have similarvector representations.

The full objective functionof the model thus learns semantic Vectors that areimbued with nuanced Sentiment information. In ourexperiments, we show how the model can leveragedocument-level Sentiment annotations of a sort thatare abundant online in the form of consumer reviewsfor movies, products, etc. The technique is suffi-ciently general to work also with continuous andmulti-dimensional notions of Sentiment as well asnon- Sentiment annotations ( , political affiliation,speaker commitment).After presenting the model in detail, we pro-vide illustrative examples of the Vectors it learns,and then we systematically evaluate the approachon document-level and sentence-level classificationtasks.

Our experiments involve the small, widelyused Sentiment and subjectivity corpora of Pang andLee (2004), which permits us to make comparisonswith a number of related approaches and publishedresults. We also show that this dataset contains manycorrelations between examples in the training andtesting sets. This leads us to evaluate on, and makepublicly available, a large dataset of informal moviereviews from the Internet Movie Database (IMDB).2 Related workThe model we present in the next section draws in-spiration from prior work on both probabilistic topicmodeling and vector -spaced models for word Dirichlet Allocation (LDA; (Blei et al.,2003)) is a probabilistic document model that as-sumes each document is a mixture of latent top-ics.

For each latent topicT, the model learns aconditional distributionp(w|T)for the probabilitythat wordwoccurs inT. One can obtain ak-dimensional vector representation of words by firsttraining ak-topic model and then filling the matrixwith thep(w|T)values (normalized to unit length).The result is a word topic matrix in which the rowsare taken to represent word meanings. However,because the emphasis in LDA is on modeling top-ics, not word meanings, there is no guarantee thatthe row (word) Vectors are sensible as points in ak-dimensional space. Indeed, we show in section4 that using LDA in this way does not deliver ro-bust word Vectors . The semantic component of ourmodel shares its probabilistic foundation with LDA,but is factored in a manner designed to discoverword Vectors rather than latent topics.

Some recentwork introduces extensions of LDA to capture sen-timent in addition to topical information (Li et al.,2010; Lin and He, 2009; Boyd-Graber and Resnik,2010). Like LDA, these methods focus on model -ing Sentiment -imbued topics rather than embeddingwords in a vector space models (VSMs) seek to model wordsdirectly (Turney and Pantel, 2010). Latent Seman-tic Analysis (LSA), perhaps the best known VSM,explicitly learns semantic word Vectors by apply-ing singular value decomposition (SVD) to factor aterm document co-occurrence matrix. It is typicalto weight and normalize the matrix values prior toSVD. To obtain ak-dimensional representation for agiven word, only the entries corresponding to theklargest singular values are taken from the word s ba-sis in the factored matrix.

Such matrix factorization-based approaches are extremely successful in prac-tice, but they force the researcher to make a numberof design choices (weighting, normalization, dimen-sionality reduction algorithm) with little theoreticalguidance to suggest which to term frequency (tf) and inverse documentfrequency (idf) weighting to transform the valuesin a VSM often increases the performance of re-trieval and categorization systems. Delta idf weight-ing (Martineau and Finin, 2009) is a supervised vari-ant of idf weighting in which the idf calculation isdone for each document class and then one valueis subtracted from the other. Martineau and Fininpresent evidence that this weighting helps with sen-timent classification, and Paltoglou and Thelwall(2010) systematically explore a number of weight-ing schemes in the context of Sentiment success of delta idf weighting in previous worksuggests that incorporating Sentiment informationinto VSM values via supervised methods is help-ful for Sentiment Analysis .

We adopt this insight,but we are able to incorporate it directly into ourmodel s objective function. (Section 4 comparesour approach with a representative sample of suchweighting schemes.)3 Our ModelTo capture semantic similarities among words , wederive a probabilistic model of documents whichlearns word representations. This component doesnot require labeled data, and shares its foundationwith probabilistic topic models such as LDA. Thesentiment component of our model uses sentimentannotations to constrain words expressing similarsentiment to have similar representations. We canefficiently learn parameters for the joint objectivefunction using alternating Capturing Semantic SimilaritiesWe build a probabilistic model of a document us-ing a continuous mixture distribution over words in-dexed by a multi-dimensional random variable.

Learning Word Vectors for Sentiment Analysis

Tags:

Information

Transcription of Learning Word Vectors for Sentiment Analysis

Related search queries

Learning Word Vectors for Sentiment Analysis

Tags:

Information

Documents from same domain

Related documents

Related search queries