Example: barber

SENTIWORDNET: A Publicly Available Lexical Resource for ...

S ENTI W ORD N ET: A Publicly Available Lexical Resource for Opinion Mining Andrea Esuli and Fabrizio Sebastiani .. Istituto di Scienza e Tecnologie dell'Informazione, Consiglio Nazionale delle Ricerche Via Giuseppe Moruzzi 1, 56124 Pisa, Italy E-mail: . Dipartimento di Matematica Pura e Applicata, Universit`a di Padova Via Giovan Battista Belzoni 7, 35131 Padova, Italy E-mail: Abstract Opinion mining (OM) is a recent subdiscipline at the crossroads of information retrieval and computational linguistics which is concerned not with the topic a document is about, but with the opinion it expresses. OM has a rich set of applications, ranging from tracking users'. opinions about products or about political candidates as expressed in online forums, to customer relationship management.

SENTIWORDNET: A Publicly Available Lexical Resource for Opinion Mining Andrea Esuli∗ and Fabrizio Sebastiani† ∗Istituto di Scienza e Tecnologie dell’Informazione, Consiglio Nazionale delle Ricerche Via Giuseppe Moruzzi 1, 56124 Pisa, Italy E-mail: andrea.esuli@isti.cnr.it †Dipartimento di Matematica Pura e Applicata, Universita di Padova` Via Giovan Battista Belzoni 7, 35131 Padova ...

Tags:

  Resource, Publicly, Available, Lexical, Sentiwordnet, A publicly available lexical resource for

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of SENTIWORDNET: A Publicly Available Lexical Resource for ...

1 S ENTI W ORD N ET: A Publicly Available Lexical Resource for Opinion Mining Andrea Esuli and Fabrizio Sebastiani .. Istituto di Scienza e Tecnologie dell'Informazione, Consiglio Nazionale delle Ricerche Via Giuseppe Moruzzi 1, 56124 Pisa, Italy E-mail: . Dipartimento di Matematica Pura e Applicata, Universit`a di Padova Via Giovan Battista Belzoni 7, 35131 Padova, Italy E-mail: Abstract Opinion mining (OM) is a recent subdiscipline at the crossroads of information retrieval and computational linguistics which is concerned not with the topic a document is about, but with the opinion it expresses. OM has a rich set of applications, ranging from tracking users'. opinions about products or about political candidates as expressed in online forums, to customer relationship management.

2 In order to aid the extraction of opinions from text, recent research has tried to automatically determine the PN-polarity of subjective terms, identify whether a term that is a marker of opinionated content has a positive or a negative connotation. Research on determining whether a term is indeed a marker of opinionated content (a subjective term) or not (an objective term) has been, instead, much more scarce. In this work we describe S ENTI W ORD N ET, a Lexical Resource in which each W ORD N ET synset s is associated to three numerical scores Obj(s), P os(s) and N eg(s), describing how objective, positive, and negative the terms contained in the synset are. The method used to develop S ENTI W ORD N ET is based on the quantitative analysis of the glosses associated to synsets, and on the use of the resulting vectorial term representations for semi-supervised synset classification.

3 The three scores are derived by combining the results produced by a committee of eight ternary classifiers, all characterized by similar accuracy levels but different classification behaviour. S ENTI W ORD N ET is freely Available for research purposes, and is endowed with a Web-based graphical user interface. 1. Introduction To aid these tasks, several researchers have attempted to Opinion mining (OM also known as sentiment classifi- automatically determine whether a term that is a marker cation ) is a recent subdiscipline at the crossroads of infor- of opinionated content has a Positive or a Negative con- mation retrieval and computational linguistics which is con- notation (Esuli and Sebastiani, 2005; Hatzivassiloglou and cerned not with the topic a text is about, but with the opin- McKeown, 1997; Kamps et al.)

4 , 2004; Kim and Hovy, 2004;. ion it expresses. Opinion-driven content management has Takamura et al., 2005; Turney and Littman, 2003), since it several important applications, such as determining critics' is by considering the combined contribution of these terms opinions about a given product by classifying online prod- that one may hope to solve Tasks 1, 2 and 3. The con- uct reviews, or tracking the shifting attitudes of the general ceptually simplest approach to this latter problem is prob- public towards a political candidate by mining online fo- ably Turney's (Turney, 2002), who has obtained interest- rums or blogs. Within OM, several subtasks can be iden- ing results on Task 2 by considering the algebraic sum of tified, all of them having to do with tagging a given text the orientations of terms as representative of the orienta- according to expressed opinion: tion of the document they belong to; but more sophisticated approaches are also possible (Hatzivassiloglou and Wiebe, 1.

5 Determining text SO-polarity, as in deciding whether a 2000; Riloff et al., 2003; Whitelaw et al., 2005; Wilson et given text has a factual nature ( describes a given al., 2004). situation or event, without expressing a positive or a The task of determining whether a term is indeed a negative opinion on it) or expresses an opinion on its marker of opinionated content ( is Subjective or Ob- subject matter. This amounts to performing binary text jective) has instead received much less attention (Esuli and categorization under categories Subjective and Ob- Sebastiani, 2006; Riloff et al., 2003; Vegnaduzzo, 2004). jective (Pang and Lee, 2004; Yu and Hatzivassiloglou, Note that in these works no distinction between different 2003); senses of a word is attempted, so that the term, and not its senses, are classified (although some such works (Hatzivas- 2.)

6 Determining text PN-polarity, as in deciding if a given siloglou and McKeown, 1997; Kamps et al., 2004) distin- Subjective text expresses a Positive or a Negative guish between different POSs of a word). opinion on its subject matter (Pang and Lee, 2004;. Turney, 2002); In this paper we describe S ENTI W ORD N ET (version ), a Lexical Resource in which each synset of W ORD - 3. determining the strength of text PN-polarity, as in de- N ET (version ) is associated to three numerical scores ciding whether the Positive opinion expressed by Obj(s), P os(s) and N eg(s), describing how Objective, a text on its subject matter is Weakly Positive, Mildly Positive, and Negative the terms contained in the synset Positive, or Strongly Positive (Pang and Lee, 2005; are.

7 The assumption that underlies our switch from terms Wilson et al., 2004). to synsets is that different senses of the same term may 417. have different opinion-related properties. Each of the three semi-supervised synset classification. The three scores are scores ranges from to , and their sum is for derived by combining the results produced by a committee each synset. This means that a synset may have nonzero of eight ternary classifiers, each of which has demonstrated, scores for all the three categories, which would indicate in our previous tests, similar accuracy but different charac- that the corresponding terms have, in the sense indicated teristics in terms of classification behaviour. by the synset, each of the three opinion-related proper- S ENTI W ORD N ET is freely Available for research pur- ties only to a certain degree1.

8 For example, the synset poses, and is endowed with a Web-based graphical user in- [estimable(3)]2 , corresponding to the sense may be terface. computed or estimated of the adjective estimable, has an Obj score of (and P os and N eg scores of ), while 2. Building S ENTI W ORD N ET. the synset [estimable(1)] corresponding to the sense The method we have used to develop S ENTI W ORD N ET is deserving of respect or high regard has a P os score of an adaptation to synset classification of our method for de- , a N eg score of , and an Obj score of ciding the PN-polarity (Esuli and Sebastiani, 2005) and SO- A similar intuition had previously been presented polarity (Esuli and Sebastiani, 2006) of terms. The method in (Kim and Hovy, 2004), whereby a term could have relies on training a set of ternary classifiers3 , each of them both a Positive and a Negative PN-polarity, each to a cer- capable of deciding whether a synset is Positive, or Nega- tain degree.

9 A similar point has also recently been made tive, or Objective. Each ternary classifier differs from the in (Andreevskaia and Bergler, 2006), in which terms that other in the training set used to train it and in the learn- possess a given opinion-related property to a higher de- ing device used to train it, thus producing different classi- gree are claimed to be also the ones on which human an- fication results of the W ORD N ET synsets. Opinion-related notators asked to assign this property agree more. Non- scores for a synset are determined by the (normalized) pro- binary scores are attached to opinion-related properties also portion of ternary classifiers that have assigned the corre- in (Turney and Littman, 2003), but the interpretation here is sponding label to it.

10 If all the ternary classifiers agree in related to the confidence in the correctness of the labelling, assigning the same label to a synset, that label will have the rather than in how strong the term is deemed to possess the maximum score for that synset, otherwise each label will property. have a score proportional to the number of classifiers that We believe that a graded (as opposed to hard ) eval- have assigned it. uation of opinion-related properties of terms can be help- Training a classifier ful in the development of opinion mining applications. A. hard classification method will probably label as Objective Each ternary classifier is generated using the semi- any term that has no strong SO-polarity, terms such as supervised method described in (Esuli and Sebastiani, short or alone.)


Related search queries