Transcription of Twitter Sentiment Analysis Introduction
1 Alec Go Huang Bhayani - Final Project ReportJune 6, 2009, 5:00PM (3 Late Days) Twitter Sentiment AnalysisIntroductionTwitter is a popular microblogging service where users create status messages (called"tweets").These tweets sometimes express opinions about different purpose of this project is to build an algorithm that can accurately classify Twittermessages as positive or negative, with respect to a query hypothesis is that wecan obtain high accuracy on classifying Sentiment in Twitter messages using machinelearning , this type of Sentiment Analysis is useful for consumers who are trying to researcha product or service, or marketers researching public opinion of their SentimentFor the purposes of our research.
2 We define Sentiment to be "a personal positive or negativefeeling."Here are some examples:SentimentQueryTweetPositivejque rydcostalis: Jquery is my newbest Franciscoschuyler: just landed at SanFranciscoNegativeexamjvici0us: History examstudying tweets that were not clearcut, we use the following litmus test:If the tweet could everappear as a newspaper headline or as a sentence in Wikipedia, then it belongs in the example, the following tweet would be marked as neutral because it is fact froma newspaper headline, even though it projects an overall negative feeling about GM:ThomasQuinlin.
3 RT @Finance_Info Bankruptcy filing could put GM on road to profits (AP) #FinanceRelated WorkThere have been many papers written on Sentiment Analysis for the domain of blogs andproduct reviews.(Pang and Lee 2008) gives a survey of Sentiment also analyzed the brand impact of microblogging (Jansen).We could not find anypapers that analyzes machine learning techniques in the specific domain of microblogs,probably because the popularity of Twitter is very , text classification using machine learning is a well studied field (Manning andSchuetze 1999).(Pang and Lee 2002) researched the effects of various machine learningtechniques (Naive Bayes (NB), Maximum Entropy (ME), and Support Vector Machines (SVM)in the specific domain of movie were able to achieve an accuracy of SVM and a unigram have also worked on detecting Sentiment in text.)
4 (Turney 2002) presents asimple algorithm, called semantic orientation, for detecting Sentiment .(Pang and Lee 2004)present a hierarchical scheme in which text is first classified as containing Sentiment , andthen classified as positive or (Read, 2005) has been done in using emoticons as labels for positive and is very relevant to Twitter because many users have emoticons in their messages have many unique attributes, which differentiates our research fromprevious research:1. maximum length of a Twitter message is 140 our trainingset, we calculated that the average length of a tweet is 14 words, and the average length ofa sentence is 78 is very different from the domains of previous research,which was mostly focused on reviews which consisted of multiple Available difference is the sheer magnitude of (Pang and Lee2002), the corpus size the Twitter API, it is much easier to collect millions oftweets for Language users post messages from many different mediums, includingtheir cell phones.
5 The frequency of misspellings and slang in tweets is much higher thanother CollectionThere are not any existing data sets of Twitter Sentiment collected our ownset of the training data, we collected messages that contained the emoticons :)and :( via the Twitter test data was set of 75 negative tweets and 108 positive tweets weremanually web interface tool was built to aid in the manual classification Appendix A for more details about the different classifiers were Naive Bayes classifier was built from libraries were used for Maximum Entropy and Support Vector table summarizes the 1.
6 Accuracy results from various classifiersTraining size also has an effect on 1 shows the effect of training sizeon 1. Effect of training size on different BayesNaive Bayes is a simple model for classification. It is simple and works well on textcategoration. We adopt multinomial Naive Bayes in our project. It assumes each feature isconditional independent to other features given the class . That is,where c is a specific class and t is text we want to classify. P(c) and P(t) is the priorprobabilities of this class and this text. And P(t | c) is the probability the text appears giventhis class .
7 In our case, the value of class c might be POSITIVE or NEGATIVE, and t is just goal is choosing value of c to maximize P(c | t):Where P(wi| c) is the probability of the ith feature in text t appears given class c. We needto train parameters of P(c) and P(wi| c). It is simple for getting these parameters in NaiveBayes model. They are just maximum likelihood estimation (MLE) of each one. Whenmaking prediction to a new sentence t, we calculate the log likelihood log P(c) + ilogP(wi|c) of different classes, and take the class with highest log likelihood as practice, it needs smoothing to avoid zero probabilities.
8 Otherwise, the likelihood will be0 if there is an unseen word when it making prediction. We simply use add-1 smoothing inour project and it works selectionFor unigram feature , there are usually 260,000 different features. This is a very largenumber. It makes model higher variance. (Since more complicated model has highervariance). So it will need much more training data to avoid overfitting. Our training setcontains hundreds of thousands sentences. But it is still a large number of features for ourtraining set. It is helpful if we discard some useless features.
9 We try 3 different featureselection feature selectionThis is the simplest way to do feature selection. We just pick features (unigram words in ourcase) for each class with high frequency occurrence in this class . In practice, if the numberof occurrences of a feature is larger than some threshold (3 or 100 in our experiments), thisfeature is a good one for that class . As we seen in the result table, this simply algorithmincreases about of InformationThe idea of mutual information is, for each class C and each feature F, there is a score tomeasure how much F could contribute to making correct decision on class C.
10 The formula ofMI score is,In practice, we also use add-1 smoothing for each Count(C = ec, F = ef) to avoid divided byzero. The code is n = () + 4;for(String feature : ()) {for(int polarity : ()) {double n11 = (polarity, feature ) + 1;double n01 = (polarity) (polarity, feature ) + 1;double n10 = ( feature ) (polarity, feature ) + 1;double n00 = n - (n11 + n01 + n10);double n1dot = n11 + n10;double n0dot = n - n1dot;double ndot1 = n11 + n01;double ndot0 = n - ndot1;double miScore = (n11 / n) * ((n * n11) / (n1dot * ndot1))+ (n01 / n) * ((n * n01) / (n0dot * ndot1))+ (n10 / n) * ((n * n10) / (n1dot * ndot0))+ (n00 / n) * ((n * n00) / (n0dot * ndot0)).}}