Example: barber

When Is a Liability Not a Liability? Textual Analysis ...

THE JOURNAL OF FINANCE VOL. LXVI, NO. 1 FEBRUARY 2011 When Is a Liability Not a Liability ? TextualAnalysis, Dictionaries, and 10-KsTIM LOUGHRAN and BILL MCDONALD ABSTRACTP revious research uses negative word counts to measure the tone of a text. We showthat word lists developed for other disciplines misclassify common words in financialtext. In a large sample of 10-Ks during 1994 to 2008, almost three-fourths of the wordsidentified as negative by the widely used Harvard Dictionary are words typically notconsidered negative in financial contexts. We develop an alternative negative wordlist, along with five other word lists, that better reflect tone in financial text.

Analysis, Dictionaries, and 10-Ks ... 2 Modal verbs are used to express possibility (weak) and necessity (strong). We extend this categorization to create our more general classification of modal words. 38 The Journal of Finance R approach over a word categorization one, arguing that categorization might

Tags:

  Analysis, Modal

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Advertisement

Transcription of When Is a Liability Not a Liability? Textual Analysis ...

1 THE JOURNAL OF FINANCE VOL. LXVI, NO. 1 FEBRUARY 2011 When Is a Liability Not a Liability ? TextualAnalysis, Dictionaries, and 10-KsTIM LOUGHRAN and BILL MCDONALD ABSTRACTP revious research uses negative word counts to measure the tone of a text. We showthat word lists developed for other disciplines misclassify common words in financialtext. In a large sample of 10-Ks during 1994 to 2008, almost three-fourths of the wordsidentified as negative by the widely used Harvard Dictionary are words typically notconsidered negative in financial contexts. We develop an alternative negative wordlist, along with five other word lists, that better reflect tone in financial text.

2 We linkthe word lists to 10-K filing returns, trading volume, return volatility, fraud, materialweakness, and unexpected BODYof finance and accounting research uses Textual Analysis toexamine the tone and sentiment of corporate 10-K reports, newspaper arti-cles, press releases, and investor message boards. Examples are Antweiler andFrank (2004), Tetlock (2007), Engelberg (2008), Li (2008), and Tetlock, Saar-Tsechansky, and Macskassy (2008). The results to date indicate that negativeword classifications can be effective in measuring tone, as reflected by signifi-cant correlations with other financial commonly used source for word classifications is the Harvard Psychoso-ciological Dictionary, specifically, the Harvard-IV-4 TagNeg (H4N) file.

3 Onepositive feature of this list for research is that its composition is beyond thecontrol of the researcher. That is, the researcher cannot pick and choose whichwords have negative implications. Yet English words have many meanings,and a word categorization scheme derived for one discipline might not trans-late effectively into a discipline with its own a survey of Textual Analysis , Berelson (1952) notes that: Content analysisstands or falls by its categories. Particular studies have been productive tothe extent that the categories were clearly formulated and well adapted tothe problem (p.)

4 92). In some contexts, the H4N list of negative words mayeffectively capture the tone of a text. The question we address in this paper iswhether a word list developed for psychology and sociology translates well intothe realm of business. Loughran and McDonald are with University of Notre Dame. We are indebted to Paul Tetlockfor comments on a previous draft. We also thank Robert Battalio, Peter Easton, James Fuehrmeyer,Paul Gao, Campbell Harvey (Editor), Nicholas Hirschey, Jennifer Marietta-Westberg, Paul Schultz,an anonymous referee, an anonymous associate editor, and seminar participants at the 2009 FMAmeeting, University of Notre Dame, and York University for helpful comments.

5 We thank Hang Lifor research Journal of FinanceR While measuring document tone using any word classification scheme isinherently imprecise, we provide evidence based on 50,115 firm-year 10-Ksbetween 1994 and 2008 that the H4N list substantially misclassifies wordswhen gauging tone in financial applications. Misclassified words that are notlikely correlated with the variables under consideration for example,taxesorliabilities simply add noise to the measurement of tone and thus attenuatethe estimated regression coefficients. However, we also find evidence that somehigh frequency misclassifications in the Harvard list, such asmineorcancer,could introduce type I errors into the Analysis to the extent that they proxy forindustry segments or firm make several contributions to the literature on Textual Analysis .

6 Mostnotably, we find that almost three-fourths ( ) of the negative word countsaccording to the Harvard list are attributable to words that are typically notnegative in a financial context. Words such astax,cost,capital,board, Liability ,forei gn,andviceare on the Harvard list. These words also appear with greatfrequency in the vast majority of 10-Ks, yet often do no more than name aboardof directors or a company svice-presidents. Other words on the Harvardlist, such asmine, cancer, crude(oil),tire,orcapital, are more likely to identifya specific industry segment than reveal a negative financial create a list of 2,337 words that typically have negative implications ina financial sense.

7 The prevalence of polysemes in English words that havemultiple meanings makes an absolute mapping of specific words into finan-cial sentiment impossible. We can, however, develop lists based on actual usagefrequency that are most likely associated with a target construct. We use theterm Fin-Neg to describe our list of negative financial words. Some of thesewords also appear on the H4N list, but others, such asfelony,litigation,re-stated,misstateme nt,andunanticipateddo testing the 10-K sample, whether tone should be gauged by the entiredocument or just the Management Discussion and Analysis (MD&A) sectionis an empirical question.

8 We show that the MD&A section does not producetone measures that have a more discernable impact on 10-K file date excessreturns. Thus, the MD&A section does not allow us to assess tone through aclearer our results, we find that dividing firms into quintiles according to the pro-portion of H4N words (with inflections) in their 10-Ks produces no discernablepattern. That is, the proportion of H4N words does not systematically increaseas 10-K filing returns decrease. However, when we use our financial negativelist to sort firms, we observe a strong pattern. Regressions with multiple con-trol variables confirm the univariate findings of no effect for the proportionalcounts from the Harvard list versus a significant impact for the Fin-Neg also show that the attenuation bias introduced by misclassifications, es-pecially by high frequency words (which may be overweighted based on simpleproportional measures), can be substantially mitigated by using term weight-ing.

9 Most Textual Analysis uses a bag of words method where a document issummarized in a vector of word counts, and then combined across documentsWhen Is a Liability Not a Liability ?37into a term-document matrix. In other disciplines, term weighting is typicallyused in any vector space representation of term weighting,where the enormous differences in frequencies are dampened through a logtransformation and common words are weighted less, both the Harvard listand our Fin-Neg list generally produce similar expand the word classification categories, we create five additional wordlists.

10 Specifically, in addition to the negative word lists, we consider positive,uncertainty, litigious, strong modal , and weak modal word assess whether these word lists actually gauge tone, we find significantrelations between our word lists and file date returns, trading volume, subse-quent return volatility, standardized unexpected earnings, and two separatesamples of fraud and material weakness. We also examine whether negativetone classifications are related to future returns in terms of a trading strategy,and find no evidence of return predictability based on the competing nature of word usage in firm-related news is not identical across me-dia.


Related search queries