Example: quiz answers

Understanding Interobserver Agreement: The Kappa Statistic

360 May 2005 Family MedicineIn reading medical literature on diagnosis and inter-pretation of diagnostic tests, our attention is generallyfocused on items such as sensitivity, specificity, pre-dictive values, and likelihood ratios. These items ad-dress the validity of the test. But if the people who ac-tually interpret the test cannot agree on the interpreta-tion, the test results will be of little us suppose that you are preparing to give a lec-ture on community-acquired pneumonia. As you pre-pare for the lecture, you read an article titled, Diag-nosing Pneumonia by History and Physical Examina-tion, published in the Journal of the American Medi-cal Association in You come across a table inthe article that shows agreement on physical examina-tion findings of the chest.

The kappa statistic (or kappa coefficient) is the most commonly used statistic for this purpose. A kappa of 1 indicates perfect agreement, whereas a kappa of 0 indicates agree-ment equivalent to chance. A limitation of kappa is that it is affected by the prevalence of the finding under ... The calculation is based on the difference between

Tags:

  Statistics, Calculation, Kappa, Kappa statistic

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of Understanding Interobserver Agreement: The Kappa Statistic

1 360 May 2005 Family MedicineIn reading medical literature on diagnosis and inter-pretation of diagnostic tests, our attention is generallyfocused on items such as sensitivity, specificity, pre-dictive values, and likelihood ratios. These items ad-dress the validity of the test. But if the people who ac-tually interpret the test cannot agree on the interpreta-tion, the test results will be of little us suppose that you are preparing to give a lec-ture on community-acquired pneumonia. As you pre-pare for the lecture, you read an article titled, Diag-nosing Pneumonia by History and Physical Examina-tion, published in the Journal of the American Medi-cal Association in You come across a table inthe article that shows agreement on physical examina-tion findings of the chest.

2 You see that there was 79%agreement on the presence of wheezing with a kappaof and 85% agreement on the presence of tactilefremitus with a Kappa of How do you interpretthese levels of agreement taking into account the kappastatistic?Accuracy Versus PrecisionWhen assessing the ability of a test (radiograph,physical finding, etc) to be helpful to clinicians, it isimportant that its interpretation is not a product of guess-work. This concept is often referred to as precision(though some incorrectly use the term accuracy). Re-call the analogy of a target and how close we get to thebull s-eye (Figure 1).

3 If we actually hit the bull s-eye(representing agreement with the gold standard), weare accurate. If all our shots land together, we have goodprecision (good reliability). If all our shots land togetherand we hit the bull s-eye, we are accurate as well is possible, however, to hit the bull s-eye purelyby chance. Referring to Figure 1, only the center blackdot in target A is accurate, and there is little precision(poor reliability about where the shots land). In B, thereis precision but not accuracy. C demonstrates neitheraccuracy nor precision.

4 In D, the black dots are bothaccurate and precise. The lack of precision in A and Ccould be due to chance, in which case, the bull s-eyeshot in A was just lucky. In B and D, the groupingsare unlikely due to , as it pertains to agreement between ob-servers ( Interobserver agreement), is often reported asa Kappa Kappa is intended to give the readera quantitative measure of the magnitude of agreementbetween observers. It applies not only to tests such asradiographs but also to items like physical exam find-ings, eg, presence of wheezes on lung examination asnoted earlier.

5 Comparing the presence of wheezes onlung examination to the presence of an infiltrate on achest radiograph assesses the validity of the exam find-ing to diagnose pneumonia. Assessing whether the ex-aminers agree on the presence or absence of wheezes(regardless of validity) assesses precision (reliability).Research SeriesUnderstanding Interobserver Agreement: The Kappa Statistic Anthony J. Viera, MD; Joanne M. Garrett, PhDFrom the Robert Wood Johnson Clinical Scholars Program, University ofNorth such as physical exam findings, radiographic interpretations, or other diagnostic tests often rely onsome degree of subjective interpretation by observers.

6 Studies that measure the agreement between two ormore observers should include a Statistic that takes into account the fact that observers will sometimesagree or disagree simply by chance. The Kappa Statistic (or Kappa coefficient) is the most commonly usedstatistic for this purpose. A Kappa of 1 indicates perfect agreement, whereas a Kappa of 0 indicates agree-ment equivalent to chance. A limitation of Kappa is that it is affected by the prevalence of the finding underobservation. Methods to overcome this limitation have been described.(Fam Med 2005;37(5):360-3.)

7 361 Vol. 37, No. 5 The Kappa StatisticInterobserver variation can be measured in any situ-ation in which two or more independent observers areevaluating the same thing. For example, let us imaginea study in which two family medicine residents areevaluating the usefulness of a series of 100 noon lec-tures. Resident 1 and Resident 2 agree that the lecturesare useful 15% of the time and not useful 70% of thetime (Table 1). If the two residents randomly assigntheir ratings, however, they would sometimes agree justby chance. Kappa gives us a numerical rating of thedegree to which this calculation is based on the difference betweenhow much agreement is actually present ( observed agreement) compared to how much agreement wouldbe expected to be present by chance alone ( expected agreement).

8 The data layout is shown in Table 1. Theobserved agreement is simply the percentage of all lec-tures for which the two residents evaluations agree,which is the sum of a + d divided by the total n in Table1. In our example, this is 15+70/100 or may also want to know how different the ob-served agreement ( ) is from the expected agree-ment ( ). Kappa is a measure of this difference, stan-dardized to lie on a -1 to 1 scale, where 1 is perfectagreement, 0 is exactly what would be expected bychance, and negative values indicate agreement lessthan chance, ie, potential systematic disagreement be-tween the observers.

9 In this example, the Kappa is (For calculations, see Table 1.)Interpretation of KappaWhat does a specific Kappa value mean? We can usethe value of from the example above. Not every-one would agree about whether constitutes good agreement. However, a commonly cited scale is repre-sented in Table It turns out that, using this scale, akappa of is in the moderate agreement rangebetween our two observers. Remember that perfectagreement would equate to a Kappa of 1, and chanceagreement would equate to 0. Table 2 may help you visualize the interpretation of Kappa .

10 So, residents inthis hypothetical study seem to be in moderate agree-ment that noon lectures are not that interpreting Kappa , it is also important to keepin mind that the estimated Kappa itself could be due tochance. To report a P value of a Kappa requires calcula-Figure 1 Accuracy and PrecisionTable 1 Interobserver VariationUsefulness of Noon Lectures Resident 1 Lectures Helpful?YesNoTotal Resident 2 Yes15520 LecturesNo107080 Helpful?


Related search queries