Example: biology

Assessing Examiner Agreement Why Reporting …

Why Reporting kappa is NOT Enough Suggested Reporting Criteria Assessing Examiner AgreementThe number of journal articles including some mention of the validity and reliability of new diagnostic methods or investigators employed in clinical studies is increasing, most notably in the past decade. Background About KappaIn particular the use of kappa statistics to assess Examiner Agreement for categorical outcomes has grown almost Medline search using kappa AND statistic generated the following. The Use of Kappa05010015020025019801985199019952000 20052010number of articleskappaweighted kappaFrequency of Citations of kappa (2179 citations)The primary purpose of this talk is to demonstrate why Reporting only kappa values does not provide the minimum information neededto assess Examiner proficiency in scoring categorical responses or Reporting of ExaminerAgreement Using kappa The secondary purpose is to suggest different criteri

In particular the use of kappa statistics to assess examiner agreement for categorical outcomes has grown almost exponentially. A Medline search using “Kappa AND

Tags:

  Assessing, Reporting, Agreement, Examiners, Kappa, Assessing examiner agreement why reporting

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of Assessing Examiner Agreement Why Reporting …

1 Why Reporting kappa is NOT Enough Suggested Reporting Criteria Assessing Examiner AgreementThe number of journal articles including some mention of the validity and reliability of new diagnostic methods or investigators employed in clinical studies is increasing, most notably in the past decade. Background About KappaIn particular the use of kappa statistics to assess Examiner Agreement for categorical outcomes has grown almost Medline search using kappa AND statistic generated the following. The Use of Kappa05010015020025019801985199019952000 20052010number of articleskappaweighted kappaFrequency of Citations of kappa (2179 citations)

2 The primary purpose of this talk is to demonstrate why Reporting only kappa values does not provide the minimum information neededto assess Examiner proficiency in scoring categorical responses or Reporting of ExaminerAgreement Using kappa The secondary purpose is to suggest different criteria that should be included in reports of Examiner proficiency in scoring categorical responses or Reporting of ExaminerAgreement Using kappa Issues regarding the study design (sample size, types of subjects, etc) of the calibration and evaluation of Examiner Agreement is not addressed in this presentation due to time Reporting of ExaminerAgreement Using kappa Alternative models to assess Examiner Agreement , such as log-linear models, latent class models, Bayesian methods and some newer graphical approaches are not addressed here either.

3 Incomplete Reporting of ExaminerAgreement Using kappa the reliabilities of three examiners (weighted kappa ) classifying the type of caries treatment needed for occlusal tooth surfaces [none, non-invasive only, or invasive] based on visual versus visual + DIAGNO dent + QLF ranged from to (Pereira, 2009) Examples of Reporting KappaA good example of Reporting kappa involves 10 examiners scoring fluorosisusing the TFI. Observed Agreement and marginal distributions are presented as well as pairwise kappas (Tavener, 2007) Examples of Reporting KappaEven when the experts all agree, they may well be RussellIs the Focus on Agreement or Validity?

4 Today we will focus on Agreement ! kappa StatisticsChance corrected agreementSimple (Exact) kappa StatisticWhere poand perepresent proportion of observed and expectedagreement (under independence, usual X2method)EEOp1ppK -1 -pE/(1 pE) K 1(Cohen, 1960)A\BNoYestotalsNo40949 Yes64551totals4654100 Simple KappapO= , pE= , K = to interpret K = slight< poorLandis and Koch Biometrics 1977(cited 8557 times) kappa Statistic* Strength of Agreement *Landis & Koch assumed equal marginals for the very fair< poorAltman, DG 1991 TextbookKappa StatisticStrength of very fair to good< poorFleiss et al 2003 TextbookKappa StatisticStrength of AgreementLandis-Koch substantialAltman goodFleiss et al fair to goodStatisticians Confuse the IssueKappa = of Agreement1.

5 The kappa statistic is sufficient todemonstrate Examiner Agreement 2. kappa values found to be substantial are generally considered Examiner training ceased when kappavalues attained substantial level More Confusion Regarding kappa value is strongly influencedby the prevalenceof the outcome 2. kappa values can be counter-intuitive3. kappa values can depend on numberof categories Controversies Regarding KappaA\BNoYestotalsNo40949 Yes64551totals4654100 Controversy 1 Feinstein, 1990(prevalence effect)pO= , K = \BNoYestotalsNo801090 Yes5510totals8515100pO= , K = \BNoYestotalsNo451560 Yes251540totals7030100 Controversy 2 Feinstein, 1990(bias has larger kappa than non-bias)pO= , K = \BNoYestotalsNo253560 Yes53540totals3070100pO= , K = of Examiner Disagreement In addition to the prevalence effect thereare twosources of Examiner disagreement-marginal heterogeneity group leveldisagreement (marginals)

6 -bivariate disagreement individual leveldisagreement (cells) which affectsprecisionConsequences of Disagreement Can produce biased estimates of disease prevalence and/or decrease the power or precision which increases the cost of conducting the s correlation coefficient (r) -cannot detect bias for continuousvariables. - kappa -cannot detect bias marginalheterogeneity for categorical variablesBias Detection 05101520250510152025 DMFS score by examiners Y and ZDMFS score by Examiner XDMFS Scores for Pairs of differencesDMFS scoreBland-Altman Differences Plot (cited 18468 times) Lancet, 1986 Puerto Rico Caries Clinical Trial24-Month D2 MFS IncrementsStookey, 200402468500 ppm F1100 ppm F2800 ppm FExaminer AExaminer Bn 170 per Marginal TFI Distributions for 4 examiners (Bias Detection for Ordinal Scale)

7 , 2007 Differences Plot in Fluorosis in prevalence for Examiner (%)E1E2E3E4 ZTavener, 2007 General kappa (r xr)For r 3 the range of kappa is-1 < -pE/(1 pE) K 1 iiiiiiEEOpp1pppp1ppK Moderate Agreement (K = )No BiasX1\Y101-23-6 Tot X101582071851-218457703-6593145 Tot Y11817445300PO= PE= K = Moderate Agreement (K = )BiasX2\Y201-23-6 Tot X2014540152001-26504603-6403640 Tot Y21559055300PO= PE= K = Marginal Homogeneity(Bias Group Level)Perform a X2formal statistical test Bland-Altman Charts -graphic approachUse maximum kappa -heuristic approach Marginal Homogeneity?

8 -Y difference (%)ICDAS3categoryNo BiasBiaszBland-Altman Marginal Differences PlotCalculating Maximum kappa (r xr)To maximize kappa we need compute the maximum value for observed Agreement , po, given fixed marginals. It is derived by maximizing po,max (po) = min (pi , p i) iiiiiiEEOpp1pppp1ppKROME Talk July 24 homeMaximum kappa Approach (no bias)CategoryX1Y1 Min01851811811-27074703-6454545sum300300 296K = KM= kappa Approach (bias) CategoryX2Y2 Min02001551551-26090603-6405540sum300300 255K = KM= Tests dVdNZ1'0 N/Z1ZZ001 1. Stuart -Maxwell X2test2. Bhapkar X2testMarginal Homogeneity TestsFor r = 2 each reduces to McNemar X2test (Sun, 2008) Marginal Homogeneity Tests X2= (df= 2, p = ) Marginal Homogeneity Tests X2= (df= 2, p = ) Tests for Symmetry (r x r)Bowker s X2test (in SAS ) For r = 2 it reduces to McNemar X2test (Bowker, 1948)X2 Tests -Symmetry (K = )X\Y01-23-6 Tot X2= (df = 3, p = )X2 Tests Symmetry (K = )X2\Y201-23-6 Tot X2= (df = 3, p = )Symmetry Marginal HomogeneitySymmetry Marginal HomogeneityMarginal Homogeneity SymmetrySymmetry vs Marginal Homogeneity (K = )

9 X\Y01-23-6 Tot X0505001001-24030301003-6102070100 Tot Y100100100300 Bowker X2= (df = 3, p = )Bhapkar X2= (df = 2, p = ) If marginal homogeneity is rejected, that is, bias is detected, examine the table to determine the source of the problem (which categories) and provide additional training forthe examiners . Suggestion -Test for Bias FirstWeighted Kappas Why Weighted kappa (r xr) kappa treats all disagreements in an r x r table the same. Appropriatefor nominally scaled variable (race, ethnic group, college major, opinions, values, mental disorders). Why Weighted kappa (r xr)However, for an ordinally scaled variable like ICDAS a disagreement between a score of 1 vs 2 (wet vs dry NC) is not as severe as one between 1 vs 5 (NC vs obvious frank lesion).

10 Assign different weights wijto off-diagonal kappa Statistic (r xr)wij-selected so that 0 wij 1(think partial credit) jiijjiijijijwEwEwOwppw1ppwpwp1ppK)()()(C ommon Weighted Kappas1. Linear weights - weights are2. Fleiss-Cohen (intraclass or squared error) )(||1rji1wij 22ij1rji1w)()( Common Weights for (3 x 3) TableX\YSNCCS11/20NC1/211/2C01/21X\YSNCC S13/40NC3/413/4C03/41 Linear Intraclass (FC) Linear Weighted kappa (K = )No BiasX1\Y101-23-6 Tot X101582071851-218457703-6593145 Tot Y11817445300PO= PE= K = KL= Weighted kappa (K = )No BiasX1\Y101-23-6 Tot X101582071851-218457703-6593145 Tot Y11817445300PO= PE= K = KFC= Weighted kappa (K = )


Related search queries