Example: marketing

The Content Validity Index: Are You Sure You Know What’s ...

Research in Nursing & Health, 2006, 29, 489 497 The Content Validity Index: Are You Sure You Know What sBeing Reported? Critiqueand RecommendationsDenise F. Polit,1,2* Cheryl Tatano Beck3**1 Humanalysis, Inc., Saratoga Springs, NY2 Griffith University School of Nursing, Gold Coast, Australia3 University of Connecticut School of Nursing, Storrs, CTAccepted 16 May 2006 Abstract:Scale developers often provide evidence of Content Validity bycomputing a Content Validity index (CVI), using ratings of item relevanceby Content experts. We analyzed how nurse researchers have definedand calculated the CVI, and found considerable consistency for item-levelCVIs (I-CVIs). However, there are two alternative, but unacknowledged,methods of computing the scale-level index (S-CVI). One method requiresuniversal agreement among experts, but a less conservative methodaverages the item-level CVIs.

Research in Nursing & Health, 2006, 29, 489–497 The Content Validity Index: Are You Sure You Know What’s Being Reported? Critique and Recommendations

Tags:

  Content, Validity, Content validity

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of The Content Validity Index: Are You Sure You Know What’s ...

1 Research in Nursing & Health, 2006, 29, 489 497 The Content Validity Index: Are You Sure You Know What sBeing Reported? Critiqueand RecommendationsDenise F. Polit,1,2* Cheryl Tatano Beck3**1 Humanalysis, Inc., Saratoga Springs, NY2 Griffith University School of Nursing, Gold Coast, Australia3 University of Connecticut School of Nursing, Storrs, CTAccepted 16 May 2006 Abstract:Scale developers often provide evidence of Content Validity bycomputing a Content Validity index (CVI), using ratings of item relevanceby Content experts. We analyzed how nurse researchers have definedand calculated the CVI, and found considerable consistency for item-levelCVIs (I-CVIs). However, there are two alternative, but unacknowledged,methods of computing the scale-level index (S-CVI). One method requiresuniversal agreement among experts, but a less conservative methodaverages the item-level CVIs.

2 Using backward inference with a purposivesample of scale development studies, we found that both methods arebeing used by nurse researchers, although it was not always possible toinfer the calculation method. The two approaches can lead to differentvalues, making it risky to draw conclusions about Content Validity . Scaledevelopers should indicate which method was used to provide readers withinterpretable Content Validity information. 2006 Wiley Periodicals, Inc. ResNurs Health 29:489 497, 2006 Keywords: instrument development and validation; methodological research;scaling; Content validityWhen a new scale is developed, researchersfollowing rigorous scale development proceduresare expected to provide extensive informationabout the scale s reliability and the criterion-related and constructvalidity of a new instrument are consideredespecially important, information about the con-tent Validity of the measure is also viewed asnecessary in drawing conclusions about the scale squality.

3 Content Validity has been defined asfollows:(1) ..the degree to which an instrument has anappropriate sample of items for the constructbeing measured (Polit & Beck, 2004,p. 423);(2) ..whether or not the items sampled forinclusion on the tool adequately represent theCorrespondence to Denise F. Polit, Humanalysis, Inc., 75 Clinton Street, SaratogaSprings, NY 12866 and Griffith University School of Nursing, Gold Coast, : and Adjunct Professor.** online in Wiley InterScience ( )DOI: 2006 Wiley Periodicals, of Content addressed by the instru-ment (Waltz, Strickland, & Lenz, 2005,p. 155); and(3) ..the extent to which an instrument ade-quately samples the research domain ofinterest when attempting to measure phenom-ena (Wynd, Schmidt, & Schaefer, 2003,p. 509).There is general agreement in these definitionsthat Content Validity concerns the degree towhich a sample of items, taken together, constitutean adequate operational definition of a is also agreement in the methodologicliterature that Content Validity is largely a matter ofjudgment, involving two distinct phases: a prioriefforts by the scale developer to enhance contentvalidity through careful conceptualization anddomain analysis prior to item generation, and aposteriori efforts to evaluate the relevance of thescale s Content through expert assessment ( ,Beck & Gable, 2001; Lynn, 1986; Mastaglia,Toye, & Kristjanson, 2003).

4 This article focuseson the second part of this ON CONTENTVALIDITY APPROACHESN umerous methods of quantifying experts degreeof agreement regarding the Content relevance of aninstrument have been proposed. These include, forexample, averaging experts ratings of itemrelevance and using a pre-established criterion ofacceptability ( , Beck & Gable, 2001); usingcoefficient alpha to quantify agreement of itemrelevance by three or more experts (Waltzet al., 2005, p. 157); and computing a multiraterkappa coefficient (Wynd et al., 2003). Avariety ofother indexes that capture interrater agreementhave been proposed and are used mainly in thefield of personnel psychology (Lindell & Brandt,1999).One approach, recommended several decadesago, has special relevance in this article. Thisapproach involves having a team of expertsindicate whether each item on a scale is congruentwith (or relevant to) the construct, computing thepercentage of items deemed to be relevant for eachexpert, and then taking an average of thepercentages across experts.

5 As an example withtwo experts, if Expert 1 rated 100% of a set ofitems to be congruent with the construct, andExpert 2 rated 80% of the items to be congruent,the value of this index would be 90%. This hasbeen referred to as theaverage congruencypercentage(ACP) and is attributed to Popham(1978). Waltz et al. (2005, p. 178) advise that anACP of 90 percent or higher would be nurse researchers, the most widelyreported measure of Content Validity is the contentvalidity index, or CVI. The CVI (which we defineand describe at length later in this article) has beenused for many years, and is most often attributed toMartuza (1977), an education specialist. However,researchers who use the CVI to assess the contentvalidity of their scales regardless of their owndisciplinary backgrounds often cite methodolo-gic work in the nursing literature, most oftenDavis (1992), Grant and Davis (1997), Lynn(1986), Waltz et al.

6 (2005), or Waltz and Bausell(1981). Lynn s seminal study has been CVI has had its share of critics, however,even among nurse researchers. For example,Wynd and her colleagues (2003) used both theCVI and a multirater kappa coefficient in theircontent validation of the Osteoporosis RiskAssessment Tool. They argued that the kappastatistic was an important supplement to (if notsubstitute for) the CVI because the formula forkappa yields an index of degree of agreementbeyond chance agreement, unlike the CVI, whichdoes not adjust for chance agreement. Otherconcerns are that the CVI throws away informa-tion by collapsing experts multipoint ordinalratings into two categories ( , into relevant/notrelevant categories, a common practice), and thatthe CVI focuses on item relevance of the itemsreviewed but does not capture whether a scaleincludes a comprehensive set of items to ade-quately measure the construct of purpose in this article is not to advocate foror against using the CVI as the standard index ofcontent Validity .

7 Rather, because the CVI is usedso widely in nursing, our purpose is to clarify whatthis index is actually capturing and to demonstratethat researchers are not always clear in articulatinghow they have computed Content VALIDITYINDEX FOR ITEMS (I-CVI)As noted by Lynn (1986), researchers computetwo types of CVIs. The first type involves thecontent Validity of individual items and the secondinvolves the Content Validity of the overall is considerable agreement about how tocompute the item-level CVI, which we refer to forthe purpose of clarity as the I-CVI. A panel ofResearch in Nursing & HealthDOI IN NURSING & HEALTH Content experts is asked to rate each scale item interms of its relevance to the underlying (1986) advised a minimum of three experts,but indicated that more than 10 was probablyunnecessary. By tradition, and based on the adviceof early writers such as Lynn, as well as Waltz andBausell (1981), these item ratings are typically ona 4-point ordinal scale.

8 Lynn acknowledged that 3-or 5-point rating scales might be considered, butshe advocated using a 4-point scale to avoidhaving a neutral and ambivalent different labels for the four points alongthe item-rating continuum have appeared in theliterature, but the one that was advocated by Davis(1992) appears to be in frequent use: 1 notrelevant,2 somewhat relevant,3 quite rele-vant,4 highly relevant. Then, for each item, theI-CVI is computed as the number of experts givinga rating of either 3 or 4 (thus dichotomizing theordinal scale intorelevantandnot relevant),divided by the total number of experts. Forexample, an item that was rated asquiteorhighlyrelevant by four out of five judges would have an I-CVI of . concern that has been raised about the CVIis that it is an index of interrater agreementthat simply expresses the proportion of agreement,and agreement can be inflated by chance example, if two judges rated the relevanceversus irrelevance of an item, by chance alonethe two judges would be expected to agree onrelevance 25 percent of the time.

9 In recognition ofthis problem, Lynn (1986) developed criteria foritem acceptability that incorporated the standarderror of the proportion. She recommended thatwith a panel of five or fewer experts, all mustagree on the Content Validity for their rating to beconsidered a reasonable representation of theuniverse of possible ratings (p. 383). In otherwords, the I-CVI should be when there arefive or fewer judges. When there are six or morejudges, the standard can be relaxed, butLynn recommended I-CVIs no lower than . example, with six raters, there could be one not relevant rating (I-CVI .83) and withnine raters there could be twonot relevantratings(I-CVI .78).Researchers use I-CVI information to guidethem in revising, deleting, or substituting items. Inresearch reports, however, researchers do notusually provide information about I-CVI tend only to be reported in methodologicalstudies that focus on descriptions of the contentvalidation process.

10 What is most often reported inscale development studies is the CVI for the entirescale, and that is where the problems Content Validity INDEXFOR SCALES (S-CVI)Computational procedures for the scale-levelCVI,which we refer to for the sake of clarity as theS-CVI, have been fully explicated in terms ofratings by two experts. Here are two frequentlycited definitions: The S-CVI is defined as theproportion of items given a rating of quite/veryrelevant by both raters involved (Waltz et al.,2005, p. 155) and the proportion of items given arating of 3 or 4 by both raters involved (Waltz &Bausell, 1981, p. 71). Both references presenttables to illustrate how to compute the S-CVI withtwo raters using 4-point scales of item example similar to that shown in Waltz et al.(p. 155) is presented in Table 1. In this example,8 out of 10 items were judged to bequiteorhighlyrelevant ( , a rating of 3 or 4) by both experts,and so the S-CVI is computed to be.


Related search queries