Example: air traffic controller

Item Analysis - ed

ByRichard J. McCowanSheila C. McCowanCenter for Development of Human ServicesResearch Foundation of SUNYS tate University College at Buffalo1695 Elmwood AvenueBuffalo, New York (Voice) (Fax)ItemAnalysisfor Criterion-Referenced Tests12 The Center for Development of Human Services is a continuing educationenterprise of the Research Foundation of the State University of New Yorkand a unit of the Graduate Studies and Research Divisionat Buffalo State College (SUNY).Funding is provided by the New York State Office of Childrenand Family Services. 1999 Research Foundation of SUNY/Center for Development of rights reservedCenter for Development of Human ServicesRobert N. Spaner - Chief Administrative J. McCowan, - Director, Research & Evaluation State College (SUNY)1695 Elmwood AvenueBuffalo, New York 14207-2407 Tel.: : Analysis uses statistics and expert judgment to evaluate testsbased on the quality of individual items, item sets, and entire setsof items, as well as the relationship of each item to other investigates the performance of items considered individuallyeither in relation to some external criterion or in relation to theremaining items on the test (Thompson & Levitov, 1985, p.)

correlation between an actual test score and the “true” criterion score. By the early 1950’s, other types of validity had been identi-fied (e.g., factorial, intrinsic, empirical, logical) (Anastasi, 1954). Messick (1989) expanded the definition by stating “Validity is an integrated evaluative judgment of the degree to which empirical

Tags:

  Correlations

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Advertisement

Transcription of Item Analysis - ed

1 ByRichard J. McCowanSheila C. McCowanCenter for Development of Human ServicesResearch Foundation of SUNYS tate University College at Buffalo1695 Elmwood AvenueBuffalo, New York (Voice) (Fax)ItemAnalysisfor Criterion-Referenced Tests12 The Center for Development of Human Services is a continuing educationenterprise of the Research Foundation of the State University of New Yorkand a unit of the Graduate Studies and Research Divisionat Buffalo State College (SUNY).Funding is provided by the New York State Office of Childrenand Family Services. 1999 Research Foundation of SUNY/Center for Development of rights reservedCenter for Development of Human ServicesRobert N. Spaner - Chief Administrative J. McCowan, - Director, Research & Evaluation State College (SUNY)1695 Elmwood AvenueBuffalo, New York 14207-2407 Tel.: : Analysis uses statistics and expert judgment to evaluate testsbased on the quality of individual items, item sets, and entire setsof items, as well as the relationship of each item to other investigates the performance of items considered individuallyeither in relation to some external criterion or in relation to theremaining items on the test (Thompson & Levitov, 1985, p.)

2 163).It uses this information to improve item and test quality. Itemanalysis concepts are similar for norm-referenced and criterion-ref-erenced tests, but they differ in specific, significant criterion-referenced tests, use norm-referenced statistics forpretest data and criterion-referenced statistics for posttest data. Thissuggestion assumes that untrained persons will know relatively littleabout pretest material, so the assumptions on which norm-refer-enced statistics are based are applicable. Once people are trained,a test is criterion-referenced, and criterion-referenced statistics mustbe is the extent to which a test measures what it is supposedto measure. It is the most critical dimension of test stated, validity is what a test measures and how well it doesthis (Anastasi, 1954; Anastasi & Urbani, 1997). Validity is a cru-cial consideration in evaluating tests.

3 Since new commercial testscannot be published without validation studies, it is reasonable toexpect similar evidence of validity for tests that screen individualsfor high stake decisions such as promotion, graduation, or minor modifications, Cronbach s (1949) concept of validity hasremained consistent over the last 50 years. Cronbach (1949, p. 48)said that validity was the extent to which a test measures what itpurports to measure and that a test is valid to the degree that whatit measures or predicts is known. He identified two basic catego-ries of validity including logical and empirical. Logical validity is aset of loosely organized, broadly defined approaches based on con-4tent Analysis that includes examination of operational issues andtest-taking processes. Content validation requires that test makersstudy a test to determine what the test scores truly 1954 the American Psychological Association (APA) defined fourcategories of validity including content, predictive, concurrent, andconstruct.

4 In 1966, the association combined predictive and con-current validity into a single grouping called criterion validity(American Psychological Association, 1966) which remains the cur-rent classification (American Educational Research Association,American Psychological Association, & National Council on Measure-ment and Education, 1985). These aspects of validity are often mis-takenly considered as three types of validity rather than a conceptabout how a score can be of ValidityFace validity estimates whether a test measures what it claimsto measure. It is the extent to which a test seems relevant, im-portant, and interesting. It is the least rigorous measure of validity is the degree to which a test matches a cur-riculum and accurately measures the specific training objectiveson which a program is based. Typically it uses expert judgmentof qualified experts to determine if a test is accurate, appropri-ate, and validity measures how well a test compares withan external criterion.

5 It includes:Predictive validity is the correlation between a predictorand a criterion obtained at a later time ( , test scoreon a specific competence and caseworker performance ofa job-related tasks).Concurrent validity is the correlation between a predictorand a criterion at the same point in time ( , perfor-mance on a cognitive test related to training and scoreson a Civil Service examination).Validity5 Construct validity is the extent to which a test measures atheoretical construct ( , a researcher examines apersonality test to determine if the personality typologiesaccount for actual results).In Standards for Educational and Psychological Testing (Ameri-can Educational Research Association, American PsychologicalAssociation & National Council on Measurement and Education,1985) stated:Validity is the most important consideration intest evaluation. The concept refers to theappropriateness, meaningfulness, and usefulnessof the specific inferences made from test validation is a process of accumulatingevidence to support such inferences.

6 A varietyof inferences may be made from scores pro-duced by a given test, and there are many waysof accumulating evidence to support anyparticular inference. Validity, however, is aunitary concept. Although evidence may beaccumulated in many ways, validity alwaysrefers to the degree to which that evidencesupports the inferences that are made from thescores. The inferences regarding specific uses ofa test are validated, not the test itself. (p. 9)They noted that professional judgment guides decisions about formsof evidence that are necessary and feasible regarding potential usesof test 1955 Cronbach and Meehl amplified the concept of constructvalidity by introducing the concept of a nomological net. This netincluded the interrelated laws that support a construct. In 1971,Cronbach said that Narrowly considered, validation is the processof examining the accuracy of a specific prediction or inference madefrom a test score (p.)

7 443). In 1989 Cronbach moderated this con-cept by acknowledging that it was impossible to attain the levelof proof demanded in the harder sciences with most social Continued6A concept is an abstraction formed by generalizing from particu-lars, while a construct is a concept deliberately invented for a spe-cific scientific purpose (Kerlinger, p. 28). The constructs on whicha test is based relate specifically to the domain of competenciesthat are tested by items included on the test. Construct validity isto the extent to which a test is based on relevant theory and re-search related to a defined domain of (1951 provided a definition similar to Cronbach when henoted that the essential question of test validity was how well atest did what it was employed to do. Validity, therefore, was thecorrelation between an actual test score and the true criterionscore.

8 By the early 1950 s, other types of validity had been identi-fied ( , factorial, intrinsic, empirical, logical) (Anastasi, 1954).Messick (1989) expanded the definition by stating Validity is anintegrated evaluative judgment of the degree to which empiricalevidence and theoretical rationales support the adequacy and ap-propriateness of inferences and actions based on test scores or othermodes of assessment (p. 18).Empirical validity emphasized factor Analysis based on correlationsbetween test scores and criterion measures (Anastasi, 1950). How-ever, test makers must interpret correlational studies cautiously be-cause spurious correlations may be misleading ( , high positivecorrelations between children s foot size and reading achievement).In 1957 Campbell introduced the notion of falsification in the vali-dation process due to spurious correlations , and he discussed theimportance of testing plausible, rival hypotheses.

9 Campbell and Fiske(1959) expanded this concept by introducing the multitrait-multimethod approach and convergent and divergent (or discrimi-nant) Continued7 Recently, Messick (1989) discussed the importance of consideringthe consequences of test use in drawing inferences about validityand added the term consequential validity to this list. He noted:Validity is an overall evaluative judgment,founded on empirical evidence and theoreticalrationales, of the adequacy and appropriatenessof inferences and actions based on test such validity is an inductive summary ofboth the adequacy of existing evidence for andthe appropriateness of potential consequencesof test interpretation and use (Messick, 1988,pp. 33-34).Improving Test ValidityAnastasi (1986) described validation as a process built into the testsduring planning and is thus built into the test from theoutset rather than being limited to the laststages of test development.

10 The validationprocess begins with the formulation of detailedtrait or construct definitions derived frompsychological theory, prior research, or system-atic observation and analyses of the relevantbehavior domain. Test items are then preparedto fit the construct definitions. Empirical itemanalyses follow with the selection of the mosteffective ( , valid) items from the initial itempools. Other appropriate internal analyses maythen be carried out, including factor analyses ofitem clusters of subtests. The final stageincludes validation and cross-validation ofvarious scores and interpretive combinations ofscores through statistical analyses againstexternal, real-life criteria. (p. 3)Validity Continued8 Many tests are flawed because they focus on insignificant or unre-lated information, disproportionately test one segment of curricu-lum and ignore other sections, or include poorly written, confus-ing test following test development procedures will increase testvalidity.


Related search queries