Example: biology

Demonstrating the Difference between Classical Test Theory ...

1 International Journal of Educational and Psychological Assessment 2009; Vol. 1(1) 2009 Time Taylor Academic JournalsThe International Journal of Educational and Psychological AssessmentApril 2009, Vol. 1, Issue 1, pp. 1-11 Demonstrating the Difference between Classical Test Theory and Item Response Theory Using Derived Test DataCarlo MagnoDe La Salle University, ManilaAbstractThe present report demonstrates the Difference between Classical test Theory (CTT) and item response Theory (IRT) approach using an actual test data for chemistry junior high school students. The CTT and IRT were compared across two samples and two forms of test on their item difficulty, internal consistency, and measurement errors.

The Rasch model was derived from the initial Poisson model illustrated in the formula: where is a function of parameters describing the ability of examinee and difficulty of the test, represents the ability of the examinee and represents the difficulty of the test which is

Tags:

  Crash

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of Demonstrating the Difference between Classical Test Theory ...

1 1 International Journal of Educational and Psychological Assessment 2009; Vol. 1(1) 2009 Time Taylor Academic JournalsThe International Journal of Educational and Psychological AssessmentApril 2009, Vol. 1, Issue 1, pp. 1-11 Demonstrating the Difference between Classical Test Theory and Item Response Theory Using Derived Test DataCarlo MagnoDe La Salle University, ManilaAbstractThe present report demonstrates the Difference between Classical test Theory (CTT) and item response Theory (IRT) approach using an actual test data for chemistry junior high school students. The CTT and IRT were compared across two samples and two forms of test on their item difficulty, internal consistency, and measurement errors.

2 The specific IRT approach used is the one-parameter Rasch model. Two equivalent samples were drawn in a private school in the Philippines and these two sets of data were compared on the tests item difficulty, split-half coefficient, Cronbach s alpha, item difficulty using the Rasch model, person and item reliability (using Rasch model), and measurement error estimates. The results demonstrate certain limitations of the Classical test Theory and advantages of using the IRT. It was found in the study that (1) IRT estimates of item difficulty do not change across samples as compared with CTT with inconsistencies; (2) difficulty indices were also more stable across forms of tests than the CTT approach; (3) IRT internal consistencies are very stable across samples while CTT internal consistencies failed to be stable across samples; (4) IRT had significantly less measurement errors than the CTT approach.

3 Perspectives for stakeholders in test and measurement are developers are basically concern about the quality of test items and how examinees respond to it when constructing tests. A psychometrician generally uses psychometric techniques to determine the validity and reliability. Psychometric Theory offers two approaches in analyzing test data: Classical test Theory (CTT) and item response Theory (IRT). Both theories enable to predict outcomes of psychological tests by identifying parameters of item difficulty and the ability of test takers. Both are concerned to improve the reliability and validity of psychological tests.

4 Both of these approaches provide measures of validity and reliability. There are some identified issues in the Classical test Theory that concerns with calibration of item difficulty, sample dependence of coefficient measures, and estimates of measurement error which in turn is addressed by the item response Theory . The purpose of this article to demonstrate the advantages and disadvantages of using both approaches in analyzing a given chemistry test Test TheoryClassical test Theory is regarded as the true score Theory . The Theory starts from the assumption that systematic effects between responses of examinees are due only to variation in ability of interest.

5 All other potential sources of variation existing in the testing materials such as external conditions or internal conditions of examinees are assumed either to be constant through rigorous standardization or to have an effect that is nonsystematic or random by nature (Van der Linden & Hambleton, 2004). The central model of the Classical test Theory is that observed test scores (TO) are composed of a true score (T) and an error score (E) where the true and the error scores are independent. The variables are established by Spearman (1904) and Novick (1966) and best illustrated in the formula: TO = T + Classical Theory assumes that each individual has a true score which would be obtained if there were no errors in measurement.

6 However, because measuring instruments are 2 International Journal of Educational and Psychological Assessment 2009; Vol. 1(1) 2009 Time Taylor Academic Journalsimperfect, the score observed for each person may differ from an individual s true ability. The Difference between the true score and the observed test score results from measurement error. Using a variety of justifications, error is often assumed to be a random variable having a normal distribution. The implication of the Classical test Theory for test takers is that tests are fallible imprecise tools. The score achieved by an individual is rarely the individual s true score.

7 This means that the true score for an individual will not change with repeated applications of the same test. This observed score is almost always the true score influenced by some degree of error. This error influences the observed to be higher or lower. Theoretically, the standard deviation of the distribution of random errors for each individual tells about the magnitude of measurement error. It is usually assumed that the distribution of random errors will be the same for all individuals. Classical test Theory uses the standard deviation of errors as the basic measure of error. Usually this is called the standard error of measurement.

8 In practice, the standard deviation of the observed score and the reliability of the test are used to estimate the standard error of measurement (Kaplan & Saccuzzo, 1997). The larger the standard error of measurement, the less certain is the accuracy with which an attribute is measured. Conversely, small standard error of measurement tells that an individual score is probably close to the true score. The standard error of measurement is calculated with the formula: rSSm 1. Standard errors of measurement are used to create confidence intervals around specific observed scores (Kaplan & Saccuzzo, 1997).

9 The lower and upper bound of the confidence interval approximate the value of the true score. Traditionally, methods of analysis based on Classical test Theory have been used to evaluate tests. The focus of the analysis is on the total test score; frequency of correct responses (to indicate question difficulty); frequency of responses (to examine distracters); reliability of the test and item-total correlation (to evaluate discrimination at the item level) (Impara & Plake, 1997). Although these statistics have been widely used, one limitation is that they relate to the sample under scrutiny and thus all the statistics that describe items and questions are sample dependent (Hambelton, 2000).

10 This critique may not be particularly relevant where successive samples are reasonably representative and do not vary across time, but this will need to be confirmed and complex strategies have been proposed to overcome this Response TheoryAnother branch of psychometric Theory is the item response Theory (IRT). IRT may be regarded as roughly synonymous with latent trait Theory . It is sometimes referred to as the strong true score Theory or modern mental test Theory because IRT is a more recent body of Theory and makes stronger assumptions as compared to Classical test Theory . This approach to testing based on item analysis considers the chance of getting particular items right or wrong.


Related search queries