Goodness-of-Fit Testing - UB

Goodness-of-Fit TestingA Maydeu-Olivares and C Garc a-Forero,University of Barcelona, Barcelona, Spain 2010 Elsevier Ltd. All rights goodness of fit The discrepancybetween a statistical model and the data at index A numerical summary ofthe discrepancy between the observed values andthe values expected under a statistical statistic A Goodness-of-Fit indexwith known sampling distribution that may be used instatistical-hypothesis goodness of fit The discrepancybetween two statistical goodness of Fit (GOF) of a statistical model describeshow well it fits into a set of observations. GOF indices sum-marize the discrepancy between the observed values andthe values expected under a statistical model. GOF statis-tics are GOF indices with known sampling distributions,usually obtained using asymptotic methods, that are usedin statistical hypothesis Testing .

As large sample approx-imations may behave poorly in small samples, a great dealof research using simulation studies has been devoted toinvestigate under which conditions the asymptoticp-valuesof GOF statistics are accurate ( , how large the samplesize must be for models of different sizes).Assessing absolute model fit ( , the discrepancybetween a model and the data) is critical in applications,as inferences drawn on poorly fitting models may be badlymisleading. Applied researchers must examine not onlythe overall fit of their models, but they should also per-form a piecewise assessment. It may well be that a modelfits well overall but that it fits poorly some parts of thedata, suggesting the use of an alternative model. Thepiecewise GOF assessment may also reveal the source ofmisfit in poorly fitting more than one substantive model is under con-sideration, researchers are also interested in a relativemodel fit ( , the discrepancy between two models; seeYuan and Bentler, 2004; Maydeu-Olivares and Cai, 2006).

Thus, we can classify GOF assessment using two usefuldichotomies: GOF indices versus GOF statistics, andabsolute fit versus relative fit. In turn, GOF indices andstatistics can be classified as overall or piecewise. A thirduseful dichotomy to classify GOF assessment is based onthe nature of the observed data, discrete versus continu-ous. Historically, GOF assessment for multivariate discretedata and that for multivariate continuous data have beenpresented as being completely different. However, newdevelopments in limited information GOF assessment fordiscrete data reveal that there are strong similaritiesbetween the two, and here, we shall highlight the simila-rities in GOF assessment for discrete and continuous Testing with Discrete Observed DataConsider modelingNobservations onndiscrete randomvariables, each withKcategories, such as the responses tontest items.

The observed responses can then be gatheredin ann-dimensional contingency table withC this setting, assessing the GOF of a model involvesassessing the discrepancy between the observed propor-tions and the probabilities expected under the modelacross all cellsc 1,..,Cof the contingency formally, letpcbe the probability of one such celland letpcbe the observed proportion. Letpu be theC-dimensional vector of model probabilities expressed asa function of, say,qmodel parameters to be estimatedfrom the data. Then, the null hypothesis to be tested isH0:p pu , that is, the model holds, againstH1:p6 pu .GOF Statistics for Assessing Overall FitThe two standard GOF statistics for discrete data arePearson s statisticX2 NXCc 1pc ^ c 2=^ c; 1 and the likelihood ratio statisticG2 2 NXCc 1pclnpc=^ c : 2 where^ c pc ^u denotes the probability of cellcunderthe for both statistics can be obtainedusing a chi-square distribution withC q 1 degrees offreedom when maximum likelihood estimation is , these asymptoticp-values are only correct when190 Author's personal copy International Encyclopedia of Education (2010), vol.

7, pp. 190-196 all expected frequenciesN^ care large (>5 is the usualrule of thumb). A practical way to evaluate whether theasymptoticp-values forX2andG2are valid is to comparethem. If thep-values are similar, then both are likely to becorrect. If they are very different, it is most likely that bothp-values are , as the number of cells in the tableincreases, the expected frequencies become small (as thesum of allCprobabilities must be equal to 1). As a result,in multivariate discrete data analysis, most often, thep-values for these statistics cannot be trusted. In fact,when the number of categories is large (sayk>4), theasymptoticp-values almost invariably become inaccurateas soon asn>5. To overcome the problem of the inac-curacy of the asymptoticp-values for these statistics,two general methods have been proposed: resamplingmethods ( , bootstrap), and pooling cells.

Unfortunately,existing evidence suggest that resampling methods do notyield accuratep-values for theX2andG2statistics (Tollenaarand Mooijart, 2003). Pooling cells may be a viable alter-native to obtain accuratep-values in some instances. Forinstance, rating items with five categories can be pooledinto three categories to reduce sparseness. However, ifthe number of variables is large, the resulting table maystill yield some small expected frequencies. Moreover,pooling may distort the purpose of the analysis. Finally,pooling must be performed before the analysis is made toobtain a statistic with the appropriate asymptotic to the difficulties posed by small expected prob-abilities on obtaining accuratep-values for GOF statisticsassessing absolute models, some researchers have resortedto examining only the relative fit of the models underconsideration, without assessing the absolute model researchers simply use GOF IndicesWithLdenoting the loglikelihood, two popular GOF indicesare Akaike s information criterion (AIC), AIC 2L 2qand Schwarz Bayesian information criterion (BIC),BIC 2L qln(N),AIC 2L 2q;BIC 2L qlnN 3 The AIC and BIC are not used to test the model in thesense of hypothesis Testing , but for model selection.

Givena data set, a researcher chooses either the AIC or BIC, andcomputes it for all models under consideration. Then, themodel with the lowest index is selected. Notice that boththe AIC and BIC combine absolute fit with model parsi-mony. That is, they penalize by adding parameters to themodel, but they do so differently. Of the two, the BICpenalizes by adding parameters to the model more stronglythan the Statistics for Piecewise Assessment of FitIn closing this section, the standard method for assessingthe source of misfit is the use ofz-scores for cell residualspc ^ cSEpc ^ c ; 4 where SE denotes standard error. In large samples, theirdistribution can be approximated using a standard normaldistribution. Unfortunately, the use of these residuals evenin moderately large contingency tables, is challenging. It isdifficult to find trends in inspecting these residuals, and thenumber of residuals to be inspected is easily too large.

Mostimportantly, for largeC, because the cell frequencies areintegers and the expected frequencies must be very small,the resulting residuals will be either very small or very Developments in GOF with DiscreteObserved Data: Limited InformationMethodsIn standard GOF methods for discrete data, contingencytables are characterized using cell probabilities. However,they can be equivalently characterized using marginalprobabilities. To see this, consider the following 2 3contingency table:This table can be characterized using the cell prob-abilitiesp 00; ; 12 0. Alternatively, it can becharacterized using the univariate_p1 p 1 1;p 1 2;p 2 2 and bivariate_p2 1 1 12; 1 2 12 probabilities, where k i Pr Xi k ; k l ij Pr Xi k;Xj l , andBoth characterizations are equivalent, and the equiva-lence extends to contingency tables of any GOF methods disregard infor-mation contained in the higher-order marginals of thetable.

Thus, quadratic forms in, say, univariate and bivari-ate residuals are used instead of using all marginal resi-duals up to 0X2 1X2 2X1 0X1 1p1 1 12p1 2 12p1 1p1 2 p2 2 X2 0X2 1X2 2X1 0p00p01p02X1 1p11p11p12 Goodness-of-Fit Testing191 Author's personal copy International Encyclopedia of Education (2010), vol. 7, pp. 190-196 GOF Statistics for Assessing Overall FitMaydeu-Olivares and Joe (2005, 2006) proposed a familyof GOF statistics,Mr, that provides a unified frameworkfor limited information and full information GOF statis-tics. This family can be written asMr Nê0r^Cêr; 5 whereêrare the residual proportions up to orderrandC G 1r G 1rDr D0rG 1rDr 1D0rG 1r: 6 Here,Grdenotes the asymptotic covariance matrixof the residual proportions up to orderrandDris a matrixof derivatives of the marginal probabilities up to orderrwith respect to the model parameters.

Two membersof this family are, for instance, and bivariate residuals are used. InMn,allresiduals up to ordern, the number of variables, areused. When ML estimation is used,Mnis algebraicallyequal to Pearson asymptotic distribution of any statistic of theMrfamily is chi-square with degrees of freedom (df) number of residuals used q. For the chi-square approxi-mation toMrbe accurate, the expected frequencies of min(2r,n) marginals need to be large. Thus, forMn, expectedcell frequencies need to be large, but forM2, wherer 2,only expected frequencies for sets of min(2r,n) 4 vari-ables need to be large (providedn>4). As a result, whenonly low-order margins are used, the asymptoticp-valuesare accurate even in gigantic models and small , often more power is obtained than when allthe information available in the data is used. Conse-quently, Maydeu-Olivares and Joe suggest Testing at thehighest level of margins for which a model is identified,discarding higher-order margins.

Goodness-of-Fit Testing - UB

Tags:

Information

Transcription of Goodness-of-Fit Testing - UB

Related search queries

Goodness-of-Fit Testing - UB

Tags:

Information

Documents from same domain

Related documents

Related search queries