Transcription of MICE: Multivariate Imputation by Chained …
1 JSSJ ournal of Statistical SoftwareMMMMMM YYYY, Volume VV, Issue : Multivariate Imputation by ChainedEquations inRStef van Buuren1 TNO Quality of Life, Leiden2 University of UtrechtKarin Groothuis-Oudshoorn1 Roessingh RD, Enschede2 University TwenteAbstractMultivariate Imputation by Chained equations (MICE) is the name of software forimputing incomplete Multivariate data by Fully Conditional Specification (FCS). appeared in the year 2000 as anS-PLUS library, and in 2001 as anRpackage. introduced predictor selection, passive Imputation and automatic pooling.
2 Thisarticle presents MICE , which extends the functionality of MICE in several MICE , the analysis of imputed data is made completely general, whereas the rangeof models under which pooling works is substantially extended. MICE adds newfunctionality for imputing multilevel data, automatic predictor selection, data handling,post-processing imputed values, specialized pooling and model selection. Imputation ofcategorical data is improved in order to bypass problems caused by perfect attention to transformations, sum scores, indices and interactions using passiveimputation, and to the proper setup of the predictor matrix.
3 MICE is freely availablefrom CRAN as anRpackagemice. This article provides a hands-on, stepwise approachto usingmicefor solving incomplete data problems in real : multiple Imputation , Chained equations , fully conditional specification, gibbs sam-pler, predictor selection, passive Imputation , IntroductionMultiple Imputation (Rubin 1987, 1996) is the method of choice for complex incomplete dataproblems. Missing data that occur in more than one variable presents a special general approaches for imputing Multivariate data have emerged: joint modeling (JM)and fully conditional specification (FCS) (van Buuren 2007).
4 Schafer (1997) developed var-ious JM techniques for Imputation under the Multivariate normal, the log-linear, and the2 MICE: Multivariate Imputation by Chained Equationsgeneral location model. JM involves specifying a Multivariate distribution for the missingdata, and drawing Imputation from their conditional distributions by Markov Chain MonteCarlo (MCMC) techniques. This methodology is attractive if the Multivariate distribution isa reasonable description of the data. FCS specifies the Multivariate Imputation model on avariable-by-variable basis by a set of conditional densities, one for each incomplete from an initial Imputation , FCS draws imputations by iterating over the conditionaldensities.
5 A low number of iterations (say 10-20) is often sufficient. FCS is attractive as analternative to JM in cases where no suitable Multivariate distribution can be found. The basicidea of FCS is already quite old, and has been proposed using a variety of names: stochas-tic relaxation (Kennickell 1991), variable-by-variable Imputation (Brand 1999), regressionswitching (van Buurenet ), sequential regressions (Raghunathanet ), orderedpseudo-Gibbs sampler (Heckermanet ), partially incompatible MCMC (Rubin 2003),iterated univariate Imputation (Gelman 2004), Chained equations van Buuren and Oudshoorn(2000) and fully conditional specification (van Buuren 2007).
6 Software implementationsSeveral authors have implemented fully conditionally specified models for Imputation . MICE(van Buuren and Oudshoorn 2000) was released as anS-PLUS library in 2000, and wasconverted by several users (Raghunathanet ) is aSAS-basedprocedure that was independently developed by Raghunathan and colleagues. The functionaRegImputeinRandS Plusis part of theHmiscpackage Harrell (2001). Theicesoftware(Royston 2004, 2005) is a widely used implementation (Solutions 2001)is also based on conditional specification, but does not (Jacobusse 2005)is a Windows stand-alone program for generating imputations under the hierarchical linearmodel.
7 A recent addition is theRpackagemi(Suet ). Furthermore, FCS is nowwidely available through themultiple imputationprocedure part of theSPSS V17moduleMVA. an of Chained EquationsApplications of Imputation by Chained equations have now appeared in quite diverse fields:addiction (Schnollet ; Macleodet ; Adamczyk and Palmer 2008; Cariaet ; Morgensternet ), arthritis and rheumatology (Wolfeet ; Rahmanet ; Van Den Houtet ), atherosclerosis (Tiemeieret ; Van Oijenet ;McClellandet ), cardiovascular system (Ambleret ; van Buurenet ;Chaseet ; Byrneet ; Kleinet ), cancer (Clarket , 2003; Clarkand Altman 2003; Roystonet ; Barosiet.)
8 Fernandeset ; Sharmaet ; McCaulet ; Huoet ; Geresteinet ), epidemiology (Cummingset ; Hindorffet ; Muelleret ; Tonet ), endocrinology (Rouxelet ; Promperset ), infectious diseases (Cottrellet ; Walkeret ; Cottrellet ; Kekitiinwaet ; Nashet ; Sabinet ; Theinet ; Garabedet ; Michelet ), genetics (Souvereinet ), healtheconomics (Briggset ; Burtonet ; Kleinet ; Marshallet ),obesity and physical activity (Orsiniet ; Wileset ; Orsiniet ;Van Vlierbergheet ), pediatrics and child development (Hillet ; Mumtazet ; Deaveet ; Samant IVet ; Butler and Heron 2008; Ramchandaniet ; Van Wouweet ), rehabilitation (van der Hulstet ), behaviorJournal of Statistical Software3(Veenstraet ; Melhemet ; Horwoodet ; Rubinet ), qualityof care (Sisket ; Roudsariet ; Ward and Franks 2007; Groteet ;Roudsariet ; Groteet ; Sommeret ), human reproduction (Smithet ,b; Hilleet ; Alatiet.
9 O Callaghanet ; Hilleet ;Den Hartoget ), management sciences (Jensen and Roy 2008), occupational health(Heymanset ; Brunneret ; Chamberlainet ), politics (Tanasoiu andColonescu 2008), psychology (Sundellet ) and sociology (Finke and Adamczyk 2008).All authors use some form of Chained equations to handle the missing data, but the detailsvary considerably. The interested reader could check out articles from a familiar applicationarea to see how multiple Imputation is done and paper describes theRpackagemice.
10 The package contains functions for three phasesof multiple Imputation : generating multiple Imputation , analyzing imputed data, and forpooling analysis results. Specific features of the software are: columnwise specification of the Imputation model arbitrary patterns of missing data passive Imputation subset selection of predictors support of arbitrary complete-data methods support pooling various types of statistics diagnostics of imputations callable user-written Imputation replaces version , but is fully compatible with previous document replaces the original manual (van Buuren and Oudshoorn 2000).