Example: tourism industry

To Explain or to Predict?

Statistical Science2010, Vol. 25, No. 3, 289 310 Institute of Mathematical Statistics, 2010To Explain or to Predict? Galit modeling is a powerful tool for developing and testingtheories by way of causal explanation, prediction, and description. In manydisciplines there is near-exclusive use of statistical modeling for causal ex-planation and the assumption that models with high explanatory power areinherently of high predictive power. Conflation between explanation and pre-diction is common, yet the distinction must be understood for progressingscientific knowledge.

To Explain or to Predict? Galit Shmueli Abstract. Statistical modeling is a powerful tool for developing and testing ... that predicting and explaining are different. This article aims to fill a critical void: to tackle the distinction be-tween explanatory modeling and predictive modeling.

Tags:

  Predict, Explain, Predicting, To explain or to predict

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of To Explain or to Predict?

1 Statistical Science2010, Vol. 25, No. 3, 289 310 Institute of Mathematical Statistics, 2010To Explain or to Predict? Galit modeling is a powerful tool for developing and testingtheories by way of causal explanation, prediction, and description. In manydisciplines there is near-exclusive use of statistical modeling for causal ex-planation and the assumption that models with high explanatory power areinherently of high predictive power. Conflation between explanation and pre-diction is common, yet the distinction must be understood for progressingscientific knowledge.

2 While this distinction has been recognized in the phi-losophy of science, the statistical literature lacks a thorough discussion of themany differences that arise in the process of modeling for an explanatory ver-sus a predictive goal. The purpose of this article is to clarify the distinctionbetween explanatory and predictive modeling, to discuss its sources, and toreveal the practical implications of the distinction to each step in the model-ing words and phrases:Explanatory modeling, causality, predictive mod-eling, predictive power, statistical strategy, data mining, scientific INTRODUCTIONL ooking at how statistical models are used in dif-ferent scientific disciplines for the purpose of theorybuilding and testing, one finds a range of perceptionsregarding the relationship between causal explanationand empirical prediction.

3 In many scientific fields suchas economics, psychology, education, and environmen-tal science, statistical models are used almost exclu-sively for causal explanation, and models that possesshigh explanatory power are often assumed to inher-ently possess predictive power. In fields such as naturallanguage processing and bioinformatics, the focus is onempirical prediction with only a slight and indirect re-lation to causal explanation. And yet in other researchfields, such as epidemiology, the emphasis on causalexplanation versus empirical prediction is more modeling for description, where the purposeis to capture the data structure parsimoniously, andwhich is the most commonly developed within the fieldof statistics, is not commonly used for theory buildingand testing in other disciplines.

4 Hence, in this article IGalit Shmueli is Associate Professor of Statistics,Department of Decision, Operations and InformationTechnologies, Robert H. Smith School of Business,University of Maryland, College Park, Maryland 20742,USA on the use of statistical modeling for causal ex-planation and for prediction. My main premise is thatthe two are often conflated, yet the causal versus pre-dictive distinction has a large impact on each step of thestatistical modeling process and on its not explicitly stated in the statistics method-ology literature, applied statisticians instinctively sensethat predicting and explaining are different.

5 This articleaims to fill a critical void: to tackle the distinction be-tween explanatory modeling and predictive the current ambiguity between the two iscritical not only for proper statistical modeling, butmore importantly, for proper scientific usage. Both ex-planation and prediction are necessary for generatingand testing theories, yet each plays a different role indoing so. The lack of a clear distinction within statisticshas created a lack of understanding in many disciplinesof the difference between building sound explanatorymodels versus creating powerful predictive models, aswell as confusing explanatory power with predictivepower.

6 The implications of this omission and the lackof clear guidelines on how to model for explanatoryversus predictive goals are considerable for both scien-tific research and practice and have also contributed tothe gap between academia and start by defining what I termexplainingandpre-dicting. These definitions are chosen to reflect the dis-289290G. SHMUELI tinct scientific goals that they are aimed at: causal ex-planation and empirical prediction, modelingandpredictive modelingreflect theprocess of using data and statistical (or data mining)methods for explaining or predicting , respectively.

7 Thetermmodelingis intentionally chosen overmodelstohighlight the entire process involved, from goal defini-tion, study design, and data collection to scientific Explanatory ModelingIn many scientific fields, and especially the socialsciences, statistical methods are used nearly exclu-sively for testing causal theory. Given a causal theo-retical model, statistical models are applied to data inorder to test causal hypotheses. In such models, a setof underlying factors that are measured by variablesXare assumed to cause an underlying effect, measuredby variableY.

8 Based on collaborative work with socialscientists and economists, on an examination of someof their literature, and on conversations with a diversegroup of researchers, I conjecture that, whether statis-ticians like it or not, the type of statistical models usedfor testing causal hypotheses in the social sciences arealmost always association-based models applied to ob-servational data. Regression models are the most com-mon example. The justification for this practice is thatthe theory itself provides the causality.

9 In other words,the role of the theory is very strong and the relianceon data and statistical modeling are strictly through thelens of the theoretical model. The theory data relation-ship varies in different fields. While the social sciencesare very theory-heavy, in areas such as bioinformat-ics and natural language processing the emphasis ona causal theory is much weaker. Hence, given this re-ality, I defineexplainingas causal explanation andex-planatory modelingas the use of statistical models fortesting causal illustrate how explanatory modeling is typicallydone, I describe the structure of a typical article in ahighly regarded journal in the field of Information Sys-tems (IS).

10 Researchers in the field of IS usually havetraining in economics and/or the behavioral structure of articles reflects the way empirical re-search is conducted in IS and related example used is an article by Gefen, Karahannaand Straub (2003), which studies technology accep-tance. The article starts with a presentation of the pre-vailing relevant theory(ies):Online purchase intensions should be ex-plained in part by the technology accep-tance model (TAM). This theoretical modelis at present a preeminent theory of technol-ogy acceptance in authors then proceed to state multiple causal hy-potheses (denotedH1,H2.)


Related search queries