### Transcription of Time Series Analysis in Python with statsmodels - SciPy

1 **time** **Series** **Analysis** in **Python** with statsmodelsWes McKinney1 Josef Perktold2 Skipper Seabold31 Department of Statistical ScienceDuke University2 Department of EconomicsUniversity of North Carolina at Chapel Hill3 Department of EconomicsAmerican University10thPython in Science Conference, 13 July 2011 McKinney, Perktold, Seabold ( **statsmodels** ) **Python** **time** **Series** AnalysisSciPy Conference 20111 / 29 What is **statsmodels** ?A library for statistical modeling, implementing standard statisticalmodels in **Python** using NumPy and SciPyIncludes:Linear (regression) models of many formsDescriptive statisticsStatistical testsTime **Series** much moreMcKinney, Perktold, Seabold ( **statsmodels** ) **Python** **time** **Series** AnalysisSciPy Conference 20112 / 29 What is **time** **Series** **Analysis** ?

2 Statistical modeling of **time** -ordered data observationsInferring structure, forecasting and simulation, and testingdistributional assumptions about the dataModeling dynamic relationships among multiple **time** seriesBroad applications in economics, finance, neuroscience, , Perktold, Seabold ( **statsmodels** ) **Python** **time** **Series** AnalysisSciPy Conference 20113 / 29 Talk OverviewBrief update onstatsmodelsdevelopmentAside: user interface and data structuresDescriptive statistics and testsAuto-regressive moving average models (ARMA) **vector** autoregression (VAR) modelsFiltering tools (Hodrick-Prescott and others)Near future: Bayesian dynamic linear models (DLMs), ARCH /GARCH volatility models and beyondMcKinney, Perktold, Seabold ( **statsmodels** ) **Python** **time** **Series** AnalysisSciPy Conference 20114 / 29 **statsmodels** development updateWe re now on GitHub!

3 Join us: out the slick Sphinx docs: focus has been largelycomputational, writingcorrect, tested implementations of all the common classes ofstatistical modelsMcKinney, Perktold, Seabold ( **statsmodels** ) **Python** **time** **Series** AnalysisSciPy Conference 20115 / 29 **statsmodels** development updateMajor work to be done on providing a nice integrateduser interfaceWemustwork together to close the gap between R and **Python** !Some important areas:Formula framework, for specifying **model** design matricesNeed integrated rich statistical data structures (pandas)Data visualization of results should always be a few keystrokes awayWrite a **statsmodels** for R users guideMcKinney, Perktold, Seabold ( **statsmodels** ) **Python** **time** **Series** AnalysisSciPy Conference 20116 / 29 Aside: statistical data structures and user interfaceWhile I have a captive fact.

4 Pandasis the only **Python** librarycurrentlyproviding data structures matching (and in many places exceeding)the richness of R s data structures (for statistics)Let s have a BoF session so I can justify this statementFeedback I hear is that end users find the fragmented, incohesive setof **Python** tools for data **Analysis** and statistics to be confusing,frustrating, and certainly not compelling them to use (Not to mention the packaging headaches)McKinney, Perktold, Seabold ( **statsmodels** ) **Python** **time** **Series** AnalysisSciPy Conference 20117 / 29 Aside: statistical data structures and user interfaceWe need to commit ASAP(not 12 months from now) to a highlevel data structure(s) as the primary data structure(s) for statisticaldata **Analysis** and communicate that clearly to end usersOr we might as well all start programming in , Perktold, Seabold ( **statsmodels** ) **Python** **time** **Series** AnalysisSciPy Conference 20118 / 29 Example data: EEG trace data050010001500200025003000350040006005 004003002001000100200300 McKinney, Perktold, Seabold ( **statsmodels** ) **Python** **time** **Series** AnalysisSciPy Conference 20119 / 29 Example data.

5 Macroeconomic , Perktold, Seabold ( **statsmodels** ) **Python** **time** **Series** AnalysisSciPy Conference 201110 / 29 Example data: Stock data200120022003200420052006200720082009 0100200300400500600700800 AAPLGOOGMSFTYHOOMcKinney, Perktold, Seabold ( **statsmodels** ) **Python** **time** **Series** AnalysisSciPy Conference 201111 / 29 Descriptive statisticsAutocorrelation, partial autocorrelation plotsCommonly used for identification in ARMA(p,q) and ARIMA(p,d,q)modelsacf = (eeg , 50)pacf = (eeg , 50) AutocorrelationMcKinney, Perktold, Seabold ( **statsmodels** ) **Python** **time** **Series** AnalysisSciPy Conference 201112 / 29 Statistical testsLjung-Box test for zero autocorrelationUnit root test for cointegration (Augmented Dickey-Fuller test)Granger-causalityWhiteness (iid-ness) and normalitySee our conference paper (when the proceedings get published!)

6 McKinney, Perktold, Seabold ( **statsmodels** ) **Python** **time** **Series** AnalysisSciPy Conference 201113 / 29 Autoregressive moving average (ARMA) modelsOne of most common univariate **time** **Series** models:yt= +a1yt 1+..+akyt p+ t+b1 t 1+..+bq t qwhereE( t, s) = 0,fort6=sand t N(0, 2)Exact log-likelihood can be evaluated via the Kalman filter, but the conditional likelihood is easier and commonly usedstatsmodelshas tools for simulating ARMA processes with knowncoefficientsai,biand also estimation given specified lag ordersimport as apar_coef = [1, .75, ]; ma_coef = [1, ]nobs = 100y = (ar_coef, ma_coef, nobs)y += 4 # add in constantMcKinney, Perktold, Seabold ( **statsmodels** ) **Python** **time** **Series** AnalysisSciPy Conference 201114 / 29 ARMA EstimationSeveral likelihood-based estimators implemented (see docs) **model** = (y)result = (order=(2, 1), trend= c ,method= css-mle , disp=-1) # array([ , , , ])Standard **model** diagnostics, standard errors, information criteria(AIC, BIC.)

7 , etc available in the returnedARMAR esultsobjectMcKinney, Perktold, Seabold ( **statsmodels** ) **Python** **time** **Series** AnalysisSciPy Conference 201115 / 29 **vector** Autoregression (VAR) modelsWidely used **model** for modeling multiple (K-variate) **time** **Series** ,especially in macroeconomics:Yt=A1Yt 1+..+ApYt p+ t, t N(0, )MatricesAiareK be astationaryprocess (sometimes achieved bydifferencing). Related class of models (VECM) for modelingnonstationary (including cointegrated) processesMcKinney, Perktold, Seabold ( **statsmodels** ) **Python** **time** **Series** AnalysisSciPy Conference 201116 / 29 **vector** Autoregression (VAR) models>>> **model** = VAR(data).

8 (8)VAR Order Selection=============================== ======================aic bic fpe hqic------------------------------------ -----------------0 * *4 * * * MinimumMcKinney, Perktold, Seabold ( **statsmodels** ) **Python** **time** **Series** AnalysisSciPy Conference 201117 / 29 **vector** Autoregression (VAR) models>>> result = (2)>>> () # print summary for each variable<snip>Results for equation m1====================================== ==============coefficient std.

9 Error t-stat prob------------------------------------ ----------------const <snip>McKinney, Perktold, Seabold ( **statsmodels** ) **Python** **time** **Series** AnalysisSciPy Conference 201118 / 29 **vector** Autoregression (VAR) models>>> result = (2)>>> () # print summary for each variable<snip>Correlation matrix of residualsm1 realgdp cpim1 , Perktold, Seabold ( **statsmodels** ) **Python** **time** **Series** AnalysisSciPy Conference 201119 / 29 VAR: Impulse Response analysisAnalyze systematic impact of unit shock to a single variableirf = (10) () cpiImpulse responsesMcKinney, Perktold, Seabold ( **statsmodels** ) **Python** **time** **Series** AnalysisSciPy Conference 201120 / 29 VAR.

10 Forecast Error Variance DecompositionAnalyze contribution of each variable to forecasting errorfevd = (20) () error variance decomposition (FEVD)m1realgdpcpiMcKinney, Perktold, Seabold ( **statsmodels** ) **Python** **time** **Series** AnalysisSciPy Conference 201121 / 29 VAR: Statistical testsIn [137]: ( m1 , [ cpi , realgdp ])Granger causality f-test================================== =======================Test statistic Critical Value p-value (4, 579)==================================== =====================H_0: [ cpi , realgdp ] do not Granger-cause m1 Conclusion: fail to reject H_0 at significance levelMcKinney, Perktold, Seabold ( **statsmodels** ) **Python** **time** **Series** AnalysisSciPy Conference 201122 / 29 FilteringHodrick-Prescott (HP) filter separates a **time** seriesytinto a trend tand a cyclical component t, so thatyt= t+ componentTrend componentMcKinney, Perktold, Seabold ( **statsmodels** ) **Python** **time** **Series** AnalysisSciPy Conference 201123 / 29 FilteringIn addition to t