Example: bankruptcy

# Time Series Analysis in Python with statsmodels - SciPy

time Series Analysis in Python with statsmodelsWes McKinney1 Josef Perktold2 Skipper Seabold31 Department of Statistical ScienceDuke University2 Department of EconomicsUniversity of North Carolina at Chapel Hill3 Department of EconomicsAmerican University10thPython in Science Conference, 13 July 2011 McKinney, Perktold, Seabold ( statsmodels ) Python time Series AnalysisSciPy Conference 20111 / 29 What is statsmodels ?A library for statistical modeling, implementing standard statisticalmodels in Python using NumPy and SciPyIncludes:Linear (regression) models of many formsDescriptive statisticsStatistical testsTime Series much moreMcKinney, Perktold, Seabold ( statsmodels ) Python time Series AnalysisSciPy Conference 20112 / 29 What is time Series Analysis ?

Vector Autoregression (VAR) models Widely used model for modeling multiple (K-variate) time series, especially in macroeconomics: Y t = A 1Y t 1 + :::+ A pY t p + t; t ˘N(0;) Matrices A i are K K. Y t must be a stationary process (sometimes achieved by di erencing). Related class of models (VECM) for modeling nonstationary (including ...

### Information

Domain:

Source:

Please notify us if you found a problem with this document:

### Transcription of Time Series Analysis in Python with statsmodels - SciPy

1 time Series Analysis in Python with statsmodelsWes McKinney1 Josef Perktold2 Skipper Seabold31 Department of Statistical ScienceDuke University2 Department of EconomicsUniversity of North Carolina at Chapel Hill3 Department of EconomicsAmerican University10thPython in Science Conference, 13 July 2011 McKinney, Perktold, Seabold ( statsmodels ) Python time Series AnalysisSciPy Conference 20111 / 29 What is statsmodels ?A library for statistical modeling, implementing standard statisticalmodels in Python using NumPy and SciPyIncludes:Linear (regression) models of many formsDescriptive statisticsStatistical testsTime Series much moreMcKinney, Perktold, Seabold ( statsmodels ) Python time Series AnalysisSciPy Conference 20112 / 29 What is time Series Analysis ?

2 Statistical modeling of time -ordered data observationsInferring structure, forecasting and simulation, and testingdistributional assumptions about the dataModeling dynamic relationships among multiple time seriesBroad applications in economics, finance, neuroscience, , Perktold, Seabold ( statsmodels ) Python time Series AnalysisSciPy Conference 20113 / 29 Talk OverviewBrief update onstatsmodelsdevelopmentAside: user interface and data structuresDescriptive statistics and testsAuto-regressive moving average models (ARMA) vector autoregression (VAR) modelsFiltering tools (Hodrick-Prescott and others)Near future: Bayesian dynamic linear models (DLMs), ARCH /GARCH volatility models and beyondMcKinney, Perktold, Seabold ( statsmodels ) Python time Series AnalysisSciPy Conference 20114 / 29 statsmodels development updateWe re now on GitHub!

3 Join us: out the slick Sphinx docs: focus has been largelycomputational, writingcorrect, tested implementations of all the common classes ofstatistical modelsMcKinney, Perktold, Seabold ( statsmodels ) Python time Series AnalysisSciPy Conference 20115 / 29 statsmodels development updateMajor work to be done on providing a nice integrateduser interfaceWemustwork together to close the gap between R and Python !Some important areas:Formula framework, for specifying model design matricesNeed integrated rich statistical data structures (pandas)Data visualization of results should always be a few keystrokes awayWrite a statsmodels for R users guideMcKinney, Perktold, Seabold ( statsmodels ) Python time Series AnalysisSciPy Conference 20116 / 29 Aside: statistical data structures and user interfaceWhile I have a captive fact.

4 Pandasis the only Python librarycurrentlyproviding data structures matching (and in many places exceeding)the richness of R s data structures (for statistics)Let s have a BoF session so I can justify this statementFeedback I hear is that end users find the fragmented, incohesive setof Python tools for data Analysis and statistics to be confusing,frustrating, and certainly not compelling them to use (Not to mention the packaging headaches)McKinney, Perktold, Seabold ( statsmodels ) Python time Series AnalysisSciPy Conference 20117 / 29 Aside: statistical data structures and user interfaceWe need to commit ASAP(not 12 months from now) to a highlevel data structure(s) as the primary data structure(s) for statisticaldata Analysis and communicate that clearly to end usersOr we might as well all start programming in , Perktold, Seabold ( statsmodels ) Python time Series AnalysisSciPy Conference 20118 / 29 Example data: EEG trace data050010001500200025003000350040006005 004003002001000100200300 McKinney, Perktold, Seabold ( statsmodels ) Python time Series AnalysisSciPy Conference 20119 / 29 Example data.

5 Macroeconomic , Perktold, Seabold ( statsmodels ) Python time Series AnalysisSciPy Conference 201110 / 29 Example data: Stock data200120022003200420052006200720082009 0100200300400500600700800 AAPLGOOGMSFTYHOOMcKinney, Perktold, Seabold ( statsmodels ) Python time Series AnalysisSciPy Conference 201111 / 29 Descriptive statisticsAutocorrelation, partial autocorrelation plotsCommonly used for identification in ARMA(p,q) and ARIMA(p,d,q)modelsacf = (eeg , 50)pacf = (eeg , 50) AutocorrelationMcKinney, Perktold, Seabold ( statsmodels ) Python time Series AnalysisSciPy Conference 201112 / 29 Statistical testsLjung-Box test for zero autocorrelationUnit root test for cointegration (Augmented Dickey-Fuller test)Granger-causalityWhiteness (iid-ness) and normalitySee our conference paper (when the proceedings get published!)

6 McKinney, Perktold, Seabold ( statsmodels ) Python time Series AnalysisSciPy Conference 201113 / 29 Autoregressive moving average (ARMA) modelsOne of most common univariate time Series models:yt= +a1yt 1+..+akyt p+ t+b1 t 1+..+bq t qwhereE( t, s) = 0,fort6=sand t N(0, 2)Exact log-likelihood can be evaluated via the Kalman filter, but the conditional likelihood is easier and commonly usedstatsmodelshas tools for simulating ARMA processes with knowncoefficientsai,biand also estimation given specified lag ordersimport as apar_coef = [1, .75, ]; ma_coef = [1, ]nobs = 100y = (ar_coef, ma_coef, nobs)y += 4 # add in constantMcKinney, Perktold, Seabold ( statsmodels ) Python time Series AnalysisSciPy Conference 201114 / 29 ARMA EstimationSeveral likelihood-based estimators implemented (see docs) model = (y)result = (order=(2, 1), trend= c ,method= css-mle , disp=-1) # array([ , , , ])Standard model diagnostics, standard errors, information criteria(AIC, BIC.)

7 , etc available in the returnedARMAR esultsobjectMcKinney, Perktold, Seabold ( statsmodels ) Python time Series AnalysisSciPy Conference 201115 / 29 vector Autoregression (VAR) modelsWidely used model for modeling multiple (K-variate) time Series ,especially in macroeconomics:Yt=A1Yt 1+..+ApYt p+ t, t N(0, )MatricesAiareK be astationaryprocess (sometimes achieved bydifferencing). Related class of models (VECM) for modelingnonstationary (including cointegrated) processesMcKinney, Perktold, Seabold ( statsmodels ) Python time Series AnalysisSciPy Conference 201116 / 29 vector Autoregression (VAR) models>>> model = VAR(data).

8 (8)VAR Order Selection=============================== ======================aic bic fpe hqic------------------------------------ -----------------0 * *4 * * * MinimumMcKinney, Perktold, Seabold ( statsmodels ) Python time Series AnalysisSciPy Conference 201117 / 29 vector Autoregression (VAR) models>>> result = (2)>>> () # print summary for each variable<snip>Results for equation m1====================================== ==============coefficient std.

9 Error t-stat prob------------------------------------ ----------------const <snip>McKinney, Perktold, Seabold ( statsmodels ) Python time Series AnalysisSciPy Conference 201118 / 29 vector Autoregression (VAR) models>>> result = (2)>>> () # print summary for each variable<snip>Correlation matrix of residualsm1 realgdp cpim1 , Perktold, Seabold ( statsmodels ) Python time Series AnalysisSciPy Conference 201119 / 29 VAR: Impulse Response analysisAnalyze systematic impact of unit shock to a single variableirf = (10) () cpiImpulse responsesMcKinney, Perktold, Seabold ( statsmodels ) Python time Series AnalysisSciPy Conference 201120 / 29 VAR.

10 Forecast Error Variance DecompositionAnalyze contribution of each variable to forecasting errorfevd = (20) () error variance decomposition (FEVD)m1realgdpcpiMcKinney, Perktold, Seabold ( statsmodels ) Python time Series AnalysisSciPy Conference 201121 / 29 VAR: Statistical testsIn [137]: ( m1 , [ cpi , realgdp ])Granger causality f-test================================== =======================Test statistic Critical Value p-value (4, 579)==================================== =====================H_0: [ cpi , realgdp ] do not Granger-cause m1 Conclusion: fail to reject H_0 at significance levelMcKinney, Perktold, Seabold ( statsmodels ) Python time Series AnalysisSciPy Conference 201122 / 29 FilteringHodrick-Prescott (HP) filter separates a time seriesytinto a trend tand a cyclical component t, so thatyt= t+ componentTrend componentMcKinney, Perktold, Seabold ( statsmodels ) Python time Series AnalysisSciPy Conference 201123 / 29 FilteringIn addition to t