### Transcription of Maximum Likelihood Estimation of an ARMA(p,q) …

1 **Maximum** **Likelihood** **Estimation** of an ARMA(p,q) Model Constantino Hevia The World Bank. DECRG. October 2008. This note describes the Matlab function that computes the **Maximum** **Likelihood** estimates of a stationary ARMA(p,q) model. Problem: To t an ARMA(p,q) model to a vector of time series fy1 ; y2 ; :::; yT g with zero unconditional mean. An ARMA(p,q) process is given by yt = 1 yt 1 + ::: + p yt p + "t + 1 "t 1 + ::: + q "t q ;. where "t is an shock normally distributed with mean zero and variance 2 . If the original PT. series do not have zero mean, we rst construct y~t = yt s=1 ys =T and then t the ARMA. model to y~t . Usage: results = arma_mle(y,p,q,[info]). Arguments: y = vector of observed time series with mean zero. p = length of the autoregressive part (AR) of the ARMA model (integer). q = length of the moving average part (MA) of the ARMA model (integer). info = [optional] If info is not zero, the program prints information about the convergence of the optimization algorithm.

2 The default value is zero. Output: A structure with the following elements: h i = ^ 1 ; ^ 2 ; :::; ^ p : estimated coe cients of the AR part. h i ^ ^ ^. = 1 ; 2 ; :::; q : estimated coe cients of the MA part. =^ : estimated standard deviation of "t . 1. The le performs a Montecarlo experiment using the function The user inputs a theoretical ARMA model. The program runs a large number of simulations and then estimates the parameters for each simulation. Finally, the histograms of the estimates are shown. Algorithm In this section I describe the algorithm used to compute the **Maximum** **Likelihood** estimates of the ARMA(p,q) process. Suppose that we want to t the (mean zero) time series fyt gTt=0 the the following ARMA(p,q) model yt = 1 yt 1 + ::: + p yt p + "t + 1 "t 1 + ::: + q "t q ; (1). 2. where "t is an shock normally distributed with mean zero and variance . Let r =. max (p; q + 1), and rewrite the model as yt = 1 yt 1 + ::: + r yt r + "t + 1 "t 1 + ::: + r 1 "t r+1 : (2).

3 We interpret j = 0 for j > p and j = 0 for j > q. The **Estimation** procedure is based on the Kalman lter (see Hamilton (1994) for the deriva- tion of the lter). To use the Kalman lter we need to write the model in the following (state-space) form xt+1 = Axt + R"t+1 (3). yt = Z0 xt (4). where xt is an r 1 state vector, A is an r r matrix, and R and Z are r 1 vectors. These matrices and vectors are de ned as follows 2 3 2 3 2 3. 1 1 0 0 0 1 1. 6 7 6 7 6 7. 6 2 0 1 0 0 7 6 1 7 6 0 7. 6 7 6 7 6 7. 6 7 6 7 6 7. 6 3 0 0 1 0 7 6 2 7 6 0 7. A=6 .. 7; R =6 .. 7; Z =6 .. 7. 6 .. 7 6 . 7 6 . 7. 6 7 6 7 6 7. 6 7 6 7 6 7. 4 r 1 0 0 0 1 5 4 5 4 5. r 0 0 0 0 r 1 0. To see that the system (3) and (4) is equivalent to (2), write the last row of (3) as xr;t+1 = r x1;t + r 1 "t+1. 2. Lagging this equation r 1 periods we nd r 1 r 1. xr;t r+2 = rL x1;t + r 1L "t+1 (5). where we de ne Lr xt = xt r as the r lag operator for any integer r 0.

4 The second to last row implies xr 1;t+1 = r 1 x1;t + xr;t + r 2 "t+1. Lagging r 2 periods we obtain r 2 r 2. xr 1;t r+3 = r 1L x1;t + xr;t r+2 + r 2L "t+1. Introducing (5) into the previous equation we nd r 2 r 1 r 1 r 2. xr 1;t r+3 = r 1L x1;t + rL x1;t + r 1L "t+1 + r 2L "t+1. or r 2 r 1 r 1 r 2. xr 1;t r+3 = r 1L + rL x1;t + r 1L + r 2L "t+1 (6). Take now row r 2, xr 2;t+1 = r 2 x1;t + xr 1;t + r 3 "t+1. Lagging r 3 periods we nd r 3 r 3. xr 2;t r+4 = r 2L x1;t + xr 1;t r+3 + r 3L "t+1. Plugging (6) into the previous equation we obtain r 3 r 2 r 1 r 1 r 2 r 3. xr 2;t r+4 = r 2L + r 1L + rL x1;t + r 1L + r 2L + r 3L "t+1. Following this iterative procedure until row r 1 we nd r 3 r 2 r 1. x1;t+1 = 1 + ::: + r 2L + r 1L + rL x1;t r 1 r 2 r 3. + r 1L + r 2L + r 3L + ::: + 1 "t+1. or 2 r r 1 r 2 r 3. 1 1L 2L ::: rL x1;t+1 = r 1L + r 2L + r 3L + ::: + 1 "t+1 (7). 3. Now, the observation equation (4) and the de nition of Z imply yt = x1;t Using (7) evaluated at t we arrive at the ARMA representation (2), 2 r r 1 r 2 r 3.

5 1 1L 2L ::: rL yt = r 1L + r 2L + r 3L + ::: + 1 "t which proves that the system (3), (4) is equivalent to (2). Denote by x^t+1jt = Et [xt+1 jy0 ; :::; yt ; x0 ] the expected value of xt+1 conditional on the history of observations (y0 ; :::; yt ). The Kalman lter provides an algorithm for computing recursively x^t+1jt given an initial value x^1j0 = 0. (Note that 0 is the unconditional mean of xt ). Associated with each of these forecasts is a mean squared error matrix, de ned as h i 0. Pt+1jt = E xt+1 x^t+1jt xt+1 x^t+1jt : Given the estimate x^tjt 1 , we use (4) to compute the innovations at = y t E [yt jy0 ; :::; yt 1 ; x0 ]. = yt Z0 x^tjt 1. The innovation variance, denoted by ! t , satis es 0. !t = E yt Z0 x^tjt 1 yt Z0 x^tjt 1 (8). 0. = E Z0 xt Z0 x^tjt 1 Z0 xt Z0 x^tjt 1. = Z0 Ptjt 1 Z: In addition to the estimates x^t+1jt , the Kalman lter equations imply the following evolution of the matrices Pt+1jt Pt+1jt = A Ptjt 1 Ptjt 1 ZZ0 Ptjt 1 =!

6 T A0 + RR0 2. : (9). Given an initial matrix P1j0 = E (xt x0t ) and the initial value x^1j0 = 0, the **Likelihood** function of the observation vector fy0 ; y1 ; :::; yT g is given by Y. T. 1=2 a2t L= (2 ! t ) exp t=1. 2! t 4. Taking logarithms, dropping the constant 2 , and multiplying by 2 we obtain X. T. l= ln (! t ) + a2t =! t (10). t=1. In principle, to nd the MLE estimates we maximize (10) with respect to the parameters j , j , and 2 . However, the following trick allows us to concentrate-out'the term 2 , and maximize only with respect to the parameters j , j . Suppose we initialize the lter with the matrix ~ 1j0 = 2 P1j0 . Then, from (9) it follows that each Pt+1jt is proportional to 2 , and from (8). P. it follows that the innovation variance is also proportional to 2 . This implies that we can optimize rst with respect to 2 by hand', replace the result into the objective function, and then optimize the resulting expression (called the concentrated log- **Likelihood** ') with respect to the parameters j , j : To see this, note that (10) becomes X.

7 T. a2t 2. l= ln !t + (11). t=1. !t 2. and 2 is cancelled out in the evolution equations of Pt+1jt and in the projections x^t+1jt . So we can directly optimize (11) with respect to 2 to obtain 1X 2. T. 2. = a =! t : T t=1 t Replacing this result into (11) we obtain the concentrated log- **Likelihood** function X. T. a2t 2. l = ln + ln ! t +. t=1. !t 2. " ! PT 2 #. XT. 1 X a2t T XT. a =! t t = ln : + ln ! t + 1 Pt=1T. t=1. T t=1. ! t t=1 n 2. t=1 t =! t : a " #. XT X T. = T ln (1=T ) + T + T ln a2t =! t + ln ! t t=1 t=1. or, dropping irrelevant constants, " #. X. T X. T. l= T ln a2t =! t + ln ! t (12). t=1 t=1. Because the innovations at and the variances ! t are nonlinear functions of the parameters [ ; ], 5. we use numerical methods to maximize (12). The Matlab function performs this task using the optimization routine from the Matlab optimization package. The initial condition for the parameters are based on the two-step regression procedure described in Hannan and McDougall (1984).

8 The rst step consists in running a (relatively) long autore- gression and computing the tted residuals. The second steps computes an OLS regression of yt on its p lagged values, and on q lagged values of the tted residuals obtained in the rst step. REFERENCES. [1] James D. Hamilton Time Series Analysis', 1994. Princeton University Press. [2] Hannan and McDougall. Regression Procedures for ARMA **Estimation** ,'Journal of the American Statistical Association, Vol 83, No 409, June 1988. 6.