Transcription of Maximum Likelihood Programming in R
1 Maximum Likelihood Programming in RMarco R. SteenbergenDepartment of Political ScienceUniversity of North Carolina, Chapel HillJanuary 2006 Contents1 Introduction22 Syntactic Declaring the Log- Likelihood Function .. Optimizing the Log- Likelihood .. 43 Output54 Obtaining Standard Errors55 Test Statistics and Output Control711 IntroductionThe Programming language R is rapidly gaining ground among political method-ologists. A major reason is that R is a flexible and versatile language, whichmakes it easy to program new routines. In addition, R algorithms are generallyvery is well-suited for Programming your own Maximum Likelihood , there are several procedures for optimizing Likelihood functions. Here Ishall focus on theoptimcommand, which implements the BFGS and L-BFGS-Balgorithms, among throughoptimis relatively straight-forward, since it is usually not necessary to provide analytic first and secondderivatives.
2 The command is also flexible, as Likelihood functions can be de-clared in general terms instead of being defined in terms of a specific data Syntactic StructureEstimating Likelihood functions entails a two-step process. First, one declaresthe log- Likelihood function, which is done in general terms. Then one optimizesthe log- Likelihood function, which is done in terms of a particular data set. Thelog- Likelihood function and optimization command may be typed interactivelyinto the R command window or they may be contained in a text file. I wouldrecommend saving log- Likelihood functions into a text file, especially if you planon using them Declaring the Log- Likelihood FunctionThe log- Likelihood function is declared as an Rfunction. In R, functions takeat least two arguments. First, they require a vector of parameters. Second, theyrequire at least one data object. Note that other arguments can be added tothis if they are necessary.
3 The data object is a generic placeholder for data. Intheoptimcommand, specific data are substituted for this the arguments are declared, the actual log- Likelihood is expressed anddemarcated by{}. Thus, we have the following syntax:name<-function(pars,object){decla rationslogl<-loglikelihood functionreturn(-logl)}Herenameis the name of the log- Likelihood function,parsis the name of theparameter vector, andobjectis the name of the generic data object. The in-structions placed between brackets define the log- Likelihood function. At a mini-mum, there should be two elements here: (1) the declaration of the log-likelihood1 Theoptimcommand also includes Nelder-Mead, conjugate gradients, and simulated an-nealing algorithms. Other optimization routines includeoptimize,nlm, procedures are not discussed , which is namedlogl, and (2) the return of negative one times the addition, it may be necessary to make may include partitioning a parameter vector or declaring temporary vari-ables that figure in the log- Likelihood function.
4 The application of this syntaxwill be clarified using several 1:Consider the Poisson log- Likelihood function, which is given byl= iyiln( ) n iln(yi!)Since the last term does not include the parameter, , it can be safely , the kernel of the log- Likelihood function isl= iyiln( ) n We can program this function using the following <-function(mu,y){n<-nrow(y)logl<-sum(y)* log(mu)-n*mureturn(-logl)} the name of the log- Likelihood function; this name will beused in theoptimcommand. The vector of parameters is calledmu; this isnot really a vector since there is only one parameter that needs to be ,yis the placeholder for the data. Since the log- Likelihood functionrequires knowledge of the sample size, we obtain this usingn<-nrow(y). Theexpression forloglcontains the kernel of the log- Likelihood function. Finally,we ask R to return -1 times the log- Likelihood 2:Imagine that we have a sample that was drawn from a normaldistribution with unknown mean, , and variance, 2.
5 The objective is toestimate these parameters. The normal log- Likelihood function is given byl= .5nln(2 ) .5nln( 2) 12 2 i(yi )2We can program this function in the following <-function(theta,y){mu<-theta[1]sigma2<- theta[2]n<-nrow(y)2We ask for 1 lbecause theoptimcommand minimizes a function by default. Mini-mization of lis the same as maximization ofl, which is what we <- *n*log(2*pi) *n*log(sigma2) -(1/(2*sigma2))*sum((y-mu)**2)return(-lo gl)}Herethetais a vector containing the two parameters of interest. We declare theelements of this vector in the first two lines of the bracketed part of the , the first element (theta[1]) is equal to , while the second element(theta[2]) is equal to 2. The remainder of the program sets the sample size,specifies the log- Likelihood function, and asks R to return the negative of that the normal log- Likelihood function may also be written asl= nln( ) + iln[ (zi)]wherezi= (yi )/ . This can be programmed <-function(theta,y){mu<-theta[1]sigma<-t heta[2]n<-nrow(y)z<-(y-mu)/sigmalogl<- -n*log(sigma) - sum(log(dnorm(z)))return(-logl)}wheredno rmis R s standard normal density function.
6 Here we estimate ratherthan 2, but it is easy to move back and forth between these Optimizing the Log-LikelihoodOnce the log- Likelihood function has been declared, then theoptimcommandcan be invoked. The minimal specification of this command isoptim(starting values, log- Likelihood , data)Herestarting valuesis a vector of starting values,log-likelihoodis thename of the log- Likelihood function that you seek to maximize, anddatade-clares the data for the estimation. This specification causes R to use the Nelder-Mead algorithm. If you want to use the BFGS algorithm you should includethemethod="BFGS"option. For the L-BFGS-B algorithm you should declaremethod="L-BFGS-B". The current specification does not produce standard er-rors. A procedure for obtaining standard errors will be discussed later in are many other options for theoptimcommand. For a detailed description see, forexample, 3:Imagine that we have a vectordatathat consists of draws froma Poisson distribution with unknown.
7 We seek to estimate this parameter andhave already declared the log- Likelihood function Estimationusing the BFGS algorithm now commences as followsoptim(1, ,y=data,method="BFGS")Here 1 is the starting value for the algorithm. Since the log- Likelihood functionrefers to generic data objects asy, it is important that the vectordatais 4:Given a vector of data,y, the parameters of the normal distrib-ution can be estimated usingoptim(c(0,1), ,y=y,method="BFGS")This is similar to Example 3 with the exception of the starting values. Sincethe normal distribution contains two parameters, two starting values need to bedeclared. Here we set the starting value for to 0 and the starting value for 2to 1. These two values are bundled using thecor concatenation OutputTheoptimspecifications discussed so far will produce several pieces of come under various headings:1.$par: This shows the MLEs of the $value: This shows the value of the log- Likelihood function at the you asked R to return -1 times the log- Likelihood function, then this isthe value reported $counts: A vector that reports the number of calls to the log-likelihoodfunction and the $convergence: A value of 0 indicates normal convergence.
8 If you see a 1reported, this means that the iteration limit was exceeded. This limit isset to 10000 by $message: This shows warnings of any problems that occurred duringoptimization. Ideally, one would like to seeNULL here, since this indicatesthat there are no Obtaining Standard ErrorsTheoptimcommand allows one to compute standard errors based on the ob-served Fisher information requires that we obtain the Hessian4 Unlike Stata, standard errors, test statistics, and confidence intervals are not computedby default in , which can be done by addinghessian=Torhessian=TRUEto the com-mand. Since we will have to perform operations on the Hessian, it is alsoimportant that we store the results from the estimation into an object. Thefollowing linear regression example illustrates how to do 5:Imagine that we are interested in estimating a simple linearregression for some simulated data. First, we create the data matrix for thepredictorsX<-cbin(1,runif(100))Here we draw 100 observations from a uniform distribution with limits 0 and data are bound together with the constant (1).
9 Next, we postulate a setof values for the true <-c(2,3,1)Here, the first element is 0, the second element is 1, and the last element is 2. We can now create the dependent variable:y<-X%*% [1:2] + rnorm(100)wherernorm(100)generates the disturbance by drawing 100 values from thestandard normal distribution. We now have the data on the dependent variableand next step is to declare the log- Likelihood function. The following syntaxshows one way to do <-function(theta,y,X){n<-nrow(X)k<-ncol( X)beta<-theta[1:k]sigma2<-theta[k+1]e<-y -X%*%betalogl<- *n*log(2*pi) *n*log(sigma2)-((t(e)%*%e)/(2*sigma2))re turn(-logl)}Herethetacontains both the elements of and 2. The program declares thefirstkelements ofthetato be and thek+ 1st element to be 2. The vectorecontains the residuals andt(e)%*%ein the log- Likelihood function causes Rto compute the sum of squared can now start the optimization of the log- Likelihood function and store theresults in an object namedp(any other name would have worked just as well):p<-optim(c(1,1,1), ,method="BFGS",hessian=T,y=y,X=X)6wherec (1,1,1)sets the starting values for 0, 1, and 2equal to 1.
10 We cannow invert the Hessian to obtain the observed Fisher information is stored asp$hessianand it can be inverted usingOI<-solve(p$hessian)The square root of the diagonal elements are then the standard errors, corre-sponding to 0, 1, and 2, respectively. These can be obtained by typingse<-sqrt(diag(OI))5 Test Statistics and Output ControlWith the standard errors in hand, Wald test statistics and their associatedp-values can be computed. The following syntax will accomplish this task for theregression model of Example 5 Cont d:The Wald test statistic is given by the ratio of theestimates and their standard errors. The associatedp-value can be computedby referring to a Student s t-distribution with degrees of freedom equal to thenumber of rows minus the number of columns <-p$par/sepval<-2*(1-pt(abs(t),nrow(X)-n col(X)))results<-cbind(p$par,se,t,pval)r esults(colnames)<-c("b","se","t","p")res ults(rownames)<-c("Const","X1","Sigma2") print(results,digits=3)The first line generates the test statistics, the second line computes the asso-ciatedp-values, while the third line brings together the estimates, estimatedstandard errors, test statistics, andp-values.