Transcription of Syntax - Stata
1 Obtain predictions, residuals, etc., after estimationSyntaxMenu for predictDescriptionOptionsRemarks and examplesMethods and formulasAlso seeSyntaxAfter single-equation(SE)modelspredict[type]ne wvar[if] [in] [,singleoptions]After multiple-equation(ME)modelspredict[type] newvar[if] [in] [,multipleoptions]predict[type] {stub*| } [if] [in], scoressingleoptionsDescriptionMainxbcalc ulate linear predictionstdpcalculate standard error of the predictionscorecalculate first derivative of the log likelihood with respect toxj Optionsnooffsetignore anyoffset()orexposure()variableotheropti onscommand-specific optionsmultipleoptionsDescriptionMainequ ation(eqno[,eqno])specify equationsxbcalculate linear predictionstdpcalculate standard error of the predictionstddpcalculate the difference in linear predictionsOptionsnooffsetignore anyoffset()orexposure()variableotheropti onscommand-specific optionsMenu for predictStatistics>Postestimation>Predict ions, residuals, predict Obtain predictions, residuals, etc.
2 , after estimationDescriptionpredictcalculates predictions, residuals, influence statistics, and the like after estimation. Exactlywhatpredictcan do is determined by the previous estimation command; command-specific optionsare documented with each estimation command. Regardless of command-specific options, the actionsofpredictshare certain similarities across estimation predicted values numbers related to theE(yj|xj). For instance, after linear regression,predictnewvarcreatesxjband, after probit,creates the probability (xjb). , xbcreatesnewvarcontainingxjb. This may be the same result as option1 (for example, linear regression) or different (for example, probit), but regardless, optionxbis , stdpcreatesnewvarcontaining the standard error of the linear ,otheroptionsmay createnewvarcontaining other useful quantities; seehelpor the reference manual entry for the particular estimation command to find out aboutother available to any of the above commands requests that the calculation ignore any offsetor exposure variable specified by including theoffset(varnameo)orexposure(varnamee)o ption when you fit the be used to make in-sample or out-of-sample the requested statistic for all possible observations, whether they were usedin fitting the model or this for standard options 1 3 and generally does thisfor estimator-specific options e(sample).
3 Restricts the prediction to the estimation Some statistics make sense only with respect to the estimation subsample. In such cases, thecalculation is automatically restricted to the estimation subsample, and the documentation forthe specific option states this. Even so, you can still specifyif e(sample)if you are make out-of-sample predictions even using other datasets. In particular, you can. use ds1.(fit a model). use two /* another dataset */. predict yhat, .. /* fill in the predictions */Options Main xbcalculates the linear prediction from the fitted model. That is, all models can be thought of asestimating a set of parametersb1,b2,..,bk, and the linear prediction is yj=b1x1j+b2x2j+ +bkxkj, often written in matrix notation as yj=xjb. For linear regression, the values yjare called the predicted values or, for out-of-sample predictions, the forecast.
4 For logit and probit,for example, yjis called the logit or probit ,x2j,..,xkjare obtained from the data currently in memory and do not necessarily correspondto the data on the independent variables used to fit the model (obtainingb1,b2,..,bk).predict Obtain predictions, residuals, etc., after estimation 3stdpcalculates the standard error of the linear prediction. Here the prediction means the same thingas the index , namely,xjb. The statistic produced bystdpcan be thought of as the standarderror of the predicted expected value, or mean index, for the observation s covariate pattern. Thestandard error of the prediction is also commonly referred to as the standard error of the fittedvalue. The calculation can be made in or out of allowed only after you have previously fit a multiple-equation model. The standard error ofthe difference in linear predictions (x1jb x2jb)between equations 1 and 2 is calculated.
5 Thisoption requires thatequation(eqno1,eqno2)be the equation-level score, lnL/ (xj ). Here lnLrefers to the theMEmodel equivalent of thescoreoption, resulting in multiple equation-level scorevariables. An equation-level score variable is created for each equation in the model; ancillaryparameters such as ln and atanh make up separate (eqno[,eqno]) synonymoutcome() is relevant only when you have previously fit amultiple-equation model. It specifies the equation to which you are ()is typically filled in with oneeqno it would be filled in that way with optionsxbandstdp, for (#1) would mean the calculation is to be made for thefirst equation,equation(#2)would mean the second, and so on. You could also refer to theequations by their (income)would refer to the equation named income andequation(hours)to the equation named you do not specifyequation(), results are the same as if you specifiedequation(#1).
6 Other statistics, such asstddp, refer to between-equation concepts. In those cases, you mightspecifyequation(#1,#2)orequation(in come,hours). When two equations must be specified,equation()is required. Options nooffsetmay be combined with most statistics and specifies that the calculation should be made,ignoring any offset or exposure variable specified when the model was option is available, even if it is not documented forpredictafter a specific command. Ifneither theoffset(varnameo)option nor theexposure(varnamee)option was specified whenthe model was fit, specifyingnooffsetdoes to command-specific options that are documented with each and are presented under the following headings:Estimation-sample predictionsOut-of-sample predictionsResidualsSingle-equation (SE) modelsSE model scoresMultiple-equation (ME) modelsME model scoresMost of the examples are presented using linear regression, but the general Syntax is applicableto all predict Obtain predictions, residuals, etc.
7 , after estimationYou can think of any estimation command as estimating a set of coefficientsb1,b2,..,bkcorresponding to the variablesx1,x2,..,xk, along with a (possibly empty) set of ancillary statistics 1, 2,.., m. All estimation commands store thebis and that storedinformation and combines it with the data currently in memory to make various calculations. Forinstance,predictcan calculate the linear prediction, yj=b1x1j+b2x2j+ +bkxkj. The dataon whichpredictmakes the calculation can be the same data used to fit the model or a differentdataset it does not the stored parameter estimates from the model, obtainsthe corresponding values ofxfor each observation in the data, and then combines them to producethe desired predictionsExample 1We have a 74-observation dataset on automobiles, including the mileage rating (mpg), the car sweight (weight), and whether the car is foreign (foreign).
8 We fit the model. use (1978 Automobile Data). regress mpg weight if foreignSourceSS df MS Number of obs = 22F( 1, 20) = 1 Prob > F = 20 R-squared = R-squared = 21 Root MSE = Std. Err. t P>|t| [95% Conf. Interval] .0024942 we were to typepredict pmpgnow, we would obtain the linear predictions for all 74 obtain the predictions just for the sample on which we fit the model, we could type. predict pmpg if e(sample)(option xb assumed; fitted values)(52 missing values generated)Heree(sample)is true only for foreign cars because we typedif foreignwhen we fit the modeland because there are no missing values among the relevant variables. If there had been missingvalues,e(sample)would also account for the way, theif e(sample)restriction can be used with any Stata command, so we couldobtain summary statistics on the estimation sample by typing.
9 Summarize if e(sample)(output omitted)predict Obtain predictions, residuals, etc., after estimation 5 Out-of-sample predictionsBy out-of-sample predictions, we mean predictions extending beyond the estimation sample. Inthe example above, typingpredict pmpgwould generate linear predictions using all 74 work on other datasets, too. You can use a new dataset and typepredictto obtainresults for that 2 Using the same auto dataset, assume that we wish to fit the modelmpg= 1weight+ 2ln(weight) + 3foreign+ 4We first create the ln(weight)variable, and then type theregresscommand:. use , clear(1978 Automobile Data). generate lnweight = ln(weight). regress mpg weight lnweight foreignSourceSS df MS Number of obs = 74F( 3, 70) = 3 Prob > F = 70 R-squared = R-squared = 73 Root MSE = Std.
10 Err. t P>|t| [95% Conf. Interval] .0038995 . we typedpredict pmpgnow, we would obtain predictions for all 74 cars in the current , we are going to use a new the make, weight, and place of manufacture of two cars, thePontiac Sunbird and the Volvo 260. Let s use the dataset and create the predictions:. use , clear(New Automobile Models). listmake weight Sunbird 2690 260 3170 Foreign. predict mpg(option xb assumed; fitted values)variable lnweight not foundr(111);Things did not work. We typedpredict mpg, and Stata responded with the message variablelnweight not found .predictcan calculate predicted values on a different dataset only if that datasetcontains the variables that went into the model. Here our dataset does not contain a variable just the log of weight, so we can create it and try again:6 predict Obtain predictions, residuals, etc.