Linear Regression Models with Logarithmic Transformations

Linear Regression Models with Logarithmic TransformationsKenneth Benoit Methodology InstituteLondon School of 17, 20111 Logarithmic Transformations of variablesConsidering the simple bivariate Linear modelYi= + Xi+ i,1there are four possible com-binations of Transformations involving logarithms: the Linear case with no Transformations , thelinear-log model , the log- Linear model2, and the log-log Yi= + Xi Yi= + logXilogYlog-linearlog-loglog Yi= + Xilog Yi= + logXiTable 1: Four varieties of Logarithmic transformationsRemember that we are usingnaturallogarithms, where the base ise Logarithms mayhave other bases, for instance the decimal logarithm of base 10. (The base 10 logarithm is used inthe definition of the Richter scale, for instance, measuring the intensity of earthquakes as Richter=log(intensity). This is why an earthquake of magnitude 9 is 100 times more powerful than anearthquake of magnitude 7: because 109/107=102and log10(102)=2.)

Some properties of logarithms and exponential functions that you may find useful include:1. log(e)=12. log(1)=03. log(xr)=rlog(x)4. logeA=A with valuable input and edits from Jouni bivariate case is used here for simplicity only, as the results generalize directly to Models involving more thanoneXvariable, although we would need to add the caveat that all other variables are held that the term log- Linear model is also used in other contexts, to refer to some types of Models for other kindsof response variablesY. These are different from the log- Linear Models discussed log(AB)=logA+logB7. log(A/B)=logA eA +B= B=eA/eB2 Why use Logarithmic Transformations of variablesLogarithmically transforming variables in a Regression model is a very common way to handle sit-uations where a non- Linear relationship exists between the independent and dependent the logarithm of one or more variables instead of the un-logged form makes the effectiverelationship non- Linear , while still preserving the Linear Transformations are also a convenient means of transforming a highly skewed variableinto one that is more approximately normal.

(In fact, there is a distribution called thelog-normaldistribution defined as a distribution whose logarithm is normally distributed but whose untrans-formed scale is skewed.)For instance, if we plot the histogram of expenses (from the MI452 course pack example), we see asignificant right skew in this data, meaning the mass of cases are bunched at lower values:050010001500200025003000020040060 0 ExpensesIf we plot the histogram of the logarithm of expenses, however, we see a distribution that looksmuch more like a normal distribution:3 The other transformation we have learned is thequadraticform involving adding the termX2to the model . Thisproduces curvature that unlike the Logarithmic transformation that can reverse the direction of the relationship, some-thing that the Logarithmic transformation cannot do. The Logarithmic transformation is what as known as a monotonetransformation: it preserves the ordering betweenxandf(x).

22468020406080100 Log(Expenses)3 Interpreting coefficients in logarithmically Models with Linear model :Yi= + Xi+ iRecall that in the Linear Regression model , logYi= + Xi+ i, the coefficient gives us directlythe change inYfor a one-unit change inX. No additional interpretation is required beyond theestimate of the coefficient literal interpretation will still hold when variables have been logarithmically transformed, butit usually makes sense to interpret the changes not in log-units but rather in percentage logarithmically transformed model is discussed in turn Linear -log model :Yi= + logXi+ iIn the Linear -log model , the literal interpretation of the estimated coefficient is that a one-unitincrease in logXwill produce an expected increase inYof units. To see what this means in termsof changes inX, we can use the result thatlogX+1=logX+loge=log(eX)which is obtained using properties 1 and 6 of logarithms and exponential functions listed on page1.

In other words,adding1 to logXmeansmultiplying Xitself bye proportional change like this can be converted to a percentage change by subtracting 1 andmultiplying by 100. So another way of stating multiplyingXby is to say thatXincreases by172% (since 100 ( 1)=172).So in terms of a change inX(unlogged):3 is the expected change inYwhenXis multiplied bye. is the expected change inYwhenXincreases by 172% For other percentage changes inXwe can use the following result: The expected change inYassociated with ap% increase inXcan be calculated as log([100+p]/100). So towork out the expected change associated with a 10% increase inX, therefore, multiply bylog(110/100)=log( )=.095. In other words, is the expected change inYwhenXis multiplied by , increases by 10%. For smallp, approximately log([100+p]/100) p/100.

Forp=1, this means that /100can be interpreted approximately as the expected increase inYfrom a 1% increase Log- Linear model :logYi= + Xi+ iIn the log- Linear model , the literal interpretation of the estimated coefficient is that a one-unitincrease inXwill produce an expected increase in logYof units. In terms ofYitself, this meansthat the expected value ofYis multiplied bye . So in terms of effects of changes inXonY(unlogged): Each 1-unit increase inXmultiplies the expected value ofYbye . To compute the effects onYof another change inXthan an increase of one unit, call thischangec, we need to includecin the exponent. The effect of ac-unit increase inXis tomultiply the expected value ofYbyec . So the effect for a 5-unit increase inXwould bee5 . For small values of , approximatelye 1+ . We can use this for the following approxima-tion for a quick interpretation of the coefficients: 100 is the expected percentage changeinYfor a unit increase inX.

For instance for =.06, , so a 1-unit change inXcorresponds to (approximately) an expected increase inYof 6%. Log-log model :logYi= + logXi+ iIn instances where both the dependent variable and independent variable(s) are log-transformedvariables, the interpretation is a combination of the Linear -log and log- Linear cases above. In otherwords, the interpretation is given as an expected percentage change inYwhenXincreases by somepercentage. Such relationships, where bothYandXare log-transformed, are commonly referredto as elastic in econometrics, and the coefficient of logXis referred to as an in terms of effects of changes inXonY(both unlogged): multiplyingXbyewill multiply expected value ofYbye To get the proportional change inYassociated with appercent increase inX, calculatea=log([100+p]/100)and takeea 44 the Regression of % urban population (1995) on per capita GNP:% urban 95 (World Bank)United Nations per capita GDP77424168100% urban 95 (World Bank) examples!

Let's consider the relationship between the percentageurban and per capita GNP:!This doesn't look too good. Let's try transforming the percapita GNP by logging it:The distribution of per capita GDP is badly skewed, creating a non- Linear relationship betweenXandY. To control the skew and counter problems in heteroskedasticity, we transform GNP/capitaby taking its logarithm. This produces the following plot:% urban 95 (World Bank)United Nations per capita GDP77424168100% urban 95 (World Bank) examples!Let's consider the relationship between the percentageurban and per capita GNP:!This doesn't look too good. Let's try transforming the percapita GNP by logging it:and the Regression with the following results:5!That looked pretty good. Now let's quantify the associationbetween percentage urban and the logged per capitaincome:. regress urb95 lPcGDP95 Source | SS df MS Number of obs = 132 ---------+------------------------------ F( 1, 130) = model | 1 Prob > F = Residual | 130 R-squared = ---------+------------------------------ Adj R-squared = Total | 131 Root MSE = ---------------------------------------- -------------------------------------- urb95 | Coef.

Std. Err. t P>|t| [95% Conf. Interval] ---------+------------------------------ -------------------------------------- lPcGDP95 | .8278521 _cons | ---------------------------------------- -------------------------------------- !The implication of this coefficient is that multiplying capita income by e, roughly , 'increases' thepercentage urban by percentage points.!Increasing per capita income by 10% 'increases' thepercentage urban by * = interpret the coefficient of on the log of the GNP/capita variable, we can make thefollowing statements:Directly from the coefficient:An increase of 1 in the log of GNP/capita will increaseYby (This is not extremely interesting, however, since few people are sure how to interpret thenatural logarithms of GDP/capita.)

Multiplicative changes ine:Multiplying GNP/cap byewill increaseYby 1% increase inX:A 1% increase in GNP/cap will increaseYby 10% increase inX:A 10% increase in GNP/cap will increaseYby log( )= .09531 if we reverseXandYfrom the above example, so that we regress the log ofGNP/capita on the %urban? In this case, the logarithmically transformed variable is leads to the following plot (which is just the transpose of the previous one this is only anexample!):lPcGDP95% urban 95 (World Bank) about the situation where the dependent variable islogged?!We could just as easily have considered the 'effect' onlogged per capita income of increasing urbanization: . regress lPcGDP95 urb95 Source | SS df MS Number of obs = 132 ---------+------------------------------ F( 1, 130) = model | 1 Prob > F = Residual | 130 R-squared = ---------+------------------------------ Adj R-squared = Total | 131 Root MSE = ---------------------------------------- -------------------------------------- lPcGDP95 | Coef.

Linear Regression Models with Logarithmic Transformations

Tags:

Information

Transcription of Linear Regression Models with Logarithmic Transformations

Related search queries

Linear Regression Models with Logarithmic Transformations

Tags:

Information

Documents from same domain

Related documents

Related search queries