Transcription of ECONOMETRIC ANALYSIS USING PANEL DATA - …
1 ECONOMETRIC ANALYSIS USING PANEL data Ranjit Kumar Paul , Library Avenue, New Delhi- 110012 Introduction Different types of data are generally available for empirical ANALYSIS , namely, time series, cross section, and PANEL . A data set containing observations on a single phenomenon observed over multiple time periods is called time series ( , GDP for several quarters or years). In time series data , both the values and the ordering of the data points have meaning. In cross-section data , values of one or more variables are collected for several sample units, or entities, at the same point in time ( , crime rates for 50 states in the United States for a given year).
2 PANEL data sets refer to sets that consist of both time series and cross section data . This has the effect of expanding the number of observations available, for instance if we have 10 years of data across 10 countries, we have 100 observations. So although there would not be enough to estimate the model as a time series or a cross section, there would be enough to estimate it as a PANEL . PANEL data can be in either a balanced or unbalanced format, a balanced PANEL is where there is an observation for every unit of observation in the time series and unbalanced where observations are missing.
3 A further benefit is that it can overcome the problem of unobserved heterogeneity in a cross section data set. This occurs where there are unobserved variations in the characteristics of the respondents in a survey based data base. Let us consider a data set on eggs produced and their prices for 50 districts in India for years 1990 and 1991. For any given year, the data on eggs and their prices represent a cross-sectional sample. For any given district, there are two time series observations on eggs and their prices. Thus, we have in all 100)250( ( PANEL ) observations on eggs produced and their prices.
4 There are other names for PANEL data , such as pooled data (pooling of time series and cross-sectional observations), combination of time series and cross-section data , micro PANEL data , longitudinal data (a study over time of a variable or group of subjects). The regression models based on such PANEL data are known as PANEL data regression models. Advantage of PANEL data : 1. Since the PANEL data relate to individuals, firms, states, countries, etc., over time, presence of heterogeneity in these units is a natural phenomenon.
5 The techniques of PANEL data estimation can take such heterogeneity explicitly into account by allowing for individual specific variables. 2. By combining time series of cross section observations, PANEL data give more informative data , more variability, less collinearity among variables, more degrees of freedom and more efficiency . 3. By studying the repeated cross section of observations, PANEL data are better suited to study the dynamics of change. Spells of unemployment, job turnover, and labour mobility are better suited with PANEL data .
6 4. PANEL data can better detect and measure effects that simply can not be observed in pure cross section or time series data . For example, the effect of minimum wage laws on ECONOMETRIC ANALYSIS USING PANEL data employment and earnings can be better studied if we include successive waves of minimum wage increases in the federal and/or state minimum wages. 5. PANEL data enables us to study more complicated behavioural models. For example, phenomena such as economies of scale and technological change can be better handled by PANEL data than by pure cross section or time series data .
7 6. By making data available for several thousand units, PANEL data can minimize the bias that might result if we aggregate individuals or firms into broad aggregates. PANEL data : An illustrative example Following ANALYSIS was carried out based on the data taken from a famous study of investment theory proposed by Y. Grunfeld. Grunfeld was interested in finding out how real gross investment (Y) depends on the real value of the firm )2X( and real capital stock )3X(. It includes data on four companies, General electric (GE), General Motor (GM), Steel (US), and Westinghouse.
8 data for each company on the preceding three variables are available for the period 1935-54. Thus, there are four cross-sectional units and 20 time periods. In all, therefore, we have 80 observations. A prior, Y is expected to be positively related to 2X and 3X. Pooling, or combining, all the 80 observations, the Grunfeld investment function can be written as: itit3X3it2X21itY i = 1, 2, 3, 4 (1) t = 1, 2, .. , 20 where i stands for the ith cross-sectional unit and t for the tth time period and it is assumed that the X s are nonstochastic and that the error term follows the classical assumptions, namely, )2,0(N~)it(E.
9 Estimation of PANEL data regression models: 1. The fixed effects approach. Estimation of (1) depends on the assumptions we make about the intercept, the slope coefficients, and the error term. There are several possibilities: 1. Assume that the intercept and slope coefficients are constant across time and space and the error term captures differences over time and individuals. 2. The slope coefficients are constant but the intercept varies over individuals. 3. The slope coefficients are constant but the intercept varies over individuals and time.
10 4. All coefficients (the intercept as well as slope coefficients) vary over individuals. 5. The intercept as well as slope coefficients vary over individuals and time. (i) All coefficients constant across time and individuals The simplest, and possibly na ve approach is to disregard the space and time dimensions of the pooled data and just estimate the usual OLS regression. That is, stack the 20 observations for each company one on top of the other, thus giving in all 80 observations for each of the variables in the model. ECONOMETRIC ANALYSIS USING PANEL data The OLS results are as follows se = ( ) ( ) ( ) t = ( ) ( ) ( ) (2) Durbin-Watson = n = 80 df = 77 Here all the coefficients are individually statistically significant and the 2R value is reasonably high.