Example: biology

Poisson Models for Count Data

Chapter 4 Poisson Models for CountDataIn this chapter we study log-linear Models for Count data under the assump-tion of a Poisson error structure. These Models have many applications, notonly to the analysis of counts of events, but also in the context of Models forcontingency tables and the analysis of survival Introduction to Poisson RegressionAs usual, we start by introducing an example that will serve to illustrativeregression Models for Count data. We then introduce the Poisson distributionand discuss the rationale for modeling the logarithm of the mean as a linearfunction of observed covariates. The result is a generalized linear model withPoisson response and link The Children Ever Born DataTable , adapted from Little (1978), comes from the Fiji Fertility Surveyand is typical of the sort of table published in the reports of the WorldFertility Survey.

vation that with count data the e ects of predictors are often multiplicative rather than additive. That is, one typically observes small e ects for small counts, and large e ects for large counts. If the e ect is in fact proportional to the count, working in the log scale leads to a much simpler model. 4.2 Estimation and Testing

Tags:

  Vation, Poisson

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of Poisson Models for Count Data

1 Chapter 4 Poisson Models for CountDataIn this chapter we study log-linear Models for Count data under the assump-tion of a Poisson error structure. These Models have many applications, notonly to the analysis of counts of events, but also in the context of Models forcontingency tables and the analysis of survival Introduction to Poisson RegressionAs usual, we start by introducing an example that will serve to illustrativeregression Models for Count data. We then introduce the Poisson distributionand discuss the rationale for modeling the logarithm of the mean as a linearfunction of observed covariates. The result is a generalized linear model withPoisson response and link The Children Ever Born DataTable , adapted from Little (1978), comes from the Fiji Fertility Surveyand is typical of the sort of table published in the reports of the WorldFertility Survey.

2 The table shows data on the number of children ever bornto married women of the Indian race classified by duration since their firstmarriage (grouped in six categories), type of place of residence (Suva, otherurban and rural), and educational level (classified in four categories: none,lower primary, upper primary, and secondary or higher). Each cell in thetable shows the mean, the variance and the number of our analysis of these data we will treat the number of children everG. Rodr guez. Revised September, 20072 CHAPTER 4. Poisson Models FOR Count DATAT able : Number of Children Ever Born to Women of Indian RaceBy Marital Duration, Type of Place of Residence and Educational Level(Each cell shows the mean, variance and sample size) +NLPUPS+NLPUPS+0 143113423422051148630120 472781464513 1955910 born to each woman as the response, and her marriage duration, type of placeof residence and level of education as three discrete predictors or The Poisson DistributionA random variableYis said to have a Poisson distribution with parameter if it takes integer valuesy= 0,1,2.

3 With probabilityPr{Y=y}=e yy!( )for >0. The mean and variance of this distribution can be shown to beE(Y) = var(Y) = .Since the mean is equal to the variance, any factor that affects one will alsoaffect the other. Thus, the usual assumption of homoscedasticity would notbe appropriate for Poisson INTRODUCTION TO Poisson REGRESSION3 The classic text on probability theory by Feller (1957) includes a numberof examples of observations fitting the Poisson distribution, including dataon the number of flying-bomb hits in the south of London during WorldWar II. The city was divided into 576 small areas of one-quarter squarekilometers each, and the number of areas hit exactlyktimes was were a total of 537 hits, so the average number of hits per area The observed frequencies in Table are remarkably close to aPoisson distribution with mean = Other examples of events thatfit this distribution are radioactive disintegrations, chromosome interchangesin cells, the number of telephone connections to a wrong number, and thenumber of bacteria in different areas of a Petri.

4 Flying-bomb Hits on London During World War IIHits01234 5+Observed229211933571 Expected Poisson distribution can be derived as a limiting form of the binomialdistribution if you consider the distribution of the number of successes in avery large number of Bernoulli trials with a small probability of success ineach trial. Specifically, ifY B(n, ) then the distribution ofYasn and 0 with =n remaining fixed approaches a Poisson distributionwith mean . Thus, the Poisson distribution provides an approximation tothe binomial for the analysis of rare events, where is small andnis the flying-bomb example, we can think of each day as one of a largenumber of trials where each specific area has only a small probability ofbeing hit. Assuming independence across days would lead to a binomialdistribution which is well approximated by the alternative derivation of the Poisson distribution is in terms of astochastic process described somewhat informally as follows.

5 Suppose eventsoccur randomly in time in such a way that the following conditions obtain: The probability of at least one occurrence of the event in a given timeinterval is proportional to the length of the interval. The probability of two or more occurrences of the event in a very smalltime interval is negligible. The numbers of occurrences of the event in disjoint time intervals aremutually 4. Poisson Models FOR Count DATAThen the probability distribution of the number of occurrences of the eventin a fixed time interval is Poisson with mean = t, where is the rateof occurrence of the event per unit of time andtis the length of the timeinterval. A process satisfying the three assumptions listed above is called aPoisson the flying bomb example these conditions are not unreasonable. Thelonger the war lasts, the greater the chance that a given area will be hitat least once.

6 Also, the probability that the same area will be hit twice thesame day is, fortunately, very small. Perhaps less obviously, whether an areais hit on any given day is independent of what happens in neighboring areas,contradicting a common belief that bomb hits tend to most important motivation for the Poisson distribution from thepoint of view of statistical estimation, however, lies in the relationship be-tween the mean and the variance. We will stress this point when we discussour example, where the assumptions of a limiting binomial or a Poisson pro-cess are not particularly realistic, but the Poisson model captures very wellthe fact that, as is often the case with Count data, the variance tends toincrease with the useful property of the Poisson distribution is that the sum of indepen-dent Poisson random variables is also Poisson .

7 Specifically, ifY1andY2areindependent withYi P( i) fori= 1,2 thenY1+Y2 P( 1+ 2).This result generalizes in an obvious way to the sum of more than two important practical consequence of this result is that we can analyzeindividual or grouped data with equivalent results. Specifically, supposewe have a group ofniindividuals with identical covariate values. LetYijdenote the number of events experienced by thej-th unit in thei-th group,and letYidenote the total number of events in groupi. Then, under theusual assumption of independence, ifYij P( i) forj= 1,2,..,ni, thenYi P(ni i). In words, if the individual countsYijare Poisson with mean i, the group totalYiis Poisson with meanni i. In terms of estimation, weobtain exactly the same likelihood function if we work with the individualcountsYijor the group Log-Linear ModelsSuppose that we have a sample ofnobservationsy1,y2.

8 ,ynwhich canbe treated as realizations of independent Poisson random variables, ESTIMATION AND TESTING5Yi P( i), and suppose that we want to let the mean i(and therefore thevariance!) depend on a vector of explanatory could entertain a simple linear model of the form i=x i ,but this model has the disadvantage that the linear predictor on the righthand side can assume any real value, whereas the Poisson mean on the lefthand side, which represents an expected Count , has to be straightforward solution to this problem is to model instead thelog-arithmof the mean using a linear model. Thus, we take logs calculating i= log( i) and assume that the transformed mean follows a linear model i=x i .Thus, we consider a generalized linear model with link log. Com-bining these two steps in one we can write the log-linear model aslog( i) =x i.

9 ( )In this model the regression coefficient jrepresents the expected changein thelogof the mean per unit change in the predictorxj. In other wordsincreasingxjby one unit is associated with an increase of jin the log ofthe Equation we obtain a multiplicative model for themean itself: i= exp{x i }.In this model, an exponentiated regression coefficient exp{ j}represents amultiplicative effect of thej-th predictor on the mean. Increasingxjby oneunit multiplies the mean by a factor exp{ j}.A further advantage of using the log link stems from the empirical obser- vation that with Count data the effects of predictors are often multiplicativerather than additive. That is, one typically observes small effects for smallcounts, and large effects for large counts. If the effect is in fact proportionalto the Count , working in the log scale leads to a much simpler Estimation and TestingThe log-linear Poisson model described in the previous section is a gener-alized linear model with Poisson error and link log.

10 Maximum likelihoodestimation and testing follows immediately from the general results in Ap-pendix B. In this section we review a few key 4. Poisson Models FOR Count Maximum Likelihood EstimationThe likelihood function fornindependent Poisson observations is a productof probabilities given by Equation Taking logs and ignoring a constantinvolving log(yi!), we find that the log-likelihood function islogL( ) = {yilog( i) i},where idepends on the covariatesxiand a vector ofpparameters throughthe log link of Equation is interesting to note that the log is the canonical link for the Poissondistribution. Taking derivatives of the log-likelihood function with respectto the elements of , and setting the derivatives to zero, it can be shownthat the maximum likelihood estimates in log-linear Poisson Models satisfythe estimating equationsX y=X ,( )whereXis the model matrix, with one row for each observation and onecolumn for each predictor, including the constant (if any),yis the responsevector, and is a vector of fitted values, calculated from the s byexponentiating the linear predictor =X.


Related search queries