Survival Models - Princeton University

Chapter 7 survival ModelsOur final chapter concerns Models for the analysis of data which have threemain characteristics: (1) the dependent variable or response is the waitingtimeuntil the occurrence of a well-defined event, (2) observations arecen-sored, in the sense that for some units the event of interest has not occurredat the time the data are analyzed, and (3) there are predictors orexplanatoryvariables whose effect on the waiting time we wish to assess or control. Westart with some basic The Hazard and Survival FunctionsLetTbe a non-negative random variable representing the waiting time untilthe occurrence of an event. For simplicity we will adopt the terminologyof Survival analysis, referring to the event of interest as death and to thewaiting time as Survival time, but the techniques to be studied have muchwider applicability.

They can be used, for example, to study age at marriage,the duration of marriage, the intervals between successive births to a woman,the duration of stay in a city (or in a job), and the length of life. Theobservant demographer will have noticed that these examples include thefields of fertility, mortality and The Survival FunctionWe will assume for now thatTis a continuous random variable with prob-ability density function ( )f(t) and cumulative distribution function( )F(t) = Pr{T < t}, giving the probability that the event has oc-curred by Rodr guez. Revised September, 20102 CHAPTER 7. Survival MODELSIt will often be convenient to work with the complement of the , thesurvivalfunctionS(t) = Pr{T t}= 1 F(t) = tf(x)dx,( )which gives the probability of being alive just before durationt, or moregenerally, the probability that the event of interest has not occurred The Hazard FunctionAn alternative characterization of the distribution ofTis given by thehazardfunction, or instantaneous rate of occurrence of the event, defined as (t) = limdt 0Pr{t T < t+dt|T t}dt.

( )The numerator of this expression is the conditional probability that the eventwill occur in the interval [t,t+dt) given that it has not occurred before, andthe denominator is the width of the interval. Dividing one by the other weobtain a rate of event occurrence per unit of time. Taking the limit as thewidth of the interval goes down to zero, we obtain an instantaneous rate conditional probability in the numerator may be written as the ratioof the joint probability thatTis in the interval [t,t+dt)andT t(whichis, of course, the same as the probability thattis in the interval), to theprobability of the conditionT t. The former may be written asf(t)dtforsmalldt, while the latter isS(t) by definition. Dividing bydtand passingto the limit gives the useful result (t) =f(t)S(t),( )which some authors give as a definition of the hazard function.]]

In words, therate of occurrence of the event at durationtequals the density of events att,divided by the probability of surviving to that duration without experiencingthe from Equation that f(t) is the derivative ofS(t). This suggestsrewriting Equation as (t) = ddtlogS(t). THE HAZARD AND Survival FUNCTIONS3If we now integrate from 0 totand introduce the boundary conditionS(0) =1 (since the event is sure not to have occurred by duration 0), we can solvethe above expression to obtain a formula for the probability of surviving todurationtas a function of the hazard at all durations up tot:S(t) = exp{ t0 (x)dx}.( )This expression should be familiar to demographers. The integral in curlybrackets in this equation is called thecumulative hazard( or cumulative risk)and is denoted (t) = t0 (x)dx.

( )You may think of (t) as the sum of the risks you face going from duration0 results show that the Survival and hazard functions provide alter-native but equivalent characterizations of the distribution ofT. Given thesurvival function, we can always differentiate to obtain the density and thencalculate the hazard using Equation Given the hazard, we can alwaysintegrate to obtain the cumulative hazard and then exponentiate to obtainthe Survival function using Equation An example will help fix :The simplest possible Survival distribution is obtained by assuminga constant risk over time, so the hazard is (t) = for allt. The corresponding Survival function isS(t) = exp{ t}.This distribution is called the exponential distribution with parameter.

The density may be obtained multiplying the survivor function by the hazardto obtainf(t) = exp{ t}.The mean turns out to be 1/ . This distribution plays a central role in sur-vival analysis, although it is probably too simple to be useful in applicationsin its own Expectation of LifeLet denote the mean or expected value ofT. By definition, one wouldcalculate multiplyingtby the densityf(t) and integrating, so = 0tf(t) 7. Survival MODELSI ntegrating by parts, and making use of the fact that f(t) is the derivativeofS(t), which has limits or boundary conditionsS(0) = 1 andS( ) = 0,one can show that = 0S(t)dt.( )In words, the mean is simply the integral of the Survival A Note on Improper Random Variables*So far we have assumed implicitly that the event of interest is bound to occur,so thatS( ) = 0.

In words, given enough time the proportion survivinggoes down to zero. This condition implies that the cumulative hazard mustdiverge, we must have ( ) = . Intuitively, the event will occur withcertainty only if the cumulative risk over a long period is sufficiently are, however, many events of possible interest that are not boundto occur. Some men and women remain forever single, some birth intervalsnever close, and some people are happy enough at their jobs that they neverleave. What can we do in these cases? There are two approaches one approach is to note that we can still calculate the hazard and survivalfunctions, which are well defined even if the event of interest is not boundto occur. For example we can study marriage in the entire population,which includes people who will never marry, and calculate marriage ratesand proportions single.

In this exampleS(t) would represent the proportionstill single at agetandS( ) would represent the proportion who limitation of this approach is that if the event is not certain tooccur, then the waiting timeTcould be undefined (or infinite) and thusnot a proper random variable. Its density, which could be calculated fromthe hazard and Survival , would be improper, it would fail to integrateto one. Obviously, the mean waiting time would not be defined. In termsof our example, we cannot calculate mean age at marriage for the entirepopulation, simply because not everyone marries. But this limitation is ofno great consequence if interest centers on the hazard and survivor functions,rather than the waiting time.

In the marriage example we can even calculatea median age at marriage, provided we define it as the age by which half thepopulation has alternative approach is to condition the analysis on the event actu-ally occurring. In terms of our example, we could study marriage (perhapsretrospectively) for people who eventually marry, since for this group CENSORING AND THE LIKELIHOOD FUNCTION5actual waiting timeTis always well defined. In this case we can calculatenot just the conditional hazard and survivor functions, but also the our marriage example, we could calculate the mean age at marriage forthose who marry. We could even calculate a conventional median, definedas the age by which half the people who will eventually marry have done turns out that the conditional density, hazard and survivor functionfor those who experience the event are related to the unconditional density,hazard and survivor for the entire population.

The conditional density isf (t) =f(t)1 S( ),and it integrates to one. The conditional survivor function isS (t) =S(t) S( )1 S( ),and goes down to zero ast . Dividing the density by the survivorfunction, we find the conditional hazard to be (t) =f (t)S (t)=f(t)S(t) S( ).Derivation of the mean waiting time for those who experience the event isleft as an exercise for the approach is adopted, care must be exercised to specify clearlywhich hazard or Survival is being used. For example, the conditional hazardfor those who eventually experience the event is always higher than theunconditional hazard for the entire population. Note also that in most casesall we observe is whether or not the event has occurred.

Survival Models - Princeton University

Tags:

Information

Transcription of Survival Models - Princeton University

Related search queries

Survival Models - Princeton University

Tags:

Information

Documents from same domain

Related documents

Related search queries