Example: bankruptcy

Non- and Semi- Parametric Modeling in Survival analysis

Non- and semi - Parametric Modeling inSurvival analysis Jianqing FanDepartment of ORFEP rinceton UniversityPrinceton, NJ 08544, USAE-mail: JiangDepartment of Mathematics and StatisticsUniversity of North CarolinaCharlotte, NC 28223, USAE-mail: this chapter, we give a selective review of the nonparametric mod-eling methods using Cox s type of models in Survival analysis . We firstintroduce Cox s model (Cox 1972) and then study its variants in the direc-tion of smoothing. The model fitting, variable selection, and hypothesistesting problems are addressed. A number of topics worthy of furtherstudy are given throughout this and Phrases. Censoring, Cox s model, failure time, likelihood, Modeling , nonparametric IntroductionSurvival analysis is concerned with studying the time between entry to a studyand a subsequent event and becomes one of the most important fields in statis-tics.

Non- and Semi- Parametric Modeling in Survival analysis ∗ Jianqing Fan Department of ORFE Princeton University Princeton, NJ 08544, USA E-mail: jqfan@princeton.edu

Tags:

  Analysis, Princeton, Survival, Modeling, Parametric, Semi, Semi parametric modeling in survival analysis

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of Non- and Semi- Parametric Modeling in Survival analysis

1 Non- and semi - Parametric Modeling inSurvival analysis Jianqing FanDepartment of ORFEP rinceton UniversityPrinceton, NJ 08544, USAE-mail: JiangDepartment of Mathematics and StatisticsUniversity of North CarolinaCharlotte, NC 28223, USAE-mail: this chapter, we give a selective review of the nonparametric mod-eling methods using Cox s type of models in Survival analysis . We firstintroduce Cox s model (Cox 1972) and then study its variants in the direc-tion of smoothing. The model fitting, variable selection, and hypothesistesting problems are addressed. A number of topics worthy of furtherstudy are given throughout this and Phrases. Censoring, Cox s model, failure time, likelihood, Modeling , nonparametric IntroductionSurvival analysis is concerned with studying the time between entry to a studyand a subsequent event and becomes one of the most important fields in statis-tics.

2 The techniques developed in Survival analysis are now applied in manyfields, such as biology ( Survival time), engineering (failure time), medicine (treat-ment effects or the efficacy of drugs), quality control (lifetime of component),credit risk Modeling in finance (default time of a firm).An important problem in Survival analysis is how to model well the condi-tional hazard rate of failure times given certain covariates, because it involvesfrequently asked questions about whether or not certain independent variablesare correlated with the Survival or failure times. These problems have presenteda significant challenge to statisticians in the last 5 decades, and their importance The authors are partly supported by NSF grants DMS-0532370, DMS-0704337 and motivated many statisticians to work in this area. Among them is one of themost important contributions, the proportional hazards model or Cox s modeland its associated partial likelihood estimation method (Cox, 1972), which stim-ulated a lot of works in this field.

3 In this chapter we review related work alongthis direction using the Cox type of models and open an academic researchavenue for interested readers. Various estimation methods are considered, avariable selection approach is studied, and a useful inference method, the gen-eralized likelihood ratio (GLR) test, is employed to address hypothesis testingproblems for the models. Several topics worthy of further study are laid downin the discussion remainder of this chapter is organized as follows. We consider univariateCox s type of models in Section 2 and study multivariate Cox s type of modelsusing the marginal Modeling strategy in Section 3. Section 4 focuses on modelselection rules, Section 5 is devoted to validating Cox s type of models, andSection 6 discusses transformation models (extensions to Cox s models). Finally,we conclude this chapter in the discussion Cox s Type of ModelsModel Specification.

4 The celebrated Cox model has provided a tremendouslysuccessful tool for exploring the association of covariates with failure time andsurvival distributions and for studying the effect of a primary covariate whileadjusting for other variables. This model assumes that, given a -dimensionalvector of covariatesZ, the underlying conditional hazard rate (rather than ex-pected Survival time ), ( z) = lim 0+1 { < + ,Z=z},is a function of the independent variables (covariates): ( z) = 0( ) (z),(1)where (z) = exp( (z)) with the form of the function (z) known such as (z) = z, and 0( ) is an unknown baseline hazard function. Once theconditional hazard rate is given, the condition survivor function ( z) and con-ditional density ( z) are also determined. In general, they have the followingrelationship: ( z) = exp( ( z)), ( z) = ( z) ( z),(2)where ( z) = 0 ( z) is the cumulative hazard function.

5 Since no assump-tions are made about the nature or shape of the baseline hazard function, theCox regression model may be considered to be a semiparametric Cox model is very useful for tackling with censored data which oftenhappen in practice. For example, due to termination of the study or early with-drawal from a study, not all of the Survival times 1, , may be fully observ-able. Instead one observes for the subject an event time = min( , ), a2censoring indicator = ( ), as well as an associated vector of covariatesZ . Denote the observed data by{(Z , , ) : = 1, , }which is an from the population (Z, , ) with = min( , ) and = ( ).Suppose that the random variables and are positive and continuous. Thenby Fan, Gijbels, and King (1997), under the Cox model (1), ( ) = { Z=z} { 0( ) Z=z},(3)where 0( ) = 0 0( ) is the cumulative baseline hazard function.

6 Equation(3) allows one to estimate the function using regression techniques if 0( ) likelihood function can also be derived. When = 0, all we know isthat the Survival time and the probability for getting this is ( Z ) = ( Z ) = ( Z ),whereas when = 1, the likelihood of getting is ( Z ) = ( Z ). There-fore the conditional (given covariates) likelihood for getting the data is = =1 ( Z ) =0 ( Z ) = =1 ( Z ) ( Z ),(4)and using (2), we have = =1log( ( Z )) ( ).= log( ( Z )) ( Z ).(5)For proportional hazards model (1), we have specifically = log( 0( ) ( )) 0( ) ( ).(6)Therefore, when both ( ) and 0( ) are parameterized, the parameters can beestimated by maximizing the likelihood (6).Estimation. The likelihood inference can be made about the parameters inmodel (1) if the baseline 0( ) and the risk function ( ) are known up to avector of unknown parameters (Aitkin and Clayton, 1980), 0( ) = 0( ; ); and ( ) = 0( ; ).

7 When the baseline is completely unknown and the form of the function ( ) isgiven, inference can be based on the partial likelihood (Cox, 1975). Since thefull likelihood involves both and 0( ), Cox decomposed the full likelihoodinto a product of the term corresponding to identities of successive failures and3the term corresponding to the gap times between any two successive first term inherits the usual large-sample properties of the full likelihoodand is called the partial partial likelihood can also be derived from counting process theory (seefor example Andersen, Borgan, Gill, and Keiding 1993) or from a profile likeli-hood in Johansen (1983). In the following we introduce the 1[The partial likelihood as profile likelihood; Fan, Gijbel, andKing (1997)] Consider the case that (z) = (z; ). Let 1< < denotethe ordered failure times and let ( ) denote the label of the item failing at.

8 Denote by the risk set at time , that is ={ : }. Consider theleast informative nonparametric Modeling for 0( ), that is, 0( ) puts pointmass at time in the same way as constructing the empirical distribution: 0( ; ) = =1 ( ).(7)Then 0( ; ) = =1 ( ).(8)Under the proportional hazards model (1), using (6), the log likelihood islog = =1[ {log 0( ; ) + ( ; )} 0( ; ) exp{ ( ; )}].(9)Substituting (7) and (8) into (9), one establishes thatlog = =1[log + ( ( ); )] =1 =1 ( ) exp{ ( ; )}.(10)Maximizing log with respect to leads to the following Breslow estimator ofthe baseline hazard [Brewlow (1972, 1974)] =[ exp{ ( ; )}] 1.(11)Substituting (11) into (10), we obtainmax 0log = =1( (Z( ); ) log[ exp{ (Z ; )}]) 4 This leads to the log partial likelihood function (Cox 1975) ( ) = =1( (Z( ); ) log[ exp{ (Z ; )}]).

9 (12)An alternative expression is ( ) = =1( (Z( ); ) log[ =1 ( ) exp{ (Z ; )}]),where ( ) = ( ) is the Survival indicator on whether the -th subjectsurvives at the time .The above partial likelihood function is a profile likelihood and is derivedfrom the full likelihood using the least informative nonparametric Modeling for 0( ), that is, 0( ) has a jump at . Let be the partial likelihood estimator of maximizing (12) with respectto . By standard likelihood theory, it can be shown that (see for exampleTsiatis 1981) the asymptotic distribution ( ) is multivariate normalwith mean zero and a covariance matrix which may be estimated consistentlyby ( 1 ( )) 1, where ( ) = 0[ 2( , ) 0( , ) ( 1( , ) 0( , )) 2] ( )and for = 0,1,2, ( , ) = =1 ( ) (Z ; ) exp{ (Z ; )},where ( ) = 1( , = 1), andx = 1,x,xx , respectively for = 0,1and the baseline hazard 0does not appear in the partial likelihood, itis not estimable from the likelihood.

10 There are several methods for estimatingparameters related to 0. One appealing estimate among them is the Breslowestimator (Breslow 1972, 1974) 0( ) = 0[ =1 ( ) exp{Z }] 1{ =1 ( )},(13)where ( ) = 1( , = 1).Hypothesis testing. After fitting the Cox model, one might be interested inchecking if covariates really contribute to the risk function, for example, checkingif the coefficient vector is zero. More generally, one considers the hypothesistesting problem 0: = the asymptotic normality of the estimator , it follows that the asymptoticnull distribution of the Wald test statistic( 0) ( )( 0)is the chi-squared distribution with degrees of freedom. Standard likelihoodtheory also suggests that the partial likelihood ratio test statistic 1= 2[ ( ) ( 0)](14)and the score test statistic = ( 0) 1( 0) ( 0)have the same asymptotic null distribution as the Wald statistic, where ( 0) = ( 0) is the score function (see for example, Andersen et al.)


Related search queries