Example: bankruptcy

THE TITANIC SHIPWRECK: W MOST LIKELY TO SURVIVE A ...

THE TITANIC shipwreck : WHO WAS most LIKELY TO SURVIVE ? A STATISTICAL ANALYSIS This paper examines the probability of surviving the TITANIC shipwreck using limited dependent variable regression analysis. This applied analysis will explain the impact of sex, age, and passenger class on the likelihood of survival. The probability of survival is examined using a Linear probability model, a Probit model, and a Logit model. 1. INTRODUCTION On April 14, 1912, the unthinkable happened when the unsinkable RMS TITANIC crashed into an iceberg and sunk into the Atlantic Ocean. The 20 lifeboats aboard the ship, a number actually larger than that required by the British Board of Trade at the time, were not enough to save a majority of the passengers, leaving over 1500 passengers aboard the sinking vessel. A total of 705 passengers escaped onto lifeboats and to safety. But not all passengers had an equal chance of getting onto a lifeboat and surviving the 2disaster.

THE TITANIC SHIPWRECK: WHO WAS MOST LIKELY TO SURVIVE? A STATISTICAL ANALYSIS This paper examines the probability of surviving the Titanic shipwreck using limited dependent variable regression analysis.

Tags:

  Most, Titanic, Likely, Shipwreck, The titanic shipwreck, W most likely

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of THE TITANIC SHIPWRECK: W MOST LIKELY TO SURVIVE A ...

1 THE TITANIC shipwreck : WHO WAS most LIKELY TO SURVIVE ? A STATISTICAL ANALYSIS This paper examines the probability of surviving the TITANIC shipwreck using limited dependent variable regression analysis. This applied analysis will explain the impact of sex, age, and passenger class on the likelihood of survival. The probability of survival is examined using a Linear probability model, a Probit model, and a Logit model. 1. INTRODUCTION On April 14, 1912, the unthinkable happened when the unsinkable RMS TITANIC crashed into an iceberg and sunk into the Atlantic Ocean. The 20 lifeboats aboard the ship, a number actually larger than that required by the British Board of Trade at the time, were not enough to save a majority of the passengers, leaving over 1500 passengers aboard the sinking vessel. A total of 705 passengers escaped onto lifeboats and to safety. But not all passengers had an equal chance of getting onto a lifeboat and surviving the 2disaster.

2 It is the purpose of this paper to explain, using regression analysis, the impact of sex, passenger class, and age on a person s likelihood of surviving the shipwreck . 2. HYPOTHESES The first hypothesis is that upper class women have the best chance of survival, followed by middle class women and then lower class women. Second, men s survival rate will also decrease as class lowers. Finally, it is predicted that age will be negatively related to the probability of survival. 3. THE DATA SET The data used in this paper consists of 1046 observations of single passengers aboard the TITANIC . It is important to note that not all passengers aboard the ship are accounted for in this analysis because some characteristics of these passengers were missing. The information provided for each passenger includes age, sex, passenger class (1st, 2nd, or 3rd), and whether or not the passenger survived or died the shipwreck . In this sample, 41 percent of the passengers survived, and 59 percent of the passengers died.

3 The sample mean of age is 30 years old. Sex is a dummy variable for male passengers on the ship. In this sample, 63 percent of the passengers were male, and 37 percent of the passengers were female. A dummy variable was also created for 1st class passengers and 2nd class passengers. For instance, the dummy variable for first class passengers (pclass1) is equal to one if a passenger was in the upper class and equal to zero if they were in any other class (middle-class or lower-class). In this sample, 27 percent of the passengers were upper class, and 25 percent of the passengers were middle class and 48 percent of the passengers were lower class. 3 4. THE MODEL To test the probability of survival a binary choice model is used. The binary choice model may be characterized as a classical regression model subject to qualitative observation of the dependent variable. In this case, the dependent variable will be whether or not the passenger survived the shipwreck . Specifically, assume that iiiXY = satisfies all classical assumptions except that iY is not observed.

4 Instead, a binary variable,iJ, is observed such that 0 if 00 if 1 =>=iiiiYJYJ The model specifies the relationship between the latent variable iY and the regressors. In order to estimate anything, the relationship between the observed variable iJ and the regressors are needed. The relationship follows from the definition of J. That is, iiiiiXXYJ<> >=i iff 0 iff 0 iff 1 which implies that )()()1( iiiiXFXPJP=<== where )( Fis the distribution function of the i . Since the space of iJ is {0, 1} it follows that )(1)0( iiXFJP == 4 Since iJ is discrete, the density function for iJ is iiJiJiiXFXFJf =11)( Then, letting 1 = it follows that the log-likelihood function is )]}(1ln[)1()(ln{)(ln1 iniiiiXFJXFJL = += Clearly, the binary choice model encompassed a wide class of models indexed by the choice of the distribution function, )( iXF. In order to examine the probability of survival the linear probability model, the probit model, and the logit model will be used.

5 LINEAR PROBABILITY MODEL The simplest binary choice model is the linear probability model where the error term is assumed to be independently and identically distributed uniformly on the interval [0, 1]. That is, )1,0(~iidUi . Therefore, iiXXF=)( for 0<1< iX. So, the linear probability model is a linear regression model with heteroskedastic errors. Specifically, ipclasspclasssexageiPCLASSPCLASSSEXAGESURV +++++=2121 Therefore, this linear probability model will be estimated using OLS, but it is important to note that the resulting estimates will be unbiased but inefficient. This is one major drawback of the linear probability model. Table 1 summarizes the OLS estimation. 5 Table 1: Linear Probabilty Model Estimates using OLS Regressor Coefficient Std. Errort-stat Prob >t Con AGE SEX PCLASS1 PCLASS2 One can obtain the following conditional probabilities of surviving from the model: Probability of survival given female & 3rd class: P (S | F, 3) = ^ + ^AGE Age = Probability of survival given female & 2nd class: P (S | F, 2) = (^ + ^2pclass ) + ^AGE Age = Probability of survival given female & 1st class: P (S | F, 1) = (^ + ^1pclass ) + ^AGE Age = Probability of survival given male & 3rd class: P (S | M, 3) = (^ +^sex ) + ^AGE Age = Probability of survival given male & 2nd class.

6 P (S | M, 2) = (^ +^sex + ^2pclass ) + ^AGE Age = 6 Probability of survival given male & 1st class: P (S | M, 1) = (^ + ^sex + ^1pclass ) + ^AGE Age = Therefore, it is clear that these regressions only differ in the intercepts and not in the slope coefficient for Age, ^AGE . This can be seen in the following graph. Graph 1: Probability of Survival Probability of Surviv al-30%-20%-10%0%10%20%30%40%50%60%70%80% 90%100%110%120%0510152025303540455055606 570758085 Age (years)Probability of Survival (%)F,3F,2F,1M,3M,2M,1 By looking at the graph, another major drawback of the linear probability model is clear. The probabilities do not lie within the unit interval. That is why there are probabilities greater than 100% and less than 0%. This is a fundamental limitation. 7 From the graph, it is also easy to understand the interpretations of the coefficients. Ceteris paribus, ^AGE is the effect on the probability of survival due to a one-year increase in age.

7 Therefore, a one-year increase in age will decrease the probability of survival by ^sex is interpreted as the difference in the probability of survival between a male and a female holding age and passenger class constant. ^1pclass is interpreted as the difference in the probability of survival between a first class passenger and a third class passenger holding sex and age constant. ^2pclass is interpreted as the difference in the probability of survival between a second class passenger and a third class passenger holding sex and age constant. In order to test the significance of each regressor, the following hypothesis test was conducted: H0: i = 0 and HA: i 0 where i =AGE, SEX, PCLASS1, PCLASS2 A two tailed test is used and if = or 5%, the two-tailed critical value is Based on this test, we reject the null hypothesis and conclude that individually all regression coefficients are statistically significant, that is, different from zero. PROBIT MODEL The probit model is the specification of the binary choice model that results when the error term is independent and identically distributed normally with mean 0 and variance 1.

8 That is, )1,0(~iidNi . In this case, )()( iiXXF = where )( is the standard normal distribution function. Specifically, 8()21)1(21 PCLASSPCLASSSEXAGESURVP pclasspclasssexagei ++++ == A probit model was estimated and the results are summarized in Table 2. Table 2: Probit Model Estimation Regressor Coefficient Std. Errort-stat Prob >t Con AGE SEX PCLASS1 PCLASS2 In order to test the significance of each regressor, the following hypothesis test was conducted: H0: i = 0 and HA: i 0 where i =AGE, SEX, PCLASS1, PCLASS2 A two tailed test is used and if = or 5%, the two-tailed critical value is Based on this test, we reject the null hypothesis and conclude that individually all regression coefficients are statistically significant, that is, different from zero.

9 It also is important to test the null hypothesis that the regressors are jointly significant. The null hypothesis is as follows: H0: ^AGE = ^sex = ^1pclass =^2pclass = 0 9 That is, all of the explanatory variables explain zero percent of the variation in the dependent variable, SURV. Therefore, a likelihood ratio test can be performed as follows: 21^~~)(ln)(ln2 KALL The value of the unconstrained log-likelihood function is The value of the constrained log-likelihood function is Therefore, the value of the test statistic is The critical value for 24 at = or 1% level of significance is Therefore, the regressors are jointly significant at the 1% level. The marginal effects were also estimated for the probit model and are summarized in Table 3 as follows: Table 3: Probit Model Marginal Effects Regressor Marginal Effects Std. Errort-stat Prob >t AGE SEX PCLASS1 PCLASS2 Unlike the linear probability model, the marginal effect of a one unit increase in an independent variable is not the estimated coefficients in the probit model.

10 Theoretically, the marginal effect of 'iX for the probit model is defined as follows: 10 = )()(/iiiXXX Therefore, the marginal effect of one regressor depends on the entire vector of regressors. As can be seen from Table 3 above, the marginal effect of each regressor is marginally significant at the 1% level. LOGIT MODEL The logit model is the binary choice model that results when the error term is independent with logistic distribution. In this case, iiXXiieeXXF+= =1)()(. Specifically, ()21)1(21 PCLASSPCLASSSEXAGESURVP pclasspclasssexagei ++++ == A logit model was estimated and the results are summarized in Table 4. Table 4: Logit Model Estimation Regressor Coefficient Std. Errort-stat Prob >t Con AGE SEX PCLASS1 PCLASS2 11In order to test the significance of each regressor, the following hypothesis test was conducted: H0: i = 0 and HA: i 0 where i =AGE, SEX, PCLASS1, PCLASS2 A two tailed test is used and if = or 5%, the two-tailed critical value is Based on this test, we reject the null hypothesis and conclude that individually all regression coefficients are statistically significant, that is, different from zero.


Related search queries