Example: tourism industry

SPSS Discriminant Function Analysis

SPSS Discriminant Function Analysis By Hui Bian Office for Faculty Excellence Discriminant Function Analysis What is Discriminant Function Analysis It builds a predictive model for group membership The model is composed of a Discriminant Function based on linear combinations of predictor variables. Those predictor variables provide the best discrimination between groups. 2 Discriminant Function Analysis Purpose of Discriminant Analysis to maximally separate the groups. to determine the most parsimonious way to separate groups to discard variables which are little related to group distinctions 3 Discriminant Function Analysis Summary: we are interested in the relationship between a group of independent variables and one categorical variable. We would like to know how many dimensions we would need to express this relationship. Using this relationship, we can predict a classification based on the independent variables or assess how well the independent variables separate the categories in the classification.

important outliers. 3. For our data, we conclude the groups do differ in their covariance matrices, violating an assumption of DA. 4. when n is large, small deviations from homogeneity will be found significant, which is why Box's M must be interpreted in conjunction with inspection of the log determinants. 23

Tags:

  Outliers

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of SPSS Discriminant Function Analysis

1 SPSS Discriminant Function Analysis By Hui Bian Office for Faculty Excellence Discriminant Function Analysis What is Discriminant Function Analysis It builds a predictive model for group membership The model is composed of a Discriminant Function based on linear combinations of predictor variables. Those predictor variables provide the best discrimination between groups. 2 Discriminant Function Analysis Purpose of Discriminant Analysis to maximally separate the groups. to determine the most parsimonious way to separate groups to discard variables which are little related to group distinctions 3 Discriminant Function Analysis Summary: we are interested in the relationship between a group of independent variables and one categorical variable. We would like to know how many dimensions we would need to express this relationship. Using this relationship, we can predict a classification based on the independent variables or assess how well the independent variables separate the categories in the classification.

2 4 Discriminant Function Analysis It is similar to regression Analysis A Discriminant score can be calculated based on the weighted combination of the independent variables Di = a + b1x1 + b2x2 +..+ bnxn Di is predicted score ( Discriminant score) x is predictor and b is Discriminant coefficient We use maximum likelihood technique to assign a case to a group from a specified cut-off score. If group size is equal, the cut-off is mean score. If group size is not equal, the cut-off is calculated from weighted means. 5 Discriminant Function Analysis Grouping variables Categorical variables Can have more than two values The codes for the grouping variables must be integers Independent variables Continuous Nominal variables must be recoded to dummy variables 6 Discriminant Function Analysis Discriminant Function A latent variable of a linear combination of independent variables One Discriminant Function for 2-group Discriminant Analysis For higher order Discriminant Analysis , the number of Discriminant Function is equal to g-1 (g is the number of categories of dependent/grouping variable).

3 The first Function maximizes the difference between the values of the dependent variable. The second Function maximizes the difference between the values of the dependent variable while controlling the first Function . And so on. 7 Discriminant Function Analysis The first Function will be the most powerful differentiating dimension. The second and later functions may also represent additional significant dimensions of differentiation. 8 Discriminant Function Analysis Assumptions (from SPSS help) Cases should be independent. Predictor variables should have a multivariate normal distribution, and within-group variance-covariance matrices should be equal across groups. Group membership is assumed to be mutually exclusive The procedure is most effective when group membership is a truly categorical variable; if group membership is based on values of a continuous variable (for example, high IQ versus low IQ), consider using linear regression to take advantage of the richer information that is offered by the continuous variable itself.

4 9 Discriminant Function Analysis Assumptions(similar to those for linear regression) Linearity, normality, multilinearity, equal variances Predictor variables should have a multivariate normal distribution. fairly robust to violations of the most of these assumptions. But highly sensitive to outliers . Model specification 10 Discriminant Function Analysis Test of significance For two groups, the null hypothesis is that the means of the two groups on the Discriminant Function -the centroids, are equal. Centroids are the mean Discriminant score for each group. Wilk s lambda is used to test for significant differences between groups. Wilk s lambda is between 0 and 1. It tells us the variance of dependent variable that is not explained by the Discriminant Function . 11 Discriminant Function Analysis Wilk s lambda is also used to test for significant differences between the groups on the individual predictor variables.

5 It tells which variables contribute a significant amount of prediction to help separate the groups. 12 Discriminant Function Analysis Two groups using an example from SPSS manual Example: the purpose of this example is to identify characteristics that are indicative of people who are likely to default on loans, and use those characteristics to identify good and bad credit risks. Sample includes a total of 850 cases (old and new/future customers) The first 700 cases are customers who were previously given loans. Use first 700 customers to create a Discriminant Analysis model, setting the remaining 150 customers aside to validate the Analysis . Then use the model to classify the 150 prospective customers as good or bad credit risks. 13 Discriminant Function Analysis Grouping variable: Default Predictors: employ, address, debtinc, and creaddebt Obtain Discriminant Function Analysis Analyze > Classify > Discriminant 14 Discriminant Function Analysis 15 Discriminant Function Analysis 16 Discriminant Function Analysis Click Classify to get this window 17 Discriminant Function Analysis Click Save to get this window 18 Discriminant Function Analysis SPSS Output: descriptive statistics 19 Discriminant Function Analysis SPSS output: ANOVA table In the ANOVA table, the smaller the Wilks's lambda, the more important the independent variable to the Discriminant Function .

6 Wilks's lambda is significant by the F test for all independent variables. 20 Discriminant Function Analysis SPSS Output (correlation matrix) The within-groups correlation matrix shows the correlations between the predictors. 21 Discriminant Function Analysis SPSS output: test of homogeneity of covariance matrices The larger the log determinant in the table, the more that group's covariance matrix differs. The "Rank" column indicates the number of independent variables in this case. Since Discriminant Analysis assumes homogeneity of covariance matrices between groups, we would like to see the determinants be relatively equal. 22 Discriminant Function Analysis SPSS output: test of homogeneity of covariance matrices 1. Box's M test tests the assumption of homogeneity of covariance matrices. This test is very sensitive to meeting the assumption of multivariate normality. 2. Discriminant Function Analysis is robust even when the homogeneity of variances assumption is not met, provided the data do not contain important outliers .

7 3. For our data, we conclude the groups do differ in their covariance matrices, violating an assumption of DA. 4. when n is large, small deviations from homogeneity will be found significant, which is why Box's M must be interpreted in conjunction with inspection of the log determinants. 23 Discriminant Function Analysis SPSS output: test of homogeneity of covariance matrices larger the eigenvalue, the more of the variance in the dependent variable is explained by that Function . has two categories, there is only one Discriminant Function . canonical correlation is the measure of association between the Discriminant Function and the dependent variable. square of canonical correlation coefficient is the percentage of variance explained in the dependent variable. 24 Discriminant Function Analysis SPSS output: summary of canonical Discriminant functions When there are two groups, the canonical correlation is the most useful measure in the table, and it is equivalent to Pearson's correlation between the Discriminant scores and the groups.

8 Wilks' lambda is a measure of how well each Function separates cases into groups. Smaller values of Wilks' lambda indicate greater discriminatory ability of the Function . The associated chi-square statistic tests the hypothesis that the means of the functions listed are equal across groups. The small significance value indicates that the Discriminant Function does better than chance at separating the groups. 25 Discriminant Function Analysis SPSS output: summary of canonical Discriminant functions The standardized Discriminant Function coefficients in the table serve the same purpose as beta weights in multiple regression (partial coefficient) : they indicate the relative importance of the independent variables in predicting the dependent. They allow you to compare variables measured on different scales. Coefficients with large absolute values correspond to variables with greater discriminating ability. 26 Discriminant Function Analysis SPSS output: summary of canonical Discriminant functions structure matrix table shows the correlations of each variable with each Discriminant Function .

9 2. Only one Discriminant Function is in this study. 3. The correlations then serve like factor loadings in factor Analysis -- that is, by identifying the largest absolute correlations associated with each Discriminant Function the researcher gains insight into how to name each Function . 27 Discriminant Function Analysis SPSS output: summary of canonical Discriminant functions Function is a latent variable that is created as a linear combination of independent variables. variables are independent variables. table shows the Pearson correlations between predictors and standardized canonical Discriminant functions. < .30 may be removed from the model. 28 Discriminant Function Analysis SPSS output: summary of canonical Discriminant functions This table contains the unstandardized Discriminant Function coefficients. These would be used like unstandardized b (regression) coefficients in multiple regression -- that is, they are used to construct the actual prediction equation which can be used to classify new cases.

10 29 Discriminant Function Analysis Discriminant Function : our model should be like this: Di = + + credebt 30 Discriminant Function Analysis SPSS output: summary of canonical Discriminant functions Centroids are the mean Discriminant scores for each group. This table is used to establish the cutting point for classifying cases. If the two groups are of equal size, the best cutting point is half way between the values of the functions at group centroids (that is, the average). If the groups are unequal, the optimal cutting point is the weighted average of the two values. The computer does the classification automatically, so these values are for informational purposes. 31 Discriminant Function Analysis The centroids are calculated based on the Function : Di = + + credebt Centroids are Discriminant score for each group when the variable means (rather than individual values for each case) are entered into the Function .


Related search queries