Example: confidence

Stata: Bivariate Statistics - Population Survey Analysis

Page 1 of 8 Stata: Bivariate Statistics Topics: Chi-square test, t-test, Pearson s R correlation coefficient - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - There are three situations during Survey data Analysis in which Bivariate Statistics are commonly used. 1. Compare two groups First, Bivariate Statistics are used to compare two study groups to see if they are similar. For example, to compare two groups at baseline before an intervention is implemented, or to compare participants who are lost to follow up to those who remained in the study.

data analysis when tens of thousands of respondents are interviewed. If we have a response category with fewer than five observations, then we should combine it with another category. The chi-square test statistic is simple to implement in Stata. In fact, we have been doing it all along! Each time we use the tabulate command with

Tags:

  Nets, Bivariate

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of Stata: Bivariate Statistics - Population Survey Analysis

1 Page 1 of 8 Stata: Bivariate Statistics Topics: Chi-square test, t-test, Pearson s R correlation coefficient - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - There are three situations during Survey data Analysis in which Bivariate Statistics are commonly used. 1. Compare two groups First, Bivariate Statistics are used to compare two study groups to see if they are similar. For example, to compare two groups at baseline before an intervention is implemented, or to compare participants who are lost to follow up to those who remained in the study.

2 When comparing groups, we want to provide strong evidence of any group differences, so we use a conservative threshold of p< to determine statistical significance. In this course, we are learning to analyze research questions with binary outcomes. Bivariate Statistics can be used to summarize and compare characteristic across groups. For example, were there differences in social-demographic characteristics of women who did and did not experience intimate partner violence in the last 12 months? 2. Identify covariates for general explanatory model When a characteristic like age is different in people who did and did not experience the outcome, we say that the characteristic is associated with the outcome.

3 This is because the characteristic helps to explain variance in the outcome. In cross sectional data Analysis , we cannot draw causal conclusions. We are not talking about causal Page 2 of 8 mechanisms that predict the outcome. Although woman s age group might be associated with whether or not she experienced intimate partner violence in the last 12 months, the biological process of aging does not cause her partner to act violently toward her. Rather, we are staying that a characteristic (like older age) tends to be present when the outcome is present.

4 When we are developing a general explanatory model when the research question is Which factors are associated with [the outcome]? - then we use Bivariate Statistics to identify potential covariates that are worth testing in a multivariable model. If a variable is independently associated with the outcome, it might continue to explain the outcome once other factors are taken into account. In this case, when Bivariate Statistics are used for the purpose of filtering potential covariates in multivariate Analysis , we use a generous threshold of p< to determine statistical significance to ensure that we do not drop any potentially useful variables from the Analysis .

5 Note, the same statistical test used to compare two groups (usually the chi-square test in logistic regression), is the same test and output that we use here to filter variables. The only difference is in purpose of the test, and therefore our interpretation of its results are different. Page 3 of 8 3. Chi-square test The chi-square test is a common Bivariate statistic used to test whether the distribution in a categorical variable is statistically different in two or more groups. The chi-square test gives a yes/no answer - a p-value less than the threshold means, yes, there are differences between the two groups.

6 In a manuscript, if you see a p-value next to a categorical variable (with data summarized as percentages), this is usually a chi-square test statistic. The chi-square test statistic p-value is easy to interpret after you have set a threshold for statistical significance either the distributions are, or are not, that same. The chi-square test is a global statistic; it tells if you if there are any differences across cells, though it does not tell you which cell(s) are different. You can often tell which cells are different qualitatively based on the percentages, though additional or different testing might be performed to isolate whether certain cells are statistically different from the rest.

7 You should not use the chi-square test statistic if one or more cells in the cross tabulation has fewer than five observations, though this is incredibly rare in Survey data Analysis when tens of thousands of respondents are interviewed. If we have a response category with fewer than five observations, then we should combine it with another category. The chi-square test statistic is simple to implement in Stata. In fact, we have been doing it all along! Each time we use the tabulate command with Survey data (by starting with svy:), we are producing a Pearson s chi-square F-statistic and p-value.

8 Source: Manzi, A., et al. (2014) BMC Pregnancy and Childbirth Page 4 of 8 4. T-test A t-test is used to test whether the distribution of a continuous variable is statistically different across groups a p-value less than the threshold means, yes, there are differences. Do NOT use a t-test when the distribution of outcomes within groups are not normal, or when the variance is not the same across groups. In these situations, consider transforming the variable (we do not discuss this further in this course), or categorize the continuous values and test it as a categorical variable.

9 You can produce t-test Statistics for a continuous variable across two or more groups with Survey data by specifying a linear regression, and testing for differences in the outcomes across group categories. Page 5 of 8 5. Test for collinearity among two covariates Before fitting any kind of multivariate model whether a general explanatory model or a hypothesis test model you should test for collinearity. Collinearity occurs when two covariates in a multivariable model are highly related; usually this is because the two variables represent the same thing (the same concept, or they happen simultaneously).

10 For example, in a society where husbands and wives tend to have the same level of education, then woman s education status and men s education status represent the same construct within households. Wife s education might do a good job explaining variance in the outcome, leaving little left over variance to be explained by husband s education. As a result, the model becomes unstable. To produce parsimonious (efficient) multivariable models, and to prevent strange, unstable results, we test for strong associations among covariates and remove any collinear covariates from the Analysis .


Related search queries