Transcription of t-Test Statistics
1 1t-Test Statistics Overview of Statistical tests Assumption: Testing for Normality The Student s t-distribution Inference about one mean (one sample t-Test ) Inference about two means (two sample t-Test ) Assumption: F-test for Variance Student s t-Test - For homogeneous variances- For heterogeneous variances Statistical Power2 Overview of Statistical TestsDuring the design of your experiment you mustspecify what statistical procedures you will require at least 3 pieces of info:Type of VariableNumber of VariablesNumber of SamplesThen refer to end-papers of Sokal and Rohlf (1995)-REVIEW-3 AssumptionsVirtually every statistic, parametric or nonparametric,has assumptions which must be met priorto the application of the experimental designand subsequent statistical will discuss specific assumptions associated with individual tests as they come all parametric Statistics have anassumption that the data come from a populationthat follows a known of the tests we will evaluate in this module require a normal distribution.
2 4 Assumption: Testing for NormalityReview and Commentary:D Agostino, , A. Belanger, and D Agostino. 1990. A suggestion for using powerful and informative tests of normality. The American Statistician 44: 316-321.(See Course Web Page for PDF version.)Most major normality tests have corresponding R code available in either the base stats package or affiliated package. We will review the options as we are 5 major tests used:Shapiro-Wilk W testAnderson-Darling testMartinez-Iglewicz testKolmogorov-Smirnov testD Agostino Omnibus testNB: Power of all is weak if N < 10 6 Shapiro-Wilk W testDeveloped by Shapiro and Wilk (1965).One of the most powerful overall is the ratio of two estimates of variance (actual calculation is cumbersome; adversely affected by ties).Test statistic is W; roughly a measure of the straightness of the quantile-quantile closer W is to 1, the more normal the sample in R and most other major stats applications.
3 Q-Q PlotTheoretical QuantilesSample QuantilesExample in R(Tutorial-3)8 Anderson-Darling test Developed by Anderson and Darling (1954). Very popular test. Based on EDF (empirical distribution function) percentile Statistics . Almost as powerful as Shapiro-Wilk W TestBased on the median & robust estimator of powerful well with small sample useful for symmetrically skewed value close to indicates recommended during available in R. 10 Kolmogorov-Smirnov TestCalculates expected normal distribution and compares it with the observed cumulative distribution on the max difference between two for discrimination below N = to detect differences is in R and most other stats Agostino et al. TestsBased on coefficients of skewness ( b1) and kurtosis (b2).If normal, b1=1 and b2=3 ( tests based on this).Provides separate tests for skew and kurt:- Skewness test requires N 8- Kurtosis test best if N > 20 Provides combined Omnibus test of in t-Distributiont-distribution is similar to Z-distributionNote similarity:The functional difference is between and identical when N > like Z, the t-distribution can be used for inferences about.
4 One would use the t-statistic when is not known and S is (the general case).Z= y / Nvs. t= y S/ N 13 The t-DistributionStandardNormal (Z)t, = 12t, = 6 See AppendixStatistical Table C 14 One Sample t-Test - Assumptions -The data must be data must follow the normal probability sample is a simple random sample from its Sample t-testt= y S/ N y t 2 ,dfSE y y t 2 ,dfSE ydf s2 2 /2,df 2 df s2 21 /2,df 16 One Sample t-Test - Example -Twelve (N = 12) rats are weighed before and after being subjected to a regimen of forced exercise. Each weight change (g) is the weight after exercise minus the weight , , , , , , , , , , , : = 0HA: 017 One Sample t-Test - Example -> W<-c( , , , , , , , , , , , )> summary(W) Min. 1st Qu. Median Mean 3rd Qu. Max. > hist(W, col= red )> (W)Shapiro-Wilk normality testdata: W W = , p-value = > W<-c( , , , , , , , , + , , , )> W [1] > (W, mu=0) One Sample t-testdata: W t = , df = 11, p-value = hypothesis: true mean is not equal to 0 95 percent confidence interval: sample estimates:mean of x One-sample t-Test using R 19 One Sample t-testFor most statistical procedures, one will want to do a post-hoc test (particularly in the case of failing to reject H0) of the required sample size necessary to test the hypothesis.
5 For example, how large of a sample size would be needed to reject the null hypothesis of the one-sample t-Test we just did?Sample size questions and related error rates are best explored through a power > (n=15, delta= , sd= , , type=" ") One-sample t test power calculation n = 15 delta = 1 sd = = power = alternative = > (n=20, delta= , sd= , , type=" ") One-sample t test power calculation n = 20 delta = 1 sd = = power = alternative = > (n=25, delta= , sd= , , type=" ") One-sample t test power calculation n = 25 delta = 1 sd = = power = Sample t-Test - Assumptions -The data are continuous (not discrete).The data follow the normal probability variances of the two populations are equal.
6 (If not, the Aspin-Welch Unequal-Variance test is used.)The two samples are independent. There is no relationship between the individuals in one sample as compared to the samples are simple random samples from their respective populations. 22 Two Sample t-testDetermination of which two-sample t-testto use is dependent upon first testing the variance assumption:Two Sample t-Test for Homogeneous VariancesTwo-Sample t-Test for Heterogeneous Variances23 Variance Ratio F-test - Variance Assumption -Must explicitly test for homogeneity of varianceHo: S12 = S22Ha: S12 S22 Requires the use of F-test which relieson the = S2max / S2minGet Ftable at N-1 df for each sampleIf Fcalc < Ftable then fail to reject Ratio F-test - Example -Suppose you had the following sample data:Sample-ASa2 = = 12 Sample-BSb2 = = 8 Fcalc = = = (df = 11,7)Decision: Fcalc < Ftable therefore fail to reject : the variances are homogeneous.
7 25 Variance Ratio F-test - WARNING -Be careful! The F-test for variance requires that the two samples are drawn from normal populations ( , must test normality assumption first).If the two samples are not normally distributed, do not use Variance Ratio F-test !Use the Modified Levene Equal-Variance Levene Equal-Variance TestFirst, redefine all of the variates as a function of the difference with their respective perform a two-sample ANOVA to get F for redefined test of homogeneity of variance currently available in R, but code easily written and xj Medx z2j= yj Medy 27 Two-sample t-Test - for Homogeneous Variances -Begin by calculating the mean & variance for each of your two determine pooled variance Sp2:Theoretical formula Machine formulaSp2= i=1N1 yi yi 2 j=1N2 yj yj 2 N1 1 N2 1 = N1 1 S12 N2 1 S22 N1 N2 2 28 Two-sample t-Test - for Homogeneous Variances -Determine the test statistic tcalc.
8 Go to t-table (Appendix) at the appropriate and df to determine ttablet= y1 y2 Sp2N1 Sp2N2df=N1 N2 229 Two-sample t-Test - Example -Suppose a microscopist has just identifiedtwo potentially different types of cells basedupon differential separates them out in to two groups (amber cells and blue cells). She suspects there may be a difference in cell wall thickness (cwt) so she wishes to test the hypothesis:Ho: ACcwt = BCcwtHa: ACcwt BCcwt30 Two-sample t-Test - Example -Parameter : She counts the number of cells in one randomly chosen field of view. SS is the sum of squares (theor. formula), or numerator of the variance equation. 31 Two-sample t-Test - Example -Ho: ACcwt = BCcwtHa: ACcwt BCcwtAt = , df = 30ttable = < ttableTherefore Fail to reject wall thicknessis similar btw 2 17= 18 2=3032> B<-c( , , , , , )> G<-c( , , , , , , )> (B,G) F test to compare two variancesdata: B and G F = , num df = 5, denom df = 6, p-value = hypothesis: true ratio of variances is not equal to 1 95 percent confidence interval: sample estimates:ratio of variances Two-sample t-Test using R33 Two-sample t-Test using R> (B,G, ) Two Sample t-testdata: B and G t = , df = 11, p-value = hypothesis: true difference in means is not equal to 0 95 percent confidence interval: sample estimates:mean of x mean of y 34 Two-sample t-Test - for Heterogeneous Variances -Q.
9 Suppose we were able to meet the normality assumption, but failed the homogeneity of variance test. Can we still perform a t-Test ?A. Yes, but we but must calculate an adjusted degrees of freedom (df).35 Two-sample t-Test - Adjusted df for Heterogeneous Variances -Performs the t-Test in exactly the same fashion as for homogeneous variances; but, you must enter the table at a different df. Note that this can have a big effect on S12N1 S22N2 2 S12N1 2N1 1 S22N2 2N2 136> Captive<-c(10,11,12,11,10,11,11)> Wild<-c(9,8,11,12,10,13,11,10,12)> (Captive,Wild) F test to compare two variancesdata: Captive and Wild F = , num df = 6, denom df = 8, p-value = hypothesis: true ratio of variances is not equal to 1 95 percent confidence interval: sample estimates:ratio of variances Two-sample t-Test using R- Heterogeneous Variance - 37> (Captive, Wild) Welch Two Sample t-testdata: Captive and Wild t = , df = , p-value = hypothesis: true difference in means is not equal to 0 95 percent confidence interval: sample estimates.
10 Mean of x mean of y Two-sample t-Test using R- Heterogeneous Variance -38 Matched-Pair t-testIt is not uncommon in biology to conduct an experiment whereby each observation in a treatment sample has a matched pair in a control , we have violated the assumption of independence and can not do a standard matched-pair t-Test was developed to address this type of experimental t-testBy definition, sample sizes must be designs arise when:Same obs are exposed to 2 treatments over and after experiments (temporally related).Side-by-side experiments (spatially related).Many early fertilizer studies used this design. One plot received fertilizer, an adjacent plot did not. Plots were replicated in a field and plant yield measured. 40 Matched-Pair t-testThe approach to this type of analysis is a bit counter though there are two samples , you will work with only one sample composed of:STANDARD DIFFERENCESand df = Nab - 141 The data are continuous (not discrete).