Example: barber

Title stata.com ranksum — Equality tests on unmatched data

Equality tests on unmatched dataDescriptionQuick startMenuSyntaxOptions for ranksumOptions for medianRemarks and examplesStored resultsMethods and formulasReferencesAlso seeDescriptionranksumtests the hypothesis that two independent samples (that is,unmatcheddata) are frompopulations with the same distribution by using the Wilcoxon rank-sum test, which is also known asthe Mann Whitney two-sample statistic (Wilcoxon 1945; Mann and Whitney 1947).medianperforms a nonparametrick-sample test on the Equality of medians. It tests the nullhypothesis that theksamples were drawn from populations with the same median. For two samples,the 2test statistic is computed both with and without a continuity for use withunmatcheddata. For Equality tests on matched data , see[R] startWilcoxon rank-sum testTest for Equality of distributions ofvover two groups defined by the levels ofcatvar1ranksum v, by(catvar1)Compute an exactp-value for the Wilcoxon rank-sum testranksum v, by(catvar1) exactEstimate the probability that a case from the first level ofcatvar1has a greater value ofvthan acase from the second level ofcatvar1ranksum v, by(catvar1) po

r(p) p-value for Pearson’s ˜2 test r(p cc) continuity-corrected p-value r(p exact) Fishers exact p-value r(p1 exact) one-sided Fishers exact p-value Methods and formulas For a practical introduction to these techniques with an emphasis on examples rather than theory, seeAcock(2018),Bland(2015), orSprent and Smeeton(2007).

Tags:

  Tests, Data, Equality, Fisher, Exact, Unmatched, Ranksum equality tests on unmatched data, Ranksum, S exact

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of Title stata.com ranksum — Equality tests on unmatched data

1 Equality tests on unmatched dataDescriptionQuick startMenuSyntaxOptions for ranksumOptions for medianRemarks and examplesStored resultsMethods and formulasReferencesAlso seeDescriptionranksumtests the hypothesis that two independent samples (that is,unmatcheddata) are frompopulations with the same distribution by using the Wilcoxon rank-sum test, which is also known asthe Mann Whitney two-sample statistic (Wilcoxon 1945; Mann and Whitney 1947).medianperforms a nonparametrick-sample test on the Equality of medians. It tests the nullhypothesis that theksamples were drawn from populations with the same median. For two samples,the 2test statistic is computed both with and without a continuity for use withunmatcheddata. For Equality tests on matched data , see[R] startWilcoxon rank-sum testTest for Equality of distributions ofvover two groups defined by the levels ofcatvar1ranksum v, by(catvar1)Compute an exactp-value for the Wilcoxon rank-sum testranksum v, by(catvar1) exactEstimate the probability that a case from the first level ofcatvar1has a greater value ofvthan acase from the second level ofcatvar1ranksum v, by(catvar1) porderNonparametric Equality -of-medians testEquality of medians test forvover two or more groups defined by the levels ofcatvar2median v, by(catvar2)Also report fisher s exact testmedian v, by(catvar2) exactAs above, but split cases at the median evenly between the above and below groupsmedian v, by(catvar2) exact medianties(split)

2 12 ranksum Equality tests on unmatched dataMenuranksumStatistics>Nonparametric analysis> tests of hypotheses>Wilcoxon rank-sum testmedianStatistics>Nonparametric analysis> tests of hypotheses>K-sample Equality -of-medians testSyntaxWilcoxon rank-sum testranksumvarname[if] [in], by(groupvar)[ exact porder]Nonparametric Equality -of-medians testmedianvarname[if] [in] [weight], by(groupvar)[medianoptions]ranksumoption sDescriptionMain by(groupvar)grouping variableexactreport exactp-value for rank-sum test; by default, exactp-valueis computed when total sample size 200porderprobability that variable for first group is larger than variable forsecond groupmedianoptionsDescriptionMain by(groupvar)grouping variableexactreportp-value from fisher s exact testmedianties(below)assign values equal to the median to below groupmedianties(above)assign values equal to the median to above groupmedianties(drop)drop values equal to the median from the analysismedianties(split)split values equal to the median equally between the two groups by(groupvar)is allowed withranksumandmedian; see[U] Prefix are allowed withmedian; see[U] Equality tests on unmatched data 3 Options for ranksum Main by(groupvar)is required.

3 It specifies the name of the grouping that the exactp-value be computed in addition to the approximatep-value. The exactp-value is based on the actual randomization distribution of the test statistic. The approximatep-value is based on a normal approximation to the randomization distribution. By default, the exactp-value is computed for sample sizesn=n1+n2 200 because the normal approximation maynot be precise in small samples. The exact computation can be suppressed by sample sizes larger than 200, you must specifyexactto compute the exactp-value. The exactcomputation is available for sample sizesn 1000. As the sample size approaches 1,000, thecomputation takes significantly an estimate of the probability that a random draw from the first population is largerthan a random draw from the second for median Main by(groupvar)is required.

4 It specifies the name of the grouping thep-value calculated by fisher s exact test. For two samples, both one- and two-sidedp-values are (below|above|drop|split)specifies how values equal to the overall median are tobe handled. The median test computes the median forvarnameby using all observations and thendivides the observations into those falling above the median and those falling below the values for an observation are equal to the sample median, they can be dropped from theanalysis by specifyingmedianties(drop); added to the group above or below the median byspecifyingmedianties(above)ormediantie s(below), respectively; or if there is more than1 observation with values equal to the median, they can be equally divided into the two groups byspecifyingmedianties(split).

5 If this option is not specified,medianties(below)is and 1We are testing the effectiveness of a new fuel additive. We run an experiment with 24 cars: 12cars with the fuel treatment and 12 cars without. We input these data by creating a dataset with the mileage rating, andtreatrecords 0 if the mileage corresponds tountreated fuel and 1 if it corresponds to treated ranksum Equality tests on unmatched data . use ranksum mpg, by(treat)Two-sample Wilcoxon rank-sum (Mann--Whitney) testtreatObs Rank sum ExpectedUntreated12 128 150 Treated12 172 150 Combined24 300 300 Unadjusted variance for ties variance : mpg(treat==Untreated) = mpg(treat==Treated)z = > |z| = prob = the total sample is only 24 cars, the exactp-value is computed by default.

6 If the sample sizewere greater than 200, we would have to specify theexactoption if we wanted the the small sample size, thep-value computed using a normal approximation, , issimilar to the exactp-value, These results indicate that the distributions are not statisticallydifferent at a significance , the median test,. median mpg, by(treat) exactMedian testGreaterWhether car receivedthan thefuel additivemedianUntreated TreatedTotalno7 512yes5 712 Total12 1224 Pearson chi2(1) = Pr = s exact = fisher s exact = corrected:Pearson chi2(1) = Pr = to reject the null hypothesis that there is no difference between the fuel with the additive andthe fuel without the these results from these two tests with those obtained from thesignrankandsigntestwhere we found significant differences; see [R]signrank.

7 An experiment run on 24 different cars isnot as powerful as a before-and-after comparison using the same 12 Equality tests on unmatched data 5 Stored resultsranksumstores the following inr():Scalarsr(N)sample sizer(N1)sample size of first groupr(N2)sample size of second groupr(z)zstatisticr(Vara)adjusted variancer(group1)value of variable for first groupr(sumobs)observed sum of ranks for first groupr(sumexp)expected sum of ranks for first groupr(p)two-sidedp-value from normal approximationr(pl)lower one-sidedp-value from normal approximationr(pu)upper one-sidedp-value from normal approximationr(pexact)two-sided exactp-valuer(plexact)lower one-sided exactp-valuer(puexact)upper one-sided exactp-valuer(porder)probability that draw from first population is larger than draw from second populationmedianstores the following inr().

8 Scalarsr(N)sample sizer(chi2)Pearson s 2r(chi2cc)continuity-corrected Pearson s 2r(groups)number of groups comparedr(p)p-value for Pearson s 2testr(pcc)continuity-correctedp-valuer( pexact) fisher s exactp-valuer(p1exact)one-sided fisher s exactp-valueMethods and formulasFor a practical introduction to these techniques with an emphasis on examples rather than theory,see Acock (2018), Bland (2015), or Sprent and Smeeton (2007). For a summary of these tests , seeSnedecor and Cochran (1989).Methods and formulas are presented under the following headings:ranksummedianranksumFor the Wilcoxon rank-sum test, there are two independent random variables,X1andX2, and wetest the null hypothesis thatX1 X2. We have a sample of sizen1fromX1and another of data are then ranked without regard to the sample to which they belong.

9 If the data are tied,averaged ranks are used. Wilcoxon s test statistic (1945) is the sum of the ranks for the observationsin the first sample:T=n1 i=1R1i6 ranksum Equality tests on unmatched dataMann and Whitney sUstatistic (1947) is the number of pairs(X1i,X2j)such thatX1i> statistics differ only by a constant:U=T n1(n1+ 1)2 fisher s principle of randomization provides a method for calculating the distribution of the teststatistic. The randomization distribution consists of all the possible values ofTresulting from the(nn1)ways to choosen1ranks from the set of alln=n1+n2observed ranks (untied or tied) andassign them to the first sample. When theexactoption is specified (or implied forn 200), thisdistribution is computed using a recursive algorithm whose computational time is proportional ton4.

10 (See fisher [1935] for the principle of randomization; Wilcoxon, Katti, and Wilcox [1970] for thecomputation with untied ranks; and Hill and Peto [1971] for the general recursive algorithm.)p-values can also be computed using a normal approximation to the randomization distribution. Itis a straightforward exercise to verify thatE(T) =n1(n+ 1)2andVar(T) =n1n2s2nwheresis the standard deviation of the combined ranks,ri, for both groups:s2=1n 1n i=1(ri r)2 This formula for the variance is exact and holds both when there are no ties and when there areties and we use averaged ranks. (Indeed, the variance formula holds for the randomization distributionof choosingn1numbers from any set ofnnumbers.)For the normal approximation, we calculatez=T E(T) Var(T)When theporderoption is specified, the probabilityp=Un1n2is noteWe follow the great majority of the literature in naming these tests for Wilcoxon, Mann, andWhitney.


Related search queries