Example: dental hygienist

Summary Statistics in SAS - Mark Irwin

Summary Statistics in SASS tatistics 135 Autumn 2005 Copyrightc 2005 by Mark E. IrwinSummary Statistics in SAST here are a number of approaches to calculating Summary Statistics most common three are PROC MEANSP rovides data summarization tools to compute descriptive Statistics forvariables across all observations and within groups of observations. PROC UNIVARIATEC alculates many of the Statistics thatPROC MEANS plus some standardunivariate graphical summaries, comparison of data to fixed distributions,and parameter estimation PROC TABULATED isplays descriptive Statistics in tabular format, using some or all of thevariables in a data set. You can create a variety of tables ranging fromsimple to highly Statistics in SAS1 PROC TABULATE computes many of the same Statistics that are computedby other descriptive statistical procedures such asPROC MEANS,PROCFREQ, andPROC :Roofing Shingle SalesData on sales last year in 49 sales districts were collected for a maker ofasphalt roofing shingles.

PROC MEANS † Calculates descriptive statistics based on moments † Estimates quantiles, which includes the median † Calculates confldence limits for the mean † Identifles extreme values † Performs a t test. PROC MEANS 3

Tags:

  Name, Statistics, Summary, Median, Summary statistics in sas

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of Summary Statistics in SAS - Mark Irwin

1 Summary Statistics in SASS tatistics 135 Autumn 2005 Copyrightc 2005 by Mark E. IrwinSummary Statistics in SAST here are a number of approaches to calculating Summary Statistics most common three are PROC MEANSP rovides data summarization tools to compute descriptive Statistics forvariables across all observations and within groups of observations. PROC UNIVARIATEC alculates many of the Statistics thatPROC MEANS plus some standardunivariate graphical summaries, comparison of data to fixed distributions,and parameter estimation PROC TABULATED isplays descriptive Statistics in tabular format, using some or all of thevariables in a data set. You can create a variety of tables ranging fromsimple to highly Statistics in SAS1 PROC TABULATE computes many of the same Statistics that are computedby other descriptive statistical procedures such asPROC MEANS,PROCFREQ, andPROC :Roofing Shingle SalesData on sales last year in 49 sales districts were collected for a maker ofasphalt roofing shingles.

2 Sales in 1000s of squares (sales) Promotional expenditures in 1000s of $ (promotion) Number of active accounts (accounts) Number of competing brands (brands) District potential (potential) Summary Statistics in SAS2 PROC MEANS Calculates descriptive Statistics based on moments Estimates quantiles, which includes the median Calculates confidence limits for the mean Identifies extreme values Performs a t MEANS3 PROC MEANS <option(s)> <statistic-keyword(s)>;BY <DESCENDING> variable-1 <..<DESCENDING> variable-n> <NOTSORTED>;CLASS variable(s) </ option(s)>;FREQ variable;ID variable(s);OUTPUT <OUT=SAS-data-set> <output-statistic-specification(s)> <id-group-specification(s)> <maximum-id-specification(s)> <minimum-id-specification(s)> </ option(s)> ;TYPES request(s);VAR variable(s) < / WEIGHT=weight-variable>;WAYS list;WEIGHT variable;There are a wide range of Statistics calculated in thisPROC. These includePROC MEANS4 Descriptive Statistics :N, NMISS, MEAN, STDDEV|STD, VAR, MIN, MAX, RANGE, CV,SKEWNESS|SKEW, KURTOSIS|KURT, STDERR, CSS, SUM, SUMWGT, USS,CLM(2-sided CI of ),LCLM, UCLM(1-sided CI of )The default Statistics areN, MEAN, STD, MIN, MAX Quantile Statistics : median |P50, Q3|P75, P1, P90, P5, P95, P10, P99, Q1|P25, QRANGE Hypothesis testingPROBT, TPROC MEANS5 There any many options available in thisPROC.

3 The most useful are DATA = SAS-data-set: Sets the data set for thePROC. ALPHA = (default = ): This sets confidence level to be1 forthe confidence procedures. FW = field-width: Specifies the field width to display Statistics indisplayed output. Has no effect on values saved in an output data set. PRINT|NOPRINT(default =PRINT): Specifies whether output is to MEANS6 PROC MEANS DATA = shingles;TITLE PROC MEANS Output of Roofing Shingle Sales ;TITLE2 Default Output ;VAR sales promotion accounts brands potential;PROC MEANS Output of Roofing Shingle Sales 2 Default Output 19:43 Sunday, November 27, 2005 The MEANS ProcedureVariable N Mean Std Dev Minimum Maximum--------------------------------- ---------------------------------------- -sales 49 49 49 49 49 MEANS7 PROC MEANS DATA = shinglesMEAN STD MIN Q1 median Q3 MAX CLM PROBT T /* Statistics */ALPHA = FW = 8; /* options */TITLE PROC MEANS Output of Roofing Shingle Sales ;TITLE2 Statistics Selected.

4 VAR sales promotion accounts brands potential;PROC MEANS Output of Roofing Shingle Sales 3 Statistics Selected 19:43 Sunday, November 27, 2005 The MEANS ProcedureLower UpperVariable Mean Std Dev Minimum Quartile median Quartile-------------------------------- ---------------------------------------- ---sales MEANS8 Lower 99% Upper 99%Variable Maximum CL for Mean CL for Mean Pr > |t| t Value----------------------------------- --------------------------------------sa les <.0001 <.0001 <.0001 <.

5 0001 <.0001 MEANS9 PROC UNIVARIATE descriptive Statistics based on moments (including skewness andkurtosis), quantiles or percentiles (such as the median ), frequency tables,and extreme values histograms and comparative histograms. Optionally, these can be fittedwith probability density curves for various distributions and with kerneldensity estimates. quantile-quantile plots (Q-Q plots) and probability plots. These plotsfacilitate the comparison of a data distribution with various theoreticaldistributions. goodness-of-fit tests for a variety of distributions including the normal the ability to inset Summary Statistics on plots produced on a graphicsdevicePROC UNIVARIATE10 the ability to analyze data sets with a frequency variable the ability to create output data sets containing Summary Statistics ,histogram intervals, and parameters of fitted curvesPROC UNIVARIATE < options > ;BY variables ;CLASS variable-1 <(v-options)> < variable-2 <(v-options)> > < / KEYLEVEL= value1 | ( value1 value2 ) >;FREQ variable ;HISTOGRAM < variables > < / options > ;ID variables ;INSET keyword-list < / options > ;OUTPUT < OUT=SAS-data-set > < keyword1= > < percentile-options >;PROBPLOT < variables > < / options > ;QQPLOT < variables > < / options > ;VAR variables ;WEIGHT variable ;PROC UNIVARIATE11 ThisPROC generates a very large amount of output by default, and otheroptions will increase it.

6 Some useful ones are ALPHA = (default = ): This sets default confidence level to be1 for the confidence procedures. Can be overridden for specificintervals CIBASIC <(<TYPE = keyword> <ALPHA = )>: Gives confidenceintervals for , , and 2assuming the data is normally whether the interval isTWOSIDED(default),LOWER, orUPPER. CIPCTLDF <(<TYPE = keyword> <ALPHA = )>CIQUANTDF <(<TYPE = keyword> <ALPHA = )>:Calculates confidence intervals for quantiles by a distribution-free methodbased on the keywordsLOWER, UPPER, SYMMETRIC(default), UNIVARIATE12 CIPCTLNORMAL <(<TYPE = keyword> <ALPHA = )>CIQUANTNORMAL <(<TYPE = keyword> <ALPHA = )>:Calculates confidence intervals for quantiles assuming normallydistributed data. The options are the same as those forCIBASIC. MU0 = 0: Sets the null hypothesis for the location parameter for testsof location. If you specify one value, it is used for all variables.

7 Ifyou specify more than one, you must specify the variables with aVARstatement. The default value is 0. NEXTROBS =n: Specifies the number of extreme observations (nsmallest andnlargest) to be displayed for each variable. NORMAL: Generates 4 tests of normality - Shapiro-Wilk, Kolmogorov-Smirnov, Anderson-Darling, and Cramer-von Mises. I suspect, but can tconfirm that the Kolmogorov-Smirnov test is actually the Lilliefors test asyou don t want to specify a mean and variance of the normal for the test,which would be required for the strict use of the UNIVARIATE13 PLOT: Produces stem-and-leaf, box plot, and normal probability plot inline-printer output. If aBYstatement is used, side-by-side box plots aregenerated. ROBUSTSCALE: Generates a table of robust estimates of scale. Theseinclude the interquartile range, Gini s mean difference, median absolutedeviation around the median (MAD), plus a couple more due toRousseeuw and Croux (1993).

8 TRIMMED=values <(<TYPE = keyword> <ALPHA = )>TRIM=values <(<TYPE = keyword> <ALPHA = )>: Generates atable of trimmed means wherevaluespecifies the number or proportionof observations trimmed. WINSORIZED=values <(<TYPE = keyword> <ALPHA = )>WINSOR=values <(<TYPE = keyword> <ALPHA = )>: Generates atable of Winsorized means, a robust measure of location. The optionswork the same as UNIVARIATE14 VARDEF=divisor: Specifies the divisor to use in calculating are 4 choicesValueDivisorFormula for DivisorDFDegrees of freedomn 1 NNumber of observationsnWDFSum of Weights minus one( iwi) 1 WEIGHT|WGTSum of Weights iwiLets now look at the various statements that can be included in aPROCUNIVARIATE block VAR: Specifies the analysis variables and there order in the results. Ifomitted, all variables will be analyzed. If you are going to store resultsfrom the analysis, this is required. BY: Generates separate analyses for each combination of the variablesgiven.

9 The default is to expect the data set to be sorted by theBYvariables. This can be overridden by UNIVARIATE15 CLASS: Specifies one or two variables that the procedure uses to groupthe data into classification levels. An option toBYthat doesn t requiresorting your data. However it is restricted to at most 2 variables whereBYcan have more. FREQ: Allows specification of a numeric variable whose value representsthe frequency of the observation. WEIGHT: Specifies numerical weights for analysis variables in thecalculations. This is similar toFREQ, but allows for non-integer main use of this is to assume that the variance of observationisatisfiesVar(Xi) = 2wiWhen calculating Summary moments, the weighted versions look like xw= iwixi iwis2w=1d iwi(xi xw)2 PROC UNIVARIATE16wheredis taken from theVARDEF option. ID: Specifies one or more variables to include in the table of extremeobservations. HISTOGRAM: Creates histograms and optionally superimposes estimatedparametric and non-parametric density curves.

10 The parametricdistributions that can be fit are Beta, Exponential, Gamma, Lognormal,Normal, and Weibull. (Will discuss more later when discussing graphics). PROBPLOT: Creates a probability plot, which compares the orderedvariable values with the percentiles of a specified theoretical distribution(default =NORMAL). The distributions available are the beta, exponential,gamma, lognormal, normal, two-parameter Weibull, and three-parameterWeibull. QQPLOT: Creates quantile-quantile plots (Q-Q plots) using high-resolutiongraphics and compares ordered variable values with quantiles of a specifiedtheoretical UNIVARIATE17Q-Q plots are preferable for graphical estimation of distributionparameters, whereas probability plots are preferable for graphicalestimation of percentiles. (Will look at the differences later betweenthe two.) INSET: Places a box or table of Summary Statistics in a high-resolutionHISTOGRAM,PROBPLOT, orQQPLOT.


Related search queries