Example: confidence

Introduction to the UNIVARIATE Procedure - …

Introduction to the UNIVARIATE Procedure Kim L. Kolbe Ritzow of Systems Seminar Consultants, Kalamazoo, MI Abstract PROe UNIVARIATE is a powerful BASE SASe PROe that combines many of the features found in other analytical PROes such as FREQ, MEANS, SUMMARY, and TABULATE into a single PROe step. PRoe UNIVARIATE is an excellent exploratory data analysis tool. It provides more information, both descriptively and graphically, in a single pass of the data than any other BASE SAS PROe. In some cases it provides information that cannot be found on any other BASE SAS PROe, such as information on the data's median, mode, quartiles and percentiles. This paper will discuss not only how to interpret some of the results generated by PROe UNIVARIATE , but it will also discuss its syntax and provide efficiency tips and techniques. A Simple PROe UNIVARIATE PRoe UNIVARIATE without any options or statements will produce a UNIVARIATE report for all numeric variables on the data set, which may give you more information than you desire.

Introduction to the UNIVARIATE Procedure Kim L. Kolbe Ritzow of Systems Seminar Consultants, Kalamazoo, MI Abstract PROe UNIVARIATE is a powerful BASE SASe PROe that combines many of the features found

Tags:

  Introduction, Procedures, Univariate, Introduction to the univariate procedure

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of Introduction to the UNIVARIATE Procedure - …

1 Introduction to the UNIVARIATE Procedure Kim L. Kolbe Ritzow of Systems Seminar Consultants, Kalamazoo, MI Abstract PROe UNIVARIATE is a powerful BASE SASe PROe that combines many of the features found in other analytical PROes such as FREQ, MEANS, SUMMARY, and TABULATE into a single PROe step. PRoe UNIVARIATE is an excellent exploratory data analysis tool. It provides more information, both descriptively and graphically, in a single pass of the data than any other BASE SAS PROe. In some cases it provides information that cannot be found on any other BASE SAS PROe, such as information on the data's median, mode, quartiles and percentiles. This paper will discuss not only how to interpret some of the results generated by PROe UNIVARIATE , but it will also discuss its syntax and provide efficiency tips and techniques. A Simple PROe UNIVARIATE PRoe UNIVARIATE without any options or statements will produce a UNIVARIATE report for all numeric variables on the data set, which may give you more information than you desire.

2 PROC UNIVARIATE DATA= ; RUN; It is more efficient to limit the scope of analysis by requesting only the numeric variables in which you are interested in analyzing by using the VAR statement on PROe UNIVARIATE . PROC UNIVARIATE DATA= ; VARWEIGHT; RUN; (see Table 1 for resulting output) The 10 Statement The ID statement on a PROe UNIVARIATE names a variable that identifies the highest and lowest values on the EXTREME section of the default report by the value of the identifying variable rather than by the observation number. The use of an ID statement does not affect any other part on the report other than the EXTREMES section. PROC UNIVARIATE DATA= ; VARWEIGHT; IDGENDER; RUN; (see Table 2 for resulting output) 1390 FREO, PLOT, and NORMAL Options Further information can be derived from PROe UNIVARIATE by using the FREQ, PLOT, or NORMAL options on the PROe UNIVARIATE statement.

3 The FREQ option will generate a frequency table, much like that of PROe FREQ, except PROe FREQ generates cumulative counts, which the FREQ option does not. The PLOT option creates a histogram (or a stem-and-leaf plot) and a box plot of the values to check their distribution. The NORMAL option provides another way to check the distribution by generating a normal probability plot which plots the data values against a normal distribution. PROC UNIVARIATE DATA= FRED PLOT NORMAL; VARWEIGHT; RUNj (see Table 3 for resulting output and an explanation of the statistics) Other Useful Options Two other useful options on the PROe UNIVARIATE statement are the NOPRINT and ROUND= options. The NOPRINT option allows the user to suppress the default report from printing when generating an output SAS data set on PROe UNIVARIATE (we'll see later on how an output SAS data set can be built).

4 The ROUND = option, which is new on UNIVARIATE starting with Version , specifies a level of preCision for the statistics. The ROUND= option can improve efficiency by reducing the amount of memory required (it does not have to store as many unique values for each variable). PROC UNIVARIATE DATA= ROUND= 1 NOPRINT; VARWEIGHT; RUN; (this example shows how the NOPRINT option is specified, but it only really makes sense to use it when the OUTPUT OUT = statement is being used to build an output SAS data set). The ROUND= option defines how the values will be internally rounded prior to the calculation of the statistics. It does not affect the display of the values on the report. When the ROUND= option is used, a message will appear next to the VARIABLE = text on the top of the report which reads: "Rounded to the nearest multiple of X where X is the value specified in the ROUND = option.

5 If the ROUND= option contains a single value, it applies to all specified variables. If the ROUND = option specifies more than one value, a VAR statement must be used and its values will correspond to the order of the variables specified in the VAR statement. PRoe UNIVARIATE DATA= ROUND= ; VAR WEIGHT HEIGHT AGE; RUN; The value specified on the ROUND = option must greater than or equal to zero. If the value is less than or equal to zero, it has no effect on the rounding. More information regarding the specifics of the ROUND= option is available in the SAS procedures Guide under PROC UNIVARIATE . The BY Statement The BY Statement on PROC UNIVARIATE allows us to obtain separate sub-group analyses for each value of the BY variable. Whenever using the BY statement it requires that the data be in the BY order. If not, sorting will be required prior to the PROC step unless the data is indexed on the BY variable, or the NOTSORTED or DESCENDING options are used on the BY statement.

6 When the BY statement is used with the PLOT option on the PROC statement an additional graph will appear labeled Schematic Plots, which will contain side-by-side box plots for each BY value. PRoe UNIVARIATE DATA= PLOT; VARWEIGHT; BY GENDER; RUN; (see Table 4 for resulting output) Creating Output SAS Data Sets PROC UNIVARIATE has the unique ability to create multiple output SAS data sets in a single pass of the da!a. When creating output SAS data sets on PROC UNIVARIATE , the VAR statement must be used. It is also a good idea to use the NOPRINT option on the PROC UNIVARIATE 1391 statement to suppress the default report when building an output SAS data set. PRoe UNIVARIATE DATA= NOPRINT; VAR WEIGHT HEIGHT; OUTPUT OUT=AVGS MEAN = AVGWGT AVGHGT MAX=MAXWGT; OUTPUT OUT=NEWD MEDlAN=MEDWGT 01=OlWGT 03=03 WGT; RUN; PRoe PRINT DATA=AVGS; TITLE 'AVGS DATA SET; RUNj PRoe PRINT DATA= NEWD; TITLE 'NEWD DATA SET; RUN; Other Statements Available Other statements available on PROC UNIVARIATE are the FREO and WEIGHT statements.

7 They, like the other statements we have seen (BY, VAR, and ID), come after the PROC and before the RUN. There is a subtle difference between the FREO and WEIGHT statements. The FREO statement identifies a variable which contains the number of observations each observation is to represent. For instance, let's say we had a variable on our data set called HOWMANY and our data set looked something like this: GENDER FEMALE FEMALE .. etc .. WEIGHT 98 110 .. etc .. HOWMANY 5 2 .. etc .. PRoe UNIVARIATE DATA= ; VARWEIGHT; FREO HOWMANY; RUN; I n the case of our first observation, a 98 pound female, the FREO statement produces the same result as if that same observation appeared on the data set five separate times. Without the FREO statement, UNIVARIATE assumes that each observation represents itself (1 observation). Therefore, in this example with this data, the use of the FREO statement will produce dramatically different results in the statistics than if we would have not used it.

8 With the FREO statement, only the integer portion of its value is used. If its value is , it is considered to be 3. If the value is less than 1 or missing, it is not used in the analysis. The WEIGHT statement on the other hand, specifies a variable name whose values are used to weight each observation. WEIGHTing values affects only the mean, variance and sum (they become weighted statistics). Whereas the FREQ statement will change the meaning of all the statistics reported. Changes and Enhancements Version of PROC UNIVARIATE offers some new features. The functionality of PROC PCTl from the Version 5 supplemental library has been incorporated into PROC UNIVARIATE under Version (the PCTLNAME=, PCTLPTS=, and the PCTlRPRE= options can be used on the OUTPUT statement to specify user-defined percentiles). Also new are the PROBS and PROBN statistics used on the OUTPUT statement.

9 PROBS gives the probability of a greater absolute value for the centered, signed rank statistic. PROBN gives the probability for testing the hypothesis that the data are from a normal distribution. The ROUND = option, as seen in an earlier example, is also new. It specifies the level of precision for the variable's values. Using the ROUND= option can improve efficiency by reducing the amount of memory required. Another option specified on the PROC statement, PCTLDEF =, has changed its default value from 5 to 4. There have been no enhancements to PROC UNIVARIATE with Version or of SAS Software. In Summary Other PROCs like MEANS, SUMMARY, and FREQ can give us similar information with a few key differences. PROC UNIVARIATE provides both statistical and graphical information that can be used to analyze data. It offers statistics that cannot be found on any other PROC (quartiles, median, and user-defined percentiles), details on outlying or extreme values, graphical information to analyze the distribution of the data, and the ability to build multiple output SAS data sets.

10 MEANS and SUMMARY can only create one output SAS data set at a time. They can, however, sum-marize statistics at various levels by 1392 the use of the CLASS statement, which PROC UNIVARIATE cannot. While SUMMARY and MEANS are a bit faster and require less memory than UNIVARIATE , PROC UNIVARIATE provides the most descriptive information in a single pass of the data than any other SAS PROC available. Trademark Notice SAS is a registered trademark of the SAS Institute Inc., Cary, NC, USA and other countries. Useful Publications SAS Institute Inc. (1990), SAS' procedures Guide, Version 6, Third Edition, Cary, NC: SAS Institute Inc. SAS Institute Inc. (1987) (written by Sandra D. Schlotzhauer and Dr. Ramon Littell), SAS' System for Elementary Statistical Analysis, Cary, NC.: SAS Institute Inc. Cody, Ronald P. and Smith, Jeffery K. (1991), Applied Statistics and the SAS' Programming Language, Third Edition, North Holland, New York Hartwig, Frederick and Dearing, Brian E.


Related search queries