Example: tourism industry

THE POWER OF PROC FORMAT - IDRE Stats

ABSTRACTThe FORMAT procedure in SAS is a very powerfuland productive tool. Yet many beginning program-mers rarely make use of them. The FORMAT pro-cedure provides a convenient way to do a tablelookup in SAS. User-generated SAS Formats canbe used to assign descriptive labels to data vales,create new variables and find unexpected FORMAT can also be used to generate dataextracts and to merge data sets. This paper willprovide an introductory look at PROC FORMAT forthe beginning user and provide sample code ( ) that will illustrate the POWER of PROC FORMATin a number of FORMAT is a procedure that creates map-pings of data values into data labels. The user de-fined FORMAT mapping is independent of a SASDATASET and variables and must be explicitly as-signed in a subsequent DATASTEP and/or FORMAT can be viewed as a table lookupallowing 1-to-1 mapping or many-to-1 mapping.

USING PROC FORMAT FOR DATA MERGES SAS now includes an INDEX option on data sets to merge DATASETS that are not sorted, but PROC FORMAT also offers a method of merging large data sets (up to 100,000 records) to very large (millions of records) unsorted DATASET or flat file. The method first builds a user defined format from a

Tags:

  Data, Corps, Format, Proc format, Merging

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of THE POWER OF PROC FORMAT - IDRE Stats

1 ABSTRACTThe FORMAT procedure in SAS is a very powerfuland productive tool. Yet many beginning program-mers rarely make use of them. The FORMAT pro-cedure provides a convenient way to do a tablelookup in SAS. User-generated SAS Formats canbe used to assign descriptive labels to data vales,create new variables and find unexpected FORMAT can also be used to generate dataextracts and to merge data sets. This paper willprovide an introductory look at PROC FORMAT forthe beginning user and provide sample code ( ) that will illustrate the POWER of PROC FORMATin a number of FORMAT is a procedure that creates map-pings of data values into data labels. The user de-fined FORMAT mapping is independent of a SASDATASET and variables and must be explicitly as-signed in a subsequent DATASTEP and/or FORMAT can be viewed as a table lookupallowing 1-to-1 mapping or many-to-1 mapping.

2 Anexample of a 1-to-1 mapping is the mapping of ap-prove/decline codes used to process credit applica-tions. In the consumer credit card industry, applica-tion processing decisions are often abbreviated. Asimple example is; a = Approve d = Decline If we have many approval codes and many declinecodes, these can be mapped or assigned with amany-to-1 mapping. For example; a1 , a2 , a4 = Approve d1 , d6 = Decline PROC FORMAT will not allow 1-to-many or many-to-many mappings. This is a good feature of PROCFORMAT since we don t want data values to take onmore than one label. When using a DATASTEP andIF-THEN-ELSE logic to assign labels, the SAS LOGwill not indicate if you have data values pointing tomore than one label. Using PROC FORMAT insteadof IF-THEN-ELSE code will provide an extra qualitycontrol check of your assignments of labels to s look at a few sample problems and see howPROC FORMAT can be used to generate more effi-cient code.

3 These examples come from the consumercredit card industry but the concepts have applicationto many APPLICATION THAT ASSIGNS VALUES WITH-OUT USING PROC FORMATWhen credit bureau data on individuals are re-quested, a generic credit bureau score can also bepurchased. This score ranks predicted risk for theindividual with higher scores being less risky thanlower scores. One such score has integer valuesfrom 370 to 870 with missing scores assigned to val-ues outside this wish to run a frequency distribution on individualswith scores less than 671 and those above 670. Abeginning programmer would often handle this bycreating another DATASET where a new variable isgenerated to assign membership into low scores,high scores and missing groups. A PROC FREQ isthen submitted to get the desired frequency stuff; set cb; if 370<= score <= 670 then group= 670- ; else if 670 < score <= 870 then group= 671+ ; else group= unscored ;proc freq data =stuff; tables group; run;The results from the above code are are shown inFigure on the following SAS code above did the job, but it required that anew data -Step be created and the label unscored was truncated to unsc.

4 THE SAME VALUE ASSIGNMENT PROBLEMSOLVED USING PROC FORMATAn alternative approach entails using a user-definedSAS FORMAT . This saves some processing time andresources. Below (next page) is the SAS Code withTHE POWER OF PROC FORMATJ onas V. Bilenas, Chase Manhattan Bank, New York, NYthe solution to same problem but using example here is of the simple value replace-ment variety of SAS FORMAT . We will discuss how toset up a FORMAT that assigns values to ranges in asubsequent FORMAT ; value score 370 - 670 = 670- 671 - 870 = 671+ other = unscored ;proc freq data =cb; tables score; FORMAT score score.;run;The SAS Code returns the output shown in :Some observations about this result are:1. The name of the FORMAT does not have to be thename of the variable that it will be assigned The assignment of the FORMAT occurs in thePROC with a FORMAT The FORMAT definition ends with the ; on a newline.

5 This is just my preference but I find it eas-ier to debug PROC FORMAT code especially if Iadd more values to the FORMAT later The unscored label now appears without trun-cation. When assigning a character label in adataset, the length of the first evaluation of thelabel will be applied to all PROC FORMAT TO FIND UNEXPECTEDVALUESUser defined formats can be used to list out unex-pected values. If a range of values are not mapped ina PROC FORMAT , the labels will be the original val-ues. Here is an example:proc FORMAT ; value looky 370-870 = 370-870 ;proc freq data =cb; tables score; FORMAT score looky.;run;The output of this code is shown in Figure below:Figure CumulativeGROUP Frequency Percent Frequency Percent--------------------------------- ------------------671+ 623 623 5170 5793 5 5798 CumulativeSCORE Frequency Percent Frequency Percent--------------------------------- ---------------------670- 5170 5170 + 623 5793 5 5798 * * * * * * * * * * * * * * * * *Figure CumulativeSCORE Frequency Percent Frequency Percent--------------------------------- ---------------------370-870 30320 30320 9003 1264 31584 NEW VARIABLES WITH PROCFORMAT AND A VALUE REPLACEMENT FOR-MATNew variables can be assigned within a data -Stepusing user defined FORMATS.

6 A nice feature ofusing FORMATS for generating new variables is thatthe method can replace IF/THEN/ELSE code. Inthis example we wish to assign expected delin-quency rates for given score ranges for a givenportfolio. proc FORMAT ; value edr low-159 = 160-169 = 170-179 = 180-high = ; data stuff; set cb2; edr=input(put(score,edr.), );run;With the above code a new variable called edr isgenerated from the call of the FORMAT in the PUTfunction. PUT always return a character, so the IN-PUT function was used to convert the variable tonumeric since we required that the edr variable be anumeric PROC FORMAT TO EXTRACT DATAUser defined formats can be used to extract a sub-set of data from a larger DATASET. Here is an FORMAT ; value $key 06980 = Mail1 06990 , 0699F , 0699H = Mail2 other = NG ; data stuff; set ; if put(seqnum,$key.)

7 Ne NG ;run;We note the following observations about this us-age:1. If values are character, use a FORMAT name thatbegins with a $ .2. Note that you can create many formats with asingle PROC FORMAT statement. Just starteach new FORMAT with a VALUE or PICTURE statement and end each definition with a ; .SPECIFYING RANGES IN PROC FORMATR anges of values can be specified in a number ofways and special keywords can be used to simplifythe expression of the Ranges can be constant values or values sepa-rated by commas: a , b , c 1,22,432. Ranges can include intervals such as:<lower> <higher>. means that the interval in-cludes both endpoints.<lower> <- <higher>. means that the intervalincludes higher endpoint, but not the lowerone.<lower> - < <higher> means that the intervalincludes lower endpoint, but not the higherone.<lower> <- < <higher> means that the intervaldoes not include either The numeric.

8 And character missingvalues can be individually assigned Ranges can be specified with special kewords:LOWFrom the least (most negative) possi-ble the largest (positve) possible other numbers not The LOW keyword does not FORMAT missing The OTHER keyword does include missing val-ues unless accounted for with a . or .USING PROC FORMAT FOR data MERGESSAS now includes an INDEX option on data sets tomerge DATASETS that are not sorted, but PROCFORMAT also offers a method of merging large datasets (up to 100,000 records) to very large (millions ofrecords) unsorted DATASET or flat method first builds a user defined FORMAT from aDATASTEP using a CNTLIN= option. The smallerfile must not have any duplicates of the key variableused for matching. The DATASET created musthave these elements: FMTNAME: name of FORMAT to create TYPE: C for character or N for numeric START: the value you want to FORMAT into a la-bel.

9 If you are specifying a range, START specifies the lower end of the range and ENDspecifies the upper end. LABEL: the label you wish to the data is generated, a FORMAT is gener-ated from the data and then applied to match rec-ords from the larger DATASET or larger flat is an example of code:proc sort data =small out=temp nodupkey force; by seqnum; data fmt (rename=(seqnum=start)); retain fmtname $key type C label Y ; set temp;proc FORMAT cntlin=fmt;run; data _null_; infile bigfile; file extract; if put(seqnum,$key.)= Y then put _infile_ ;run;We observe the following about this code: The sort of the small DATASET was done toensure no duplicates of the key field SEGNUM. This code extracted the entire record of the largeflat file if there was a match in the key field andput that record into a file with a filename of ex-tract.

10 A match merge extract could work just as wellusing a SAS Dataset by modifying the data -Step as follows: data match; set bigfile; if put(seqnum,$key.)= Y ;run;SAS PICTURE FORMATSSAS PICTURE Formats provide a template forprinting numbers. The template can specify hownumbers are displayed and provide a method to dealwith: Leading zeros. Decimal and comma placement. Embedding characters within numbers ( , %) Prefixes. Truncation or rounding of example of using PICTURE FORMATS is totruncate numbers that represents dates fromYYMMDD display to FORMAT ; picture dt 0-991231= 11/11 (multiplier=.01) ;The 11/11 specifies that all leading zeros will be dis-played. If the label was 01/11 then 001213 wouldbe displayed as 0/12 instead of 00 MULTIPLIER option truncates the DD portion ofthe date when example of PICTURE FORMATS is to add atrailing % in PROC TABULATE output when aPCTSUM or PCTN calculation is specified.


Related search queries