Transcription of SUGI 27: Programming Tricks for Reducing …
1 Paper 23-271 Programming Tricks For Reducing storage And Work SpaceCurtis A. Smith, Defense Contract Audit Agency, La Mirada, you ever had trouble getting a SAS job to complete,although your SAS code is bug-free? Often, especially whenprocessing large amounts of data, you can exhaust your storagespace or the WORK library. Likely, there is a SAS code , the author will share some of his favorite Tricks tosqueeze more out of storage and WORK library space using SAScode to resolve resource limitations. All of the Tricks will use SAScode from base SAS. This paper will cover subjects such asreducing the number of SAS data sets needed in the work space, Reducing the size of SAS data sets, and cleaning up the worklibrary. The author will concentrate on the DATASETS, DELETE,SUMMARY, and SORT procedures, the DROP, KEEP,COMPRESS, and OUT options of the Data Step, and theWHERE statement.
2 SAS programmers at any level can benefitfrom these my endeavors to get my SAS jobs to complete whenprocessing large amounts of data, I encounter temporary workspace shortages as well as permanent storage space , of course, uses its WORK library as the place to create andstore temporary work data sets. A SAS job can require manytemporary SAS data sets, even some that you may not referencein a DATA step or procedure. Of course, you decide where tostore permanent SAS data sets. But sometimes, you may nothave enough free space for the temporary or permanent SASdata sets you want to create. Exhausting temporary andpermanent storage space is common. Many approaches to solvethese problems are available. Many solutions involve theoperating system. But, I will present solutions that involve onlyusing SAS code and fall into three categories: Reduce the size of the SAS data sets Reduce the number of SAS data sets Clean up the storage spaceREDUCE THE SIZE OF THE SAS DATA SETST here are many ways to reduce the size of SAS data sets thatyou place in your temporary work space and inpermanent storage .
3 Below I share my FILESSAS can use compression algorithms to compressSAS data sets. This is a process of Reducing theamount of space needed to store a SAS data set - it does notaffect the data stored within that SAS data set. Using theCOMPRESS= system or data set option, any SAS data setcreated on disk will be compressed. SAS data set compressioncan greatly reduce the size of SAS data sets. To use theCOMPRESS= system or data set option, set the option to either"YES" or "BINARY." (In newer versions of SAS, CHAR can beused as an alternative to YES with the same result.) TheCOMPRESS=YES value uses an algorithm that works better withSAS data sets that are primarily comprised of charactervariables. On the other hand, COMPRESS=BINARY uses adifferent algorithm that works better with SAS data sets that areprimarily comprised of many variables including many numericvariables.
4 My experience has been that COMPRESS=YESreduces the size of the SAS data set by about 50 option to use with COMPRESS= is REUSE=. Specifying thisoption allows SAS to reuse space within the compressed SASdata set that has been freed by deleted observations. Otherwise,SAS cannot reclaim the space made available by the following running the DATA step with the COMPRESS= set to YES or BINARY you will see a message in your SAS log that lookssomething like this:I thought you might want to see some benchmark results. I tookfour SAS data sets with differing numbers of observations andvariables. I started with uncompressed SAS data sets, thencompressed with normal compression, then compressed withbinary compression. File1 and File3 have 20 variables, 5 of whichare numeric; and File2 and File4 have 29 variables, 10 of whichare REDUCTIONObsCharNumYESBINARYFile1120, , ,542, ,976, my examples, COMPRESS=YES always produced betterresults.
5 However, the files with more variables and more numericvariables (File2 and File4) got almost the same benefit using/*USE THE DATA STEP COMPRESS OPTION*/DATA (COMPRESS=BINARY REUSE=YES); SET ; WHERE REC_TYPE='1';RUN;/*USE THE SYSTEM COMPRESS OPTION*/OPTIONS COMPRESS=YES REUSE=YES;DATA ; SET ; WHERE REC_TYPE='1';RUN;NOTE: The data set has 117658observations and 20 : Compressing data set size by is 935 pages; un-compressedwould require 1369 27 Applications Development2/*DROP UNNEEDED VARIABLES*/PROC SORT DATA= (DROP=TTDOTAMT TTDOTHRS TTDTOAMT TTDTOHRS TTDSRAMT) OUT= (DROP=REC_TYPE); BY POOL ACCOUNT DEPT DATE; WHERE REC_TYPE NE ' ';RUN;/*SUMMARIZE SORTED FILE*/PROC SUMMARY DATA= MISSING; BY POOL ACCOUNT DEPT DATE; VAR YTDOTHRS YTDTOHRS YTDOTAMT YTDTOAMT YTDSRAMT; OUTPUT OUT= (INDEX=(PBCODE) DROP=_TYPE_) SUM=YTDOTHRS YTDTOHRS YTDOTAMT YTDTOAMT;RUN;COMPRESS=BINARY. I recommend you try both methods withyour files to determine which provides the better some cases, compressing a SAS data set will result in a filethat is larger in storage size than the uncompressed this happens, SAS will warn you with a message.
6 Version8 of the SAS System will not compress the SAS data set whenthe result would be a larger file take a look at the space savings (in this example, from MS-Windows).As you can see, the space savings are well worththe effort. In the case of File4, a reduction of nearly500MB is nothing to gloss UNNEEDED VARIABLES Deleting unneeded variables can have a dramaticimpact on the size of the SAS data set. Forexample, a variable of only five bytes in a SASdata set of one million observations will require fivemillion bytes, or approximately 5MB. As soon aspossible in your DATA steps and procedures, delete any SASdata set variables that you do not need. Use the data set DROP=option to identify which variables to delete, or use the KEEP=option to identify which variables to retain. Both will accomplishthe same thing, one will be easier to use than the otherdepending the number of the existing variables you want , when creating a subset or summary SAS data set orjust processing a SAS data set with a procedure, you do not needall of the variables in the source SAS data set.
7 So delete thoseyou do not need from the source SAS data set, and do so assoon as possible. However, be careful not to delete any variablesthat you will need. Many times I have fallen into the trap ofdropping a variable from my input SAS data set and then referredto it in a WHERE statement. The solution, ofcourse, is to keep the variable in the input SASdata set and drop it from the output SAS dataset. Consider the example at the bottom of theprevious in the SORT procedure that we do notdrop the REC_TYPE variable from the input(DATA=) SAS data set because we need it forthe WHERE statement. If we had included it inthe DROP= option on the input SAS data set, wewould have received an error that theREC_TYPE variable was not on the input SASdata set. In this example, the SUMMARY procedure will produce a SAS data set with onlythe character variables identified in the BYstatements and the four numeric variablesidentified on the VAR statement (plus, of course,the _FREQ_ and _TYPE_ variables created bythe SUMMARY procedure, unless we dropthem).
8 So, why bother dropping the unwantedvariables during the SORT procedure? Because by droppingthose variables in the SORT procedure, our intermediate WORKSAS data set is greatly reduced in size. (And, we will also see areduction in the time and space needed for sorting.)Let s take a look at the WIP_DTL and the SORTED file fromWindows Explorer s point of view. Dropping five variables madea significant look at the file properties from SAS point of view. Firstnotice the SAS data set WIP_DTL. It contains 22 Not Overlook theObviousDelete unneeded variables assoon as possible. Even a twocharacter variable can add upto a significant amount ofspace if you have 27 Applications Development3 Now, notice the SAS data set SORTED. It contains only unneeded variables is such a great way to reducespace, I really want to emphasize this trick. I really want toemphasize this trick.
9 When you reduce space, you can alsoreduce processing time and I/O time. Let s look at an example ofsorting a SAS data set. First, we will sort it without dropping anyof the unneeded variables. Second, we will drop the unneededvariables, but will we do so using a KEEP= option on the outputSAS data set. Third, we will drop the unneeded variables, usinga KEEP option on the input SAS data set. Look at the SAS logcarefully, noticing the time and memory s the SAS code for our first scenario, keeping , here s the SAS log for our first s the SAS code for our second scenario. Notice the KEEP=option on the output SAS data set. In this scenario, we will besorting all of the variables from the input SAS data set, but notwriting all of them to the output SAS data , here s the SAS log from our second scenario. You willnotice some improvement in processing time over our firstscenario, but no memory , here s the SAS code for our third scenario.
10 This time, we rebeing smart and dropping all of the unneeded variables from theinput SAS data , take a look at the SA Slog from our third do not see much difference in memory usage, but, wow, lookat the processing reduction! In our first scenario, processing time/*KEEP ALL VARIABLES*/PROC SORT DATA= OUT= ; BY WEEK PROJECT DEPT ACCOUNT;RUN;NOTE: There were 167184 observationsread from the data set : The data set has167184 observations and 24 : PROCEDURE SORT used: real time 1 Memory 2130k/*DROP VARIABLES ON OUTPUT*/PROC SORT DATA= OUT= (KEEP=WEEK PROJECT DEPT ACCOUNT MTDTOAMT MTDOTAMT MTDSRAMT); BY WEEK PROJECT DEPT ACCOUNT;RUN;NOTE: There were 167184 observationsread from the data set : The data set has167184 observations and 7 : PROCEDURE SORT used: real time seconds Memory 2123k/*DROP VARIABLES ON INPUT*/ PROC SORT DATA= (KEEP=WEEK PROJECT DEPT ACCOUNT MTDTOAMT MTDOTAMT MTDSRAMT) OUT= ; BY WEEK PROJECT DEPT ACCOUNT;RUN.