Transcription of Ways to Summarize Data Using SUM Function in SAS
1 1 Ways to Summarize data Using SUM Function in SAS Anjan Matlapudi and J. Daniel Knapp Pharmacy Informatics, PerformRx, The Next Generation PBM, 200 Stevens Drive, Philadelphia, PA 19113 ABSTRACT SUM is one of the most frequently used SAS functions for aggregating numeric variables. Although summarizing data Using the SUM Function is a simple concept, it can become more complex when we deal with large data sets and many variables. This can sometimes lead to inaccurate results. Therefore it requires careful logic to choose the most appropriate Function or procedure in each situation in order to output accurate results when we roll up or group data .
2 There are several ways to Summarize data Using the SUM Function . This paper illustrates various methods ranging from Using the SUM Function in the simple data step to Using the SUM Function in SAS procedures such as PROC PRINT, PROC SUMMARY, PROC MEANS, PROC TABULATE and PROC SQL. This paper also covers how SAS handles missing values when you sum data . INTRODUCTION Let us first start with the most basic concepts of the SUM Function and further explain the best possible way to Summarize data including horizontal summation (across variables), vertical summation (across observations), and cumulative summation (running totals).
3 Sample code is incorporated in this paper to generate test data and output results. GENERATE SAMPLE data The following code generates a sample data set to test various SAS functions and procedures. This sample pharmacy claims data contains the number of drug prescriptions and total drug spend for each drug type (brand drugs versus generic drugs) by pharmacy for the years 2010-2012. *---Generate Sample Dataset---*; data SampleData; input Pharmacy $ 1-10 DrugClass $ 11-17 Prescriptions 19-21 Y2010 23-24 Y2011 26-28 Y2012 @30 ; Label Prescriptions = 'Prescription Volume in Million' Y2010 = 'Year 2010 (Drug Spend in Millions)' Y2011 = 'Year 2011 (Drug Spend in Millions)' Y2012 = 'Year 2012 (Drug Spend in Millions)'; format Y2010 Y2011 Y2012 ; datalines.
4 CVS Generic 100 50 20 30 Rite Aid Generic 200 30 10 40 Walgreens Generic 300 60 20 20 Walmart Generic 400 . 30 30 CVS Brand 100 50 20 30 Rite Aid Brand 200 30 10 40 Walgreens Brand 300 60 20 20 Walmart Brand 400 70 30 30 Unknown 500 70 10 10 ; run; Notice in row #4 the Y2010 field has a missing numeric variable. Also note that in row #9 the Pharmacy field has one missing character variable. We purposefully placed these two examples in this data set to test how SAS handles missing values while we Summarize this data .
5 We also formatted the Y2010, Y2011, and Y2012 variables differently to examine how SAS summarizes variables with differing formats. Coders' CornerNESUG 2012 Phar CVS Rite Walg Walm CVS Rite Walg Walm BASI Let usbelow the ariNoticewhich the + Similasee ththen ynull if t The Toadditio *---Bdata se Ad Su Su Su Su Torun; Ph CV Ri Wa Wa CV Ri Wa Wa Drug Drmacy Cl Gen Aid Genreens Genart Gen Bra Aid Brareens Braart Bra UnkIC SUM FUN start with the s(Figure #1) is ithmetic operate that SAS corris showing a noperator.
6 Rly the SAS syse highlighted inyou must place the observationotalCost variabonal calculationBasic sum fuSUMOut; et SampleDatddVar = Y201umVar1=sum(YumVar2=sum(oumVar3=sum(0 umVar4=sumabotalCost=sum Drug harmacy ClassVS Generiite Aid Generialgreens Generialmart GeneriVS Brand ite Aid Brand algreens Brand almart Brand Unknows Dispensed ug ass Preseric eric eric eric nd nd nd nd nown NCTIONS simple SUM fuused to summator (+) to sum tectly computednull value.)))
7 This stem has anothn SumVar4 var of word in fron is missing vable example shns in combinatiounctions---ta; 10+ Y2011+ Y2010,Y2011of Y2010-Y20,of Y2010)bs(Y2010-Y2m(y2010,y20 B s Prescriptioic 100 ic 200 ic 300 ic 400 100 200 300 400 wn 500 from Pharmaccriptions 100 200 300 400 100 200 300 400 500 nction to see harize the total ahe drugs sold wd all the variabexample demoher Function cariable.)
8 If you usnt of the list. Iflues. ows that we caon with the SU-*; Y2012; 1,Y2012); 2012); ; 2012); 011,y2012)*Pasic Sum Funct ns Y2010 $ $ $ . $ $ $ $ $ 2cies During Y Y2010 $ $ $ . $ $ $ $ $ how SAS compamount of drugwithin three yeales horizontallyonstrates that Salled SUMABS,se list of variabf you use 0 in an utilize additiUM Function . Prescriptiotion Across Va Y2011 $ $ $ $ $ $ $ $ $Year 2010-201 Y2011 putes horizontag spend in all thars and assigny, except the WSAS ignores m which only coble such as exafront of of theonal mathemaons; riables Add Su Y2012 Var Var$ 100 10$ 80 8$ 100 10$.
9 6$ 100 10$ 80 8$ 100 10$ 130 13$ 90 912 Y201 $ $ $ $ $ $ $ $ $ summation. Three years. Firn to the AddVarWalMart AddVamissing values wmputes absoluample A-C, 1-3en SAS returnstical operators um Sum Sum Sr1 Var2 Var3 Va00 100 50 280 80 30 00 100 60 460 60 0 00 100 50 280 80 30 00 100 60 430 130 70 490 90 70 6 12 00 00 00 00 00 00 00 00 00 The data steprst, we will use r variable.
10 R variable, when we use ute values as w, Y2010-Y012 s 0 instead of to perform Sum total ar4 Cost 20 10000 10 16000 40 30000 . 24000 20 10000 10 16000 40 30000 40 52000 60 45000 p we Coders' CornerNESUG 2012 3 SUM WITH PRINT PROCEDURE The PROC PRINT procedure can output vertical summation results very quickly, but can only output results in the output window. Note that the PROC PRINT procedure does not have the capability to add a new variable. proc print data = SampleData noobs; sum Y2010 Y2011 Y2012; run; Drugs Dispensed from Pharmacies During Year 2010-2012 Drug Pharmacy Class Prescriptions Y2010 Y2011 Y2012 CVS Generic 100 $ $ Rite Aid Generic 200 $ $ Walgreens Generic 300 $ $ Walmart Generic 400.