Example: bankruptcy

046-2007: Turning the Data Around: PROC …

1 Paper 046-2007 Turning the data around: PROC transpose and alternative approachesErik W. Tilanus, independent consultant, Driebergen, the NetherlandsABSTRACTPROC transpose can be used to "rotate" ( transpose ) SAS data sets. the values of one given variablebecome variable names and variables in the input data set become observations in the answers to three simple questions suffice to define the specification statements of PROC paper poses these questions and demonstrates how PROC transpose can be handled using reporting it is common to have a time dimension in the horizontal direction, like the figures you see in Table 1. Foranalysis with SAS it is often more appropriate to have these figures "rotated", a variable YEAR and variables forthe various equipment types, like in Table is exactly what PROC transpose can do for this paper we will discuss the basics of PROC transpose and demonstrate a number of special techniques withit. We also look at alternatives , using a DATA group200020012002200320042005CD players230432454322865219822063719494MP3 players83451046812270138461502917404 Cassette players125931147010343931782046971 DVD players153901748919153204582172523094 DVD recorders304633823574376045015036 VHS recorders1459013695135831145998517466 Table 1: Units sold figures of some electronics storeYEARCD playersMP3 playersCassetteplayersDVD playersDVDrecordersVHSrecorders200023043 8345125931539030461459020012454310468114 701748933821369520022286512270103431

1 Paper 046-2007 Turning the data around: PROC TRANSPOSE and alternative approaches Erik W. Tilanus, independent consultant, Driebergen, the Netherlands

Tags:

  Corps, Transpose, Alternatives, Approaches, Proc transpose and alternative approaches

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of 046-2007: Turning the Data Around: PROC …

1 1 Paper 046-2007 Turning the data around: PROC transpose and alternative approachesErik W. Tilanus, independent consultant, Driebergen, the NetherlandsABSTRACTPROC transpose can be used to "rotate" ( transpose ) SAS data sets. the values of one given variablebecome variable names and variables in the input data set become observations in the answers to three simple questions suffice to define the specification statements of PROC paper poses these questions and demonstrates how PROC transpose can be handled using reporting it is common to have a time dimension in the horizontal direction, like the figures you see in Table 1. Foranalysis with SAS it is often more appropriate to have these figures "rotated", a variable YEAR and variables forthe various equipment types, like in Table is exactly what PROC transpose can do for this paper we will discuss the basics of PROC transpose and demonstrate a number of special techniques withit. We also look at alternatives , using a DATA group200020012002200320042005CD players230432454322865219822063719494MP3 players83451046812270138461502917404 Cassette players125931147010343931782046971 DVD players153901748919153204582172523094 DVD recorders304633823574376045015036 VHS recorders1459013695135831145998517466 Table 1: Units sold figures of some electronics storeYEARCD playersMP3 playersCassetteplayersDVD playersDVDrecordersVHSrecorders200023043 8345125931539030461459020012454310468114 7017489338213695200222865122701034319153 3574135832003219821384693172045837601145 9200420637150298204217254501985120051949 41740469712309450367466 Table 2: The same data, but rotated, is often easier to analyse with SASPROC transpose , THE BASICSIn the introduction we mentioned already that PROC transpose "rotates" a SAS data set, which is to say thatTRANSPOSE turns variables into observations and observations into variables.

2 With transpose you can create acertain level of independence between program and data. Now processing and presentation can be optimizedseparately from defining the data set for optimal IN THREE EASY QUESTIONSPROC transpose is controlled by three specification statements: the ID statement, the VAR statement and the BYstatement. The variables to be declared in these statements can be determined with the help of a few simplequestions:1. Which variable in the input data set contains (after formatting!) the variable names of the output dataset?SASG lobalForum2007 Coders Corner 2 This variable is specified in the ID statement. From the question it is clear that this variable must have a unique valuein each observation (or per BY group) after formatting, since this becomes the name of the transposed variable. If thecontents of the variable does not comply with the rules for variable names, SAS will adapt it by, among other things,replacing unacceptable characters with underscores. If the ID variable is numeric it is necessary to attach a prefix toits value, to change for example the value 3 into SCORE3 or the value 12 into SCORE12.

3 Such a prefix is declared inthe PROC statement with the PREFIX= Which variable(s) in the input data set contains the values to be transposed?These variables are declared in the VAR statement. Now, in order to find out from which original variable in the inputdata set the value originates, SAS adds the variable _NAME_. Should another variable name be desired here, thename must be given in the option NAME= in the PROC the VAR statement is left out, then all numeric variables which do not as yet have any other task ( declared in anID or BY statement) will be For which group of observations is the value of the ID variable unique (forms a 'block' to betransposed)?This group of observations is designated in the BY statement. The presence of the BY statement means that the dataset will not be transposed as a whole, but transposing will take place per BY EXAMPLENow let us apply this to the data in the introduction. (Table 1) Let us assume that the table is in SAS data set format,with the variables Product_Group and Y2000 1: The variable Product_Group contains the names for the new variables, so we enter:ID Product_Group;Question 2: The variables Y2000 Y2005 contain the information to be transposed.

4 Since these are all the numericvariables, you could leave out the VAR statement, but is a good habit to include it anyway:VAR Y2000-Y2005;Question 3: There is no BY group processing in this example. All values of the variable Product_Group are we don't need a BY add one more element: we replace the default _NAME_ variable by Year, using the NAME= option in the PROC statement. So the total program now looks like this:LIBNAME Forum "E:\SASF orum";PROC transpose DATA= OUT= NAME=Year; ID Product_Group; VAR Y2000-Y2005;RUN;The result of running this code is a data set like in Table 3: PROC PRINT output of data set Corner 3 EXTENSION OF THE EXAMPLET able 4 contains an extension of Table 1 . Next to the variables in Table 1 it contains also the variable Outlet. Now wehave the situation that the variable that contains the new variable names has a repetition of values. According toQuestion 3, the variable Outlet should be in the BY statement, since per outlet the values in Product_Group areunique.

5 Note that not all product groups are present in all outlets. That is not a problem. It will simply lead to somemissing values in the OutletProduct_GroupY2000Y2001Y2002Y2003Y 2004Y20051 Electronics MarketCD players1898023967177092918919898158862 Electronics MarketMP3 players1084313141142941997222290176453 Electronics MarketDVD players1393816236163702240421302209054 Electronics MarketDVD recorders97699911711683121130355 Electronics MarketVHS recorders2652729310318172027222458115226 Media CenterCD players2073025086208842519221777199337 Media CenterMP3 players1303416677180182152223254251978 Media CenterCassette players15733146181415312402823677819 Media CenterDVD players14875172901770220379234502166810 Media CenterVHS recorders1589016240183481206710525765211 Music StoreCD players24563268372046429353198521908712 Music StoreMP3 players9888127851245418457201021832813 Music StoreCassette players19690156471739216572108641030114 Music StoreDVD players11880137791584516016155181911215 Music StoreDVD recorders10971151153719991565309716 Music StoreVHS

6 Recorders2293423917256931860716236106661 7 Video and moreCD players546773645932103995656449118 Video and moreCassette players55303343524055333534213032486319 Video and moreDVD players17026182081577427595222772357120 Video and moreDVD recorders835884105015581089256821 Video and moreVHS recorders297943760940375221683137416402 Table 4: Extension of table 1: inclusion of the outlet nameWith the BY statement the program would look like this:LIBNAME Forum "E:\SASF orum";PROC transpose DATA= OUT= NAME=Year; ID Product_Group; VAR Y2000-Y2005; BY Outlet;RUN;The result is presented in Table TRANSPOSITIONT here are situations where the data cannot be transposed at once in the way we have seen so far. Let us assumethat we would not only have the units sold in the table, but also the related turnover. Table 6 shows part of the newinput table. We renamed the Y2000-Y2005 variables to Q2000-Q2005 (Q for quantity) and added the variablesV2000-V2005 (V for Value, the turnover).In our output we would like to see two variables related to each product category: the units sold ( Q_CD_Players)and the turnover ( (V_CD_Players).)

7 This cannot be achieved with a simple are two ways to approach this first approach is to perform two standard transpositions, one for each series of variables to be transposed. Thenthe two resulting output data sets are merged (one to one) to achieve the desired Corner 4 The second approach is a single transposition of all variables, followed by a data step where you merge thetransposed data set with itself, filtering either the Q-related or V-related observations with a WHERE= outletYEAR playersplayersplayersrecordersrecordersp layers1 Electronics Electronics Electronics Electronics Electronics Electronics Media Media Media Media Media Media Music StoreY2000245639888118801097229341969014 Music StoreY2001268371278513779115123917156471 5 Music StoreY2002204641245415845153725693173921 6 Music StoreY2003293531845716016199918607165721 7 Music StoreY2004198522010215518156516236108641 8 Music StoreY2005190871832819112309710666103011 9 Video and Video and Video and Video and Video and Video and 5: The result of transposing the data in table Table transpose .

8 FIRST APPROACH, USING 2 TRANSPOSITIONSThe first approach uses two transpositions, one for the Q2000-Q2005 variables and one for the V2000-V2005variables. To rename the output variables from CD_Players to Q_CD_Players, we use the PREFIX= option in thePROC statement. We merge the data sets together using a one to one merge. This is possible since the observationsin the two data sets match course there can only be one variable Year in the output data set. According to the rules of the MERGE statement, this variable will contain the values from the last mentioned data set. But we don't need the prefix befor theyear anymore, therefore we strip the prefix with the SUBSTR is the code for the first approach:LIBNAME Forum "E:\SASF orum";PROC transpose DATA= OUT= NAME=Year PREFIX=Q_; ID Product_Group; VAR Q2000-Q2005; BY Outlet;RUN;PROC transpose DATA= OUT= NAME=Year PREFIX=V_; ID Product_Group; VAR V2000-V2005; BY Outlet;RUN;SASG lobalForum2007 Coders Corner 5 DATA ; MERGE ; Year=SUBSTR(Year,2,4);RUN.

9 Obs OutletProduct_GroupQ2000Q2001Q2002Q20031 Electronics MarketCD players189802396717709291892 Electronics MarketMP3 players108431314114294199724 Electronics MarketDVD players139381623616370224045 Electronics MarketDVD recorders976999117116836 Electronics MarketVHS recorders265272931031817202727 Media CenterCD players207302508620884251928 Media CenterMP3 6: Part of the table with units sold (Q2000-Q2005) and turn over (V2000-V2005)DOUBLE transpose : SECOND APPROACH, USING 1 TRANSPOSITIONIn the second approach we use one transposition, transposing all variables at once. This will lead to an output dataset with observations for each value of the Year variable: Q2000-Q2005 and V2000-V2005. Then we merge the dataset to itself, filtering either the Q- or the V- values. At the same time we rename the transposed Forum "E:\SASF orum";PROC transpose DATA= OUT= NAME=Year; ID Product_Group; VAR Q2000-Q2005 V2000-V2005; BY Outlet;RUN;DATA ; MERGE (WHERE=(Year=:"Q") RENAME=(CD_Players=Q_CD_players MP3_Players=Q_MP3_Players Cassette_Players=Q_Cassette_Players DVD_Players=Q_DVD_Players DVD_Recorders=Q_DVD_Recorders VHS_Recorders=Q_VHS_Recorders)) (WHERE=(Year=:"V") RENAME=(CD_Players=V_CD_players MP3_Players=V_MP3_Players Cassette_Players=V_Cassette_Players DVD_Players=V_DVD_Players DVD_Recorders=V_DVD_Recorders VHS_Recorders=V_VHS_Recorders)).

10 Year=SUBSTR(Year,2,4);RUN;SASG lobalForum2007 Coders Corner 6A part of the resulting data set of both approaches is presented in Table outletYearplayersplayersplayersrecorders recorders1 Electronics Market2000189801084313938976265272 Electronics Market2001239671314116236999293103 Electronics Market20021770914294163701171318174 Electronics Market20032918919972224041683202725 Electronics Market20041989822290213021211224586 Electronics Market20051588617645209053035115227 Media Media 7: The result of the double transpositionANOTHER EXAMPLE: BUILDING A TWO-WAY DISTANCE TABLEA distance table for a road map often contains city names both in the columns and the rows, so that a distance canalways be determined in both directions. By means of PROC transpose , this can be realized quite simply from astraight distance data set. Table 8 shows the input data set and Table 9 shows the desired 2-way distance first step is to generate additional observations in such way that there is a two-way pair of all Expand(DROP=temp); SET Distance; * First write the original observation; OUTPUT; * Exchange To and From; temp = From; From = To; To = temp; OUTPUT;RUN;Next we use PROC transpose to create the table.


Related search queries