Example: biology

PROC TRANSPOSE FOR MULTIPLE VALUES …

PROC TRANSPOSE FOR MULTIPLE VALUESKANHAIYA GUPTA, TCS, FOND DU LAC, WISCONSINABSTRACTPROC TRANSPOSE can be used to rotate ( TRANSPOSE ) SAS data sets. This procedure transforms the data from rows to columns or from columns to rows. But PROC TRANSPOSE has some limitations. It doesn't works as required for MULTIPLE VALUES of VAR parameter FOR ID/BY parameter. This paper demonstrates how TRANSPOSE can be done when VAR parameter has MULTIPLE VALUES without losing any record in TRANSPOSEPROC TRANSPOSE <DATA=input-data-set> <LABEL=label> <LET> <NAME=name> <OUT=output-data-set> <PREFIX=prefix>;BY <DESCENDING> variable-1 <..<DESCENDING> variable-n> <NOTSORTED>;COPY variable(s);ID variable;IDLABEL variable;VAR variable(s);OptionsDATA= input-data-set names the SAS data set to : most recently created SAS data setLABEL= label specifies a name for the variable in the output data set that contains the label of the variable that is being transposed to create the current : _LABEL_LET allows duplicate VALUES of an ID variable.

PROC TRANSPOSE FOR MULTIPLE VALUES KANHAIYA GUPTA, TCS, FOND DU LAC, WISCONSIN ABSTRACT PROC TRANSPOSE can be …

Tags:

  Multiple, Value, Corps, Transpose, Proc transpose for multiple values, Proc transpose for multiple values kanhaiya, Kanhaiya

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Advertisement

Transcription of PROC TRANSPOSE FOR MULTIPLE VALUES …

1 PROC TRANSPOSE FOR MULTIPLE VALUESKANHAIYA GUPTA, TCS, FOND DU LAC, WISCONSINABSTRACTPROC TRANSPOSE can be used to rotate ( TRANSPOSE ) SAS data sets. This procedure transforms the data from rows to columns or from columns to rows. But PROC TRANSPOSE has some limitations. It doesn't works as required for MULTIPLE VALUES of VAR parameter FOR ID/BY parameter. This paper demonstrates how TRANSPOSE can be done when VAR parameter has MULTIPLE VALUES without losing any record in TRANSPOSEPROC TRANSPOSE <DATA=input-data-set> <LABEL=label> <LET> <NAME=name> <OUT=output-data-set> <PREFIX=prefix>;BY <DESCENDING> variable-1 <..<DESCENDING> variable-n> <NOTSORTED>;COPY variable(s);ID variable;IDLABEL variable;VAR variable(s);OptionsDATA= input-data-set names the SAS data set to : most recently created SAS data setLABEL= label specifies a name for the variable in the output data set that contains the label of the variable that is being transposed to create the current : _LABEL_LET allows duplicate VALUES of an ID variable.

2 PROC TRANSPOSE transposes the observation containing the last occurrence of a particular ID value within the dataset or BY name specifies the name for the variable in the output data set that contains the name of the variable being transposed to create the current : _NAME_PROBLEMS uppose we have a SAS dataset that looks like below where a UPC can have MULTIPLE type and a type can have MULTIPLE we are required to manipulate the data by UPC so that output has one column for each TYPE, and the rows for each UPC represent ALL possible combinations of the VALUES for that UPC (also called a Cartesian expansion).The output dataset should look likeUPCABCD11211324524638191381213849138 4213889138821391913912139491394213989139 821If we do simple PROC TRANSPOSE then that will not help. PROC TRANSPOSE DATA = AA OUT = BB;BY UPC;ID type;VAR value ;RUN;Error will be thrown like : The ID value "B" occurs twice in the same BY : The above message was for the following by-group: UPC=1 ERROR: The ID value "C" occurs twice in the same BY : The above message was for the following by-group: UPC=2 ERROR: The ID value "A" occurs twice in the same BY : The ID value "B" occurs twice in the same BY : The ID value "B" occurs twice in the same BY : The ID value "C" occurs twice in the same BY : The above message was for the following by-group: UPC=3 ERROR: All BY groups were can use LET option.

3 LET allows duplicate VALUES for an ID TRANSPOSE DATA = AA OUT = BB LET;BY UPC;ID type;VAR value ;RUN;The output will look this is not what we are looking for. LET option will pick up only last occurrence of a particular ID value within the data set or BY Solution of this problem can be to first separate single and MULTIPLE value records. Arrange dataset AA in order of UPC and TYPE and make two datasets one (SV) having only one value for UPC and TYPE combination and another (MV) datasets containing MULTIPLE occurrences for a UPC and TYPE combination. For MV dataset, generate all the right combinations and assign proper index to each of the combination. Then combine SV and MV datasets. Now we have an index assigned to each unique combination of UPC, TYPE and value . Now apply TRANSPOSE on UPC and code to generate the desired reporting dataset looks like:%let MAXFIELDS = 100 ;%let MAXVALUES = 25 ;%let MAXFVMATRIX = %eval(&MAXFIELDS.)

4 * &MAXVALUES.) ;/* A Macro to sort a Dataset */%macro sort(ds,by) ;proc sort data= by run;%mend ;/* A macro to find out the number of records in a Dataset */ %macro nobs(ds, mvar) ;%global %let &mvar = 0 ;data _null_ ; set &ds nobs=nobs ; call symput("&mvar", nobs) ; stop ;run ;%mend nobs ;/* A macro to do TRANSPOSE for MULTIPLE VALUES */%macro xtrans(in,out) ;%sort(&in, upc chrtyp) ;/* split by single and MULTIPLE VALUES */data sv mv ; set by upc chrtyp ; if and then output sv ; else output mv ;run ;/* if there are MULTIPLE VALUES expand, TRANSPOSE , merge with the single VALUES */%nobs(mv, nobs) ;%if &nobs %then %do ;data tmp_xtrans (keep=upc pi_index output_type output_value sortedby=upc pi_index )sv_xid1(keep=upc pi_index sortedby=upc pi_index);set mv ;by upc chrtyp ;retain num_fields 0.

5 Array buf_field_names(&MAXFIELDS.) $50 _temporary_ ;array buf_reps(&MAXFIELDS.) _temporary_ ;array buf_field_values(&MAXFIELDS., &MAXVALUES.) $100 _temporary_ ;array buf_field_counts(&MAXFIELDS.) _temporary_ ;/* initialize upc info */if then do ;num_fields = 0 ;end ; /* *//* create list of field names and */if then do ;num_fields = num_fields + 1 ;buf_field_names(num_fields) = chrtyp ;buf_field_counts(num_fields) = 0 ;end ;/* add value to list of VALUES for field */if buf_field_counts(num_fields) < &MAXVALUES. then do ;buf_field_counts(num_fields) = buf_field_counts(num_fields) + 1 ;buf_field_values(num_fields, buf_field_counts(num_fields)) = chrvl ;end ;/* done with this UPC, output rows */if then do ;/* create combination index. compute the number of rows to output */buf_reps(num_fields) = 1 ;pi_count = buf_field_counts(1) ;do i = num_fields to 2 by -1 ;buf_reps(i-1) = buf_reps(i) * buf_field_counts(i) ;pi_count = pi_count * buf_field_counts(i) ;end ;/* output a row for each expanded type and value upc,pi_id,type, value *//* generate the right combinations for this type with proper index */do j = 1 to pi_count ;pi_index = j ;output sv_xid1 ;do i = 1 to num_fields ;output_type = buf_field_names(i) ;xind = mod(ceil(j/buf_reps(i)), buf_field_counts(i)) ;if xind = 0 then xind = buf_field_counts(i) ;output_value = buf_field_values(i,xind) ;output tmp_xtrans ;end ;end ;end ; /* done with this UPC */run ;proc sql ; create table sv2 as select * from sv a left join sv_xid1 b on = ;run ;data mv_t.

6 Set sv2 tmp_xtrans(rename=(output_type=chrtyp output_value=chrvl)) ; by upc pi_index ;run ;proc TRANSPOSE data=mv_t out=&out. (drop=_: pi_index) ; by upc pi_index ; id chrtyp ; var chrvl ;run ;%end ;%else %do ;/* upc with single VALUES */ proc TRANSPOSE data=sv out=&out.(bufno=4 drop=_: ) ; by upc ; id chrtyp ; var chrvl ;run ;%end ;%mend ;/* create input Dataset */data AA;infile datalines dlm = ',';input upc chrtyp $ chrvl $;datalines;001, A, 1001, B, 2001, B, 3002, B, 4002, C, 5002, C, 6003, A, 8003, A, 9003, B, 1003, B, 4003, B, 8003, C, 9003, C, 2003, D, 1;run;/* Call Macro */%xtrans(AA,BB);Num_field contains the number of unique TYPE for a UPC. buf_field_counts array contains the number of occurrences of a TYPE for a UPC. buf_field_namesarray contains name of all unique TYPE for a array contains all VALUES of a TYPE for a UPC.

7 In the above example it is assumed that maximum number of TYPE possible is is an extremely powerful programming technique, which can be used to generate TRANSPOSE of a datasets where VAR has MULTIPLE occurances for ID/BY parameters. This code provides the basic programming structure for transposing datasets which with little bit modification can be used to get desired INFORMATIONYour comments and questions are valued and encouraged. Contact the author at: kanhaiya Lal GuptaTata Consultancy Services LtdFond Du Lac, WI 54935 Work Phone: (920) 929-7870E- mail.


Related search queries