Example: barber

SUGI 23: Proving it Works: Using PROC COMPARE …

1 Proving it Works: Using PROC COMPARE to Verify an Analysis Converted into SAS Softwareby Lauren Haworth, MA, Programmer,and Njeri Karanja, , , Nutrition ScientistKaiser Permanente Center for Health Research, Portland, OregonAbstractWhen SAS Software is used to replace legacy softwaresystems, programmers are often asked to recreate analy-ses run on the old system. The challenge for the pro-grammer is to make the results from the new systemmatch the results from the old system, so analyses will becomparable over paper explores the use of PROC COMPARE toprove that a new analysis system written in SAS producesresults that exactly match those produced by a legacysoftware system. The example in this paper involves theanalysis of a survey form called the Food FrequencyQuestionnaire (FFQ), which is used in nutrition researchto capture information about a respondent s typical frequency with which food is consumed is one of avariety of methods for assessing the dietary intake ofgroups and The frequency method asksrespondents to report the frequency with which they con-sume each food, from a pre-selected list of foods, over aspecified time ,3 The information is then used todevelop a dietary profile for the individual respondent orthe are many food frequency i

1 Proving it Works: Using PROC COMPARE to Verify an Analysis Converted into SAS Software by Lauren Haworth, MA, Programmer, and …

Tags:

  Using, Corps, Work, Compare, Proving, Using proc compare, Proving it works

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of SUGI 23: Proving it Works: Using PROC COMPARE …

1 1 Proving it Works: Using PROC COMPARE to Verify an Analysis Converted into SAS Softwareby Lauren Haworth, MA, Programmer,and Njeri Karanja, , , Nutrition ScientistKaiser Permanente Center for Health Research, Portland, OregonAbstractWhen SAS Software is used to replace legacy softwaresystems, programmers are often asked to recreate analy-ses run on the old system. The challenge for the pro-grammer is to make the results from the new systemmatch the results from the old system, so analyses will becomparable over paper explores the use of PROC COMPARE toprove that a new analysis system written in SAS producesresults that exactly match those produced by a legacysoftware system. The example in this paper involves theanalysis of a survey form called the Food FrequencyQuestionnaire (FFQ), which is used in nutrition researchto capture information about a respondent s typical frequency with which food is consumed is one of avariety of methods for assessing the dietary intake ofgroups and The frequency method asksrespondents to report the frequency with which they con-sume each food, from a pre-selected list of foods, over aspecified time ,3 The information is then used todevelop a dietary profile for the individual respondent orthe are many food frequency instruments developed bydifferent groups to meet a variety of research needs.

2 Onesuch instrument is the Food Frequency Questionnairethat is part of the Health Habits and History Question-naire developed by the National Cancer Institute (NCI).4 The NCI FFQ has been calibrated and validated againstother dietary assessment methods in different ,6 The FFQ is analyzed Using special NCI-designed soft-ware called DietSys software was originallydeveloped in the 1980s as an analysis tool for the the FFQ and DietSys were again updated in estimates the intake of 33 nutrients and up to 20user-defined food groups. The software allows users toenter and verify data, standardize editing, and calculate avariety of health habits such as the frequency of restau-rant eating and vitamin FFQ was used to assess baseline food and nutrientintake in the Dietary Approaches to Stop Hypertension(DASH) The study tested the impact of dietarychange on blood problemThe NCI recently discontinued its policy of updating Di-etSys and providing direct technical support to users ofthe software.

3 This leaves research scientists with a soft-ware package that is unable to support newer versions ofthe FFQ, resulting in a limited ability to assess changesin the food supply and eating habits. Additionally, Diet-Sys has always had a few inherent operating glitches thatmake analysis inefficient. For example, DietSys does notprovide much flexibility for running custom has to run the program repeatedly to test differentanalysis options, and there is no support for automationor batch processing of food records. For these reasons, wedecided to develop a more flexible and upgradeable sys-tem Using our new program to be credible, we needed to dem-onstrate that results obtained Using our new SAS pro-gram (called CHRFFQ) were comparable to those thatwould be obtained with the DietSys program. To ac-complish this we conducted a test run of the DASH base-line dietary data on CHRFFQ and DietSys and usedPROC COMPARE to validate the results.

4 This paper willdescribe the development and testing of CHRFFQ,showing how SAS was an invaluable tool in the 1: Rewriting the program in SASOur first step in building CHRFFQ was to use the Diet-Sys documentation to write SAS code that we thoughtwould match the DietSys results. Thankfully, the DietSyssoftware is in the public domain, so we did not have todeal with any copyright analysis of FFQ forms involves computing nutrientintakes by multiplying the reported frequency for a givenfood with the amount of nutrient in a specified quantityof that food. Total daily nutrient intake is the sum of theproduct of the frequency and nutrient amounts in all FFQ also asks more general questions such as howmany vegetables one eats, what fats one uses in cooking,favorite brands of cereal, and what vitamins one results are used to adjust nutrient calculations.

5 ThePostersPosters2result is an extremely complex analysis that is not DietSys documentation outlined in great detail eachof the nutrient calculations and the effect of each of theadjustments. It also explained how each of the analysisoptions was implemented. This documentation provedinvaluable to the development SAS code was written to use the same calculationsand adjustments. The resulting program had over threethousand lines of code. About half of the CHRFFQ de-velopment time was spent building the basic programand getting it to run 2: Setting up the testOnce the SAS program was running without errors, westarted the long process of getting it to match the resultsproduced by DietSys. Though we tried to copy the Diet-Sys algorithm, in many cases it was unclear from thedocumentation how DietSys was calculating certain re-sults. The only way to see if we had gotten it right was totest CHRFFQ against set up a test dataset to use to COMPARE the two sys-tems.

6 Over 400 FFQ forms from the DASH study wereentered into DietSys. This produced an ASCII file thatwas accepted by DietSys, and could be read into SASusing an INPUT the data file was ready, we ran it through the Diet-Sys analysis and the CHRFFQ analysis Using the sameoption settings. For the first test run, we left all of theoptions at their simplest setting (in most cases this meantturning the option off).Next we converted the DietSys results into a SAS we were ready to COMPARE the results. Our goal wasto keep testing and revising CHRFFQ until we could getto the point where any differences between the CHRFFQand DietSys results were less than 1% of each 3: Running PROC COMPAREPROC COMPARE is the perfect tool for comparing twofiles. By specifying a common variable as an identifier, itconducts an observation by observation, variable by vari-able comparison to see if each of the data points is thesame.

7 It then produces a detailed report outlining thedifferences. It tells you which observations are on one filebut not the other, which variables are on one file but notthe other, and which variables have different values inthe two files. The latter check was the most useful for ourpurposes. We used this feature to make sure that bothsystems gave the same result for each of the nutrients foreach FFQ the first pass, we tried running PROC COMPARE with the default option settings. We specified the basedataset (DIETSYS), the comparison dataset (CHRFFQ)and the variable to use to identify the matching observa-tions (ID):PROC COMPARE BASE=DIETSYS COMPARE =CHRFFQ;ID ID;RUN;This proved to be a mistake. At this stage of the devel-opment process, there were too many differences betweenthe two files. PROC COMPARE produced hundreds ofpages of highlight the major problems, we changed the PROCCOMPARE settings.

8 Instead of running every possiblecomparison at once (the PROC COMPARE default set-ting), we used a step-by-step approach. The first step wasto confirm that both files had the same variables. To dothis, you specify the LISTVAR option and theNOVALUES option. LISTVAR specifies a comparison ofvariables, and NOVALUES turns off the value compari-son:PROC COMPARE BASE=DIETSYS COMPARE =CHRFFQLISTVAR NOVALUES;ID ID;RUN;This run pointed out a couple of variables we had forgot-ten to compute in CHRFFQ. We added the variables andre-ran PROC COMPARE to make sure we had fixed theproblem. This time PROC COMPARE produced no out-put (that is what PROC COMPARE does when the twofiles are identical for each of the comparisons specified inthe options settings).The next step of our testing was to make sure that bothsystems produced the same number of observations onthe output file.

9 To do this, you specify the LISTOBS op-tion:PROC COMPARE BASE=DIETSYS COMPARE =CHRFFQLISTOBS NOVALUES;ID ID;RUN;This run pointed out that the two systems were usingdifferent criteria to select invalid observations for dele-tion. By looking at the PROC COMPARE output, wecould identify which records had been incorrectlydropped, figure out what caused them to be dropped, andcorrect the confirmed that the two files had the same vari-ables and observations, we were ready to start comparingvalues in the two files. We removed the NOVALUES option to turn on the values comparison. Also, in order tolimit the amount of output in each run, we made threeadditional changes to the options , we limited the comparison to ten observations byusing the OBS= parameter on each of the two observations were enough to spot major differencesat this point. Second, we changed the comparison methodPostersPosters3to METHOD=PERCENT and CRITERION=20.

10 Thismeans that only values that were off by more than 20%were reported. Third, we limited the comparison to onevariable: total calories (CALS). Since this variable sum-marizes the respondent s diet, it was a useful tool fortesting all facets of the program. The new run used thefollowing code:PROC COMPARE BASE=DIETSYS (OBS=10) COMPARE =CHRFFQ (OBS=10)METHOD=PERCENT CRITERION=.20;ID ID;VAR CALS;RUN;With this setup, PROC COMPARE ran more quickly andproduced a much smaller report showing the major dif-ferences between the two files. By looking at the report,we were able to identify which part of the program wasproducing incorrect then began an iterative process of finding a problem,fixing the program, and then running PROC COMPARE again, each time coming closer to matching the most cases, differences between the two systems werecaused by errors in our CHRFFQ program.


Related search queries