Example: stock market

SUGI 26: MISSOVER, TRUNCOVER, and PAD, OH …

Paper 9- 26 missover , truncover , and pad , OH MY!!or Making Sense of the INFILE and input Cates, MPH, Technical Training SpecialistABSTRACTThe SAS System has many powerfultools to store, analyze and presentdata. However, first programmersneed to get the data into SASdatasets. This presentation willdelve into the intricacies of readingdata from sequential (text) filesusing the DATA step and INFILE andINPUT statements. Discussion willfocus on the different optionsavailable when reading differenttypes of text files. For example,when should you use the missover option and when is the truncover option more appropriate. This paperassumes the audience has basicknowledge of reading text files usingthe DATA step (Base SAS ) and isappropriate for users on anyOperating System, although someoptions may be and understanding the SASdocumentation can sometimes be achallenge.

Paper 9-26 MISSOVER, TRUNCOVER, and PAD, OH MY!! or Making Sense of the INFILE and INPUT Statements. Randall Cates, MPH, Technical Training Specialist

Tags:

  Input, Sugi, Sugi 26, Missover, Truncover, And pad, 26 missover

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of SUGI 26: MISSOVER, TRUNCOVER, and PAD, OH …

1 Paper 9- 26 missover , truncover , and pad , OH MY!!or Making Sense of the INFILE and input Cates, MPH, Technical Training SpecialistABSTRACTThe SAS System has many powerfultools to store, analyze and presentdata. However, first programmersneed to get the data into SASdatasets. This presentation willdelve into the intricacies of readingdata from sequential (text) filesusing the DATA step and INFILE andINPUT statements. Discussion willfocus on the different optionsavailable when reading differenttypes of text files. For example,when should you use the missover option and when is the truncover option more appropriate. This paperassumes the audience has basicknowledge of reading text files usingthe DATA step (Base SAS ) and isappropriate for users on anyOperating System, although someoptions may be and understanding the SASdocumentation can sometimes be achallenge.

2 This is evident in theINFILE statement. There are no lessthan 34 different options availablefor this particular statement. Thiscan get very sticky when the datafile you need to read differs fromthe safe, easy columnar data how can we make sense of theplethora of options? This paper willattempt to clarify some of theconfusion. Three situations areexplored. First Variable-Lengthrecords; both shorter values, andmissing data points. Next, readingin multiple files at once. Finally,obtaining data from both remote OS'sand Web sites using the LITTLE TIME, SO MANY OPTIONSWhen the data lines aren't complete,what option will read the datacorrectly and completely? INFILE hasa number of options available:FLOWOVER The default.

3 Causes theINPUT statement to jumpto the next record if itdoesn t find values forall Sets all empty vars tomissing when reading ashort line. However, itcan also skip Stops the DATA step whenit reads a short Forces the input statement to stopreading when it gets tothe end of a short option will notskip Causes the input statement to search thedata lines for acharacter stringspecified in the Pads short lines withblanks to the length ofthe LRECL= : SCANOVER and STOPOVER will not be following text file was createdwith MS-Notepad on Windows-NT thenread into a SAS dataset using INFILEand input statements. Each lineshould contain 4 data points; Lastand First names, Employee ID and Jobtitle. The grayed-out area denotesactual line lengths.

4 (Note: Most Wordprocessors on Windows and UNIX createvariable-length lines, whereasMainframe computers files with linesof uniform length, filled in byblanks.)LANGKAMM SARAH E0045 MechanicTORRES JAN E0029 PilotSMITH MICHAEL E0065 LEISTNER COLIN E0116 MechanicTOMAS HARALDWADE KIRSTEN E0126 PilotWAUGH TIM E0204 PilotAdvanced TutorialsThen two sets of code were submittedusing different options on the INFILE statement. First the lines were readin with Column input ;DATA test; INFILE "d:\infile\ " <OPTIONS>; input lastn $1-21 Firstn $ 22-31 Empid $32-36 Jobcode $37-45;RUN;Then List input was used;DATA test; INFILE "d:\infile\ "; input lastn $ Firstn $ Empid $ Jobcode $ ;RUN;FLOWOVER:The FLOWOVER option is the defaultoption on INFILE.

5 Here, when theINPUT statement reaches the end ofnon-blank characters without havingfilled all variables, a new line isread into the input Buffer and input attempts to fill the rest of thevariables starting from column next time an input statement isexecuted, a new line is brought intothe input Buffer. The results(printed with PROC PRINT) are input ;ObsLastnFirstnEmpidJobcode1 LANGKAMMSARAHE0045 Mechanic2 TORRESJANE0029 SMITH3 LEISTNERCOLINE0116 Mechanic4 TOMASHARALDWADEWAUGHIn the second line, since the valuePILOT did not extend to the requirednumber of columns for Jobcode(37-45),the input statement jumped to thenext line to complete , for the fifth line readin, the input statement first jumpedto the sixth line to read Empid, thento the seventh line to read input .

6 ObsLastnFirstnEmpidJobcode1 LANGKAMMSARAHE0045 Mechanic2 TORRESJANE0029 Pilot3 SMITHMICHAELE0065 LEISTNER4 TOMASHARALDWADEKIRSTEN5 WAUGHTIME0204 PilotIn this example the Pilot values areplaced in the appropriate places, butthe input statement still loops tothe next line when unable to fill :When the missover option is used onthe INFILE statement, the input statement does not jump to the nextline when reading a short , missover sets variables input ;ObsLastnFirstnEmpidJobcode1 LANGKAMMSARAHE0045 Mechanic2 TORRESJANE00293 SMITHMICHAELE00654 LEISTNERCOLINE0116 Mechanic5 TOMASHARALD6 WADEKIRSTENE01267 WAUGHTIME0204 All lines are read in as separaterecords. Notice however, that thePILOT Jobcodes are still missover encounters the End-Of-Line mark, and has not read allrequired columns for a particularvariable, then that variable is setto missing.

7 This is better, butstill not TutorialsList input ;ObsLastnFirstnEmpidJobcode1 LANGKAMMSARAHE0045 Mechanic2 TORRESJANE0029 Pilot3 SMITHMICHAELE00654 LEISTNERCOLINE0116 Mechanic5 TOMASHARALD6 WADEKIRSTENE0126 Pilot7 WAUGHTIME0204 PilotSince List input doesn't specifyexplicit columns, these data linescan be correctly read using theMISSOVER :The truncover option acts similarlyto missover , and in addition, willtake partial values to fill the firstunfilled input ;ObsLastnFirstnEmpidJobcode1 LANGKAMMSARAHE0045 Mechanic2 TORRESJANE0029 Pilot3 SMITHMICHAELE00654 LEISTNERCOLINE0116 Mechanic5 TOMASHARALD6 WADEKIRSTENE0126 Pilot7 WAUGHTIME0204 PilotHere truncover successfully reads theshort lines, apportioning out thevalues to the correct places.

8 Whenthe input statement reached aforeshortened line, the truncover option takes what's left ( Pilot)and assigns it to the appropriatevalue. Other variables are set tomissing where input ;ObsLastnFirstnEmpidJobcode1 LANGKAMMSARAHE0045 Mechanic2 TORRESJANE0029 Pilot3 SMITHMICHAELE00654 LEISTNERCOLINE0116 Mechanic5 TOMASHARALD6 WADEKIRSTENE0126 Pilot7 WAUGHTIME0204 PilotSince List input reads from delimiterto delimiter, truncover can :The PAD option does not replace theFLOWOVER option. Instead, the PADoption adds blanks to short lines outto the logical record length(LRECL).In this case, PAD takes the LRECL from the file information, but youcan specify LRECL= in the input ;ObsLastnFirstnEmpidJobcode1 LANGKAMMSARAHE0045 Mechanic2 TORRESJANE0029 Pilot3 SMITHMICHAELE00654 LEISTNERCOLINE0116 Mechanic5 TOMASHARALD6 WADEKIRSTENE0126 Pilot7 WAUGHTIME0204 PilotWhen reading in data with ColumnInput, SAS reads "just the columns,Ma'am".

9 Since the PAD option addsblanks, SAS can read the appropriatecolumns without hitting the End-of-File mark. So the data is read TutorialsList input ;ObsLastnFirstnEmpidJobcode1 LANGKAMMSARAHE0045 Mechanic2 TORRESJANE0029 Pilot3 SMITHMICHAELE0065 LEISTNER4 TOMASHARALDWADEKIRSTEN5 WAUGHTIME0204 PilotList input reads data from delimiterto delimiter. The default delimitercharacter is a blank. Multipledelimiters are treated as one. Sowith the PAD option in effect, andFLOWOVER still in effect, the input statement must look to the next lineto fill the remaining :Reading files with variable linelengths can be frustrating,especially when one doesn't fullyunderstand how each option does, anddoesn't, work. The default option ofFLOWOVER expects to fill allvariables, and uses multiple lines was originally created to beused in conjunction with PAD andworks effectively and well in mostsituations.

10 However, this can be aCPU intensive process when reading anextremely large is a good tool for checkingcode and raw data when dealing withlarge, potentially messy files, sinceit forces the DATA step to stop thefirst time it finds a short was developed later thanthe missover and pad options, anddeals admirably with not only shortlines but with short is more also efficientsince it doesn't require the extra"padding".One more point about variable-lengthfiles. It is possible to copy in asubset of any raw data file into theDATA step and run these options onthe subset. Use an INFILE DATALINES;statement, and add whichever optionsare appropriate. For example;DATA test; INFILE datalines truncover ; input lastn $1-20 firstn $21-30 empid $31-35 jobcode $37-44;DATALINES;"add a number of data lines here sanssemicolons"RUN;ALL THE FILES, PLEASEA nother situation that might come upis where the raw data exists innumerous multiple files.


Related search queries