Example: confidence

Using Excel to Clean and Prepare Data for Analysis

The Industrial-Organizational Psychologist89 Using Excel to Clean and Prepare data for AnalysisR. Jason WeissDevelopment Dimensions InternationalRobert J. TownsendCalifornia State University FullertonCleaning data and preparing it for Analysis is one of thosethankless jobs that is dull, laborious, and painstaking, no matter which way youslice it. The cost of a mistake is considerable, too, as you will discover if youtry to report an observed Fof 317. We think the burden can be greatly reducedwith some help from our old friend, Excel . It s true that many of us already useExcel to Clean and Prepare data for Analysis , but our sense is that few peopleleverage Excel s considerable strengths in a systematic way. In this article, wedescribe a power user s approach to cleaning and preparing your data withExcel.

Excel to clean and prepare data for analysis, but our sense is that few people leverage Excel’s considerable strengths in a systematic way. In this article, we ... An easy way to do this is to use Excel’s Find/Replacefunctionality. Start by highlighting Column F, which contains

Tags:

  Analysis, Data, Excel, Clean, Prepares, Use excel, Excel to clean and prepare data for analysis

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of Using Excel to Clean and Prepare Data for Analysis

1 The Industrial-Organizational Psychologist89 Using Excel to Clean and Prepare data for AnalysisR. Jason WeissDevelopment Dimensions InternationalRobert J. TownsendCalifornia State University FullertonCleaning data and preparing it for Analysis is one of thosethankless jobs that is dull, laborious, and painstaking, no matter which way youslice it. The cost of a mistake is considerable, too, as you will discover if youtry to report an observed Fof 317. We think the burden can be greatly reducedwith some help from our old friend, Excel . It s true that many of us already useExcel to Clean and Prepare data for Analysis , but our sense is that few peopleleverage Excel s considerable strengths in a systematic way. In this article, wedescribe a power user s approach to cleaning and preparing your data withExcel.

2 We suggest a phased approach that produces Analysis -ready data with-out destroying the original dataset. We ll also look at ways to document yourdataset so that it will make sense when reviewed at a later point, or by other peo-ple. We conclude with a note about a presentation at the upcoming SIOP con-ference that needs your input!APhased Approach to data PreparationWhy is cleaning and preparing data such a pain? Part of the problem isthe lack of an easy, sensible, and common process. Another is the fact thatpeople rarely document their datasets effectively how often have you lookedat a folder with three slightly different files, all named some variant of FinalProject ,and wondered which one was the realdataset you used foryour analyses 6 months ago? A third source of frustration, which follows fromthe second, is that it s easy to lose your place if you get distracted or have tocorrect a mistake you made several changes earlier.

3 Oh Did I just undothe recoding of those reverse-coded items? Better start over, just to be and your data deserve better. We certainly won t say that the processwe propose is ideal or necessarily suitable for all circumstances. We do feel,though, that it reflects some of the best practices we ve discovered over manycombined years of working with data in Excel . Plus, it s a process you canuse consistently, which helps you in two ways: First, it replaces the need toreinvent the wheel every time you work with data . Second, once you knowthe process, you can quickly understand any data file created with it. Let sstart with a brief look at the main steps in the process:1. Create the data file. We will use several worksheets within a singleExcel file to represent our data at each major stage of the process, from ourinitial raw data through several stages of transformation to the final, Analysis -ready Clean the this stage, we remove any elements we don t wantto leave in our dataset, such as duplicate entries, out-of-range data , and extra-neous characters.

4 The outcome is a Clean set of raw Process the data . The processing stage is where we Prepare thecleaned raw data for Analysis through parsing, recoding, reformatting, andother Create an Analysis -ready copy of the data . Here, we copy the finalset of data for import into a statistics package. 5. Document the data . Finally, we add any necessary documentation tothe data file so that the actions taken on the data are clear when the file isrevisited by others or at a later 1. Create the data FileWe recommend creating separate worksheets in an Excel data file foreach logical step in the data cleaning process. This has a number of , the original data and all transformations are preserved, so it does notrequire much effort to back up a step. Second, the worksheet labels makeclear the main differences between the worksheets.

5 Finally, you never haveto play detective to figure out the differences between multiple files con-taining what look like the same data . Following are the worksheets we will use:Original worksheet contains the data as originally capturedor entered. No actions will be taken on this worksheet except to copy the datato the next sheet, where we will Clean and process it. This worksheet, then,exists solely to maintain a pristine copy of the base data . This starts out as a copy of the original data , which is thencleaned and processed to produce Analysis -ready data . Final sheet contains a literal copy of the columns and rowsof data you plan to use in your analyses. We discuss below why it is neces-sary to have a separate worksheet for your final up your worksheets is simple. By default, Excel opens with threeworksheets available.

6 Additional worksheets can be added to the workbookby selecting Insert | Worksheet. The default names for the worksheets areSheet1, Sheet2, and so forth. You can change these by doubleclicking on theworksheet names on the tabs at the bottom of the screen, or by clicking For-mat | Sheet | Rename. Note that Excel sometimes abbreviates worksheetas simply sheet. Don t worry, the two terms are 2005 Volume 42 Number 3A quick word of advice on naming your worksheets: Excel permits spacesin worksheet names, but these become onerous in functions that refer to cellsacross worksheets. We suggest following a convention of Using upper- andlowercase letters to suggest word separation in worksheet names. For exam-ple, OriginalDatais a perfectly legible worksheet name. Alternately, useunderscore characters for spaces, for example, Processed_Data.

7 With thatsaid, we will maintain spaces within worksheet names as a means of main-taining readability through the remainder of this will use the example of a standard data cleaning task in which wemanipulate a single data file, displayed as Exhibit 1. The file we are Using isbased on fictional data culled from the equally fictional Weiss Circus ClownSelection Test-Revised, which has swelled to 10 items, two of which arereverse coded (see Weiss, 2004b for more information on the originalWCCST). The file is available for download at if you wish to follow along with the dataset. Exhibit 1. Sample 2. Clean the DataThe goal of the data cleaning process is to preserve meaningful datawhile removing elements that may impede our ability to run the analyseseffectively or otherwise affect the quality of the statistics that result.

8 Candi-dates for removal include duplicate records, extraneous characters withincells, or out-of-range values. Note that we will be acting directly on thedata during this phase, though we can always hit the undo button if wemake a mistake. The first step is to copy the data over from theOriginal-Datasheet to the InterimDatasheet. To do so, highlight all cells in theOriginalDatasheet directly or by typing CTRL+Aon your the data and paste it into the InterimDatasheet. Before we start cleaning, we will assign each row in the spreadsheet anID number. This way, if we delete a row (presumably because it s a dupli-cate), we can tell from the gaps in ID numbers where the deleted rows Industrial-Organizational Psychologist91It s a pretty simple process to create an ID: Insert a column to the left of col-umn A and input the number 1 into the cell next to the first row of data (cellA2).

9 Highlight the cells that need to be numbered, click Edit | Fill |Series, input a step value of 1, and hit OK. In our example, the cells will nowbe numbered from 1 to 11. Now we re ready to proceed with the duplicate common step in preparing data for a statsprogram is searching for and removing duplicate entries. Our strategy is tocreate a key that uniquely identifies each person in the dataset; we will thensort the data based on this key and check if the key shows up in adjacentrows. We create the key by copying the values of the First Name, MI, LastName, and Street Addresscells and concatenating them into one cell. Happi-ly, Excel s CONCATENATE()function does all of the hard work for us. InRow 2 of the first empty column, enter the following formula: =CON-CATENATE(A2,B2,C2,D2). Now that we have our key, we need to searchfor duplicates.

10 First, we need to sort by our keys. Click on data | Sortandselect the column containing the key (Column R in our example). With thedata sorted, we can then proceed to check if the key shows up in adjacentrows. In Row 2 of the first empty column, enter =EXACT(R2,R3)and copythe formula down to the remaining cells. The EXACT()function returnsTRUEif the values it is given are identical, and FALSE otherwise. Here s a power user s hint. In a large dataset, it can become tiresome tolocate all the TRUE cells. A simple way around this is to automatically for-mat all the TRUE cells to a different color. Select the test column and clickon Format | Conditional Formatting. Arrange the drop down boxes toread Cell Value Is Equal To TRUE . Select the Format button, fol-lowed by the Patternstab, then choose the color to highlight the cell if thefunction value is true.


Related search queries