Transcription of seven - dssbooks.com
1 147sevenStatistical Analysiswith in 2/22/07 11:37 AM Page chapter illustrates the tools available in Excel for performing statistical analysis. These toolsinclude some new functions, the Data Analysis Toolpack, and some new chart features. Thischapter is not intended to teach the statistical concepts which can be used in Excel s analysis, butrather demonstrate to the reader that several tools are available in Excel to perform these statis-tical functions. Statistical analysis is used often in DSS applications for analyzing input and dis-playing conclusive output. These tools will be used especially in applications involvingsimulation. Some examples of such DSS applications include the Birthday Simulation and PokerSimulation cases in Part III of the text.
2 Other applications which rely on statistical analysis arethe Queuing cases and the Reliability Analysis case. A user may want to analyze historical datafor forecasting purposes, analyze the performance of a simulation to test the quality of theirmodel and parameters, or understand the probability of some future results in order to aid indecision making. We discuss the application of statistical analysis in simulation in Chapter 9 andagain in Chapter 20 with this chapter, the reader will learn how to: Perform basic statistical analysis of data using Excel functions. Use some of the statistical features of the Data Analysis Toolpack such as Descriptive Sta-tistics and Histograms. Work with trend curves to analyze data patterns.
3 Perform basic linear regression techniques in Excel. Work with several different distribution functions in DataStatistical analysis provides an understanding of a set of data. Using statistics, we can determinean average value, a variation of the data from this average, a range of data values, and performother interesting analysis. We begin this analysis by using statistical Excel of the basic statistical calculations to perform is finding the meanof a set of numbers;the mean is simply the average, which we learned how to calculate with the AVERAGE functionin Chapter 4:=AVERAGE(range or range_name)Figure displays a table of family incomes for a given year. We first name this range ofdata, cells B4:B31, as FamIncome. We can now find the average, or mean, family income forthat year using the AVERAGE function as follows (see Figure ):=AVERAGE(FamIncome)Similar to the mean, the mediancan also be considered the middle value of a set of num-bers.
4 The median is the middle number in a list of sorted data. To find the median, we use theMEDIAN function, which takes a range of data as its parameter:=MEDIAN(range or range_name)148 CHAPTER 7 Statistical Analysis with Excelch07_4769 2/22/07 11:37 AM Page 148 Figure incomes for a given the mean, or average, of all family incomes using the AVERAGE determine the median of the above family incomes, we enter the MEDIAN function asfollows:=MEDIAN(FamIncome)We can check whether or not this function has returned the correct result by sorting thedata and finding the middle number (refer to Chapter 10 for details on sorting). Since there arean even number of family incomes recorded in the table, we must average the two middle num-bers.
5 The result is the same (see Figure ).SECTION Understanding Data149ch07_4769 2/22/07 11:37 AM Page 149 Figure the MEDIAN function and verifying the result by sorting the data and finding themiddle important value, standard deviation, is the square root of the variance, whichmeasures the difference between the mean of the data set and the individual values. Finding thestandard deviation is simple with the STDEV function. The parameter for this function is alsojust the range of data for which we are calculating the standard deviation:=STDEV(range or range_name)In Figure , we calculate the standard deviation of the family income data using the fol-lowing function:=STDEV(FamIncome)Figure the STDEV function. 150 CHAPTER 7 Statistical Analysis with Excelch07_4769 2/22/07 11:37 AM Page 150 SECTION Understanding Data151 SummaryStatistical Functions:AVERAGEF inds the mean of a set of the median of a set of the standard deviation of a set of Analysis Toolpackprovides an additional method by which to perform statisticalanalysis.
6 This Excel Add-In includes statistical analysis techniques such as Descriptive Statistics,Histograms, Exponential Smoothing, Correlation, Covariance, Moving Average, and others (seeFigure ). These tools automate a sequence of calculations that require much data manipula-tion if only Excel functions are being used. We will now discuss how to use Descriptive Statisticsand Histogramsin the Analysis Toolpack. (Refer to Appendix A for more discussion on ExcelAdd-Ins.)(Note: Before using the Analysis Toolpack, we must ensure that it is an active Add-in. To do so, chooseTools > Add-insfrom the Excel menu and select Analysis Toolpackfrom the list. If you do not see it onthe list, you may need to update your installation of Excel on your computer.)
7 After you have checkedAnalysis Toolpackon the Add-inslist, you should find the Data Analysisoption under the Toolsmenuoption.) StatisticsThe Descriptive Statisticsoption provides a list of statistical information about our data set, in-cluding the mean, median, standard deviation, and variance. To use Descriptive Statistics, we goto Tools > Data Analysis > Descriptive Statistics. Choosing the Descriptive Statisticsoption fromthe Data Analysiswindow (shown in Figure ) displays a new window (shown in Figure ). Figure Data Analysis dialog box provides alist of analytical Descriptive Statistics dialog box ap-pears after it is chosen from the Data Analysis 2/22/07 11:37 AM Page 151 The Input Rangerefers to the location of the data set.
8 We can check whether our data isGrouped By Columnsor Rows. If there are labels in the first row of each column of data, then wecheck the Labels in First Rowbox. The Output Rangerefers to where we want the results of theanalysis to be displayed in the current worksheet. We could also place the analysis output in anew worksheet or a new workbook. The Summary Statisticsbox calculates the most commonlyused statistics from our data. We will discuss the last three options, Confidence Level for Mean,Kth Largest, and Kth Smallest, later in the us now consider an example in order to appreciate the benefit of this tool. In Figure , there is a table containing quarterly stock returns for three different companies. We wantto determine the average stock return, the variability of stock returns, and which quarters hadthe highest and lowest stock returns for each company.
9 This information could be very useful forselecting a company in which to use the Descriptive Statistics tool to answer these questions. In the Descriptive Statisticsdialog box (see Figure ), we enter the range B3:D27for the Input Range. (Notice that we donot select the first column, Date, since we are not interested in a statistical analysis of these val-ues.) Next, we check that our data is Grouped By Columns; since we do have labels in the first rowof each column of data, we check the Labels in First Rowbox. We now specify G3as the locationof the output in the Output Rangeoption. After checking Summary Statistics, we press OK(with-out checking any of the last three options) to observe the results shown below in Figure 7 Statistical Analysis with ExcelFigure stock returns for three the Descriptive Statistics dialog boxfor the above example , let us become familiar with the Mean, Median, and Mode.
10 As already mentioned, theMeanis simply the average of all values in a data set, or all observations in a sample. We have al-ready observed that without the Analysis Toolpack, the mean value can be found with the AV-ch07_4769 2/22/07 11:37 AM Page 152 ERAGE function in Excel. The Medianis the middle observation when the data is sorted in as-cending order. If there is an odd number of values, then the median is truly the middle value. Ifthere is an even number of values, then it is the average of the two middle values. Figure results of the Descriptive Statistics analysis for the example Modeis the most frequently occurring value. If there is no repeated value in the data set,then there is no Modevalue, as in this example (considering all decimal values).