Example: tourism industry

MAT 211 Introduction to Business Statistics I Lecture Notes

MAT 211 Introduction to Business Statistics ILecture NotesMuhammad El-TahaDepartment of Mathematics and StatisticsUniversity of Southern Maine96 Falmouth StreetPortland, ME 04104-9300 MAT 211, Spring 97, revised Fall 97,revised Spring 98 MAT 211 Introduction to Business Statistics ICourse ContentTopic 1: Data AnalysisTopic 2: ProbabilityTopic 3: Random Variables and Discrete DistributionsTopic 4: Continuous Probability DistributionsTopic 5: Sampling DistributionsTopic 6: Point and Interval Estimation1 Contents1 DataAnalysis41 42 GraphicalMethods .. 63 8 155 Sample Mean and VarianceForGroupedData .. 166 162 Probability211 SampleSpaceandEvents .. 212 Probability of an 223 Laws of 24 4 CountingSamplePoints .. 275 RandomSampling .. 296 293 Discrete Random Variables341 342 ExpectedValueandVariance .. 363 37 4 MarkovChains.

MAT 211 Introduction to Business Statistics I Lecture Notes MuhammadEl-Taha DepartmentofMathematicsandStatistics UniversityofSouthernMaine 96FalmouthStreet

Tags:

  Business, Introduction, Statistics, 211 introduction to business statistics i

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of MAT 211 Introduction to Business Statistics I Lecture Notes

1 MAT 211 Introduction to Business Statistics ILecture NotesMuhammad El-TahaDepartment of Mathematics and StatisticsUniversity of Southern Maine96 Falmouth StreetPortland, ME 04104-9300 MAT 211, Spring 97, revised Fall 97,revised Spring 98 MAT 211 Introduction to Business Statistics ICourse ContentTopic 1: Data AnalysisTopic 2: ProbabilityTopic 3: Random Variables and Discrete DistributionsTopic 4: Continuous Probability DistributionsTopic 5: Sampling DistributionsTopic 6: Point and Interval Estimation1 Contents1 DataAnalysis41 42 GraphicalMethods .. 63 8 155 Sample Mean and VarianceForGroupedData .. 166 162 Probability211 SampleSpaceandEvents .. 212 Probability of an 223 Laws of 24 4 CountingSamplePoints .. 275 RandomSampling .. 296 293 Discrete Random Variables341 342 ExpectedValueandVariance .. 363 37 4 MarkovChains.

2 394 Continuous Distributions471 472 473 Uniform:U[a,b].. 50 4 Exponential .. 512 5 Sampling Distributions 5 51 TheCentralLimitTheorem(CLT) .. 552 556 Large Sample Estimation601 602 PointEstimatorsandTheirProperties .. 613 SingleQuantitativePopulation .. 61 4 SingleBinomialPopulation .. 635 656 663 Chapter 1 Data AnalysisChapter ProblemsDescriptive StatisticsGraphical MethodsFrequency Distributions (Histograms)Other MethodsNumerical methodsMeasures of Central TendencyMeasures of VariabilityEmpirical RulePercentiles1 IntroductionStatistical Problems1. A market analyst wants to know the effectiveness of a new A pharmaceutical Co. wants to know if a new drug is superior to already existingdrugs, or possible side How fuel efficient a certain car model is?4. Is there any relationship between your GPA and employment If you answer all questions on a (T,F) (or multiple choice) examination completelyrandomly, what are your chances of passing?

3 6. What is the effect of package designs on How to interpret polls. How many individuals you need to sample for your infer-ences to be acceptable? What is meant by the margin of error?8. What is the effect of market strategy on market share?9. How to pick the stocks to invest in?I. DefinitionsProbability:A game of chanceStatistics:Branch of science that deals with data analysisCourse objective:To make decisions in the prescence of uncertaintyTerminologyData:Any recorded event ( times to assemble a product)Information:Any aquired data ( A collection of numbers (data))Knowledge:Useful dataPopulation:set of all measurements of interest( all registered voters, all freshman students at the university)Sample:A subset of measurements selected from the population of interestVariable:A property of an individual population unit ( major, height, weight offreshman students)Descriptive Statistics :deals with procedures used tosummarizethe information con-tained in a set of Statistics :deals with procedures used to make inferences (predictions)about a population parameter from information contained in a of a statistical problem.

4 (i) A clear definition of the population and variable of interest.(ii) a design of the experiment or sampling procedure.(iii) Collection and analysis of data (gathering and summarizing data).(iv) Procedure for making predictions about the population based on sample infor-mation.(v) A measure of goodness or reliability for the (better statement)To make inferences (predictions, decisions) about certain characteristics of a popula-tion based on information contained in a of data:qualitative vs quantitative OR discrete vs continuousDescriptive statisticsGraphical vs numerical methods52 Graphical MethodsFrequency and relative frequency distributions (Histograms):ExampleWeight Loss 5. 4 41 :Provide a useful summary of the available :Construct a statistical graph called a histogram (or frequency distribution)Weight Loss Dataclass bound- tally ,ffreq, (.)

5 12) (.20) (.28) (.2 4) (.12) (.04) # of classesmax = largest measurementmin = smallest measurementn=samplesizew=classwidthRule of thumb:-The number of classes chosen is usually between 5 and 20. (Most of the time between7 and 13.)-The more data one has the larger is the number of :k=1+ (n);w=max :w= But we usedw=29 56= (why?)Graphs: Graph the frequency and relative frequency the above example using 12 and 4classes respectively. Comment onthe usefulness of each includingk= in Constructing a Frequency Distribution(Histogram)1. Determine the number of classes2. Determine the class width3. Locate class boundaries4. Proceed as abovePossible shapes of frequency distributions1. Normal distribution (Bell shape)2. Exponential3. Uniform4. Binomial, Poisson (discrete variables)Important-The normal distribution is the most popular, most useful, easiest to handle- It occurs naturally in practical applications- It lends itself easily to more in depth analysisOther Graphical Methods-Statistical Table: Comparing different populations- Bar Charts- Line Charts- Pie-Charts- Cheating with Charts73 NumericalmethodsMeasures of CentralMeasures of DispersionTendency(Variability)1.

6 Sample mean1. Range2. Sample median 2. Mean Absolute Deviation (MAD)3. Sample mode3. Sample Variance4. Sample Standard DeviationI. Measures of Central TendencyGiven a sample of measurements (x1,x2, ,xn)wheren=samplesizexi= value of theithobservation in the sample1. Sample Mean (arithmetic average)x=x1+x2+ +xnnorx= xnExample 1: Given a sample of 5 test grades(90, 95, 80, 60, 75)then x= 90 + 95 + 80 + 60 + 75 = 400x= xn=4005= 2:Letx= age of a randomly selected student sample:(20, 18, 22, 29, 21, 19) x= 20 + 18 + 22 + 29 + 21 + 19 = 129x= xn=1296= Sample MedianThe median of a sample (data set) is the middle number when the measurements arearranged in ascending :Ifnis odd, the median is the middle number8 Ifnis even, the median is the average of the middle two 1: Sample (9, 2, 7, 11, 14),n=5 Step 1: arrange in ascending order2, 7, 9, 11, 14 Step 2: med = 2: Sample (9, 2, 7, 11, 6, 14),n=6 Step 1: 2, 6, 7, 9, 11, 14 Step 2: med =7+92= :(i)xissensitiveto extreme values(ii) the median isinsensitiveto extreme values (because median is a measure oflocation or position).

7 The value ofx(observation) that occurs with the greatest : Sample: (9, 2, 7, 11, 14, 7, 2, 7), mode = 79 Effect ofx, median and mode on relative frequency Measures of VariabilityGiven: a sample of sizensample: (x1,x2, ,xn)1. Range:Range = largest measurement - smallest measurementor Range = max - minExample 1: Sample (90, 85, 65, 75, 70, 95)Range = max - min = 95-65 = 302. Mean Absolute Difference (MAD)(not in textbook)MAD = |x x|nExample 2: Same samplex= xn=80xx x|x x|901010855565-151575-5570-1010951515 Totals 480060 MAD = |x x|n=606= :(i) MAD is a good measure of variability(ii) It is difficult for mathematical manipulations3. Sample Variance,s2s2= (x x)2n 14. Sample Standard Deviation,s11s= s2ors= (x x)2n 1 Example: Same sample as before (x= 80)xx x(x x)290 101008552565 -1522575-52570 -1010095 15225 Totals 480 0700 Thereforex= xn=4806=80s2= (x x)2n 1=7005=140s= s2= 140 = Formula for Calculatings2andss2= x2 ( x)2nn 1s= x2 ( x)2nn 1(ors= s2).

8 Example: Same sample12xx290 810085 722565 422575 562570 490095 9025 Totals 480 39,100s2= x2 ( x)2nn 1=39,100 (480)265=39,100 38,4005=7005=140s= s2= 140 = methods(Summary)Data:{x1,x2, ,xn}(i) Measures of central tendencySample mean:x= xinSample median: the middle number when the measurements are arranged in ascendingorderSample mode: most frequently occurring value(ii) Measures of variabilityRange:r=max minSample Variance:s2= (xi x)2n 1 Sample standard deviation: s= s2 Exercise:Find all the measures of central tendency and measures of variability for theweight loss Interpretation of the Variance:Finite PopulationsLetN= population :{x1,x2, ,xN}Population mean: = xiNPopulation variance: 2= (xi )2N13 Population standard deviation: = 2, = (xi )2 NPopulation parameters vs sample Statistics :x, s2, parameters: , 2, .Practical Significance of the standard deviationChebyshev s Inequality.

9 (Regardless of the shape of frequency distribution)Given a numberk 1, and a set of measurementsx1,x2,..,xn,atleast(1 1k2)ofthe measurements lie withinkstandard deviations of their sample least (1 1k2) observations lie in the interval (x ks,x+ks). set of grades hasx=75,s=6. Then(i) (k= 1): at least 0% of all grades lie in [69,81](ii) (k= 2): at least 75% of all grades lie in [63,87](iii) (k= 3): at least 88% of all grades lie in [57,93](iv) (k= 4): at least ?% of all grades lie in [?,?](v) (k= 5): at least ?% of all grades lie in [?,?]Suppose that you are told that the frequency distribution is bell shaped. Can youimprove the estimates in Chebyshev s a set of measurementsx1,x2,..,xn, that is bell shaped. Then(i) approximately 68% of the measurements lie withinonestandard deviations of theirsample mean, (x s,x+s)(ii) approximately 95% of the measurements lie withintwostandard deviations oftheir sample mean, (x 2s,x+2s)(iii) at least (almost all) 99% of the measurements lie withinthreestandard deviationsof their sample mean, (x 3s,x+3s)ExampleAdatasethasx=75,s= 6.

10 The frequency distribution is known to benormal (bell shaped). Then(i) (69,81) contains approximately 68% of the observations(ii) (63,87) contains approximately 95% of the observations(iii) (57,93) contains at least 99% (almost all) of the observationsComments.(i) Empirical rule works better if sample size is large(ii) In your calculations always keep 6 significant digits14(iii) Approximation:s range4(iv) Coefficient of variation ( ) =sx4 PercentilesUsing percentiles is useful if data is badly ,x2,..,xnbe a set of measurements arranged in increasing 0<p<100. Thepthpercentile is a numberxsuch thatp%ofallmeasurements fall below thepthpercentile and (100 p)% fall above : 2,5,8,10,11,14,17,20.(i) Find the (S1) position =.3(n+1)=.3(9) = (S2) 30th percentile = 5 +.7(8 5) = 5 + Lower Quartile(25th percentile)Example.(S1) position =.25(n+1)=.25(9) = (S2)Q1=5+.


Related search queries