Example: quiz answers

Think Bayes - Green Tea Press

Think BayesBayesian statistics Made SimpleVersion BayesBayesian statistics Made SimpleVersion B. DowneyGreen Tea PressNeedham, MassachusettsCopyright 2012 Allen B. Tea Press9 Washburn AveNeedham MA 02492 Permission is granted to copy, distribute, and/or modify this documentunder the terms of the Creative Commons Unported License, which is available My theory, which is mineThe premise of this book, and the other books in theThink Xseries, is that ifyou know how to program, you can use that skill to learn other books on Bayesian statistics use mathematical notation and presentideas in terms of mathematical concepts like calculus. This book usesPython code instead of math, and discrete approximations instead of con-tinuous mathematics. As a result, what would be an integral in a math bookbecomes a summation, and most operations on probability distributions aresimple Think this presentation is easier to understand, at least for people with pro-gramming skills.

Think Bayes Bayesian Statistics Made Simple Version 1.0.9 Allen B. Downey Green Tea Press Needham, Massachusetts

Tags:

  Statistics, Ebay, Think, Think bayes

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of Think Bayes - Green Tea Press

1 Think BayesBayesian statistics Made SimpleVersion BayesBayesian statistics Made SimpleVersion B. DowneyGreen Tea PressNeedham, MassachusettsCopyright 2012 Allen B. Tea Press9 Washburn AveNeedham MA 02492 Permission is granted to copy, distribute, and/or modify this documentunder the terms of the Creative Commons Unported License, which is available My theory, which is mineThe premise of this book, and the other books in theThink Xseries, is that ifyou know how to program, you can use that skill to learn other books on Bayesian statistics use mathematical notation and presentideas in terms of mathematical concepts like calculus. This book usesPython code instead of math, and discrete approximations instead of con-tinuous mathematics. As a result, what would be an integral in a math bookbecomes a summation, and most operations on probability distributions aresimple Think this presentation is easier to understand, at least for people with pro-gramming skills.

2 It is also more general, because when we make modelingdecisions, we can choose the most appropriate model without worrying toomuch about whether the model lends itself to conventional , it provides a smooth development path from simple examples to real-world problems. Chapter 3 is a good example. It starts with a simple ex-ample involving dice, one of the staples of basic probability. From thereit proceeds in small steps to the locomotive problem, which I borrowedfrom Mosteller sFifty Challenging Problems in Probability with Solutions, andfrom there to the German tank problem, a famously successful applicationof Bayesian methods during World War Modeling and approximationMost chapters in this book are motivated by a real-world problem, so theyinvolve some degree of modeling.

3 Before we can apply Bayesian methods(or any other analysis), we have to make decisions about which parts of theviChapter 0. Prefacereal-world system to include in the model and which details we can example, in Chapter 7, the motivating problem is to predict the winnerof a hockey game. I model goal-scoring as a Poisson process, which impliesthat a goal is equally likely at any point in the game. That is not exactly true,but it is probably a good enough model for most Chapter 12 the motivating problem is interpreting SAT scores (the SAT isa standardized test used for college admissions in the United States). I startwith a simple model that assumes that all SAT questions are equally diffi-cult, but in fact the designers of the SAT deliberately include some questionsthat are relatively easy and some that are relatively hard.

4 I present a secondmodel that accounts for this aspect of the design, and show that it doesn thave a big effect on the results after Think it is important to include modeling as an explicit part of problemsolving because it reminds us to Think about modeling errors (that is, errorsdue to simplifications and assumptions of the model).Many of the methods in this book are based on discrete distributions, whichmakes some people worry about numerical errors. But for real-world prob-lems, numerical errors are almost always smaller than modeling , the discrete approach often allows better modeling decisions,and I would rather have an approximate solution to a good model than anexact solution to a bad the other hand, continuous methods sometimes yield performanceadvantages for example by replacing a linear- or quadratic-time compu-tation with a constant-time I recommend a general process with these steps:1.

5 While you are exploring a problem, start with simple models and im-plement them in code that is clear, readable, and demonstrably your attention on good modeling decisions, not Once you have a simple model working, identify the biggest sourcesof error. You might need to increase the number of values in a discreteapproximation, or increase the number of iterations in a Monte Carlosimulation, or add details to the If the performance of your solution is good enough for your applica-tion, you might not have to do any optimization. But if you do, thereare two approaches to consider. You can review your code and Working with the codeviifor optimizations; for example, if you cache previously computed re-sults you might be able to avoid redundant computation. Or you canlook for analytic methods that yield computational benefit of this process is that Steps 1 and 2 tend to be fast, so you canexplore several alternative models before investing heavily in any of benefit is that if you get to Step 3, you will be starting with a ref-erence implementation that is likely to be correct, which you can use forregression testing (that is, checking that the optimized code yields the sameresults, at least approximately).

6 Working with the codeThe code and sound samples used in this book are available Git is a version control system thatallows you to keep track of the files that make up a project. A collection offiles under Git s control is called a repository . GitHub is a hosting servicethat provides storage for Git repositories and a convenient web GitHub homepage for my repository provides several ways to workwith the code: You can create a copy of my repository on GitHub by pressing theForkbutton. If you don t already have a GitHub account, you ll need tocreate one. After forking, you ll have your own repository on GitHubthat you can use to keep track of code you write while working onthis book. Then you can clone the repo, which means that you copythe files to your computer. Or you could clone my repository.

7 You don t need a GitHub accountto do this, but you won t be able to write your changes back to GitHub. If you don t want to use Git at all, you can download the files in a Zipfile using the button in the lower-right corner of the GitHub code for the first edition of the book works with Python 2. If youare using Python 3, you might want to use the updated code developed this book using Anaconda from Continuum Analytics, whichis a free Python distribution that includes all the packages you ll need toviiiChapter 0. Prefacerun the code (and lots more). I found Anaconda easy to install. By defaultit does a user-level installation, not system-level, so you don t need admin-istrative privileges. You can download Anaconda you don t want to use Anaconda, you will need the following packages: NumPy for basic numerical computation, ; SciPy for scientific computation.

8 Matplotlib for visualization, these are commonly used packages, they are not included with allPython installations, and they can be hard to install in some you have trouble installing them, I recommend using Anaconda or one ofthe other Python distributions that include these of the examples in this book use classes and functions defined Some of them also , which provideswrappers for some of the functions inpyplot, which is part Code styleExperienced Python programmers will notice that the code in this bookdoes not comply with PEP 8, which is the most common style guide forPython ( 0008/).Specifically, PEP 8 calls for lowercase function names with underscores be-tween words,like_this. In this book and the accompanying code, functionand method names begin with a capital letter and use camel case, broke this rule because I developed some of the code while I was a VisitingScientist at Google, so I followed the Google style guide, which deviatesfrom PEP 8 in a few places.

9 Once I got used to Google style, I found that Iliked it. And at this point, it would be too much trouble to on the topic of style, I write Bayes s theorem with ansafter the apos-trophe, which is preferred in some style guides and deprecated in others. Idon t have a strong preference. I had to choose one, and this is the one finally one typographical note: throughout the book, I use PMF andCDF for the mathematical concept of a probability mass function or cumu-lative distribution function, and Pmf and Cdf to refer to the Python objectsI use to represent PrerequisitesThere are several excellent modules for doing Bayesian statistics in Python,includingpymcand OpenBUGS. I chose not to use them for this book be-cause you need a fair amount of background knowledge to get started withthese modules, and I want to keep the prerequisites minimal.

10 If you knowPython and a little bit about probability, you are ready to start this 1 is about probability and Bayes s theorem; it has no code. Chap-ter 2 introducesPmf, a thinly disguised Python dictionary I use to representa probability mass function (PMF). Then Chapter 3 introducesSuite, a kindof Pmf that provides a framework for doing Bayesian some of the later chapters, I use analytic distributions including the Gaus-sian (normal) distribution, the exponential and Poisson distributions, andthe beta distribution. In Chapter 15 I break out the less-common Dirichletdistribution, but I explain it as I go along. If you are not familiar with thesedistributions, you can read about them on Wikipedia. You could also readthe companion to this book, Think Stats, or an introductory statistics book(although I m afraid most of them take a mathematical approach that is notparticularly helpful for practical purposes).


Related search queries