Example: marketing

Biostatistics 140.754 Advanced Methods in Biostatistics IV

Biostatistics Methods in Biostatistics IVJeffrey LeekAssistant ProfessorDepartment of / 66 Course primary focus of this course is regression modeling, alongwith other more modern approaches for estimating orpredicting the relationship between random prerequisites for this course are learning outcomes, syllabus, motivation, grading, etc. areavailable from the course ~jleek/teaching/2011/574 ILecture notes will be posted the night before evaluation will consist of a weekly reading assignment,a biweekly homework assignment, and a final / 66 Course Information - CreditsKen Rice (UW) - (slides with a are directly lifted from him)Jon Wakefield (UW)Brian Caffo (JHU)+ Assorted others as mentioned in the text.

I Methods: Biometrics, Annals of Applied Statistics, Biostatistics, Statistics in Medicine, Neuroimage, Genome Biology Modern methods papers use simulation studies to illustrate statistical properties; we will often do the same. Most PhD theses \resemble" methods papers, and contain material similar to that discussed in 574. A focus of this ...

Tags:

  Methods, Statistical, Advanced, Biostatistics, Biostatistics 140, 754 advanced methods in biostatistics

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of Biostatistics 140.754 Advanced Methods in Biostatistics IV

1 Biostatistics Methods in Biostatistics IVJeffrey LeekAssistant ProfessorDepartment of / 66 Course primary focus of this course is regression modeling, alongwith other more modern approaches for estimating orpredicting the relationship between random prerequisites for this course are learning outcomes, syllabus, motivation, grading, etc. areavailable from the course ~jleek/teaching/2011/574 ILecture notes will be posted the night before evaluation will consist of a weekly reading assignment,a biweekly homework assignment, and a final / 66 Course Information - CreditsKen Rice (UW) - (slides with a are directly lifted from him)Jon Wakefield (UW)Brian Caffo (JHU)+ Assorted others as mentioned in the text.

2 Any mistakes, typos,or otherwise misleading information is measurement error due / 66 What s So Great About Applied Statistics? I keep saying the sexy job in the next ten years will bestatisticians. People think I m joking, but who would ve guessedthat computer engineers would ve been the sexy job of the 1990s? - Hal Valarian (Google Chief Economist)4 / 66 Applied Statisticians Eric Lander Director Broad Steven Levi2 Freak onomics Nate Silver Daryl Morey Houston Rockets GM 5 / 66 Jobs For Applied Statisticians 6 / 66 Course Information - How does 574 fit in? 574 is an Advanced , level course. The following areassumed:ILinear algebra; expressions like (XTX) 1 XTYshould makesense to probability; manipulation of distributions,Central Limit Theorem, Laws of Large Numbers, somelikelihood theoryIIntroductory Regression; some familiarity with multipleregression will be helpfulIThe R Language; sufficient to implement the material above(and look up new stuff in help files)Please note:much of 574 will interpret regression from anon-parametric point of view.

3 This is a modern approach, and maydiffer from classical material you have seen / 66 Course Information - How does 574 fit in?574 is amethodscourseIThe main aim is to understand how/why Methods work andwhat practical situations where they will be most math will be limited in the lecture notes (unlike in673-674, 771-772), so expect some hand-waving ( ..under mild regularity conditions ).IMany of the course example will be short/stylized. However,the goal of the course is to provide both understanding ofspecific Methods and theirimplementation/ / 66 Course Information - How does 574 fit in? The term Methods is somewhat open to interpretation - this isone potential way to break journals down to give some insightITheory:Annals of Statistics, JRSSB, Statistica SinicaIData Analysis:JASA A&CS, JRSSC, Nature, NEJM, JAMA,Neuroimage, Genome BiologyIMethods:Biometrics, Annals of Applied Statistics, Biostatistics , Statistics in Medicine, Neuroimage, GenomeBiologyModern Methods papers use simulation studies to illustratestatistical properties; we will often do the PhD theses resemble Methods papers, and contain materialsimilar to that discussed in 574.

4 A focus of this course will bereading, understanding, and learning to construct academic / 66 Course Info - TextbooksThere is no fixed textbook for this course. A couple of usefulbooks may be:Modern Applied Statistics with SGeneralized Linear ModelsResearch papers will be featured, for more recent topics - 574 ismore cutting edge than some other courses we / 66 Course Info - TextbooksAnother couple of classics applied statisticians should haveaccess to:Elements of statistical LearningAnalysis of Longitudinal DataAn Introduction to the Bootstrap11 / 66 More Ridiculously Useful BooksAnother couple of really useful books - not 100% related to coursecontent, buthighlyrecommendedA course in large sample theory1 The Elements of ~jleek/teaching/2011/754/ instructor s favorite statistics book12 / 66 Course Info - Course ContentIReview of ideas behind regressionINon-parametric inference (generalized method of moments)ILikelihood + Quasi-Likelihood inferenceIBayesian inferenceIAnalysis of correlated data - generalized estimating equationsIBootstrappingIModel selection/shrinkage (Lasso, etc.)

5 IFactor analysis/principal components analysisIInteraction-based approaches to prediction/association ( )IMultiple testing13 / 66 Outline of Today s LectureIBackground (randomness, parameters, regression)IRegression with estimating equationsISandwich estimators of variance14 / 66 Terminology IThe response variable will be termed the outcome. Usually wewish to relate the outcome to (or Z, U)Preferred name2 OutcomeCovariate(s)Other names:ResponseRegressors, PredictorsOutputInputEndpointExplanatory VariableConfusing NameDependentIndependentIPredictor has causal connotations. [In]dependent is a poorchoice (the covariates need not be independent of each other- and may befixed, by an experimenter)IIn 574 we consider Y and X which are continuous, categorical,or counts; later in the course multivariate outcomes are brieflyconsidered (more on that in 755/56).

6 Outcomes which arecensored or mixed ( alcohol consumption) are alsopossible. Categorical variables may be nominal or by me15 / 66 What is Randomness? You may be used to thinking of the stochastic parts of randomvariables as just chance. Invery selectsituations this is fine;radioactive decay really does appear to be just chance3 However, this is not what random variables actually represent inmost applications, and it can be amisleadingsimplication tothink that its just chance that prevents us knowing the see this, consider the following thought ask Brian Caffo about / 66 What is Randomness? Recall high school For two resistors in series , the resistances are added to give atotal (Y , measured in Ohms, ) which werecordwithout errorWe know the number of gold stripes (X) andsilver stripes (Z ).

7 We also know that eachresistance is number of much resistance do stripes of eachcolor correspond to?17 / 66 What is Randomness? Thought experiment #1; Notethat in this situation thereno measurement error or noise ,andnothing randomis going is the value of eachgoldstripe?18 / 66 What is Randomness? Thought experiment #1; Notethat in this situation thereno measurement error or noise ,andnothing randomis going is the difference between Xand X+1?19 / 66 What is Randomness? Thought experiment #1; Notethat in this situation thereno measurement error or noise ,andnothing randomis going is the difference between Xand X+1?20 / 66 Thought Experiment Math Here s the truth;Yn 1= 01n 1+ 1Xn 1+ 2Zn 1wherenis evenly distributed between allX, not knowingZ, we will fit the relationshipY 01+ 1 XHere fit means that we will finde orthogonalto1andXsuchthatY= 01+ 1X+eBy linear algebra ( projection onto1andX) we must havee=Y (Y 1n Y (X X1)(X X1) (X X1)X)1 (Y (X X1)(X X1) (X X1))Xwhere X=X 1/(1 1) =X 1/n, the mean ofX- a / 66 Thought Experiment Math?

8 The fitted line, witheNote the orthogonality to1andXWhat s the slope of the line?22 / 66 Thought Experiment Math? What to remember (in real experiments too);IThe errors representeverything that we didn t random here - we just have imperfect informationIIf you arenevergoing to knowZ(or can t assume you know alot about it) this sort of marginal relationship is all thatcanbe learnedWhat youdidn tmeasure can t be / 66 Thought Experiment #2 A different design What is going on?24 / 66 Thought Experiment #2 PlottingYagainstX;25 / 66 Thought Experiment #2 PlottingYagainstX;.. and not knowingZ26 / 66 Thought Experiment #2 Here s the fitted line;.. what s the slope?What would you conclude?27 / 66 Thought Experiment #2 Here s the truth, for bothYandZ;Y= 01+ 1X+ 2ZZ= 01+ 1X+ where is orthongal to1,X.

9 Therefore,Y= 0+ 1X+ 2( 0+ 1X+ )= ( 0+ 2 0)1+ ( 1+ 2 1)X+ 2 01+ 1X+eand we get 1= 1if (and only if) there s nothing going on betweenZandX. The change we saw in theY Xslope (from#1 to #2) follows exactly this / 66 Thought Experiment #2 IThe marginal slope 1is not the wrong answer, but it maynot be the same as do you wnat? TheY Zslope ifZis fixed or ifZvaries withXin the same way it did in your experiment?INo one needs to know thatYis being measured for 16= 1to observed eare actually 2 here, so the noise doesn t simply reflect theZ Xrelationshipalone29 / 66 Thought Experiment #3 A final design .. a real mess!30 / 66 Thought Experiment #3 A final design .. / 66 Thought Experiment #3 A final design.

10 (Starts to look like real data!)32 / 66 Thought Experiment #3 IZandXwere orthogonal - what happened to the slope?IButthe variability ofZdepended onX. What happened toe,compared to #1 and # 2? We can extend all thesearguments toXn pandZn q- see Jon Wakefield s book formore. Reality also tends to have>1 un-pretty phenomenaper situation!In general, the nature of what we call randomness dependsheavilyon what is going on unobserved. Its only in extremelysimple situations4that unobservedpatternscan be dismissedwithout careful thought. Insomecomplex situations theycanbe dismissed, but only after careful probably don t require a PhD statistician33 / 66 Reality Check This is a realistically- complex system you might see in practiceYour X might be time(developmental) and Y expression of a particular geneKnowing the Y-X relationship isclearly useful, but pretending thatall the Z -X relationships arepretty is na ve (at best)34 / 66 Reality Check With reasonable sample sizen, inference ( learning about ) ispossible without making strong assumptions about the distributionof Y , and how it varies with X.


Related search queries