Example: quiz answers

Hierarchical Bayesian Modeling

Hierarchical Bayesian ModelingAngie Wolfgang NSF Postdoctoral Fellow, Penn Stateabout a populationMaking scientific inferencesbased on many individualsAstronomical PopulationsSchawinski et al. 2014 Lissauer, Dawson, & Tremaine, 2014 Once we discover an object, we look for more ..to characterize their properties and understand their planetsAstronomical PopulationsOr we use many (often noisy) observations of a single objectto gain insight into its physics. Hierarchical Modelingis a statistically rigorous way to make scientific inferences about a population (or specific object) based on many individuals (or observations).Frequentist multi-level Modeling techniques exist, but we will discuss the Bayesian approach : variability of sample(If __ is the true value, what fraction of many hypothetical datasets would be as or more discrepant from __ as the observed one?)

Hierarchical Modeling is a statistically rigorous way to make scientific inferences about a population (or specific object) based on many individuals (or observations). Frequentist multi-level modeling techniques exist, but we will discuss the Bayesian approach today. Frequentist: variability of sample

Tags:

  Hierarchical

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of Hierarchical Bayesian Modeling

1 Hierarchical Bayesian ModelingAngie Wolfgang NSF Postdoctoral Fellow, Penn Stateabout a populationMaking scientific inferencesbased on many individualsAstronomical PopulationsSchawinski et al. 2014 Lissauer, Dawson, & Tremaine, 2014 Once we discover an object, we look for more ..to characterize their properties and understand their planetsAstronomical PopulationsOr we use many (often noisy) observations of a single objectto gain insight into its physics. Hierarchical Modelingis a statistically rigorous way to make scientific inferences about a population (or specific object) based on many individuals (or observations).Frequentist multi-level Modeling techniques exist, but we will discuss the Bayesian approach : variability of sample(If __ is the true value, what fraction of many hypothetical datasets would be as or more discrepant from __ as the observed one?)

2 Bayesian : uncertainty of inference(What s the probability that __ is the true value given the current data?)Understanding Bayesx = data = the parameters of a model that can produce the data p() = probability density distribution of | = conditional on , or given p( ) = prior probability (How probable are the possible values of in nature?) p(x| ) = likelihood, or sampling distribution (Ties your model to the data probabilistically: how likely is the data you observed given specific values?) p( |x) = posterior probability (A new prior distribution, updated with information contained in the data: what is the probability of different values given the data and your model?)Bayes Theoremp( |x) p(x| ) p( )(straight out of conditional probability)posteriorlikelihoodprior(We just learned how to evaluate p( |x) numerically to infer from x.)

3 (But let s get a better intuition for the statistical model itself.)Applying Bayesp( |x) p(x| ) p( )posteriorlikelihoodpriorExample (1-D): Fitting an SED to photometryx = 17 measurements of L = age of stellar population, star formation timescale , dust content AV, metallicity, redshift, choice of IMF, choice of dust reddening lawNelson et al. 2014 Model: Stellar Population SynthesisModel can be summarized as f(x| ): Maps this is NOT p(x| ) because f(x| ) is not a probability distribution!!xf(x| )Applying Bayesp( |x) p(x| ) p( )posteriorlikelihoodpriorExample (1-D): Fitting an SED to photometryx = 17 measurements of L = age of stellar population, star formation timescale , dust content AV, metallicity, redshift, choice of IMF, choice of dust reddening lawModel: Stellar Population SynthesisModel can be summarized as f(x| ): Maps this is NOT p(x| ) because f(x| ) is not a probability distribution!

4 !If use 2 for fitting, then you are implicitly assuming that:p(xi| ) =where = f(xi| ) and = statistical measurement error you are assuming Gaussian noise (if you could redo a specific xi the same way many times, you d find:) Applying Bayesp( |x) p(x| ) p( )posteriorlikelihoodpriorExample (2-D): Fitting a PSF to an imagex = matrix of pixel brightnesses = , of Gaussian (location, FWHM of PSF)f(x| ) = 2-D Gaussian p(x| ) = where = f(x| ) and = noise (possibly spatially correlated)Both likelihood and model are Gaussian!!Applying Bayesp( |x) p(x| ) p( )posteriorlikelihoodpriorExample (1-D): Fitting an SED to photometryx = 17 measurements of L = age of stellar population, star formation timescale , dust content AV, metallicity, redshift, choice of IMF, choice of dust reddening lawModel: Stellar Population SynthesisModel can be summarized as f(x| ): Maps this is NOT p(x| ) because f(x| ) is not a probability distribution!

5 !Ok, now we know of one way to write p(x| ).What about p( )? 1) If we have a previous measurement/inference of that object s metallicity, redshift, etc., use it with its error bars as p( ). (Usually measured via 2, so p( ) is Gaussian with = measurement and = error. BUT full posteriors from previous analysis is better.) 2) Choose wide, uninformative distributions for all the parameters we don t know well. 3) Use distributions in nature from previous observations of similar HierarchicalOption #3 for p( ): Use distributions in nature from previous observations of similar ( ) = n( | )/ n( | )d = p( | )Histograms of population properties, when normalized, can be interpreted as probability distributions for individual parameters:where n( | ) is the function with parameters that was fit to the histogram (or even the histogram itself, if you want to deal with a piecewise function!

6 Ilbert et al. 2009 For example, redshift was part of the for SED fitting. One could use the red lines (parametric form below) as p(z) = p(z| ) = n(z| )/ n(z| )dzwith n(z| ) = and = {a,b,c}.But BE CAREFUL of detection bias, selection effects, upper limits, etc.!!!!!!Going HierarchicalOption #3 for p( ): Use distributions in nature from previous observations of similar ( ) = n( | )/ n( | )d = p( | )Histograms of population properties, when normalized, can be interpreted as probability distributions for individual parameters:where n( | ) is the function with parameters that was fit to the histogram (or even the histogram itself, if you want to deal with a piecewise function!)Population helps make inference on individual.

7 P( |x) p(x| ) p( )posteriorlikelihoodpriorp( |x) p(x| ) p( | )posteriorlikelihoodprior(Almost there!!)Abstracting again ..Going Hierarchical .. but what if we want to use the individuals to infer things (the s) about the population?p( |x) p(x| ) p( | )posteriorlikelihoodpriorp( , |x) p(x| , ) p( | ) p( )posteriorlikelihoodpriorIf you truly don t care about the parameters for the individual objects, then you can marginalize over them:p( |x) [ p(x| , ) p( | ) d ] p( ) = p(x| ) p( ) , p( | ) contains some interesting physics and getting values for given the data can help us understand :p( |x) p(x| ) p( )posteriorlikelihoodpriorp( , |x) p(x| , ) p( | ) p( )posteriorlikelihoodprior Regular Bayes: Hierarchical Bayes:ObservablesParametersPopulationPar ametersObservablesIndividual ParametersGraphically:p( |x) p(x| ) p( )posteriorlikelihoodpriorp( , |x) p(x| , ) p( | ) p( )posteriorlikelihoodprior Regular Bayes: Hierarchical Bayes.

8 ObservablesParametersPopulationParameter sObservablesIndividual ParametersphysicsphysicsConditional independence between individuals:Even for an individual object, connection between parameters and observables can involve several layers. (Example: measuring mass of a planet)Latent VariablesMplRVsspectraHBM in Action: Model-compositions of individual super-Earths (fraction of mass in a gaseous envelope: fenv) -the distribution of this composition parameter over the Kepler population ( , ).Internal Structure ModelsPopulation-wide DistributionsLikelihoodWanted to understand BOTH:Exoplanet compositions: Wolfgang & Lopez, 2015 HBM in Action: ResultsExoplanet compositions: Wolfgang & Lopez, 2015 Posterior on population parameters:Marginal composition distribution:Width of distribution had not been previously in Action: ResultsExoplanet compositions: Wolfgang & Lopez, 2015 Posteriors on composition parameter fenv for individual planets:A Note About ShrinkageHierarchical models pool the information in the individual data.

9 Mean of distribution of x suncertainty in x1 when analyzed by which shrinks individual estimates toward the population mean and lowers overall RMS error. (A key feature of any multi-level Modeling !)uncertainty in x1 when analyzed in Hierarchical modelA Note About Shrinkagemean of distribution of x suncertainty in x1 when analyzed in Hierarchical modeluncertainty in x1 when analyzed by itselfWolfgang, Rogers, & Ford, 2016 Shrinkage in action:Gray = data Red = posteriorsPractical Considerations1)Pay attention to the structure of your model!! Did you capture the important dependencies and correlations? Did you balance realism with a small number of population-level parameters? 2) Evaluating your model with the data (performing Hierarchical MCMC): JAGS ( ; can use stand-alone binary or interface with R) STAN ( ; interfaces with R, Python, Julia, MATLAB) Or write your own Hierarchical MCMC code 3) Spend some time testing the robustness of your model: if you generate hypothetical datasets using your HBM and then run the MCMC on those datasets, how close do the inferences lie to the truth ?

10 In Sum, Why HBM? Obtain simultaneous posteriors on individual and population parameters: self-consistent constraints on the physics Readily quantify uncertainty in those parameters Naturally deals with large measurement uncertainties and upper limits (censoring) Similarly, can account for selection effects *within* the model, simultaneously with the inference Enables direct, probabilistic relationships between theory and observations Framework for model comparisonFurther ReadingDeGroot & Schervish, Probability and Statistics (Solid fundamentals) Gelman, Carlin, Stern, & Rubin, Bayesian Data Analysis (In-depth; advanced topics) Loredo 2013; (Few-page intro/overview of multi-level Modeling in astronomy) Kelly 2007 (HBM for linear regression, also applied to quasars)Loredo & Wasserman, 1998 (Multi-level model for luminosity distribution of gamma ray bursts) Mandel et al.


Related search queries