Transcription of Introduction to Stan - University of Colorado Boulder
1 IntroductionBayesian StatsAbout StanExamplesTips and TricksIntroduction to StanCameron BrackenUniversity of Colorado BoulderFebruary 2015 IntroductionBayesian StatsAbout StanExamplesTips and TricksWhat is Stan? A probabilistic programming language implementing fullBayesian statistical inference with MCMC sampling(NUTS, HMC) and penalized maximum likelihoodestimation with Optimization (L-BFGS) !"#$%&'()*+,!"#$%&'( )$"( "%#*+"')( , "1)",( /'"( -2( #%1,-0( '%0&*+13( )-( '-*4"( @/"()-)$"( &#-5*"0'( )$"F( 6"#"( 6-#O+13( -1<( ( K*%0( &-+1)",( -/)( )$"( /'"( -2( "*".)#-0".$%1+.%*.-0&/)"#'( )-( -4"#.-0"( )$"( *-13( %1,)",+-/'( 1%)/#"( -2( )$"( .%*./*%)+-1'?( 5-##-6",( 0-1"F( 2#-0( #"*%)+4"'5".%/'"($"(ST/')($%,()-(3-()-(N -1)"(7%#*-QU)$"(3%05*+13(.%'+1-V<W1( N%#.$( 99?( 9AX:?( L-$1( 4-1M"/0%11('"1)(%(*"))"#(UY+.$)0F"#?(9AX :V()-)$"( Z$"-#")+.%*( [+4+'+-1( *"%,"#( &#-&-'+13)$"(/'"(-2()$+'()".$1+@/"(-1(DM =H7()-('-*4"1"/)#-1( ,+22/'+-1( %1,( 0/*)+&*+.))))))))))))))))))))))))))))))) )))))]
2 %)+-1&#-5*"0'<( ( Z$+'( 6%'( )$"( 2+#')( &#-&-'%*( )-/'"( )$"( N-1)"( 7%#*-( )".$1+@/"( -1( %1"*".)#-1+.( ,+3+)%*( .-0&/)"#<( ( H*'-( +1( 9AX:?D1#+.-( E"#0+( $%,( EDYN=H7( UE+3/#"( 9V?( %0".$%1+.%*(%1%*-3(.-0&/)"#?(&#-3#%00",)-(#/1(N-1)"(7%#*-(&#-5*"0'<((=1(9AX\?()$"2+#')( #/1'( -1( %,+3+)%*( .-0&/)"#)--O( &*%."( -1DM=H7( UE+3/#"( ;V<=1( )$"( *%)"( 9 AXC'%1,( "%#*F( 9A]C'?0%1F(&%&"#'(6"#"6#+))"1( ,"'.#+5+13)$"( N-1)"( 7%#*-0")$-,( %1,( +)'/'"( +1( '-*4+13&#-5*"0'(+1#%,+%)+-1(%1,&%#)+.*"( )#%1'&-#)%1,( -)$"#( %#"%'<Z$"( 2+#')( -&"1N-1)"(7%#* "#"1."( 6%'$"*,( %)( K7GH( +1)$"( '/00"#( -29 AXA<( ( N%1F( -2( )$-'"( 0")$-,'( %#"( ')+**( +1( /'"( )-,%F( +1.*/,+13( )$"( #%1,-0( 1/05"#3"1"#%)+-1(0")$-,(/'",(+1(N7M!<-'./+0%12%%3#45"-'./+0%62%%7)89%:;8<%&*;=' >!45" Stanislaw Ulam, namesake ofStan and co-inventor MonteCarlo methods shown hereholding the Fermiac, EnricoFermi s physical Monte Carlosimulator for neutron diffusion. (image from the Stan manual)IntroductionBayesian StatsAbout StanExamplesTips and TricksBayesian StatisticsBy Bayesian data analysis, we mean practical methods formaking inferences from data using probability models forquantities we observe and about which we wish to essential characteristic of Bayesian methods is theirexplict use of probability for quantifying uncertainty ininferences based on statistical analysis.))))))))))))))))))))))))))))))) ))))))))))))))))
3 [Gelman et al., Bayesian Data Analysis, 3rd edition, 2013]IntroductionBayesian StatsAbout StanExamplesTips and TricksBackground on Bayesian StatisticsFrom Bayes rule, supposing the data is fixed (observed):p( |y)=p(y, )p(y)=p(y| )p( )p(y)=p(y| )p( ) p(y, )d =p(y| )p( ) p(y| )p( )d p( |y) p(y| )p( )=p(y, )IEveryone: Model data as randomIBayesians: Model parameters as randomIntroductionBayesian StatsAbout StanExamplesTips and TricksBackground on Bayesian StatisticsFrom Bayes rule:p( |y,x) Posterior p(y| ,x) Liklihoodp( ,x) Prior ParametersyDependent data (response)xIndependent data (covariates/predictors/constants)Posteri or: The answer, probability distributions of parametersLiklihood: A computable function of the parameters, modelspecificPrior: "Initial guess", incorporates existing knowledge of thesystemThe key to building Bayesian models is specifying the likelihoodfunction, same as StatsAbout StanExamplesTips and TricksMonte Carlo Markov Chain (MCMC) in a nutshellIWe want to generate random draws from a target distribution(the posterior).
4 We then identify a way to construct a nice markov chain such that its equilibrium probability distributionis our target we can construct such a chain then we arbitrarily start fromsome point and iterate the markov chain many times (like howwe forecasted the weather n times). Eventually, the draws wegenerate would appear as if they are coming from our are several ways to construct nice markov chains ( ,gibbs sampler, Metropolis-Hastings algorithm).(explanation from Cross Validated)- MCMC is really a way to solve integrals that are impossible tosolve StatsAbout StanExamplesTips and TricksWhat does stan do?ISamples from the posterior distribution (if your model isspecified correctly)I"Fits" bayesian modelsIEmpowers you to write your own Bayesian models, it s mucheasier than you think!No U-Tu r nSamplerAutomatic Step Size andNumber AdaptationIntroductionBayesian StatsAbout StanExamplesTips and TricksWhy Stan?There are tons of other black-box MCMC samplers out there(BUGS, JAGS, Church, PyMC, many many more, )IStan is open sourceIBuilt to be fast (about 10 times faster then BUGS accordingGelman)I Stan can handle problems that choke BUGS and JAGS Andrew GelmanIntroductionBayesian StatsAbout StanExamplesTips and TricksUsing StanStan is a library with a number of interfaces, we will use the Rinterface called Model Code (.)
5 Stan) Setup script (.R) RStan Stan C++ code Compiler Model Run Data Results IntroductionBayesian StatsAbout StanExamplesTips and TricksExample 1 Fit Normal Distribution Model codeDownload from: ~bracken/ files in1- {int<lower=0> N;// error checking for Nvector[N] y;}parameters{real<lower=0> sigma;realmu;}model{y ~ normal(mu, sigma);//vectorized}IntroductionBayesian StatsAbout StanExamplesTips and TricksExample 1 Fit Normal Distribution RStan codelibrary(RStan)model_file=' 'iterations= 500N = 1000mu = 100sigma= 10y = rnorm(N, mu,sigma)# simulate datastan_data= list(N=N, y=y)# data passed to stan# set up the modelstan_model= stan(model_file, data= stan_data, chains= 0)stanfit= stan(fit= stan_model, data= stan_data,iter=iterations)# run the modelprint(stanfit,digits=2) Introduction Bayesian StatsAbout StanExamplesTips and TricksDiagnostics - Text outputInference for Stan model: chains, each with iter=500; warmup=250; thin=1.
6 Post-warmup draws per chain=250, total post-warmup draws= se_mean sd 25% 50% 75% n_eff Rhatsigma 366 617 395 were drawn using NUTS(diag_e) at Thu Feb 19 21:04:56 each parameter, n_eff is a crude measure of effective sample size,and Rhat is the potential scale reduction factor on split chains (atconvergence, Rhat=1).Ithin, warmupIneffI RIlp__IntroductionBayesian StatsAbout StanExamplesTips and TricksDiagnostics - Traceplots010020030040050002000400060008 000 Trace of sigmaIterations0100200300400500050100150 Trace of muIterations0100200300400500-2000000-100 00000 Trace of lp__IterationsIntroductionBayesian StatsAbout StanExamplesTips and TricksDiagnostics - Posterior plotsHistogram of extract(stanfit)$muextract(stanfit)$ of extract(stanfit)$sigmaextract(stanfit)$ of extract(stanfit)$lp__extract(stanfit)$lp __Frequency-2791-2789-2787-2785010020030 0400 IntroductionBayesian StatsAbout StanExamplesTips and TricksDiagnostics - shinyStanOld package (don t use it): to: ("shinyStan")launch_shinystan(stanfit)In troductionBayesian StatsAbout StanExamplesTips and TricksConvergence - TraceplotsConverged.
7 0500100015002000050100150 Trace of beta_space_loc[1]Iterations0500100015002 00014151617 Trace of beta_space_loc[2]Iterations0500100015002 000 of beta_space_loc[3]Iterations0500100015002 000 of beta_space_loc[4]Iterations0500100015002 000 of beta_space_loc[5]IterationsNot Converged:02004006008001000 200 100050 Trace of gp_knot_value[1]Iterations02004006008001 000 200 100050 Trace of gp_knot_value[2]Iterations02004006008001 000 250 150 50 Trace of gp_knot_value[3]Iterations02004006008001 000 250 150 50 Trace of gp_knot_value[4]Iterations02004006008001 000 150 5050150 Trace of gp_knot_value[5]Iterations02004006008001 000 200 100050 Trace of gp_knot_value[6]Iterations02004006008001 000 300 150 50 Trace of gp_knot_value[7]Iterations02004006008001 000 200 1000 Trace of gp_knot_value[8]IterationsIntroductionBa yesian StatsAbout StanExamplesTips and TricksConvergence - Posterior plotsConverged:beta_space_loc_01beta_spa ce_loc_02beta_space_loc_03beta_space_loc _04beta_space_loc_0502004000200400010020 0300400020040060002004000501001501415161 7 Converged:beta_space_loc_01beta_space_lo c_02beta_space_loc_03beta_space_loc_04be ta_space_loc_050500100015000500100015002 0000500100015002000050010001500050010001 500 StatsAbout StanExamplesTips and TricksExample 2 Fit normal distribution (fancier)yn Normal( , )The likelihood of observing a normally distributed data value is thenormal density of that point given the parameter files in2- valuesIntroductionBayesian StatsAbout StanExamplesTips and TricksExample 3 fit GEV distributionGeneralized extreme value distributiony GEV( , , ) : location, : scale.
8 ShapeExample files in3- seedIconstraintsIvariable constraintsIInitial valuesIntroductionBayesian StatsAbout StanExamplesTips and TricksExample 4 - multiple linear regressionyn= + xn+ nwhere n Normal(0, ), which can be written:yn ( + xn) Normal(0, )yn Normal( + xn, )The likelihood of observing a givenynis just the normal densitywith mean + xnand standard deviation .Example files in4-multiple-linear- StatsAbout StanExamplesTips and TricksExample 5 - Binomial regression (glm)yn= + g 1(xn)+ nyn Normal(g 1(xn), 1)Example files in5-binomial- StatsAbout StanExamplesTips and TricksStan tips and tricks#1 tip: Read the Manual! It is excellentOther things we didn t really talk about:ILocal variables in the model block, can be used to storeintermediate resultsIMatrices vs arrays, Column vector vs row vectorIConstrained data typesITransformed parametersIFunctionsILogical operations/Other types of loopingIElementwise operatorsIBuilt-in functionsIPrint statementsIMissing dataIPredictionIDiscrete variablesIntroductionBayesian StatsAbout StanExamplesTips and TricksStan tips and tricksNo need to truncate priors, do that in the parameter boundsIBAD: setting constraints on parameters but using a prior withother constraintsparameters{realalpha;//implie s no constraints}model{alpha~ uniform(0,1);}IGOOD::parameters{real<lower=0,upper=1> alpha;}model{#alpha ~ uniform(0,1).}
9 // default uniform priors}IntroductionBayesian StatsAbout StanExamplesTips and TricksStan tips and tricksINo need to use conjugate priorsIUnlike BUGS (or other Gibbs based samplers), avoid supervauge priors if you can, ( , )IWhen in doubt, use a normal prior, or google itIThe Stan mailing list is very activeIntroductionBayesian StatsAbout StanExamplesTips and TricksSpeeding up Stan modelsIAvoid repeated operations// 1/alpha is repeatedfor(n in 1:N)y[n]~ exponential(1/alpha* x[n]);IVectorization is always faster// not vectorizedfor(n in 1:N)y[n]~ normal(beta0+ beta1* x[n], sigma);//vectorizedy ~ normal(beta0+ beta1* x, sigma);IPriors: More informative the better (think better initialconditions), use MLE to get initial estimatesIParallization: can run multiple chains if you have multiplecores, but each chain is still serialIMore advanced: Accessincrement_log_probdirectly