Transcription of INTRODUCTION TO Machine Learning - Computer Science
1 INTRODUCTION TOMachine LearningETHEM ALPAYDIN The MIT Press, Slides forCHAPTER 4:Parametric MethodsLecture Notes for E Alpayd n 2004 INTRODUCTION to Machine Learning The MIT Press ( )3 Parametric Estimation X= { xt }t wherext ~ p (x) Parametric estimation: Assume a form for p (x | ) and estimate ,its sufficient statistics, using , N ( , 2) where = { , 2}Lecture Notes for E Alpayd n 2004 INTRODUCTION to Machine Learning The MIT Press ( )4 Maximum Likelihood Estimation Likelihoodof given the sample Xl ( |X) = p (X| ) = tp (xt| ) Log likelihoodL( |X) = log l ( |X) = tlog p (xt| ) Maximum likelihood estimator (MLE) *= argmax L( |X)Lecture Notes for E Alpayd n 2004 INTRODUCTION to Machine Learning The MIT Press ( )5 Examples: Bernoulli/Multinomial Bernoulli.
2 Two states, failure/success, xin {0,1} P (x) = pox(1 po )(1 x)L (po|X) = log tpoxt(1 po )(1 xt) MLE: po = txt / N Multinomial:K>2 states, xiin {0,1}P (x1,x2,..,xK) = ipixiL(p1,p2,..,pK|X) = log t ipixitMLE: pi = txit / NLecture Notes for E Alpayd n 2004 INTRODUCTION to Machine Learning The MIT Press ( )6 Gaussian (Normal) Distribution p(x) = N ( , 2) MLE for and 2: Lecture Notes for E Alpayd n 2004 INTRODUCTION to Machine Learning The MIT Press ( )7 Bias and VarianceUnknown parameter Estimator di= d (Xi) on sample XiBias: b (d) = E [d] Variance: E [(d E [d])2]Mean square error.
3 R (d, ) = E [(d )2]= (E [d] )2+ E [(d E [d])2]= Bias2+ VarianceLecture Notes for E Alpayd n 2004 INTRODUCTION to Machine Learning The MIT Press ( )8 Bayes Estimator Treat as a random var with prior p ( ) Bayes rule: p ( |X) = p(X| ) p( ) / p(X) Full:p(x|X) = p(x| ) p( |X) d Maximum a Posteriori (MAP): MAP= argmax p( |X) Maximum Likelihood (ML): ML= argmax p(X| ) Bayes : Bayes = E[ |X] = p( |X) d Lecture Notes for E Alpayd n 2004 INTRODUCTION to Machine Learning The MIT Press ( )9 Bayes Estimator: Example xt ~ N ( , o2) and ~ N ( , 2) ML= m MAP= Bayes =Lecture Notes for E Alpayd n 2004 INTRODUCTION to Machine Learning The MIT Press ( )10 Parametric ClassificationLecture Notes for E Alpayd n 2004 INTRODUCTION to Machine Learning The MIT Press ( )11 Given the sample ML estimates are Discriminant becomesLecture Notes for E Alpayd n 2004 INTRODUCTION to Machine Learning The MIT Press ( )
4 12 Equal variancesSingle boundary athalfway between meansLecture Notes for E Alpayd n 2004 INTRODUCTION to Machine Learning The MIT Press ( )13 Variances are differentTwo boundariesLecture Notes for E Alpayd n 2004 INTRODUCTION to Machine Learning The MIT Press ( )14 RegressionLecture Notes for E Alpayd n 2004 INTRODUCTION to Machine Learning The MIT Press ( )15 Regression: From LogL to ErrorLecture Notes for E Alpayd n 2004 INTRODUCTION to Machine Learning The MIT Press ( )16 Linear RegressionLecture Notes for E Alpayd n 2004 INTRODUCTION to Machine Learning The MIT Press ( )17 Polynomial RegressionLecture Notes for E Alpayd n 2004 INTRODUCTION to Machine Learning The MIT Press ( )18 Other Error Measures Square Error: Relative Square Error: Absolute Error: E ( |X) = t|rt- g(xt| )| -sensitive Error.
5 E ( |X) = t1(|rt- g(xt| )|> ) (|rt g(xt| )| )Lecture Notes for E Alpayd n 2004 INTRODUCTION to Machine Learning The MIT Press ( )19 Bias and VarianceLecture Notes for E Alpayd n 2004 INTRODUCTION to Machine Learning The MIT Press ( )20 Estimating Bias and Variance Msamples are used to fit gi (x), i =1,..,MLecture Notes for E Alpayd n 2004 INTRODUCTION to Machine Learning The MIT Press ( )21 Bias/Variance Dilemma Example: has no variance and high bias has lower bias with variance As we increase complexity, bias decreases (a better fit to data) and variance increases (fit varies more with data) Bias/Variance dilemma: (Geman et al.)
6 , 1992)Lecture Notes for E Alpayd n 2004 INTRODUCTION to Machine Learning The MIT Press ( )22biasvariancefgigfLecture Notes for E Alpayd n 2004 INTRODUCTION to Machine Learning The MIT Press ( )23 Polynomial RegressionBest fit min error Lecture Notes for E Alpayd n 2004 INTRODUCTION to Machine Learning The MIT Press ( )24 Best fit, elbow Lecture Notes for E Alpayd n 2004 INTRODUCTION to Machine Learning The MIT Press ( )25 Model Selection Cross-validation:Measure generalization accuracy by testing on data unused during training Regularization:Penalize complex modelsAkaike s information criterion (AIC), Bayesian information criterion (BIC) Minimum description length (MDL).
7 Kolmogorov complexity, shortest description of data Structural risk minimization (SRM)Lecture Notes for E Alpayd n 2004 INTRODUCTION to Machine Learning The MIT Press ( )26 Bayesian Model Selection Prior on models, p(model) Regularization, when prior favors simpler models Bayes, MAP of the posterior, p(model|data) Average over a number of models with high posterior (voting, ensembles: Chapter 15)