Example: bachelor of science

Machine Learning Basics

Machine Learning Basics Lecture slides for Chapter 5 of Deep Learning Ian Goodfellow 2016-09-26. TER 5. Machine Learning Basics . Linear Regression Linear regression example Optimization of w 3 2 1. MSE(train). 0. y 1. 2 3 x1 w1. Figure e : A linear regression problem, with a training set consisting of ten data p containing one feature. Because there is only one feature, the weight vect ns only a single parameter to learn, w . (Left)Observe that linear regression l (Goodfellow 2016). more parameters than training examples. We have little chance of ch Underfitting and Overfitting in tion that generalizes well when so many wildly di erent solutions ex xample, the quadratic model is perfectly matched to the true struct Polynomial Estimation sk so it generalizes well to new data.

pseudoinverse to solve the underdetermined problem with minimal regularization), the degree-9 polynomial overfits significantly, as we saw in figure 5.2. 119 Figure 5.5 (Goodfellow 2016) Bias and Variance CHAPTER 5. MACHINE LEARNING BASICS

Tags:

  Basics, Machine, Learning, Pseudoinverse, Machine learning basics

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of Machine Learning Basics

1 Machine Learning Basics Lecture slides for Chapter 5 of Deep Learning Ian Goodfellow 2016-09-26. TER 5. Machine Learning Basics . Linear Regression Linear regression example Optimization of w 3 2 1. MSE(train). 0. y 1. 2 3 x1 w1. Figure e : A linear regression problem, with a training set consisting of ten data p containing one feature. Because there is only one feature, the weight vect ns only a single parameter to learn, w . (Left)Observe that linear regression l (Goodfellow 2016). more parameters than training examples. We have little chance of ch Underfitting and Overfitting in tion that generalizes well when so many wildly di erent solutions ex xample, the quadratic model is perfectly matched to the true struct Polynomial Estimation sk so it generalizes well to new data.

2 Underfitting Appropriate capacity Overfitting y y y x0 x0 x0. Figure : We fit three models to this example training set. The training da (Goodfellow 2016). TER 5. Machine Learning Basics . Generalization and Capacity Training error Underfitting zone Overfitting zone Generalization error Error Generalization gap 0 Optimal Capacity Capacity e : Typical relationship between capacity and error. Training and test Figure e di erently. At the left end of the error and generalization graph, (Goodfellow 2016). Training Set Size CHAPTER 5. Machine Learning Basics . Bayes error Train (quadratic). Error (MSE). Test (quadratic). Test (optimal capacity). Train (optimal capacity). Figure 0 1 2 3 4 5. 10 10 10 10 10 10. Number of training examples Optimal capacity (polynomial degree). 20. 15.

3 10. 5. 0. 0 1 2 3 4 5. 10 10 10 10 10 10. (Goodfellow 2016). Number of training examples of how we can control a model's tendency to overfit or underfit via can train a high-degree polynomial regression model with di eren Weight Decay figure for the results. Underfitting Appropriate weight decay Overfitting (Excessive ) (Medium ) ( (). y y y x( x( x(. Figure : We fit a high-degree polynomial regression model to our example trai (Goodfellow 2016). etween the estimator and the true value of the parameter . As is clear f quation , evaluating the MSE incorporates both the bias and the varia Bias and Variance esirable estimators are those with small MSE and these are estimators anage to keep both their bias and variance somewhat in check. Underfitting zone Overfitting zone Bias Generalization error Variance Optimal Capacity capacity gure : As capacity increases (x-axis), bias (dotted) tends to decrease and vari Figure U-shaped ashed) tends to increase, yielding another curve for generalization error (.))))

4 (Goodfellow 2016). 0 1. 10. Decision Trees 00 01 11. R 5. Machine Learning Basics . 010 011 110 11. 0 1. 1110. 10. 00 01 11. 010. 010 011 110 111. 00 01. 1110 1111. 0. 011. 010. 1 110. 00 01. 0. 11. 011. 10. 1110 111 1111. Figure 1 110. (Goodfellow 2016). Principal Components Analysis CHAPTER 5. Machine Learning Basics . 20 20. 10 10. x2. 0 0. z2. 10 10. 20 20. 20 10 0 10 20 20 10 0 10 20. x1 z1. Figure : PCA learns a linear projection that aligns the direction of greatest variance with the axes of the new space. (Left)The original data consists of samples of x. In this Figure space, the variance might occur along directions that are not axis-aligned. (Right)The transformed data z = x> W now varies most along the axis z1 . The direction of second most variance is now along z2.

5 (Goodfellow 2016). Curse of Dimensionality HAPTER 5. Machine Learning Basics . Figure : As the number of relevant dimensions Figure of the data increases (from left ight), the number of configurations of interest may grow exponentially. (Left)In t ne-dimensional example, we have one variable for which we only care to distinguish egions of interest. With enough examples falling within each of these regions (each reg orresponds to a cell in the illustration), Learning algorithms can easily generalize correc A straightforward way to generalize is to estimate the value of the target function (Goodfellow wit 2016). Nearest Neighbor Figure : Illustration of how the nearest algorithm breaks up the inpu neighbor gions. An example (represented here by a circle) within each region defi (Goodfellow 2016).

6 The manifold to vary from one point to another. This often happens wh nifold intersects itself. For example, a figure eight is a manifold that has a s Manifold Learning mension in most places but two dimensions at the intersection at the cent ure : Data sampled from a distribution in a two-dimensional space that is ac Figure centrated near a one-dimensional manifold, like a twisted string. The solid line ind underlying manifold that the learner should infer. (Goodfellow 2016). Uniformly Sampled Images CHAPTER 5. Machine Learning Basics . Figure (Goodfellow 2016). QMUL Dataset Figure : Training examples from the Face Dataset (Gong et al. ich the subjects were asked to move in such a way as to cover the two-dimen ld corresponding to two angles of rotation. We would like Learning algorithm o discover and disentangle such manifold coordinates.)

7 Figure illustrates (Goodfellow 2016).


Related search queries