Example: bachelor of science

Gaussian Processes for Machine Learning

C. E. Rasmussen & C. K. I. Williams, Gaussian Processes for Machine Learning , the MIT Press, 2006,ISBN 2006 Massachusetts Institute of 3 ClassificationIn chapter 2 we have consideredregressionproblems, where the targets arereal valued. Another important class of problems isclassification1problems,where we wish to assign an input patternxto one ofCclasses,C1,.., examples of classification problems are handwritten digit recognition(where we wish to classify a digitized image of a handwritten digit into one often classes 0-9), and the classification of objects detected in astronomical skysurveys into stars or galaxies.

C. E. Rasmussen & C. K. I. Williams, Gaussian Processes for Machine Learning, the MIT Press, 2006, ISBN 026218253X. 2006 Massachusetts Institute of Technology.c www ...

Tags:

  Processes, Machine, Learning, Gaussian processes for machine learning, Gaussian

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Advertisement

Transcription of Gaussian Processes for Machine Learning

1 C. E. Rasmussen & C. K. I. Williams, Gaussian Processes for Machine Learning , the MIT Press, 2006,ISBN 2006 Massachusetts Institute of 3 ClassificationIn chapter 2 we have consideredregressionproblems, where the targets arereal valued. Another important class of problems isclassification1problems,where we wish to assign an input patternxto one ofCclasses,C1,.., examples of classification problems are handwritten digit recognition(where we wish to classify a digitized image of a handwritten digit into one often classes 0-9), and the classification of objects detected in astronomical skysurveys into stars or galaxies.

2 (Information on the distribution of galaxies inthe universe is important for theories of the early universe.) These examplesnicely illustrate that classification problems can either be binary (or two-class,binary, multi-classC= 2) or multi-class (C >2).We will focus attention onprobabilistic classification, where test predictionsprobabilisticclassificationta ke the form of class probabilities; this contrasts with methods which provideonly aguessat the class label, and this distinction is analogous to the differencebetween predictive distributions and point predictions in the regression generalization to test cases inherently involves some level of uncertainty,it seems natural to attempt to make predictions in a way that reflects theseuncertainties.

3 In a practical application one may well seek a class guess, whichcan be obtained as the solution to adecision problem, involving the predictiveprobabilities as well as a specification of the consequences of making specificpredictions (the loss function).Both classification and regression can be viewed asfunction approximationproblems. Unfortunately, the solution of classification problems using Gaussianprocesses is rather more demanding than for the regression problems consideredin chapter 2. This is because we assumed in the previous chapter that thelikelihood function was Gaussian ; a Gaussian process prior combined with aGaussian likelihood gives rise to a posterior Gaussian process over functions,and everything remains analytically tractable.

4 For classification models, wherethe targets are discrete class labels, the Gaussian likelihood is inappropriate;2non- Gaussian likelihood1In the statistics literature classification is often called may choose to ignore the discreteness of the target values, and use a regressiontreatment, where all targets happen to be say 1 for binary classification. This is known asC. E. Rasmussen & C. K. I. Williams, Gaussian Processes for Machine Learning , the MIT Press, 2006,ISBN 2006 Massachusetts Institute of this chapter we treat methods of approximate inference for classification,where exact inference is not provides a general discussion of classification problems, and de-scribes thegenerativeanddiscriminativeapproaches to these problems.

5 Insection we saw how Gaussian process regression (GPR) can be obtainedby generalizing linear regression. In section we describe an analogue oflinear regression in the classification case, logistic regression. In section regression is generalized to yield Gaussian process classification (GPC)using again the ideas behind the generalization of linear regression to GPR the combination of a GP prior with a Gaussian likelihood gives riseto a posterior which is again a Gaussian process. In the classification case thelikelihood is non- Gaussian but the posterior process can beapproximatedby aGP.

6 The Laplace approximation for GPC is described in section (for binaryclassification) and in section (for multi-class classification), and the expecta-tion propagation algorithm (for binary classification) is described in section of these methods make use of a Gaussian approximation to the results for GPC are given in section , and a discussion of theseresults is provided in section Classification ProblemsThe natural starting point for discussing approaches to classification is thejoint probabilityp(y,x), whereydenotes the class label.

7 Using Bayes theoremthis joint probability can be decomposed either asp(y)p(x|y) or asp(x)p(y|x).This gives rise to two different approaches to classification problems. The first,which we call thegenerativeapproach, models the class-conditional distribu-generative approachtionsp(x|y) fory=C1,..,CCand also the prior probabilities of each class,and then computes the posterior probability for each class usingp(y|x) =p(y)p(x|y) Cc=1p(Cc)p(x|Cc).( )The alternative approach, which we call thediscriminativeapproach, focussesdiscriminative approachon modellingp(y|x) directly.

8 Dawid [1976] calls the generative and discrimina-tive approaches the sampling and diagnostic paradigms, turn both the generative and discriminative approaches into practicalmethods we will need to createmodelsfor eitherp(x|y), orp(y|x) could either be of parametric form, or non-parametric models such asthose based on nearest neighbours. For the generative case a simple, com-generative modelexampleleast-squares classification, see section , that the important distinction is between Gaussian and non- Gaussian likelihoods;regression with a non- Gaussian likelihood requires a similar treatment, but since classificationdefines an important conceptual and application area, we have chosen to treat it in a separatechapter.

9 For non- Gaussian likelihoods in general, see section the generative approach inference forp(y) is generally straightforward, being esti-mation of a binomial probability in the binary case, or a multinomial probability in themulti-class E. Rasmussen & C. K. I. Williams, Gaussian Processes for Machine Learning , the MIT Press, 2006,ISBN 2006 Massachusetts Institute of Classification Problems35mon choice would be to model the class-conditional densities with Gaussians:p(x|Cc) =N( c, c). A Bayesian treatment can be obtained by placing appro-priate priors on the mean and covariance of each of the Gaussians.

10 However,note that this Gaussian model makes a strong assumption on the form of class-conditional density and if this is inappropriate the model may perform the binary discriminative case one simple idea is to turn the output of adiscriminative modelexampleregression model into a class probability using aresponse function(the inverseof alink function), which squashes its argument, which can lie in the domain( , ), into the range [0,1], guaranteeing a valid probabilistic example is thelinear logistic regressionmodelp(C1|x) = (x>w),where (z) =11 + exp( z),( )which combines the linear model with the logistic response function.


Related search queries