A Brief Introduction to Machine Learning for Engineers

[ ] 17 May 2018A Brief Introduction to MachineLearning for Engineers (2018), A Brief Introduction to Machine Learning for Engineers , :Vol. XX, No. XX, pp 1 231. DOI: SimeoneDepartment of InformaticsKing s College Basics51 What is Machine Learning ? .. When to Use Machine Learning ? .. Goals and Outline ..112 A Gentle Introduction through Linear Supervised Learning .. Inference .. Frequentist Approach .. Bayesian Approach .. Minimum Description Length (MDL) .. Information-Theoretic Metrics .. Interpretation and Causality .. Summary ..493 Probabilistic Models for Preliminaries .. The Exponential Family .. Frequentist Learning .. Bayesian Learning .. Supervised Learning via Generalized Linear Models (GLM) Maximum Entropy Property .. Energy-based Models.

Some Advanced Topics .. Summary ..74II Supervised Learning754 Preliminaries: Stochastic Gradient Descent .. Classification as a Supervised Learning Problem .. Discriminative Deterministic Models .. Discriminative Probabilistic Models: Generalized Linear Mod-els .. Discriminative Probabilistic Models: Beyond GLM .. Generative Probabilistic Models .. Boosting .. Summary ..1075 Statistical Learning Theory A Formal Framework for Supervised Learning .. PAC Learnability and Sample Complexity .. PAC Learnability for Finite Hypothesis Classes .. VC Dimension and Fundamental Theorem of PAC Summary ..126 III Unsupervised Learning1296 Unsupervised Unsupervised Learning .. Clustering .. ML, ELBO and EM .. Directed Generative Models .. Undirected Generative Models.

Discriminative Models .. Autoencoders .. Ranking .. Summary ..164IV Advanced Modelling and Inference1657 Probabilistic Graphical Introduction .. Bayesian Networks .. Markov Random Fields .. Bayesian Inference in Probabilistic Graphical Models .. Summary ..1858 Approximate Inference and Monte Carlo Methods .. Variational Inference .. Monte Carlo-Based Variational Inference .. Approximate Learning .. Summary ..201V Conclusions2029 Concluding Remarks203 Appendices206A Appendix A: Information Entropy .. Conditional Entropy and Mutual Information .. Divergence Measures ..212B Appendix B: KL Divergence and Exponential Family215 Acknowledgements217 References218A Brief Introduction to MachineLearning for EngineersOsvaldo Simeone11 Department of Informatics, King s College monograph aims at providing an Introduction to keyconcepts, algorithms, and theoretical results in Machine learn-ing.

The treatment concentrates on probabilistic modelsfor supervised and unsupervised Learning problems. It in-troduces fundamental concepts and algorithms by buildingon first principles, while also exposing the reader to moreadvanced topics with extensive pointers to the literature,within a unified notation and mathematical framework. Thematerial is organized according to clearly defined categories,such as discriminative and generative models, frequentistand Bayesian approaches, exact and approximate inference,as well as directed and undirected models. This monographis meant as an entry point for researchers with an engineer-ing background in probability and linear ; DOI XXXXXXXXc 2018 XXXXXXXXN otation Random variables or random vectors both abbreviated as rvs are represented using roman typeface, while their values and realiza-tions are indicated by the corresponding standard font.

Forinstance,the equality x =xindicates that rv x takes valuex. Matrices are indicated using uppercase fonts, with roman type-face used for random matrices. Vectors will be taken to be in column form. XTandX are the transpose and the pseudoinverse of matrixX,respectively. The distribution of a rv x, either probability mass function(pmf)for a discrete rv or probability density function (pdf) for continuousrvs, is denoted aspx,px(x), orp(x). The notation x pxindicates that rv x is distributed accordingtopx. For jointly distributed rvs (x,y) pxy, the conditional distribu-tion of x given the observation y =yis indicated aspx|y=y,px|y(x|y)orp(x|y). The notation x|y =y px|y=yindicates that rv x is drawn ac-cording to the conditional distributionpx|y=y. The notation Ex px[ ] indicates the expectation of the argumentwith respect to the distribution of the rv x px.

Accordingly, we willalso write Ex px|y[ |y] for the conditional expectation with respect tothe distributionpx|y=y. When clear from the context, the distributionover which the expectation is computed may be omitted. The notation Prx px[ ] indicates the probability of the argumentevent with respect to the distribution of the rv x px. When clear12 Notationfrom the context, the subscript is dropped. The notation log represents the logarithm in base two, whilelnrepresents the natural logarithm. x N( , ) indicates that random vector x is distributed accord-ing to a multivariate Gaussian pdf with mean vector and covariancematrix . The multivariate Gaussian pdf is denoted asN(x| , ) as afunction ofx. x U(a,b) indicates that rv x is distributed according to a uni-form distribution in the interval [a,b].

The corresponding uniform pdfis denoted asU(x|a,b). (x) denotes the Dirac delta function or the Kronecker delta func-tion, as clear from the context. ||a||2=PNi=1a2iis the quadratic, orl2, norm of a vectora=[a1,..,aN]T. We similarly define thel1norm as||a||1=PNi=1|ai|, andthel0pseudo-norm||a||0as the number of non-zero entries of vectora. Idenotes the identity matrix, whose dimensions will be clearfromthe context. Similarly, 1 represents a vector of all ones. Ris the set of real numbers;R+the set of non-negative real num-bers;R the set of non-positive real numbers; andRNis the set of allvectors ofNreal numbers. 1 ( ) is the indicator function: 1 (x) = 1 ifxis true, and 1 (x) = 0otherwise. |S|represents the cardinality of a setS. xSrepresents a set of rvsxkindexed by the integersk : Artificial IntelligenceAMP: Approximate Message PassingBN: Bayesian NetworkDAG: Directed Acyclic GraphELBO: Evidence Lower BOundEM: Expectation MaximizationERM: Empirical Risk MinimizationGAN: Generative Adversarial NetworkGLM: Generalized Linear ModelHMM: Hidden Markov : independent identically distributedKL: Kullback-LeiblerLASSO: Least Absolute Shrinkage and Selection OperatorLBP: Loopy Belief PropagationLL: Log-LikelihoodLLR: Log-Likelihood RatioLS: Least SquaresMC: Monte CarloMCMC: Markov Chain Monte CarloMDL: Minimum Description LengthMFVI: Mean Field Variational InferenceML: Maximum LikelihoodMRF: Markov Random FieldNLL: Negative Log-LikelihoodPAC: Probably Approximately Correctpdf.

Probability density function34 Acronymspmf: probability mass functionPCA: Principal Component AnalysisPPCA: Probabilistic Principal Component AnalysisQDA: Quadratic Discriminant AnalysisRBM: Restricted Boltzmann MachineSGD: Stochastic Gradient DescentSVM: Support Vector Machinerv: random variable or random vector (depending on the context) : subject toVAE: Variational AutoEncoderVC: Vapnik ChervonenkisVI: Variational InferencePart IBasics1 IntroductionHaving taught courses on Machine Learning , I am often asked by col-leagues and students with a background in engineering to suggest thebest place to start to get into this subject. I typically respond with alist of books for a general, but slightly outdated Introduction , readthis book; for a detailed survey of methods based on probabilistic mod-els, check this other reference; to learn about statisticallearning, Ifound this text useful; and so on.

This answer strikes me, andmostlikely also my interlocutors, as quite unsatisfactory. This is especiallyso since the size of many of these books may be discouraging for busyprofessionals and students working on other projects. Thismonographis an attempt to offer a basic and compact reference that describes keyideas and principles in simple terms and within a unified treatment,encompassing also more recent developments and pointers tothe liter-ature for further What is Machine Learning ?A useful way to introduce the Machine Learning methodology is bymeans of a comparison with the conventional engineering design What is Machine Learning ?7 This starts with a in-depth analysis of the problem domain, which cul-minates with the definition of a mathematical model. The mathemat-ical model is meant to capture the key features of the problemunderstudy, and is typically the result of the work of a number of mathematical model is finally leveraged to derive hand-crafted so-lutions to the instance, consider the problem of defining a chemical processto produce a given molecule.

The conventional flow requires chemiststo leverage their knowledge of models that predict the outcome of indi-vidual chemical reactions, in order to craft a sequence of suitable stepsthat synthesize the desired molecule. Another example is the designof speech translation or image/ video compression algorithms. Both ofthese tasks involve the definition of models and algorithms by teamsof experts, such as linguists, psychologists, and signal processing prac-titioners, not infrequently during the course of long engineering design flow outlined above may be too costly andinefficient for problems in which faster or less expensive solutions aredesirable. The Machine Learning alternative is to collect large data sets, , of labelled speech, images or videos, and to use this informationto train general-purpose Learning machines to carry out thedesiredtask.

A Brief Introduction to Machine Learning for Engineers

Tags:

Information

Transcription of A Brief Introduction to Machine Learning for Engineers

Related search queries

A Brief Introduction to Machine Learning for Engineers

Tags:

Information

Documents from same domain

Related documents

Related search queries