Linear Regression and Support Vector Regression

Linear Regression and Support Vector RegressionPaul University of Adelaide24 October 2012 Outlines Regression overview Linear Regression Support Vector Regression Machine learning tools availableRegression OverviewCLUSTERINGCLASSIFICATIONREGRESSI ON (THIS TALK)K-means Decision tree Linear DiscriminantAnalysis Neural Networks Support Vector Machines Boosting Linear Regression Support Vector RegressionGroupdata based on their characteristicsSeparatedata based on their labelsFinda model that can explain the output given the input++++++++++++++++++++++------+++++++ +Data processing flowchart (Income prediction)Pre-processing(noise/outlier removal)Feature extraction and selectionRegressionRaw dataProcessed data++++++++ , , , (should have been 25!!!) height and sex seem to be irrelevant.++++++++AgeIncomeTransformed dataLinear Regression Given data with n dimensional variables and 1 target-variable (real number)Where The objective: Find a function f that returns the best fit.

Assume that the relationship between X and y is approximately Linear . The model can be represented as (w represents coefficients and b is an intercept) )},(),..,,(),,{(2211mmyyyxxx yn,x nf: bybwwfnxw),,..,(1 Linear Regression To find the best fit, we minimize the sum of squared errors Least square estimation The solution can be found by solving(By taking the derivative of the above objective function ) In MATLAB, the back-slash operator computes a least square solution. miiimiiibyyy1212))(() (minxwYXXXTT1)( wwLinear Regression To ovoid over-fitting, a regularization term can be introduced (minimize a magnitude of w) LASSO: Ridge Regression : miiimiiibyyy1212)) (() (minxw njjmiiiwCby112||)(minxw njjmiiiCby1212||)(minwxwSupport Vector Regression Find a function, f(x), with at most -deviation from the target yIncomeAge We do not care about errors as long as they are less than Theproblem can be written as a convex optimization problem.

||||21min112 iiiiybbytsxwxww byiixw1 iiybxwC: tradeoff the complexityWhat ifthe problem is not feasible?We can introduceslack variables (similar to soft margin loss function).9 Support Vector Regression 0,|),(|max)),(,( xxfyfyLbf xwx),( Assume Linear parameterizationxy*2 1 Only the point outside the -region contribute to the final cost10 Soft marginxi,yi xy*2 1 Minimize miiiC1*2)(||||21 w miybbyiiiiiiii,..,1,0,)()(** xwxwUnder constraintsGiven training datami,..,1 11 How about a non- Linear case? Linear versus Non- Linear SVR Linear case Non- Linear case Map data into a higher dimensional space, , IncomeAge incomeagef :incomeageagef )2,(:2 Incomebyii xw1byiii 2212xwxwDual problem Primal Dual miiiC1*2)(||||21min w miybbytsiiiiiiii,..,1,0,)()(..** xwxw miiiiimiiiimiiimjijijjiiCtsyxx1**1*1*1,* *,0;0)(..)()(,))((21max Primalvariables: w for each feature dimDual variables: , * for each data pointComplexity:the dim of the input spaceComplexity: Number of Support vectorsxy*2 1 =0 > 0 Kernel trick Linear : Non- Linear :Note: No need to compute the mapping function, (.)

,explicitly. Instead, we use the kernel function. Commonly used kernels:-Polynomial kernels:-Radial basis function (RBF) kernels: yx,),()(),(yxKyx dTyxyxK)1(),( )||||21exp(),(22yxyxK Note: for RBF kernel, dim( (.)) is infiniteDual problem for non- Linear case Primal Dual miiiC1*2)(||||21min w miybbytsiiiiiiii,..,1,0,))(())((..** xwxw miiiiimiiiimiiimjijijjiiCtsy1**1*1*1,**, 0;0)(..)()()(),())((21max xxPrimalvariables: w for each feature dimDual variables: , * for each data pointComplexity:the dim of the input spaceComplexity: Number of Support vectorsxy*2 1 =0 > 0K(xi, xj)SVR ApplicationsOptical Character Recognition (OCR)A. J. Smolaand B. Scholkopf, A Tutorial on Support Vector Regression , NeuroCOLTT echnical Report TR-98-030 SVR Applications Stock price predictionSVR DemoWEKA and Linear Regression Software can be downloaded from Data set used in this experiment: Computer hardware The objective is to predict CPU performance based on these given attributes: Machine cycle time in nanoseconds (MYCT) Minimum main memory in kilobytes (MMIN) Maximum main memory (MMAX) Cache memory in kilobytes (CACH) Minimum channels in units (CHMIN) Maximum channels in units (CHMAX) Output is expressed as a Linear combination of the attributes.

Each attribute has a specific weight. Output = bawawawnn ..2211 Evaluation Root mean-square error Mean absolute errornyyyyyymm2222211) (..) () ( nyyyyyymm| |..| || |2211 WEKAData visualizationLoad data and normalize each attribute to [0, 1]WEKA ( Linear Regression )WEKA ( Linear Regression )Performance = ( x MYCT) + ( x MMIN) + ( x MMAX) + ( x CACH) + ( x CHMAX) Large Machine cycle time (MYCT) does not indicate the best performance Main memory plays a more important role in the system performanceWEKA ( Linear SVR)Compare to Linear RegressionPerformance = ( x MYCT) + ( x MMIN) + ( x MMAX) + ( x CACH) + ( x CHMAX) WEKA (non- Linear SVR)A list of Support vectorsWEKA (Performance comparison)MethodMean absolute errorRoot mean squared errorLinear ( Linear ) C = (RBF) C = , gamma = C (for Linear SVR) and <C, > (for non- Linear SVR) need to be cross-validated for a better Machine Learning tools Shogun toolbox (C++) Shark Machine Learning library (C++) Machine Learning in Python (Python) Machine Learning in Open CV2 LibSVM, LibLinear, etc.

Linear Regression and Support Vector Regression

Tags:

Information

Advertisement

Transcription of Linear Regression and Support Vector Regression

Related search queries

Linear Regression and Support Vector Regression

Tags:

Information

Advertisement

Documents from same domain

Uncertainty in Machine Learning - University of Adelaide

Related documents

Bridge Railing Manual (RLG) - Texas Department of ...

EN-Airbus-A380-Facts-and-Figures-Dec-2021

Related search queries