Distributed Optimization and Statistical Learning via the ...

Foundations and TrendsR inMachine LearningVol. 3, No. 1 (2010) 1 122c 2011 S. Boyd, N. Parikh, E. Chu, B. Peleatoand J. EcksteinDOI: Optimization and StatisticalLearning via the Alternating DirectionMethod of MultipliersStephen Boyd1, Neal Parikh2, Eric Chu3 Borja Peleato4and Jonathan Eckstein51 Electrical Engineering Department, Stanford University, Stanford, CA94305, USA, Science Department, Stanford University, Stanford, CA 94305,USA, Engineering Department, Stanford University, Stanford, CA94305, USA, Engineering Department, Stanford University, Stanford, CA94305, USA, Science and Information Systems Department andRUTCOR, Rutgers University, Piscataway, NJ 08854.

Introduction32 Dual Dual Augmented Lagrangians and the Method of Multipliers103 Alternating Direction Method of Optimality Conditions and Stopping Extensions and Notes and References234 General Proximity Quadratic Objective Smooth Objective Decomposition315 Constrained Convex Convex Linear and Quadratic Programming366 1-Norm Least Absolute Basis General 1 Regularized Loss Sparse Inverse Covariance Selection457 Consensus and Global Variable Consensus General Form Consensus Sharing568 Distributed Model Splitting across Splitting across Features669 Nonconvex Nonconvex Bi-convex Problems7610 Abstract Graph Computing MapReduce8211 Numerical Small Dense Distributed 1 Regularized Logistic Group Lasso with Feature Distributed Large-Scale Lasso with Regressor Selection10012 Conclusions103 Acknowledgments105A Convergence Proof106 References111 AbstractMany problems of recent interest in statistics and machine learningcan be posed in the framework of convex Optimization .

Due to theexplosion in size and complexity of modern datasets, it is increasinglyimportant to be able to solve problems with a very large number of fea-tures or training examples. As a result, both the decentralized collectionor storage of these datasets as well as accompanying Distributed solu-tion methods are either necessary or at least highly desirable. In thisreview, we argue that thealternating direction method of multipliersis well suited to Distributed convex Optimization , and in particular tolarge-scale problems arising in statistics, machine Learning , and relatedareas.

The method was developed in the 1970s, with roots in the 1950s,and is equivalent or closely related to many other algorithms, suchas dual decomposition, the method of multipliers, Douglas Rachfordsplitting, Spingarn s method of partial inverses, Dykstra s alternatingprojections, Bregman iterative algorithms for 1problems, proximalmethods, and others. After briefly surveying the theory and history ofthe algorithm , we discuss applications to a wide variety of statisticaland machine Learning problems of recent interest, including the lasso,sparse logistic regression, basis pursuit, covariance selection, supportvector machines, and many others.

We also discuss general distributedoptimization, extensions to the nonconvex setting, and efficient imple-mentation, including some details on Distributed MPI and HadoopMapReduce all applied fields, it is now commonplace to attack problems throughdata analysis, particularly through the use of Statistical and machinelearning algorithms on what are often large datasets. In industry, thistrend has been referred to as Big Data , and it has had a significantimpact in areas as varied as artificial intelligence, internet applications,computational biology, medicine, finance, marketing, journalism, net-work analysis, and these problems arise in diverse application domains, theyshare some key characteristics.

First, the datasets are often extremelylarge, consisting of hundreds of millions or billions of training examples;second, the data is often very high-dimensional, because it is now possi-ble to measure and store very detailed information about each example;and third, because of the large scale of many applications, the data isoften stored or even collected in a Distributed manner. As a result, ithas become of central importance to develop algorithms that are bothrich enough to capture the complexity of modern data, and scalableenough to process huge datasets in a parallelized or fully decentral-ized fashion.

Indeed, some researchers [92] have suggested that evenhighly complex and structured problems may succumb most easily torelatively simple models trained on vast such problems can be posed in the framework of convex opti-mization. Given the significant work on decomposition methods anddecentralized algorithms in the Optimization community, it is naturalto look to parallel Optimization algorithms as a mechanism for solvinglarge-scale Statistical tasks. This approach also has the benefit that onealgorithm could be flexible enough to solve many review discusses the alternating direction method of multipli-ers (ADMM), a simple but powerful algorithm that is well suited todistributed convex Optimization , and in particular to problems aris-ing in applied statistics and machine Learning .

It takes the form of adecomposition-coordinationprocedure, in which the solutions to smalllocal subproblems are coordinated to find a solution to a large globalproblem. ADMM can be viewed as an attempt to blend the benefitsof dual decomposition and augmented Lagrangian methods for con-strained Optimization , two earlier approaches that we review in 2. Itturns out to be equivalent or closely related to many other algorithmsas well, such as Douglas-Rachford splitting from numerical analysis,Spingarn s method of partial inverses, Dykstra s alternating projec-tions method, Bregman iterative algorithms for 1problems in signalprocessing, proximal methods, and many others.

The fact that it hasbeen re-invented in different fields over the decades underscores theintuitive appeal of the is worth emphasizing that the algorithm itself is not new, and thatwe do not present any new theoretical results. It was first introducedin the mid-1970s by Gabay, Mercier, Glowinski, and Marrocco, thoughsimilar ideas emerged as early as the mid-1950s. The algorithm wasstudied throughout the 1980s, and by the mid-1990s, almost all of thetheoretical results mentioned here had been established. The fact thatADMM was developed so far in advance of the ready availability oflarge-scale Distributed computing systems and massive optimizationproblems may account for why it is not as widely known today as webelieve it should main contributions of this review can be summarized as follows.

(1) We provide a simple, cohesive discussion of the extensiveliterature in a way that emphasizes and unifies the aspectsof primary importance in (2) We show, through a number of examples, that the algorithmis well suited for a wide variety of large-scale Distributed mod-ern problems. We derive methods for decomposing a wideclass of Statistical problems by training examples and by fea-tures, which is not easily accomplished in general.(3) We place a greater emphasis on practical large-scale imple-mentation than most previous references. In particular, wediscuss the implementation of the algorithm in cloud com-puting environments using standard frameworks and provideeasily readable implementations of many of our , the focus is on applications rather than theory, and a maingoal is to provide the reader with a kind of toolbox that can be appliedin many situations to derive and implement a Distributed algorithm ofpractical use.

Distributed Optimization and Statistical Learning via the ...

Tags:

Information

Advertisement

Transcription of Distributed Optimization and Statistical Learning via the ...

Related search queries

Distributed Optimization and Statistical Learning via the ...

Tags:

Information

Advertisement

Documents from same domain

Related documents

Related search queries