bellet@usc.edu Department of Computer Science University ...

[ ] 12 Feb 2014 Technical reportA Survey on Metric Learning for Feature Vectors andStructured DataAur elien bellet of Computer ScienceUniversity of Southern CaliforniaLos Angeles, CA 90089, USAA maury Hubert Curien UMR 5516 Universit e de Saint-Etienne18 rue Benoit Lauras, 42000 St-Etienne, FranceAbstractThe need for appropriate ways to measure the distance or similaritybetween data is ubiq-uitous in machine learning, pattern recognition and data mining, but handcrafting suchgood metrics for specific problems is generally difficult. This has led to the emergence ofmetric learning, which aims at automatically learning a metric from dataand has attracteda lot of interest in machine learning and related fields for the past tenyears.

This surveypaper proposes a systematic review of the metric learning literature, highlighting the prosand cons of each approach. We pay particular attention to Mahalanobis distance metriclearning, a well-studied and successful framework, but additionallypresent a wide range ofmethods that have recently emerged as powerful alternatives, including nonlinear metriclearning, similarity learning and local metric learning. Recent trends and extensions, suchas semi-supervised metric learning, metric learning for histogram data and the derivation ofgeneralization guarantees, are also covered. Finally, this survey addresses metric learningfor structured data, in particular edit distance learning, and attempts to give an overviewof the remaining challenges in metric learning for the years to :Metric Learning, Similarity Learning, Mahalanobis Distance, Edit Distance1.

IntroductionThe notion ofpairwise metric used throughout this survey as a generic term for distance,similarity or dissimilarity function between data pointsplays an important role in manymachine learning, pattern recognition and data mining instance, in classi-fication, thek-Nearest Neighbor classifier (Cover and Hart,1967) uses a metric to identifythe nearest neighbors; many clustering algorithms, such asthe prominentK-Means (Lloyd,1982), rely on distance measurements between data points; in information retrieval, doc- . Most of the work in this paper was carried out while the author was affiliated with Laboratoire HubertCurien UMR 5516, Universit e de Saint-Etienne, Metric-based learning methods were the focus of the recent SIMBAD European project (ICT 2008-FET2008-2011).

Website: Aur elien bellet , Amaury Habrard and Marc , Habrard and Sebbanuments are often ranked according to their relevance to a given query based on similarityscores. Clearly, the performance of these methods depends on the quality of the metric:as in the saying birds of a feather flock together , we hope that it identifies as similar(resp. dissimilar) the pairs of instances that are indeed semantically close (resp. different).General-purpose metrics exist ( , the Euclidean distance and the cosine similarity forfeature vectors or the Levenshtein distance for strings) but they often fail to capture theidiosyncrasies of the data of interest. Improved results are expected when the metric isdesigned specifically for the task at hand.

Since manual tuning is difficult and tedious, a lotof effort has gone intometric learning, the research topic devoted to automatically learningmetrics from Metric Learning in a NutshellAlthough its origins can be traced back to some earlier work ( ,Short and Fukunaga,1981;Fukunaga,1990;Friedman,199 4;Hastie and Tibshirani,1996;Baxter and Bartlett,1997), metric learning really emerged in 2002 with the pioneering work ofXing et al.(2002)that formulates it as a convex optimization problem. It has since been a hot research topic,being the subject of tutorials at ICML 20102and ECCV 20103and workshops at ICCV2011,4 NIPS 20115and ICML goal of metric learning is to adapt some pairwise real-valued metric function, saythe Mahalanobis distancedM(x,x ) =p(x x )TM(x x ), to the problem of interestusing the information brought by training examples.

Most methods learn the metric (here,the positive semi-definite matrixMindM) in a weakly-supervised way from pair or triplet-based constraints of the following form: Must-link / cannot-link constraints (sometimes called positive / negative pairs):S={(xi,xj) :xiandxjshould be similar},D={(xi,xj) :xiandxjshould be dissimilar}. Relative constraints (sometimes called training triplets):R={(xi,xj,xk) :xishould be more similar toxjthan toxk}.A metric learning algorithm basically aims at finding the parameters of the metric suchthat it best agrees with these constraints (see Figure1for an illustration), in an effort toapproximate the underlying semantic metric.

This is typically formulated as an optimizationproblem that has the following general form:minM (M,S,D,R) + R(M)where (M,S,D,R) is a loss function that incurs a penalty when training constraintsare violated,R(M) is some regularizer on the parametersMof the learned metric Survey on Metric Learning for Feature Vectors and Structured DataMetric LearningFigure 1: Illustration of metric learning applied to a face recognition task. For simplicity,images are represented as points in 2 dimensions. Pairwise constraints, shownin the left pane, are composed of images representing the same person (must-link, shown in green) or different persons (cannot-link, shown in red).

We wishto adapt the metric so that there are fewer constraint violations (right pane).Images are taken from the Caltech Faces learningalgorithmMetric-basedalgorithmDa tasampleLearnedmetricLearnedpredictorPre dictionFigure 2: The common process in metric learning. A metric is learned from training dataand plugged into an algorithm that outputs a predictor ( , a classifier, a regres-sor, a recommender ) which hopefully performs better than a predictorinduced by a standard (non-learned) metric. 0 is the regularization parameter. As we will see in this survey, state-of-the-art metriclearning formulations essentially differ by their choice of metric, constraints, loss functionand the metric learning phase, the resulting function is used to improve the perfor-mance of a metric-based algorithm, which is most oftenk-Nearest Neighbors (k-NN), butmay also be a clustering algorithm such asK-Means, a ranking algorithm, etc.

The commonprocess in metric learning is summarized in ApplicationsMetric learning can potentially be beneficial whenever the notion of metric between in-stances plays an important role. Recently, it has been applied to problems as diverse aslink prediction in networks (Shaw et al.,2011), state representation in reinforcement learn-ing (Taylor et al.,2011), music recommendation (McFee et al.,2012), partitioning , Habrard and Sebban(Lajugie et al.,2014), identity verification (Ben et al.,2012), webpage archiving (Law et al.,2012), cartoon synthesis (Yu et al.,2012) and even assessing the efficacy of acupuncture(Liang et al.,2012), to name a few.

In the following, we list three large fields ofapplicationwhere metric learning has been shown to be very visionThere is a great need of appropriate metrics in Computer vision, notonly to compare images or videos in ad-hoc representations such as bags-of-visual-words(Li and Perona,2005) but also in the pre-processing step consisting in building this veryrepresentation (for instance, visual words are usually obtained by means of clustering). Forthis reason, there exists a large body of metric learning literature dealing specifically withcomputer vision problems, such as image classification (Mensink et al.,2012), object recog-nition (Frome et al.)

bellet@usc.edu Department of Computer Science University ...

Tags:

Information

Transcription of bellet@usc.edu Department of Computer Science University ...

Related search queries

bellet@usc.edu Department of Computer Science University ...

Tags:

Information

Documents from same domain

Related documents

Related search queries