Model Inversion Attacks that Exploit Conﬁdence …

Model Inversion Attacks that Exploit Confidence Informationand Basic CountermeasuresMatt FredriksonCarnegie Mellon UniversitySomesh JhaUniversity of Wisconsin MadisonThomas RistenpartCornell TechABSTRACTM achine-learning (ML) algorithms are increasingly utilizedin privacy-sensitive applications such as predicting lifestylechoices, making medical diagnoses, and facial recognition. Ina Model Inversion attack, recently introduced in a case studyof linear classifiers in personalized medicine by Fredriksonet al. [13], adversarial access to an ML Model is abusedto learn sensitive genomic information about Model Inversion Attacks apply to settings outsidetheirs, however, is develop a new class of Model Inversion attack thatexploits confidence values revealed along with new Attacks are applicable in a variety of settings, andwe explore two in depth: decision trees for lifestyle surveysas used on machine-learning-as-a-service systems and neuralnetworks for facial recognition.

In both cases confidence val-ues are revealed to those with the ability to make predictionqueries to models. We experimentally show Attacks that areable to estimate whether a respondent in a lifestyle surveyadmitted to cheating on their significant other and, in theother context, show how to recover recognizable images ofpeople s faces given only their name and access to the MLmodel. We also initiate experimental exploration of naturalcountermeasures, investigating a privacy-aware decision treetraining algorithm that is a simple variant of CART learn-ing, as well as revealing only rounded confidence values. Thelesson that emerges is that one can avoid these kinds of MIattacks with negligible degradation to INTRODUCTIONC omputing systems increasingly incorporate machine learn-ing (ML) algorithms in order to provide predictions of lifestylechoices [6], medical diagnoses [20], facial recognition [1],and more.

The need for easy push-button ML has evenprompted a number of companies to build ML-as-a-servicecloud systems, wherein customers can upload data sets, trainclassifiers or regression models, and then obtain access toperform prediction queries using the trained Model allPermission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. Copyrights for components of this work owned by others than theauthor(s) must be honored. Abstracting with credit is permitted. To copy otherwise, orrepublish, to post on servers or to redistribute to lists, requires prior specific permissionand/or a fee. Request permissions from 15,October 12 16, 2015, Denver, Colorado, is held by the owner/author(s).

Publication rights licensed to 978-1-4503-3832-5/15/10 ..$ : .over easy-to-use public HTTP interfaces. The features usedby these models, and queried via APIs to make predictions,often represent sensitive information. In facial recognition,the features are the individual pixels of a picture of a per-son s face. In lifestyle surveys, features may contain sensitiveinformation, such as the sexual habits of the context of these services, a clear threat is thatproviders might be poor stewards of sensitive data, allow-ing training data or query logs to fall prey to insider at-tacks or exposure via system compromises. A number ofworks have focused on Attacks that result from access to(even anonymized) data [18,29,32,38]. A perhaps more sub-tle concern is that the ability to make prediction queriesmight enable adversarialclientsto back out sensitive work by Fredrikson et al.

[13] in the context of ge-nomic privacy shows amodel Inversion attackthat is ableto use black -box access to prediction models in order to es-timate aspects of someone s genotype. Their attack worksfor any setting in which the sensitive feature being inferredis drawn from a small set. They only evaluated it in a singlesetting, and it remains unclear if Inversion Attacks pose abroader this paper we investigate commercial ML-as-a-serviceAPIs. We start by showing that the Fredrikson et al. at-tack, even when it is computationally tractable to mount, isnot particularly effective in our new settings. We thereforeintroduce new Attacks that infer sensitive features used asinputs to decision tree models, as well as Attacks that re-cover images from API access to facial recognition key enabling insight across both situations is that wecan build attack algorithms that Exploit confidence valuesexposed by the APIs.

One example from our facial recogni-tion Attacks is depicted in Figure 1: an attacker can producea recognizable image of a person, given only API access to afacial recognition system and the name of the person whoseface is recognized by APIs and Model provide an overviewof contemporary ML services in Section 2, but for the pur-poses of discussion here we roughly classify client-side accessas being eitherblack-boxorwhite-box. In a black -box setting,an adversarial client can make prediction queries against amodel, but not actually download the Model a white-box setting, clients are allowed to download adescription of the Model . The new generation of ML-as-a-service systems including general-purpose ones such asBigML [4] and Microsoft Azure Learning [31] allow dataowners to specify whether APIs should allow white-box orblack-box access to their 1: An image recovered using a new Model in-version attack (left) and a training set image of thevictim (right).

The attacker is given only the per-son s name and access to a facial recognition systemthat returns a class confidence a Model defining a functionfthat takes input afeature vectorx1,..,xdfor some feature dimensiondandoutputs a predictiony=f(x1,..,xd). In the Model in-version attack of Fredriksonet al.[13], an adversarial clientuses black -box access tofto infer a sensitive feature, sayx1, given some knowledge about the other features and thedependent valuey, error statistics regarding the Model , andmarginal priors for individual variables. Their algorithm isa maximum a posteriori (MAP) estimator that picks thevalue forx1which maximizes the probability of having ob-served the known values (under some seemingly reasonableindependence assumptions). To do so, however, requirescomputingf(x1,..,xd) for every possible value ofx1(andany other unknown features).

This limits its applicabilityto settings wherex1takes on only a limited set of first contribution is evaluating their MAP estima-tor in a new context. We perform a case study showingthat it provides only limited effectiveness in estimating sen-sitive features (marital infidelity and pornographic viewinghabits) in decision-tree models currently hosted on BigML smodel gallery [4]. In particular the false positive rate is toohigh: our experiments show that the Fredrikson et al. algo-rithm would incorrectly conclude, for example, that a per-son (known to be in the training set) watched pornographicvideos in the past year almost 60% of the time. This mightsuggest that Inversion is not a significant risk, but in fact weshow new Attacks that can significantly improve decision tree the ac-tual data available via the BigML service APIs, one sees thatmodel descriptions include more information than leveragedin the black -box attack.

In particular, they provide thecount of instances from the training set that match eachpath in the decision tree. Dividing by the total number ofinstances gives a confidence in the classification. While apriori this additional information may seem innocuous, weshow that it can in fact be give a new MAP estimator that uses the confidenceinformation in the white-box setting to infer sensitive in-formation withno false positiveswhen tested against twodifferent BigML decision tree models. This high precisionholds for target subjects who are known to be in the trainingdata, while the estimator s precision is significantly worsefor those not in the training data set. This demonstratesthat publishing these models poses a privacy risk for thosecontributing to the training new estimator, as well as the Fredrikson et al.

One,query or run predictions a number of times that is linearin the number of possible values of the target sensitive fea-ture(s). Thus they do not extend to settings where featureshave exponentially large domains, or when we want to inverta large number of features from small faces from neural exampleof a tricky setting with large-dimension, large-domain datais facial recognition: features are vectors of floating-pointpixel data. In theory, a solution to this large-domain in-version problem might enable, for example, an attacker touse a facial recognition API to recover an image of a persongiven just their name (the class label). Of course this wouldseem impossible in the black -box setting if the API returnsanswers to queries that are just a class label. Inspecting fa-cial recognition APIs, it turns out that it is common to givefloating-point confidence measures along with the class label(person s name).

Model Inversion Attacks that Exploit Conﬁdence …

Tags:

Information

Advertisement

Transcription of Model Inversion Attacks that Exploit Conﬁdence …

Model Inversion Attacks that Exploit Conﬁdence …

Tags:

Information

Advertisement

Documents from same domain