Practical Secure Aggregation for Privacy-Preserving ...

Practical Secure Aggregationfor Privacy-Preserving Machine LearningKeith Bonawitz*, Vladimir Ivanov*, Ben Kreuter*,Antonio Marcedone ,H. Brendan McMahan*, Sarvar Patel*,Daniel Ramage*, Aaron Segal*, and Karn Mountain View, CA 94043 Tech, 2 West Loop Rd., New York, NY 100441 INTRODUCTIONM achine learning models trained on sensitive real-world datapromise improvements to everything from medical screen-ing [48] to disease outbreak discovery [39]. And the wide-spread use of mobile devices means even richer and moresensitive data is becoming available [37].However, large-scale collection of sensitive data entails particularly high-profile example of the consequences ofmishandling sensitive data occurred in 1988, when the videorental history of a nominee for the US Supreme Court waspublished without his consent [4].

The law passed in responseto that incident remains relevant today, limiting how onlinevideo streaming services can use their user data [44].This work outlines an approach to advancing Privacy-Preserving machine learning by leveraging Secure multipartycomputation (MPC) to compute sums of model parameterupdates from individual users devices in a Secure problem of computing a multiparty sum where no partyreveals its update in the clear even to the aggregator isreferred to asSecure Aggregation . As described in Section 2,the Secure Aggregation primitive can be used to privatelycombine the outputs of local machine learning on user devices,in order to update a global model. Training models in thisway offers tangible benefits a user s device can share anupdate knowing that the service provider will only see thatupdate after it has been averaged with those of other Secure Aggregation problem has been a rich area ofresearch: different approaches include works based on genericsecure multi-party computation protocols, works based onDC-nets, works based on partially- or fully-homomorphicthreshold encryption, and works based on pairwise discuss these existing works in more detail in Section 9,and compare them to our are particularly focused on the setting of mobile devices,where communication is extremely expensive, and dropoutsare common.

Given these constraints, we would like ourprotocol to incur no more than twice as much communicationas sending the data vector to be aggregated in the clear,and would also like the protocol to be fully robust to users Research performed during an internship at at any point. We believe that previous works do notaddress this mixture of constraints, which is what motivatesour Our ResultsWe present a protocol for securely computing sums of vectors,which has a constant number of rounds, low communicationoverhead, robustness to failures, and which requires only oneserver with limited trust. In our design the server has tworoles: it routes messages between the other parties, and itcomputes the final result. We present two variants of the pro-tocol: one is more efficient and can be proven Secure againsthonest but curious adversaries, in the plain model.

The otherguarantees privacy against active adversaries (including anactively adversarial server), but requires an extra round, andis proven Secure in the random oracle model. In both cases,we can show formally that the server only learns users inputsin aggregate, using a simulation-based proof as is standardfor MPC protocols. Both variants we present are practicaland we present benchmark results from our prototype OrganizationIn Section 2 we describe the machine learning applicationthat motivates this work. In Section 3 we review the crypto-graphic primitives we use in our protocol. We then proceedto give a high-level overview of our protocol design in Sec-tion 4, followed by a formal protocol description in Section Section 6 we prove security against honest-but-curious(passive) adversaries and include a high-level discussion ofprivacy against active Section 7, we give per-formance numbers based both on theoretical analysis as wellas on a prototype implementation.

Finally, we discuss someissues surrounding Practical deployments and future work inSection 8 and conclude with a discussion of related work inSection Mobile IntelligenceFederated LearningFederated Learning with Secure AggregationFigure 1:Left:In the cloud-centric approach to machine intelligence, user devices interact with cloud-hostedmodels, generating logs that can be used as training examples. The logs from many users are combined andused to improve the model, which is then used to serve future user :In Federated Learning,machine intelligence models are shipped to users devices where they are both evaluated and trained of improved models are shared with the server, where they are aggregated into a new model anddeployed to user :When Secure Aggregation is added to Federated Learning, the Aggregation ofmodel updates is logically performed by the virtual, incorruptible third party induced by the Secure multipartycommunication, so that the cloud provider learns only the aggregated model Secure Aggregation FORFEDERATED LEARNINGC onsider training a deep neural network to predict the nextword that a user will type as she composes a text models are commonly used to to improve typing efficacyfor a phone s on-screen keyboard [31].

A modeler may wishto train such a model on all text messages across a large pop-ulation of users. However, text messages frequently containsensitive information; users may be reluctant to upload a copyof them to the modeler s servers. Instead, we consider train-ing such a model in aFederated Learningsetting, whereineach user maintains a private database of her text messagessecurely on her own mobile device, and a shared global modelis trained under the coordination of a central server basedupon highly processed, minimally scoped, ephemeral updatesfrom users [45, 52].These updates are high-dimensional vectors based on infor-mation from the user s private database. Training a neural netis typically done by repeatedly iterating over these updatesusing a variant of a mini-batch stochastic gradient descentrule [16, 30].

(See Appendix C for details.)Although each update is ephemeral and contains no more(and typically significantly less) information than the user sprivate database, a user might still be concerned about whatinformation remains. In some circumstances, it is possible tolearn invididual words that a user has typed by inspectingthat user s most recent update. However, in the FederatedLearning setting, the server does not need to access anyindi-vidualuser s update in order to perform stochastic gradientdescent; it requires only the element-wiseweighted averagesofthe update vectors, taken over a random subset of users. Us-ing a Secure Aggregation protocol to compute these weightedaverages1would ensure that the server may learn only thatone or moreusers in this randomly selected subset wrote agiven word, but a Secure weighted average given a Secure sum operationis straightfoward; for detail, see Appendix Learning systems face several Practical chal-lenges.

Mobile devices have only sporadic access to powerand network connectivity, so the set of users participating ineach update step is unpredictable and the system must be ro-bust to users dropping out. Because the neural network maybe parameterized by millions of values, updates may be large,representing a direct cost to users on metered network devices also generally cannot establish direct commu-nications channels with other mobile devices (relying on aserver or service provider to mediate such communication)nor can they natively authenticate other mobile , Federated Learning motivates a need for a SecureAggregation protocol that:(1) operates on high-dimensional vectors(2)is highly communication efficient, even with a novelset of users on each instantiation(3) is robust to users dropping out(4)provides the strongest possible security under theconstraints of a server-mediated, unauthenticatednetwork model3 CRYPTOGRAPHIC PRIMITIVESIn this section, we discuss the cryptographic primitives andassumptions needed for our Secret SharingWe rely on Shamir st-out-of-nSecret Sharing [50], whichallows a user to split a secretsintonshares, such that anytshares can be used to reconstructs, but any set of at mostt 1shares gives no information scheme is parameterized over a finite fieldFof size atleastl >2k(wherekis the security parameter of the scheme), some large public primep.

We note thatsuch a large field size is needed because our scheme requiresclients to secret share their secret keys (whose length must beproportional to the security parameter for the security proofto go through). We also assume that integers1,..,n(whichwill be used to denote users in the protocol) can be identified2with distinct field elements inF. Given these parameters, thescheme consists of two algorithms. The sharing (s,t,U) {(u,su)}u Utakes as input a secrets, a setUofnfield elements representing user IDs, anda thresholdt |U|; it produces a set of sharessu, each ofwhich is associated with a differentu U. The ({(u,su)}u V,t) stakes as input thethresholdtand the shares corresponding to a subsetV Usuch that|V| t, and outputs a field requires that s F, t,nwith1 t n, U Fwhere|U|=n, if{(u,su)}u U (s,t,U),V Uand|V| t, ({(u,su)}u V,t) =s.

Se-curity requires that s,s Fand anyV Usuch that|V|< t:{{(u,su)}u U (s,t,U) :{(u,su)}u V} {{(u,su)}u U (s ,t,U) :{(u,su)}u V}where denotes that the two distributions are Key AgreementKey Agreement consists of a tuple of algorithms( , , ). The (k) ppproduces some public parameters (overwhich our scheme will be parameterized). (pp) (sSKu,sPKu)allows any partyuto generate a private-publickey (sSKu,sPKv) su,vallows any userutocombine their private keysSKuwith the public keysPKvforanyv(generated using the samepp), to obtain a privateshared keysu, specific Key Agreement scheme we will use is Diffie-Hellman key agreement [20], composed with a hash specifically, (k) (G ,g,q,H)samplesgroupG of prime orderq, along with a generatorg, anda hash functionH2; (G ,g,q,H) (x,gx)samplesa randomx Zqas the secret keysSKu, andgxas thepublic keysPKu; (xu,gxv) su,voutputssu,v=H((gxv)xu).

Practical Secure Aggregation for Privacy-Preserving ...

Tags:

Information

Transcription of Practical Secure Aggregation for Privacy-Preserving ...

Related search queries

Practical Secure Aggregation for Privacy-Preserving ...

Tags:

Information

Documents from same domain

Related documents

Related search queries