Example: marketing

De-anonymizing Web Browsing Data with Social …

De-anonymizing Web Browsing data with Social NetworksJessica SuStanford ShuklaStanford GoelStanford NarayananPrinceton online trackers and network adversaries de-anonymizeweb Browsing data readily available to them? We show theoretically, via simulation, and through experiments onreal user data that de-identified web Browsing histories canbe linked to Social media profiles using only publicly avail-able data . Our approach is based on a simple observation:each person has a distinctive Social network , and thus theset of links appearing in one s feed is unique. Assumingusers visit links in their feed with higher probability thana random user, Browsing histories contain tell-tale marks ofidentity. We formalize this intuition by specifying a modelof web Browsing behavior and then deriving the maximumlikelihood estimate of a user s Social profile.

De-anonymizing Web Browsing Data with Social Networks Jessica Su Stanford University jtysu@stanford.edu Ansh Shukla Stanford University anshukla@stanford.edu

Tags:

  Social, Network, With, Data, Browsing, Anonymizing web browsing data with social, Anonymizing, Anonymizing web browsing data with social networks

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of De-anonymizing Web Browsing Data with Social …

1 De-anonymizing Web Browsing data with Social NetworksJessica SuStanford ShuklaStanford GoelStanford NarayananPrinceton online trackers and network adversaries de-anonymizeweb Browsing data readily available to them? We show theoretically, via simulation, and through experiments onreal user data that de-identified web Browsing histories canbe linked to Social media profiles using only publicly avail-able data . Our approach is based on a simple observation:each person has a distinctive Social network , and thus theset of links appearing in one s feed is unique. Assumingusers visit links in their feed with higher probability thana random user, Browsing histories contain tell-tale marks ofidentity. We formalize this intuition by specifying a modelof web Browsing behavior and then deriving the maximumlikelihood estimate of a user s Social profile.

2 We evaluatethis strategy on simulated Browsing histories, and show thatgiven a history with 30 links originating from Twitter, wecan deduce the corresponding Twitter profile more than 50%of the time. To gauge the real-world e ectiveness of this ap-proach, we recruited nearly 400 people to donate their webbrowsing histories, and we were able to correctly identifymore than 70% of them. We further show that several on-line trackers are embedded on su ciently many websites tocarry out this attack with high accuracy. Our theoreticalcontribution applies to any type of transactional data andis robust to noisy observations, generalizing a wide rangeof previous de-anonymization attacks. Finally, since our at-tack attempts to find the correct Twitter profile out of over300 million candidates, it is to our knowledge the largest-scale demonstrated de-anonymization to Concepts Security and privacy!

3 Pseudonymity, anonymityand untraceability; Information systems!Onlineadvertising; Social networks;Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full cita-tion on the first page. Copyrights for components of this work owned by others thanACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re-publish, to post on servers or to redistribute to lists, requires prior specific permissionand/or a fee. Request permissions from 2017 ACM. ISBN :TBD1. INTRODUCTIONO nline anonymity protects civil liberties. At an abstractlevel, it enables intellectual freedom: research shows thatusers change their behavior when they know they are be-ing surveilled online [23], resulting in a chilling e ect [32].

4 Concretely, users who have their anonymity compromisedmay su er harms ranging from persecution by governmentsto targeted frauds that threaten public exposure of onlineactivities [6].The online advertising industry builds Browsing historiesof individuals via third-party trackers embedded on webpages. While a small number of companies admit to attach-ing user identities to these Browsing -history datasets, mostcompanies promise users that the histories are pseudony-mous and not linked to identity. Privacy advocates haveargued that such data can be de-anonymized, but we lackconclusive evidence. It has remained unclear what typeof identified auxiliary information could be used in a de-anonymization attack, whether an attack could work at thescale of millions of users, and what the success rate of suchan attack would this paper we show that Browsing histories can be linkedto Social media profiles such as Twitter, Facebook, or Redditaccounts.

5 We begin by observing that most users subscribeto a distinctive set of other users on a service. Since usersare more likely to click on links posted by accounts thatthey follow, these distinctive patterns persist in their brows-ing history. An adversary can thus de-anonymize a givenbrowsing history by finding the Social media profile whose feed shares the history s idiosyncratic an attack is feasible for any adversary with accessto Browsing histories. This includes third-party trackersand others with access to their data (either via intrusion ora lawful request). network adversaries including govern-ment surveillance agencies, Internet service providers, andco ee shop eavesdroppers also see URLs of unencryptedweb tra c. The adversary may also be a cross-device track-ing company aiming to link two di erent Browsing histories( , histories generated by the same user on di erent de-vices).

6 For such an adversary, linking to Social media profilesis a stepping make three key contributions. First, we develop ageneral theoretical framework for de-anonymization. We as-sume there is a background probability of clicking on links,and that a link appearing in a user s feed increases its prob-ability of appearing in their Browsing history by a user-1A user s feed or timeline contains the aggregated contentposted by all accounts to which the user factor. We then derive a maximum likelihood es-timate, which lets us identify the feed in the system mostlikely to have generated the observed history. This generalframing applies to a variety of other de-anonymization at-tacks ( Section 8).Our second contribution is implementing and evaluatingthis technique. We chose Twitter as the source of aux-iliary information for several reasons: its real-time API,which avoids the need for large-scale web-crawling; the factthat most activity is public; and finally, the fact that linksare wrapped in , which simplifies detailsof our attack.

7 We assume that either due to the refererheader or by exploiting timing information, the adversaryknows which links in the user s history resulted from clickson Twitter. By employing a variety of caching and ap-proximation techniques, we built a system capable of De-anonymizing web Browsing histories in real-time, typicallyin under one minute. To test the performance of this sys-tem, we picked 60 active Twitter users at random, obtainedtheir feeds, and simulated Browsing histories using a sim-ple behavioral model. Given a synthetic history containing30 Twitter links, we identified the correct Twitter profile out of over 300 million active Twitter users over 50% ofthe time. We show that our maximum likelihood estimateachieves better accuracy than intersection size and Jaccardsimilarity, two approaches that have been previously studiedin the context of similar de-anonymization tasks [15, 35].

8 Finally, our third contribution is creating an experimentto test this attack on real Browsing built anonline tool to allow users to donate their Browsing history;upon which we executed our attack and showed the result tothe user so they could confirm or deny. The attack workedcorrectly for 72% of the 374 users who completed the experi-ment. We present these results as a proof of concept, notingthat our sample of users is not are many ways in which users may be de-anonymizedwhen Browsing the web (see Section 2). However, our attackis notable for its generality and for the variety of adver-saries who may employ it. Any Social media site can beused for such an attack, provided that a list of each user ssubscriptions can be inferred, the content is public, and theuser visits su ciently many links from the site. For ex-ample, on Facebook subscriptions can be inferred based on likes, and on Reddit based on comments, albeit incom-pletely and with some error.

9 Further, it is inherent in theweb s design and users behavior, and is not due to spe-cific, fixable vulnerabilities by browsers or websites, unlikeprevious de-anonymization attacks. It simultaneously con-firms the fingerprintability of Browsing profiles and the easyavailability of auxiliary information. Application-layer de-anonymization has long been considered the Achilles heelof Tor and other anonymity systems, and our work providesanother reason why that is the increasing adoption of HTTPS on the web diminishesthe strength of an attack by network adversaries, but not bythird-party trackers. However, network adversaries still seethe domain of encrypted requests, even if the URL is hypothesize that the attack will still work in this scenariobut will require a greater number of links per user. Userscan mitigate attacks by installing tracker-blocking tools suchas Ghostery, uBlock Origin, or Privacy Badger, as well as2 This experiment was approved by Stanford University s In-stitutional Review Board (Protocol No.)

10 34095).HTTPS everywhere to increase the use of encryption. Ofcourse, not revealing one s real-world identity on Social me-dia profiles also makes it harder for the adversary to identifythe user, even if the linking is successful. Nascent projectssuch as Contextual Identity containers for Firefox help usersmore easily manage their identity online [5]. None of thesesolutions is perfect; ultimately, protecting anonymity onlinerequires vigilance and awareness of potential RELATED WORKThe de-anonymization literature is vast, but linkage at-tacks (and demonstrations of uniqueness) based onbehaviorare especially relevant to our work. These include transac-tional records of movie viewing [28], location traces [7, 22],credit-card metadata [8], and writing style [27]. Attacks onanonymous communication systems such as long-term in-tersection attacks and statistical disclosure attacks employsimilar principles [24].


Related search queries