Example: dental hygienist

Online Tracking: A 1-million-site Measurement and …

Online Tracking: A 1-million-site Measurement and AnalysisSteven EnglehardtPrinceton NarayananPrinceton is an extended version of our paper that appeared at ACM CCS present the largest and most detailed Measurement ofonline tracking conducted to date, based on a crawl of thetop 1 million websites. We make 15 types of measurementson each site , including stateful (cookie-based) and stateless(fingerprinting-based) tracking, the effect of browser privacytools, and the exchange of tracking data between differentsites ( cookie syncing ). Our findings include multiple so-phisticated fingerprinting techniques never before measuredin the Measurement is made possible by our open-sourceweb privacy Measurement tool, OpenWPM1, which uses anautomated version of a full-fledged consumer browser.

Online Tracking: A 1-million-site Measurement and Analysis Steven Englehardt Princeton University ste@cs.princeton.edu Arvind Narayanan Princeton University

Tags:

  Analysis, Measurement, Site, Million, Million site measurement and, Million site measurement and analysis

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of Online Tracking: A 1-million-site Measurement and …

1 Online Tracking: A 1-million-site Measurement and AnalysisSteven EnglehardtPrinceton NarayananPrinceton is an extended version of our paper that appeared at ACM CCS present the largest and most detailed Measurement ofonline tracking conducted to date, based on a crawl of thetop 1 million websites. We make 15 types of measurementson each site , including stateful (cookie-based) and stateless(fingerprinting-based) tracking, the effect of browser privacytools, and the exchange of tracking data between differentsites ( cookie syncing ). Our findings include multiple so-phisticated fingerprinting techniques never before measuredin the Measurement is made possible by our open-sourceweb privacy Measurement tool, OpenWPM1, which uses anautomated version of a full-fledged consumer browser.

2 Itsupports parallelism for speed and scale, automatic recoveryfrom failures of the underlying browser, and comprehensivebrowser instrumentation. We demonstrate our platform sstrength in enabling researchers to rapidly detect, quantify,and characterize emerging Online tracking INTRODUCTIONWeb privacy Measurement observing websites and ser-vices to detect, characterize and quantify privacy-impactingbehaviors has repeatedly forced companies to improvetheir privacy practices due to public pressure, press cov-erage, and regulatory action [5, 15]. On the other hand,web privacy Measurement presents formidable engineeringand methodological challenges. In the absence of a generictool, it has been largely confined to a niche community seek to transform web privacy Measurement into awidespread practice by creating a tool that is useful not justto our colleagues but also to regulators, self-regulators, thepress, activists, and website operators, who are often in thedark about third-party tracking on their own domains.

3 Wealso seek to lessen the burden ofcontinualoversight of webtracking and privacy, by developing a robust and modularplatform for repeated (Section 3) solves three key systems challengesfaced by the web privacy Measurement community. It doesso by building on the strengths of past work, while avoidingthe pitfalls made apparent in previous engineering efforts.(1) We achieve scale through parallelism and robustness byutilizing isolated Measurement processes similar to FPDetec-tive s platform [2], while still supporting stateful measure-ments. We re able to scale to 1 million sites, without having1 resort to a stripped-down browser [31] (a limitation weexplore in detail in Section ). (2) We provide compre-hensive instrumentation by expanding on the rich browserextension instrumentation of FourthParty [33], without re-quiring the researcher to write their own automation code.

4 (3) We reduce duplication of work by providing a modulararchitecture to enable code re-use between these problems is hard because the web is not de-signed for automation or instrumentation. Selenium,2themain tool for automated browsing through a full-fledgedbrowser, is intended for developers to test theirownweb-sites. As a result it performs poorly on websites not con-trolled by the user and breaks frequently if used for large-scale measurements. Browsers themselves tend to suffermemory leaks over long sessions. In addition,instrument-ingthe browser to collect a variety of data for later analy-sis presents formidable challenges. For full coverage, we vefound it necessary to have three separate Measurement points:a network proxy, a browser extension, and a disk state mon-itor.

5 Further, we must link data collected from these dis-parate points into a uniform schema, duplicating much ofthe browser s own internal logic in parsing large-scale view of web tracking and this paper we report results from a January 2016 mea-surement of the top 1 million sites (Section 4). Our scaleenables a variety of new insights. We observe for the firsttime that Online tracking has a long tail , but we find asurprisingly quick drop-off in the scale of individual track-ers: trackers in the tail are found on very few sites (Sec-tion ). Using a new metric for quantifying tracking (Sec-tion ), we find that the tracking-protection tool Ghostery( ) is effective, with some caveats(Section ). We quantify the impact of trackers and thirdparties on HTTPS deployment (Section ) and show thatcookie syncing is pervasive (Section ).

6 Turning to browser fingerprinting, we revisit an influential2014 study on canvas fingerprinting [1] with updated and im-proved methodology (Section ). Next, we report on sev-eral types of fingerprinting never before measured at scale:font fingerprinting using canvas (which is distinct from can-vas fingerprinting; Section ), and fingerprinting by abus-ing the WebRTC API (Section ), the Audio API ( ), and the Battery Status API ( ). Finally, we showthat in contrast to our results in Section , existing pri-vacy tools arenoteffective at detecting these newer andmore obscure fingerprinting , our results show cause for concern, but also en-couraging signs. In particular, several of our results suggestthat while Online tracking presents few barriers to entry,trackers in the tail of the distribution are found on very fewsites and are far less likely to be encountered by the av-erage user.

7 Those at the head of the distribution, on theother hand, are owned by relatively few companies and areresponsive to the scrutiny resulting from privacy envision a future where Measurement provides a keylayer of oversight of Online privacy. This will be especiallyimportant given that perfectly anticipating and preventingall possible privacy problems (whether through blocking toolsor careful engineering of web APIs) has proved enable such oversight, we plan to make all of our datapublicly available (OpenWPM is already open-source). Weexpect that Measurement will be useful to developers of pri-vacy tools, to regulators and policy makers, journalists, andmany BACKGROUND AND RELATED WORKB ackground: third-party Online usersbrowse and interact with websites, they are observed by both first parties, which are the sites the user visits directly, and third parties which are typically hidden trackers such asad networks embedded on most web pages.

8 Third partiescan obtain users browsing histories through a combinationof cookies and other tracking technologies that allow themto uniquely identify users, and the referer header that tellsthe third party which first-party site the user is currentlyvisiting. Other sensitive information such as email addressesmay also be leaked to third parties via the referer privacy Measurement closestcomparisons to OpenWPM are other open web privacy mea-surement platforms, which we now review. We consider atool to be a platform if is is publicly available and there issome generality to the types of studies that can be performedusing it. In some cases, OpenWPM has directly built uponexisting platforms, which we make explicit note the most similar platform to uses a hybrid PhantomJS and Chromium basedautomation infrastructure [2], with both native browser codeand a proxy for instrumentation.

9 In the published study, theplatform was used for the detection and analysis of finger-printers, and much of the included instrumentation was builtto support that. The platform allows researchers to conductadditional experiments by replacing a script which is exe-cuted with each page visit, which the authors state can beeasily extended for non-fingerprinting differs in several ways from FPDetective: (1)it supports both stateful and stateless measurements, whereasFPDetective only supports stateless (2) it includes genericinstrumentation for both stateless and stateful tracking, en-abling a wider range of privacy studies without additionalchanges to the infrastructure (3) none of the included instru-mentation requires native browser code, making it easier toupgrade to new or different versions of the browser, and (4)OpenWPM uses a high-level command-based architecture,which supports command re-use between Crawleris a Chromium based crawler that uti-lizes the Chameleon3browser extension for detecting browserfingerprinting.

10 Chameleon Crawler uses similar automation3 , but supports a subset of OpenWPM s a Firefox plug-in for instrumenting thebrowser and does not handle automation [33]. OpenWPMhas incorporated and expanded upon nearly all of Fourth-Party s instrumentation (Section 3).WebXrayis a PhantomJS based tool for measuring HTTP traffic [31]. It has been used to study third-party inclusionson the top 1 million sites, but as we show in Section ,measurements with a stripped-down browser have the po-tential to miss a large number of resource a Chrome extension that detects track-ing and exposes APIs for extending its functionality such asmeasurement and blocking [48].XRay[27] andAdFisher[9] are tools for running auto-mated personalization detection experiments.


Related search queries