Example: bankruptcy

Datasheets for Datasets

Datasheets for Datasets Timnit Gebru 1 Jamie Morgenstern 2 Briana Vecchione 3 Jennifer Wortman Vaughan 1 Hanna Wallach 1. Hal Daum III 1 4 Kate Crawford 1 5. Abstract We therefore propose the concept of Datasheets for Datasets . The machine learning community has no stan- In the electronics industry, every component is accompanied dardized way to document how and why a dataset by a datasheet describing standard operating characteristics, was created, what information it contains, what test results, and recommended usage. By analogy, we rec- [ ] 9 Jul 2018. tasks it should and should not be used for, and ommend that every dataset be accompanied with a datasheet whether it might raise any ethical or legal con- documenting its motivation, creation, composition, intended cerns. To address this gap, we propose the con- uses, distribution, maintenance, and other information. We cept of Datasheets for Datasets .

machine learning systems exhibit accuracy disparities be-tween subpopulations, and calls for more diverse datasets, inclusive testing, and standards to address these disparities. 3.3. Electrical and Electronic Technologies Like datasets, electronic components are incorporated into a system whose larger goal may be far removed from the

Tags:

  Like, Exhibit

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of Datasheets for Datasets

1 Datasheets for Datasets Timnit Gebru 1 Jamie Morgenstern 2 Briana Vecchione 3 Jennifer Wortman Vaughan 1 Hanna Wallach 1. Hal Daum III 1 4 Kate Crawford 1 5. Abstract We therefore propose the concept of Datasheets for Datasets . The machine learning community has no stan- In the electronics industry, every component is accompanied dardized way to document how and why a dataset by a datasheet describing standard operating characteristics, was created, what information it contains, what test results, and recommended usage. By analogy, we rec- [ ] 9 Jul 2018. tasks it should and should not be used for, and ommend that every dataset be accompanied with a datasheet whether it might raise any ethical or legal con- documenting its motivation, creation, composition, intended cerns. To address this gap, we propose the con- uses, distribution, maintenance, and other information. We cept of Datasheets for Datasets .

2 In the electronics anticipate that such Datasheets will increase transparency industry, it is standard to accompany every com- and accountability in the machine learning community. ponent with a datasheet providing standard oper- Section 2 provides context for our proposal. Section 3. ating characteristics, test results, recommended discusses the evolution of safety standards in other indus- usage, and other information. Similarly, we rec- tries, and outlines the concept of Datasheets in electronics. ommend that every dataset be accompanied with a We give examples of questions that should be answered in datasheet documenting its creation, composition, Datasheets for Datasets in Section 4, and discuss challenges intended uses, maintenance, and other properties. and future work in Section 5. The appendix includes a more Datasheets for Datasets will facilitate better com- complete proposal along with prototype Datasheets for two munication between dataset creators and users, well-known Datasets : Labeled Faces in the Wild (Huang and encourage the machine learning community et al.)

3 , 2007) and Pang and Lee's polarity dataset (2004). to prioritize transparency and accountability. 2. Context 1. Introduction A foundational challenge in the use of machine learning is Machine learning is no longer a purely academic disci- the risk of deploying systems in unsuitable environments. A. pline. Domains such as criminal justice (Garvie et al., model's behavior on some benchmark may say very little 2016; Systems, 2017; Andrews et al., 2006), hiring and about its performance in the wild. Of particular concern are employment (Mann & O'Neil, 2016), critical infrastruc- recent examples showing that machine learning systems can ture (O'Connor, 2017; Chui, 2017), and finance (Lin, 2012) amplify existing societal biases. For example, Buolamwini all increasingly depend on machine learning methods. & Gebru (2018) showed that commercial gender classifica- tion APIs have near perfect performance for lighter-skinned By definition, machine learning models are trained using males, while error rates for darker-skinned females can be data; the choice of data fundamentally influences a model's as high as 33%.

4 1 Bolukbasi et al. (2016) showed that word behavior. However, there is no standardized way to docu- embeddings trained on news articles exhibit gender biases, ment how and why a dataset was created, what information finishing the analogy man is to computer programmer as it contains, what tasks it should and shouldn't be used for, woman is to X with homemaker, a stereotypical role for and whether it might raise any ethical or legal concerns. women. Caliskan et al. (2017) showed these embeddings This lack of documentation is especially problematic when also contain racial biases: traditional European-American Datasets are used to train models for high-stakes applications. names are closer to positive words like joy, while African- 1. Microsoft Research, New York, NY 2 Georgia Institute of Tech- American names are closer to words like agony.. nology, Atlanta, GA 3 Cornell University, Ithaca, NY 4 University These biases can have dire consequences that might not be of Maryland, College Park, MD 5 AI Now Institute, New York, NY.

5 Correspondence to: Timnit Gebru easily discovered. Much like a faulty resistor or a capac- itor in a circuit, the effects of a biased machine learning Proceedings of the 5 th Workshop on Fairness, Accountability, and 1. Transparency in Machine Learning, Stockholm, Sweden, PMLR The evaluated APIs also provided the labels of female and 80, 2018. Copyright 2018 by the author(s). male, failing to address the complexities of gender beyond binary. Datasheets for Datasets component, such as a dataset, can propagate throughout a ational, social, and economic opportunities. However, much system making them difficult to track down. For example, like current machine learning technology, automobiles were biases in word embeddings can result in hiring discrimina- introduced with few safety checks or regulations. When cars tion (Bolukbasi et al., 2016). For these and other reasons, first became available in the US, there were no speed limits, the World Economic Forum lists tracking the provenance, stop signs, traffic lights, driver education, or regulations development, and use of training Datasets as a best practice pertaining to seat belts or drunk driving (Canis, 2017).

6 This that all companies should follow in order to prevent discrim- resulted in many deaths and injuries due to collisions, speed- inatory outcomes (World Economic Forum Global Future ing, and reckless driving (Hingson et al., 1988). Reminis- Council on Human Rights 2016 2018, 2018). But while cent of current debates about machine learning, courtrooms provenance has been extensively studied in the database and newspaper editorials argued the possibility that the au- literature (Cheney et al., 2009; Bhardwaj et al., 2014), it has tomobile was inherently evil (Lewis v. Amorous, 1907). received relatively little attention in machine learning. The US and the rest of the world have gradually enacted The risk of unintentional misuse of Datasets can increase driver education, drivers licenses (Department of Transporta- when developers are not domain experts. This concern is tion Federal Highway Administration, 1997), and safety particularly important with the movement toward democ- systems like four-wheel hydraulic brakes, shatter-resistant ratizing AI and toolboxes that provide publicly available windshields, all-steel bodies (McShane, 2018), padded dash- Datasets and off-the-shelf models to be trained by those with boards, and seat belts (Peltzman, 1975).

7 Motorists' slow little-to-no domain knowledge or machine learning exper- adoption of seat belts spurred safety campaigns promoting tise. As these powerful tools become available to a broader their adoption. By analogy, machine learning will likely set of developers, it is increasingly important to enable these to require laws and regulations (especially in high-stakes developers to understand the implications of their work. environments), as well as social campaigns to promote best practices. The automobile industry routinely uses crash-test We believe this problem can be partially mitigated by accom- dummies to develop and test safety systems. This practice panying Datasets with Datasheets that describe their creation, led to problems similar to the biased dataset problems cur- their strengths, and their limitations. While this is not the rently faced by the machine learning community: almost all same as making everyone an expert, it gives an opportunity crash-test dummies were designed with prototypical male for domain experts to communicate what they know about a physiology; only in 2011 did US safety standards require dataset and its limitations to the developers who might use it.

8 Frontal crash tests with female crash-test dummies (Na- The use of Datasheets will be more effective if coupled with tional Highway Traffic Safety Administration, 2006), educational efforts around interpreting and applying ma- following evidence that women sustained more serious chine learning models. Such efforts are happening both injuries than men in similar accidents (Bose et al., 2011). within traditional ivory tower institutions ( , the new ethics in computing course at Harvard) and in new educa- Clinical Trials in Medicine tional organizations. For instance, one of 's missions is to get deep learning into the hands of as many people like data collection and experimentation for machine learn- as possible, from as many diverse backgrounds as possi- ing, clinical trials play an important role in drug develop- ble ( , 2017); their educational program includes ex- ment. When the US justice system stopped viewing clinical plicit training in dataset biases and ethics.

9 Combining better trials as a form of medical malpractice (Dowling, 1975), education and Datasheets will more quickly enable progress standards for clinical trials were put in place, often spurred by both domain experts and machine learning experts. by gross mistreatment, committed in the name of science. For example, the US government ran experiments on citi- zens without their consent, including a study of patients with 3. Safety Standards in Other Industries syphilis who were not told they were sick (Curran, 1973) and To put our proposal into context, we discuss the evolution radiation experiments (Faden et al., 1996; Moreno, 2013). of safety standards for automobiles, drugs, and electronics. The poor, the imprisoned, minority groups, pregnant women, Lessons learned from the historical dangers of new tech- and children comprised a majority of these study groups. nologies, and the safety measures put in place to combat The US now requires drug trials to inform participants that them, can help define a path forward for machine learning.

10 Drugs are experimental and not proven to be effective (and participants must consent). Prior to the start of a clinical The Automobile Industry trial, an Institutional Review Board and the Food and Drug Administration (FDA) review evidence of the drug's rela- Similar to current hopes that machine learning will posi- tive safety (including the drug's chemical composition and tively transform society, the introduction of automobiles results of animal testing) and the trial design (including promised to expand mobility and provide additional recre- Datasheets for Datasets participant demographics) (Food and Drug Administration, international standards, all electronic components, ranging 2018). Machine learning's closest legal analog to these from the cheapest and most ubiquitous resistor to highly safeguards is the EU's General Data Protection Regulation complex integrated circuits like CPUs, are accompanied (GDPR), which aims to ensure that users' personal data are by Datasheets that are prominently displayed on the compo- not used without their explicit consent.


Related search queries