Transcription of Datasheets for Datasets
1 Datasheets for Datasets Timnit Gebru 1 Jamie Morgenstern 2 Briana Vecchione 3 Jennifer Wortman Vaughan 1 Hanna Wallach 1. Hal Daum III 1 4 Kate Crawford 1 5. Abstract We therefore propose the concept of Datasheets for Datasets . The machine learning community has no stan- In the electronics industry, every component is accompanied dardized way to document how and why a dataset by a datasheet describing standard operating characteristics, was created, what information it contains, what test results, and recommended usage. By analogy, we rec- [ ] 9 Jul 2018. tasks it should and should not be used for, and ommend that every dataset be accompanied with a datasheet whether it might raise any ethical or legal con- documenting its motivation, creation, composition, intended cerns.
2 To address this gap, we propose the con- uses, distribution, maintenance, and other information. We cept of Datasheets for Datasets . In the electronics anticipate that such Datasheets will increase transparency industry, it is standard to accompany every com- and accountability in the machine learning community. ponent with a datasheet providing standard oper- Section 2 provides context for our proposal. Section 3. ating characteristics, test results, recommended discusses the evolution of safety standards in other indus- usage, and other information. Similarly, we rec- tries, and outlines the concept of Datasheets in electronics.
3 Ommend that every dataset be accompanied with a We give examples of questions that should be answered in datasheet documenting its creation, composition, Datasheets for Datasets in Section 4, and discuss challenges intended uses, maintenance, and other properties. and future work in Section 5. The appendix includes a more Datasheets for Datasets will facilitate better com- complete proposal along with prototype Datasheets for two munication between dataset creators and users, well-known Datasets : Labeled Faces in the Wild (Huang and encourage the machine learning community et al., 2007) and Pang and Lee's polarity dataset (2004).
4 To prioritize transparency and accountability. 2. Context 1. Introduction A foundational challenge in the use of machine learning is Machine learning is no longer a purely academic disci- the risk of deploying systems in unsuitable environments. A. pline. Domains such as criminal justice (Garvie et al., model's behavior on some benchmark may say very little 2016; Systems, 2017; Andrews et al., 2006), hiring and about its performance in the wild. Of particular concern are employment (Mann & O'Neil, 2016), critical infrastruc- recent examples showing that machine learning systems can ture (O'Connor, 2017; Chui, 2017), and finance (Lin, 2012) amplify existing societal biases.
5 For example, Buolamwini all increasingly depend on machine learning methods. & Gebru (2018) showed that commercial gender classifica- tion APIs have near perfect performance for lighter-skinned By definition, machine learning models are trained using males, while error rates for darker-skinned females can be data ; the choice of data fundamentally influences a model's as high as 33%.1 Bolukbasi et al. (2016) showed that word behavior. However, there is no standardized way to docu- embeddings trained on news articles exhibit gender biases, ment how and why a dataset was created, what information finishing the analogy man is to computer programmer as it contains, what tasks it should and shouldn't be used for, woman is to X with homemaker, a stereotypical role for and whether it might raise any ethical or legal concerns.
6 Women. Caliskan et al. (2017) showed these embeddings This lack of documentation is especially problematic when also contain racial biases: traditional European-American Datasets are used to train models for high-stakes applications. names are closer to positive words like joy, while African- 1. Microsoft Research, New York, NY 2 Georgia Institute of Tech- American names are closer to words like agony.. nology, Atlanta, GA 3 Cornell University, Ithaca, NY 4 University These biases can have dire consequences that might not be of Maryland, College Park, MD 5 AI Now Institute, New York, NY.
7 Correspondence to: Timnit Gebru easily discovered. Much like a faulty resistor or a capac- itor in a circuit, the effects of a biased machine learning Proceedings of the 5 th Workshop on Fairness, Accountability, and 1. Transparency in Machine Learning, Stockholm, Sweden, PMLR The evaluated APIs also provided the labels of female and 80, 2018. Copyright 2018 by the author(s). male, failing to address the complexities of gender beyond binary. Datasheets for Datasets component, such as a dataset, can propagate throughout a ational, social, and economic opportunities. However, much system making them difficult to track down.
8 For example, like current machine learning technology, automobiles were biases in word embeddings can result in hiring discrimina- introduced with few safety checks or regulations . When cars tion (Bolukbasi et al., 2016). For these and other reasons, first became available in the US, there were no speed limits, the World Economic Forum lists tracking the provenance, stop signs, traffic lights, driver education, or regulations development, and use of training Datasets as a best practice pertaining to seat belts or drunk driving (Canis, 2017). This that all companies should follow in order to prevent discrim- resulted in many deaths and injuries due to collisions, speed- inatory outcomes (World Economic Forum Global Future ing, and reckless driving (Hingson et al.))
9 , 1988). Reminis- Council on Human Rights 2016 2018, 2018). But while cent of current debates about machine learning, courtrooms provenance has been extensively studied in the database and newspaper editorials argued the possibility that the au- literature (Cheney et al., 2009; Bhardwaj et al., 2014), it has tomobile was inherently evil (Lewis v. Amorous, 1907). received relatively little attention in machine learning. The US and the rest of the world have gradually enacted The risk of unintentional misuse of Datasets can increase driver education, drivers licenses (Department of Transporta- when developers are not domain experts.
10 This concern is tion Federal Highway Administration, 1997), and safety particularly important with the movement toward democ- systems like four-wheel hydraulic brakes, shatter-resistant ratizing AI and toolboxes that provide publicly available windshields, all-steel bodies (McShane, 2018), padded dash- Datasets and off-the-shelf models to be trained by those with boards, and seat belts (Peltzman, 1975). Motorists' slow little-to-no domain knowledge or machine learning exper- adoption of seat belts spurred safety campaigns promoting tise. As these powerful tools become available to a broader their adoption.