Example: marketing

50 years of Data Science - courses.csail.mit.edu

50 years of data ScienceDavid DonohoSept. 18, 2015 Version than 50 years ago, John Tukey called for a reformation of academic statistics. In TheFuture of data Analysis , he pointed to the existence of an as-yet unrecognizedscience, whosesubject of interest was learning from data , or data analysis . Ten to twenty years ago, JohnChambers, Bill Cleveland and Leo Breiman independently once again urged academic statisticsto expand its boundaries beyond the classical domain of theoretical statistics; Chambers calledfor more emphasis on data preparation and presentation rather than statistical modeling; andBreiman called for emphasis on prediction rather than inference.

Sept. 18, 2015 Version 1.00 Abstract More than 50 years ago, John Tukey called for a reformation of academic statistics. In ‘The Future of Data Analysis’, he pointed to the existence of an as-yet unrecognized science, whose subject of interest was learning from …

Tags:

  Data, Year, Sciences, 2015, 50 years of data science

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of 50 years of Data Science - courses.csail.mit.edu

1 50 years of data ScienceDavid DonohoSept. 18, 2015 Version than 50 years ago, John Tukey called for a reformation of academic statistics. In TheFuture of data Analysis , he pointed to the existence of an as-yet unrecognizedscience, whosesubject of interest was learning from data , or data analysis . Ten to twenty years ago, JohnChambers, Bill Cleveland and Leo Breiman independently once again urged academic statisticsto expand its boundaries beyond the classical domain of theoretical statistics; Chambers calledfor more emphasis on data preparation and presentation rather than statistical modeling; andBreiman called for emphasis on prediction rather than inference.

2 Cleveland even suggested thecatchy name data Science for his envisioned recent and growing phenomenon is the emergence of data Science programs at majoruniversities, including UC Berkeley, NYU, MIT, and most recently the Univ. of Michigan, whichon September 8, 2015 announced a $100M data Science Initiative that will hire 35 new in these new programs has significant overlap in curricular subject matter with tradi-tional statistics courses; in general, though, the new initiatives steer away from close involvementwith academic statistics paper reviews some ingredients of the current data Science moment , including recentcommentary about data Science in the popular media, and about how/whether data Science isreally different from now-contemplated field of data Science amounts to a superset of the fields of statisticsand machine learning which adds some technology for scaling up to big data .

3 This chosensuperset is motivated by commercial rather than intellectual developments. Choosing in this wayis likely to miss out on the really important intellectual event of the next fifty all of Science itself will soon become data that can be mined, the imminent revolutionin data Science is not about mere scaling up , but instead the emergence of scientific studies ofdata analysis Science -wide. In the future, we will be able to predict how a proposal to change dataanalysis workflows would impact the validity of data analysis across all of Science , even predictingthe impacts on work by Tukey, Cleveland, Chambers and Breiman, I present a vision of datascience based on the activities of people who are learning from data , and I describe an academicfield dedicated to improving that activity in an evidence-based manner.

4 This new field is a betteracademic enlargement of statistics and machine learning than today s data Science Initiatives,while being able to accommodate the same short-term on a presentation at the Tukey Centennial workshop, Princeton NJ Sept 18 2015 :1 Contents1 Today s data Science Moment42 data Science versus The Big data Meme .. The Skills Meme .. The Jobs Meme .. What here is real? .. A Better Framework ..93 The Future of data Analysis, 1962104 The 50 years since Exhortations .. Reification .. 145 Breiman s Two Cultures , 2001156 The Predictive Culture s Secret The Common Task Framework .. Experience with CTF.

5 The Secret Sauce .. Required Skills .. 187 Teaching of today s consensus data Science198 The Full Scope of data The Six Divisions .. Discussion .. Teaching of GDS .. Research in GDS .. Programming Environments: R .. Wrangling: Tidy data .. Presentation: Knitr .. Discussion .. 289 Science about data Science -Wide Meta Analysis .. Cross-Study Analysis .. Cross-Workflow Analysis .. Summary .. 3210 The Next 50 years of data Open Science takes over .. Science as data .. Scientific data Analysis, tested Empirically .. DJ Hand (2006) .. Donoho and Jin (2008).

6 Zhao, Parmigiani, Huttenhower and Waldron (2014) .. data Science in 2065 .. 3711 Conclusion37 Acknowledgements:Special thanks to Edgar Dobriban, Bradley Efron, and Victoria Stodden for comments on data Scienceand on drafts of this to John Storey, Amit Singer, Esther Kim, and all the other organizers of the Tukey Centennial atPrinceton, September 18, thanks to my undergraduate statistics teachers: Peter Bloomfield, Henry Braun, Tom Hettmansperger,Larry Mayer, Don McNeil, Geoff Watson, and John in part by NSF DMS-1418362 and Statistical AssociationCEOC hief Executive OfficerCTFC ommon Task FrameworkDARPAD efense Advanced Projects Research AgencyDSIData Science InitiativeEDAE xploratory data AnalysisFoDAThe Furure of data Analysis, 1962 GDSG reater data ScienceHCHigher CriticismIBMIBM of Mathematical StatisticsITInformation Technology (the field)

7 JWTJohn Wilder TukeyLDSL esser data ScienceNIHN ational Institutes of HealthNSFN ational Science FoundationPoMCThe Problem of Multiple Comparisons, 1953 QPEQ uantitative Programming EnvironmentRR a system and language for computing with dataSS a system and language for computing with dataSASS ystem and lagugage produced by SAS, and lagugage produced by SPSS, Computational ResultTable 1: Frequent Acronyms31 Today s data Science MomentOn Tuesday September 8, 2015 , as I was preparing these remarks, the University of Michigan an-nounced a $100 Million data Science Initiative (DSI), ultimately hiring 35 new university s press release contains bold pronouncements: data Science has become a fourth approach to scientific discovery, in addition toexperimentation, modeling, and computation, said Provost Martha web site for DSI gives us an idea what data Scienceis: This coupling of scientific discovery and practice involves the collection, manage-ment, processing, analysis, visualization, and interpretation of vast amounts of het-erogeneous data associated with a diverse array of scientific, translational, and inter-disciplinary applications.

8 This announcement is not taking place in a vacuum. A number of DSI-like initiatives startedrecently, including(A)Campus-wide initiatives at NYU, Columbia, MIT, ..(B)New Master s Degree programs in data Science , for example at Berkeley, NYU, Stanford,..There are new announcements of such initiatives data Science versus StatisticsMany of my audience at the Tukey Centennial where these remarks were presented are appliedstatisticians, and consider their professional career one long series of exercises in the above ..collection, management, processing, analysis, visualization, and interpretation of vast amounts ofheterogeneous data associated with a diverse array of.

9 Applications. In fact, some presentations atthe Tukey Centennial were exemplary narratives of .. collection, management, processing, analysis,visualization, and interpretation of vast amounts of heterogeneous data associated with a diverse arrayof .. applications. To statisticians, the DSI phenomenon can seem puzzling. Statisticians see administrators touting,as new, activities that statisticians have already been pursuing daily, for their entire careers; andwhich were considered standard already when those statisticians were back in graduate following points about the U of M DSI will be very telling to such statisticians: U of M s DSI is taking place at a campus with a large and highly respected Statistics Depart-ment The identified leaders of this initiative are faculty from the Electrical Engineering and ComputerScience Department (Al Hero) and the School of Medicine (Brian Athey).

10 1 For an updated interactive geographic map of degree programs, The inagural symposium has one speaker from the Statistics department (Susan Murphy), outof more than 20 , statistics is being marginalized here; the implicit message is that statistics is a partof what goes on in data Science but not a very big part. At the same time, many of the concrete de-scriptions of what the DSI willactually dowill seem to statisticians to be bread-and-butter is apparently the word that dare not speak its name in connection with such an initiative!2 Searching the web for more information about the emerging term data Science , we encounterthe following definitions from the data Science Association s Professional Code of Conduct 3 data Scientist" means a professional who uses scientific methods to liberateand create meaning from raw a statistician, this sounds an awful lot like what applied statisticians do: use methodology tomake inferences from data .


Related search queries