TensorFlow: Large-Scale Machine Learning on …

tensorflow : Large-Scale Machine Learning on Heterogeneous Distributed Systems(Preliminary White Paper, November 9, 2015)Mart n Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro,Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow,Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser,Manjunath Kudlur, Josh Levenberg, Dan Man e, Rajat Monga, Sherry Moore, Derek Murray,Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar,Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Vi egas, Oriol Vinyals,Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang ZhengGoogle Research AbstractTensorFlow [1]

Is an interface for expressing Machine learn-ing algorithms, and an implementation for executing such al-gorithms. A computation expressed using tensorflow can beexecuted with little or no change on a wide variety of hetero-geneous systems, ranging from mobile devices such as phonesand tablets up to Large-Scale distributed systems of hundredsof machines and thousands of computational devices such asGPU cards. The system is flexible and can be used to expressa wide variety of algorithms, including training and inferencealgorithms for deep neural network models, and it has beenused for conducting research and for deploying Machine learn-ing systems into production across more than a dozen areas ofcomputer science and other fields, including speech recogni-tion, computer vision, robotics, information retrieval, naturallanguage processing, geographic information extraction, andcomputational drug discovery.

This paper describes the Ten-sorFlow interface and an implementation of that interface thatwe have built at Google. The tensorflow API and a referenceimplementation were released as an open-source package underthe Apache license in November, 2015 and are available IntroductionThe Google Brain project started in 2011 to explore theuse of very- Large-Scale deep neural networks, both forresearch and for use in Google s products. As part ofthe early work in this project, we built DistBelief, ourfirst-generation scalable distributed training and infer-ence system [14], and this system has served us well.

Weand others at Google have performed a wide variety of re-search using DistBelief including work on unsupervisedlearning [31], language representation [35, 52], modelsfor image classification and object detection [16, 48],video classification [27], speech recognition [56, 21, 20], Corresponding authors: Jeffrey Dean and Rajat prediction [47], move selection for Go [34],pedestrian detection [2], reinforcement Learning [38],and other areas [17, 5]. In addition, often in close collab-oration with the Google Brain team, more than 50 teamsat Google and other Alphabet companies have deployeddeep neural networks using DistBelief in a wide varietyof products, including Google Search [11], our advertis-ing products, our speech recognition systems [50, 6, 46],Google Photos [43], Google Maps and StreetView [19],Google Translate [18]

, YouTube, and many on our experience with DistBelief and a morecomplete understanding of the desirable system proper-ties and requirements for training and using neural net-works, we have built tensorflow , our second-generationsystem for the implementation and deployment of Large-Scale Machine Learning models. tensorflow takes com-putations described using a dataflow-like model andmaps them onto a wide variety of different hardwareplatforms, ranging from running inference on mobiledevice platforms such as Android and iOS to modest-sized training and inference systems using single ma-chines containing one or many GPU cards to large-scaletraining systems running on hundreds of specialized ma-chines with thousands of GPUs.

Having a single systemthat can span such a broad range of platforms signifi-cantly simplifies the real-world use of Machine learningsystem, as we have found that having separate systemsfor Large-Scale training and small-scale deployment leadsto significant maintenance burdens and leaky abstrac-tions. tensorflow computations are expressed as statefuldataflow graphs (described in more detail in Section 2),and we have focused on making the system both flexibleenough for quickly experimenting with new models forresearch purposes and sufficiently high performance androbust for production training and deployment of ma-chine Learning models.

For scaling neural network train-ing to larger deployments, tensorflow allows clients toeasily express various kinds of parallelism through repli-cation and parallel execution of a core model dataflow1graph, with many different computational devices all col-laborating to update a set of shared parameters or otherstate. Modest changes in the description of the com-putation allow a wide variety of different approachesto parallelism to be achieved and tried with low effort[14, 29, 42]. Some tensorflow uses allow some flexibil-ity in terms of the consistency of parameter updates, andwe can easily express and take advantage of these relaxedsynchronization requirements in some of our larger de-ployments.

Compared to DistBelief, tensorflow s pro-gramming model is more flexible, its performance is sig-nificantly better, and it supports training and using abroader range of models on a wider variety of hetero-geneous hardware of our internal clients of DistBelief have al-ready switched to tensorflow . These clients rely onTensorFlow for research and production, with tasks asdiverse as running inference for computer vision mod-els on mobile phones to Large-Scale training of deepneural networks with hundreds of billions of parame-ters on hundreds of billions of example records usingmany hundreds of machines [11, 47, 48, 18, 53, 41].

Although these applications have concentrated on ma-chine Learning and deep neural networks in particular,we expect that tensorflow s abstractions will be usefulin a variety of other domains, including other kinds ofmachine Learning algorithms, and possibly other kindsof numerical computations. We have open-sourced theTensorFlow API and a reference implementation underthe Apache license in November, 2015, available rest of this paper describes tensorflow in moredetail. Section 2 describes the programming model andbasic concepts of the tensorflow interface, and Section 3describes both our single Machine and distributed imple-mentations.

Section 4 describes several extensions tothe basic programming model, and Section 5 describesseveral optimizations to the basic implementations. Sec-tion 6 describes some of our experiences in using Ten-sorFlow, Section 7 describes several programming id-ioms we have found helpful when using tensorflow , andSection 9 describes several auxiliary tools we have builtaround the core tensorflow system. Sections 10 and 11discuss future and related work, respectively, and Sec-tion 12 offers concluding Programming Model and Basic ConceptsA tensorflow computation is described by a directedgraph, which is composed of a set ofnodes.

TensorFlow: Large-Scale Machine Learning on …

Tags:

Information

Transcription of TensorFlow: Large-Scale Machine Learning on …

Related search queries

TensorFlow: Large-Scale Machine Learning on …

Tags:

Information

Related documents

Related search queries