Transcription of TensorFlow: A System for Large-Scale Machine Learning
1 This paper is included in the Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16).November 2 4, 2016 Savannah, GA, USAISB N 978 -1- 931971-33 -1 Open access to the Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation is sponsored by : A System for Large-Scale Machine LearningMart n Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh levenberg , Rajat Monga, Sherry Moore, Derek G.
2 Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng, Google : A System for Large-Scale Machine learningMart n Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean,Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur,Josh levenberg , Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker,Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang ZhengGoogle BrainAbstractTensorFlow is a Machine Learning System that operates atlarge scale and in heterogeneous environments.
3 tensor -Flow uses dataflow graphs to represent computation,shared state, and the operations that mutate that state. Itmaps the nodes of a dataflow graph across many machinesin a cluster, and within a Machine across multiple com-putational devices, including multicore CPUs, general-purpose GPUs, and custom-designed ASICs known asTensor Processing Units (TPUs). This architecture givesflexibility to the application developer: whereas in previ-ous parameter server designs the management of sharedstate is built into the System , tensorflow enables develop-ers to experiment with novel optimizations and training al-gorithms.
4 tensorflow supports a variety of applications,with a focus on training and inference on deep neural net-works. Several Google services use tensorflow in pro-duction, we have released it as an open-source project, andit has become widely used for Machine Learning this paper, we describe the tensorflow dataflow modeland demonstrate the compelling performance that tensor -Flow achieves for several real-world IntroductionIn recent years, Machine Learning has driven advances inmany different fields [3, 5, 24, 25, 29, 31, 42, 47, 50,52, 57, 67, 68, 72, 76].
5 We attribute this success to theinvention of more sophisticated Machine Learning mod-els [44, 54], the availability of large datasets for tack-ling problems in these fields [9, 64], and the develop-ment of software platforms that enable the easy use oflarge amounts of computational resources for trainingsuch models on these large datasets [14, 20].We have developed the tensorflow System for ex-perimenting with new models, training them on largedatasets, and moving them into production. We havebased tensorflow on many years of experience with ourfirst-generation System , DistBelief [20], both simplify-ing and generalizing it to enable researchers to explorea wider variety of ideas with relative ease.
6 TensorFlowsupports both Large-Scale training and inference: it effi-ciently uses hundreds of powerful (GPU-enabled) serversfor fast training, and it runs trained models for inference inproduction on various platforms, ranging from large dis-tributed clusters in a datacenter, down to running locallyon mobile devices. At the same time, it is flexible enoughto support experimentation and research into new machinelearning models and System -level uses a unified dataflow graph to repre-sent both the computation in an algorithmandthe stateon which the algorithm operates.
7 We draw inspirationfrom the high-level programming models of dataflow sys-tems [2, 21, 34] and the low-level efficiency ofparame-ter servers[14, 20, 49]. Unlike traditional dataflow sys-tems, in which graph vertices represent functional compu-tation on immutable data, tensorflow allows vertices torepresent computations that own or update mutable carrytensors(multi-dimensional arrays) betweennodes, and tensorflow transparently inserts the appropri-ate communication between distributed unifying the computation and state management in asingle programming model, tensorflow allows program-mers to experiment with different parallelization schemesthat, for example, offload computation onto the serversthat hold the shared state to reduce the amount of networktraffic.
8 We have also built various coordination protocols,and achieved encouraging results with synchronous repli-cation, echoing recent results [10, 18] that contradict thecommonly held belief that asynchronous replication is re-quired for scalable Learning [14, 20, 49].Over the past year, more than 150 teams at Google haveused tensorflow , and we have released the System as anUSENIX Association12th USENIX Symposium on Operating Systems Design and Implementation 265open-source to our large community ofusers we have gained experience with many different ma-chine Learning applications.
9 In this paper, we focus onneural network training as a challenging systems problem,and select two representative applications from this space:image classification and language modeling. These ap-plications stress computational throughput and aggregatemodel size respectively, and we use them both to demon-strate the extensibility of tensorflow , and to evaluate theefficiency and scalability of our present Background & motivationWe begin by describing the limitations of our previoussystem ( ) and outlining the design principles that weused in the development of tensorflow ( ).
10 Previous System : DistBeliefTensorFlow is the successor to DistBelief, which isthe distributed System for training neural networks thatGoogle has used since 2011 [20]. DistBelief uses thepa-rameter serverarchitecture, and here we criticize its lim-itations, but other systems based on this architecture haveaddressed these limitations in other ways [11, 14, 49]; wediscuss those systems in Subsection the parameter server architecture, a job comprisestwo disjoint sets of processes: statelessworkerprocessesthat perform the bulk of the computation when training amodel, and statefulparameter serverprocesses that main-tain the current version of the model parameters.