Efficient Processing of Deep Neural Networks: A Tutorial ...

Efficient Processing of Deep Neural Networks: A Tutorial and Survey This article provides a comprehensive Tutorial and survey coverage of the recent advances toward enabling Efficient Processing of deep Neural networks. B y V i v i e n n e S z e , S e n i o r M e m b e r I E E E , Y u -H s i n C h e n , S t u d e n t M e m b e r I E E E , Ti e n -J u Ya ng , Student Member IEEE, a n d J oe l S. E m e r , Fellow IEEE. ABSTRACT | Deep Neural networks (DNNs) are currently widely between various hardware architectures and platforms;. used for many artificial intelligence (AI) applications including be able to evaluate the utility of various DNN design computer vision, speech recognition, and robotics.

While DNNs techniques for Efficient Processing ; and understand recent deliver state-of-the-art accuracy on many AI tasks, it comes implementation trends and opportunities. at the cost of high computational complexity. Accordingly, KEYWORDS | ASIC; computer architecture; convolutional techniques that enable Efficient Processing of DNNs to improve Neural networks; dataflow Processing ; deep learning; deep energy efficiency and throughput without sacrificing application Neural networks; energy- Efficient accelerators; low power;. accuracy or increasing hardware cost are critical to the wide machine learning; spatial architectures; VLSI. deployment of DNNs in AI systems.

This article aims to provide a comprehensive Tutorial and survey about the recent advances toward the goal of enabling Efficient Processing of DNNs. I. I N T RODUC T ION. Specifically, it will provide an overview of DNNs, discuss various Deep Neural networks (DNNs) are currently the founda- hardware platforms and architectures that support DNNs, and tion for many modern artificial intelligence (AI) applica- highlight key trends in reducing the computation cost of DNNs tions [1]. Since the breakthrough application of DNNs either solely via hardware design changes or via joint hardware to speech recognition [2] and image recognition [3], the design and DNN algorithm changes.

It will also summarize number of applications that use DNNs has exploded. These various development resources that enable researchers and DNNs are employed in a myriad of applications from self- practitioners to quickly get started in this field, and highlight driving cars [4], to detecting cancer [5] to playing complex important benchmarking metrics and design considerations games [6]. In many of these domains, DNNs are now able that should be used for evaluating the rapidly growing number to exceed human accuracy. The superior performance of of DNN hardware designs, optionally including algorithmic codesigns, being proposed in academia and industry.

The DNNs comes from its ability to extract high-level features reader will take away the following concepts from this article: from raw sensory data after using statistical learning over a understand the key design considerations for DNNs; be able large amount of data to obtain an effective representation to evaluate different DNN hardware implementations with of an input space. This is different from earlier approaches benchmarks and comparison metrics; understand the tradeoffs that use hand-crafted features or rules designed by experts. The superior accuracy of DNNs, however, comes at the cost of high computational complexity. While Manuscript received March 15, 2017; revised August 6, 2017; accepted September 29, 2017.

Date of current version N. ovember 20, 2017. This work was supported by DARPA YFA, general-purpose compute engines, especially graphics pro- MIT CICS, and gifts from Nvidia and Intel. (Corresponding author: Vivienne Sze.) cessing units (GPUs), have been the mainstay for much DNN. V. Sze, Chen and Yang are with the Department of Electrical Engineering and computer Science, Massachusetts Institute of Technology, Cambridge, MA 02139 USA Processing , increasingly there is interest in providing more (e-mail: specialized acceleration of the DNN computation. This arti- J. S. Emer is with the Department of Electrical Engineering and computer Science, Massachusetts Institute of Technology, Cambridge, MA 02139 USA, and also with cle aims to provide an overview of DNNs, the various tools Nvidia Corporation, Westford, MA 01886 USA (e-mail: for understanding their behavior, and the techniques being Digital Object Identifier: explored to efficiently accelerate their computation.))

0018-9219 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See for more information. Vol. 105, No. 12, December 2017 | Proceedings of the IEEE 2295. Sze et al . : Ef ficient Processing of Deep Neural Net work s: A Tutorial and Sur vey This paper is organized as follows. Within AI is a large subfield called machine learning, which was defined in 1959 by Arthur Samuel as the field of study that Section II provides background on the context of why gives computers the ability to learn without being explicitly DNNs are important, their history and applications. programmed. That means a single program, once created, will Section III gives an overview of the basic components be able to learn how to do some intelligent activities outside of DNNs and popular DNN models currently in use.

The notion of programming. This is in contrast to purpose-built Section IV describes the various resources used for programs whose behavior is defined by hand-crafted heuristics DNN research and development. that explicitly and statically define their behavior. Section V describes the various hardware platforms The advantage of an effective machine learning algo- used to process DNNs and the various optimizations rithm is clear. Instead of the laborious and hit-or-miss used to improve throughput and energy efficiency without impacting application accuracy ( , produce approach of creating a distinct, custom program to solve bitwise identical results).

Each individual problem in a domain, the single machine Section VI discusses how mixed-signal circuits and new learning algorithm simply needs to learn, via a processes memory technologies can be used for near-data process- called training, to handle each new problem. ing to address the expensive data movement that domi- Within the machine learning field, there is an area that is nates throughput and energy consumption of DNNs. often referred to as brain-inspired computation. Since the brain Section VII describes various joint algorithm and hard- is currently the best machine we know for learning and solv- ware optimizations that can be performed on DNNs to ing problems, it is a natural place to look for a machine learning improve both throughput and energy efficiency while approach.

Therefore, a brain-inspired computation is a program trying to minimize impact on accuracy. or algorithm that takes some aspects of its basic form or func- Section VIII describes the key metrics that should be tionality from the way the brain works. This is in contrast to considered when comparing various DNN designs. attempts to create a brain, but rather the program aims to emu- late some aspects of how we understand the brain to operate. Although scientists are still exploring the details of how II. BACKGROU N D ON DN NS the brain works, it is generally believed that the main com- In this section, we describe the position of DNNs in the con- putational element of the brain is the neuron.

Efficient Processing of Deep Neural Networks: A Tutorial ...

Tags:

Information

Transcription of Efficient Processing of Deep Neural Networks: A Tutorial ...

Related search queries

Efficient Processing of Deep Neural Networks: A Tutorial ...

Tags:

Information

Documents from same domain

Related documents

Related search queries