Example: biology

Fine-Grained DRAM: Energy-Efficient DRAM for …

Fine-Grained DRAM: Energy-Efficient DRAM for ExtremeBandwidth SystemsMike O Connor Niladrish Chatterjee Donghyuk Lee John Wilson Aditya Agrawal Stephen W. Keckler William J. Dally NVIDIA The University of Texas at Austin Stanford University{moconnor, nchatterjee, donghyukl, jowilson, adityaa, skeckler, bdally} GPUs and other high-performance throughput processors willrequire multiple TB/s of bandwidth to DRAM. Satisfying this band-width demand within an acceptable energy budget is a challengein these extreme bandwidth memory systems. We propose a newhigh- bandwidth DRAM architecture, Fine-Grained DRAM (FGDRAM),which improves bandwidth by 4 and improves the energy efficiencyof DRAM by 2 relative to the highest- bandwidth , most Energy-Efficient contemporary DRAM, High bandwidth Memory (HBM2).

Fine-Grained DRAM: Energy-Efficient DRAM for Extreme Bandwidth Systems Mike O’Connor∗†‡ Niladrish Chatterjee∗† Donghyuk Lee† John Wilson† Aditya Agrawal† Stephen W. Keckler†‡ William J. Dally†⋄ †NVIDIA ‡The University of Texas at Austin ⋄Stanford University {moconnor, nchatterjee, donghyukl, jowilson, adityaa, skeckler, …

Tags:

  Reading, Energy, Energy efficient, Efficient, Fine, Extreme, Bandwidth, Ardms, Fine grained dram, Energy efficient dram for extreme bandwidth

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Advertisement

Transcription of Fine-Grained DRAM: Energy-Efficient DRAM for …

1 Fine-Grained DRAM: Energy-Efficient DRAM for ExtremeBandwidth SystemsMike O Connor Niladrish Chatterjee Donghyuk Lee John Wilson Aditya Agrawal Stephen W. Keckler William J. Dally NVIDIA The University of Texas at Austin Stanford University{moconnor, nchatterjee, donghyukl, jowilson, adityaa, skeckler, bdally} GPUs and other high-performance throughput processors willrequire multiple TB/s of bandwidth to DRAM. Satisfying this band-width demand within an acceptable energy budget is a challengein these extreme bandwidth memory systems. We propose a newhigh- bandwidth DRAM architecture, Fine-Grained DRAM (FGDRAM),which improves bandwidth by 4 and improves the energy efficiencyof DRAM by 2 relative to the highest- bandwidth , most Energy-Efficient contemporary DRAM, High bandwidth Memory (HBM2).

2 These benefits are in large measure achieved by partitioning the DRAMdie into many independent units, called grains, each of which has alocal, adjacent I/O. This approach unlocks the bandwidth of all thebanks in the DRAM to be used simultaneously, eliminating sharedbuses interconnecting various banks. Furthermore, the on-DRAM datamovement energy is significantly reduced due to the much shorterwiring distance between the cell array and the local I/O. This FGDRAM architecture readily lends itself to leveraging existing techniques toreducing the effective DRAM row size in an area efficient manner,reducing wasteful row activate energy in applications with low local-ity.

3 In addition, when FGDRAM is paired with a memory controlleroptimized to exploit the additional concurrency provided by the in-dependent grains, it improves GPU system performance by 19% overan iso- bandwidth and iso-capacity future HBM baseline. Thus, thisenergy- efficient , high- bandwidth FGDRAM architecture addresses theneeds of future extreme - bandwidth memory CONCEPTS Hardware Dynamic memory;Power and energy ; Com-puting methodologies Graphics processors; Computer sys-tems organization Parallel architectures; Both authors contributed equally to the paperPermission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page.

4 Copyrights for components of this work owned by others than ACMmust be honored. Abstracting with credit is permitted. To copy otherwise, or republish,to post on servers or to redistribute to lists, requires prior specific permission and/or afee. Request permissions from October 14 18, 2017, Cambridge, MA, USA 2017 Association for Computing ISBN 978-1-4503-4952-9/17/10.. $ , energy -Efficiency, High bandwidth , GPUACM Reference format:M. O Connor, N. Chatterjee, D. Lee, J. Wilson, A. Agrawal, Keckler, Dally. 2017. Fine-Grained DRAM: Energy-Efficient DRAM for ExtremeBandwidth Systems.

5 InProceedings of MICRO-50, Cambridge, MA, USA,October 14 18, 2017,14 INTRODUCTIONHigh bandwidth DRAM has been a key enabler of the continu-ous performance scaling of Graphics Processing Units (GPUs) andother throughput-oriented parallel processors. Successive gener-ations of GPU-specific DRAMs, optimized primarily to maximizebandwidth rather than minimize cost per bit, have increased ag-gregate system bandwidth ; first through high-frequency off-chipsignaling with Graphics Double-Data Rate memories (GDDR3/5/5X[18,21,24]) and, most recently, through on-package integration ofthe processor die and wide, high- bandwidth interfaces to stacks ofDRAM ( ,High bandwidth Memory (HBM/HBM2) [20,23] andMulti-Channel DRAM (MCDRAM) [15]).

6 Future GPUs will demandmultiple TB/s of DRAM bandwidth requiring further improvementsin the bandwidth of GPU-specific DRAM this paper, we show that traditional techniques for extend-ing the bandwidth of DRAMs will either add to the system en-ergy, and/or add to the cost/area of DRAM devices. To meet thebandwidth objectives of the future, DRAM devices must be moreenergy- efficient than they are today without significantly sacrific-ing area-efficiency. To architect a DRAM device that meets theseobjectives, we carry out a detailed design space exploration of high- bandwidth DRAM microarchitectures.

7 Using constraints imposedby practical DRAM layouts and insights from GPU memory ac-cess behaviors to inform the design process, we arrive at a DRAMand memory controller architecture, Fine-Grained DRAM (FGDRAM),suited to future high- bandwidth most formidable challenge to scaling the bandwidth of GPUDRAMs is the energy of DRAM accesses. Every system is designedto operate within a fixed maximum power envelope. The energyspent on DRAM access eats into the total power budget availablefor the rest of the system. Traditionally, high-end GPU cards havebeen limited to approximately 300W, of which no more than about20% is budgeted to the DRAM when operating at peak , October 14 18, 2017, Cambridge, MA, USAM.

8 O Connor et TB/sper access energy [pJ/bit]memorysystem (a)Maximum DRAM access energy for given peak bandwidth within60W DRAM power budget01234 HBM2(pJ/b)ActivationOn-die Data MovementI/O(b)HBM2 energy consumptionFigure 1: GPU Memory Power and EnergyFigure 1a shows the DRAM energy per access that can be toleratedat a given peak DRAM bandwidth while remaining within a 60 WDRAM power budget. We see that the energy improvements of on-die stacked High bandwidth Memory (HBM2) over off-chip GDDR5memories have allowed modern GPUs to approach a terabyte-per-second of memory bandwidth at comparable power to previousGPUs that provided less than half the bandwidth using GDDR5.

9 Thisfigure also demonstrates, however, that even withHBM2, systemswith more than 2 TB/s of bandwidth won t be possible within thistraditional power budget. A future exascale GPU with 4 TB/s ofDRAM bandwidth would dissipate upwards of 120 W of energy to access a bit inHBM2is approximately pJ/bit,and, as shown in Figure 1b, it consists largely ofdata movementenergy(the energy to move data from the row buffer to the I/Opins) andactivation energy (the energy to precharge a bank andactivate a row of cells into the row-buffer); the I/O energy accountsfor the small remainder. The activation energy is a function ofthe row size and the row locality of the memory access stream,and it is a significant factor because most GPU workloads accessonly a small fraction of the 1KB row activated inHBM2.

10 The datamovement energy is determined primarily by the distance thatthe data moves on both the DRAM die and the base layer die toreach the I/O pins, the capacitance of these wires, and the rateof switching on this datapath. Since most current DRAM devices,includingHBM2, send data from banks spread across the die to acommon I/O interface, data may travel from the farthest cornersof the device to the I/O PHYs on the base-layer, leading energy fordata movement to dominate the overall of these components of DRAM , a DRAM die is a collection of small units calledgrains,with each grain having a local, dedicated, and narrow data like a traditional DRAM channel, each grain serves a DRAM request in its entirety.


Related search queries