Transcription of What Every Programmer Should Know About Memory
1 What Every Programmer Should Know About MemoryUlrich DrepperRed Hat, 21, 2007 AbstractAs CPU cores become both faster and more numerous, the limiting factor for most programs isnow, and will be for some time, Memory access. Hardware designers have come up with evermore sophisticated Memory handling and acceleration techniques such as CPU caches butthese cannot work optimally without some help from the Programmer . Unfortunately, neitherthe structure nor the cost of using the Memory subsystem of a computer or the caches on CPUsis well understood by most programmers.
2 This paper explains the structure of Memory subsys-tems in use on modern commodity hardware, illustrating why CPU caches were developed, howthey work, and what programs Should do to achieve optimal performance by utilizing IntroductionIn the early days computers were much simpler. The var-ious components of a system, such as the CPU, Memory ,mass storage, and network interfaces, were developed to-gether and, as a result, were quite balanced in their per-formance. For example, the Memory and network inter-faces were not (much) faster than the CPU at situation changed once the basic structure of com-puters stabilized and hardware developers concentratedon optimizing individual subsystems.
3 Suddenly the per-formance of some components of the computer fell sig-nificantly behind and bottlenecks developed. This wasespecially true for mass storage and Memory subsystemswhich, for cost reasons, improved more slowly relativeto other slowness of mass storage has mostly been dealt withusing software techniques: operating systems keep mostoften used (and most likely to be used) data in main mem-ory, which can be accessed at a rate orders of magnitudefaster than the hard disk. Cache storage was added to thestorage devices themselves, which requires no changes inthe operating system to increase thepurposes of this paper, we will not go into more detailsof software optimizations for the mass storage storage subsystems, removing the main memoryas a bottleneck has proven much more difficult and al-most all solutions require changes to the hardware.
4 To-1 Changes are needed, however, to guarantee data integrity whenusing storage device 2007 Ulrich DrepperAll rights reserved. No redistribution these changes mainly come in the following forms: RAM hardware design (speed and parallelism). Memory controller designs. CPU caches. Direct Memory access (DMA) for the most part, this document will deal with CPUcaches and some effects of Memory controller the process of exploring these topics, we will exploreDMA and bring it into the larger picture.
5 However, wewill start with an overview of the design for today s com-modity hardware. This is a prerequisite to understand-ing the problems and the limitations of efficiently us-ing Memory subsystems. We will also learn About , insome detail, the different types of RAM and illustratewhy these differences still document is in no way all inclusive and final. It islimited to commodity hardware and further limited to asubset of that hardware. Also, many topics will be dis-cussed in just enough detail for the goals of this such topics, readers are recommended to find moredetailed it comes to operating-system-specific details andsolutions, the text exclusively describes Linux.
6 At notime will it contain any information About other author has no interest in discussing the implicationsfor other OSes. If the reader thinks s/he has to use adifferent OS they have to go to their vendors and demandthey write documents similar to this last comment before the start. The text contains anumber of occurrences of the term usually and other,similar qualifiers. The technology discussed here existsin many, many variations in the real world and this paperonly addresses the most common, mainstream is rare that absolute statements can be made About thistechnology, thus the StructureThis document is mostly for software developers.
7 It doesnot go into enough technical details of the hardware to beuseful for hardware-oriented readers. But before we cango into the practical information for developers a lot ofgroundwork must be that end, the second section describes random-accessmemory (RAM) in technical detail. This section s con-tent is nice to know but not absolutely critical to be ableto understand the later sections. Appropriate back refer-ences to the section are added in places where the contentis required so that the anxious reader could skip most ofthis section at third section goes into a lot of details of CPU cachebehavior.
8 Graphs have been used to keep the text frombeing as dry as it would otherwise be. This content is es-sential for an understanding of the rest of the 4 describes briefly how virtual Memory is imple-mented. This is also required groundwork for the 5 goes into a lot of detail About Non UniformMemory Access (NUMA) 6 is the central section of this paper. It brings to-gether all the previous sections information and givesprogrammers advice on how to write code which per-forms well in the various situations. The very impatientreader could start with this section and, if necessary, goback to the earlier sections to freshen up the knowledgeof the underlying 7 introduces tools which can help the program-mer do a better job.
9 Even with a complete understandingof the technology it is far from obvious where in a non-trivial software project the problems are. Some tools section 8 we finally give an outlook of technologywhich can be expected in the near future or which mightjust simply be good to ProblemsThe author intends to update this document for sometime. This includes updates made necessary by advancesin technology but also to correct mistakes. Readers will-ing to report problems are encouraged to send email tothe author. They are asked to include exact version in-formation in the report.
10 The version information can befound on the last page of the would like to thank Johnray Fuller and the crew at LWN(especially Jonathan Corbet for taking on the dauntingtask of transforming the author s form of English intosomething more traditional. Markus Armbruster provideda lot of valuable input on problems and omissions in this DocumentThe title of this paper is an homage to David Goldberg sclassic paper What Every Computer Scientist ShouldKnow About Floating-Point Arithmetic [12]. This pa-per is still not widely known, although it Should be aprerequisite for anybody daring to touch a keyboard forserious word on the PDF: xpdf draws some of the diagramsrather poorly.)