Transcription of Valgrind: A Framework for Heavyweight Dynamic …
1 Valgrind: A Framework for Heavyweight Dynamic BinaryInstrumentationNicholas NethercoteNational ICT Australia, Melbourne, SewardOpenWorks LLP, Cambridge, binary instrumentation (DBI) frameworks make it easyto build Dynamic binary analysis (DBA) tools such as checkersand profilers. Much of the focus on DBI frameworks has been onperformance; little attention has been paid to their capabilities. As aresult, we believe the potential of DBI has not been fully this paper we describe Valgrind, a DBI Framework designedfor building Heavyweight DBA tools. We focus on its unique sup-port forshadow values a powerful but previously little-studiedand difficult-to-implement DBA technique, which requires a toolto shadow every register and memory value with another value thatdescribes it. This support accounts for several crucial design fea-tures that distinguish Valgrind from other DBI frameworks.
2 Be-cause of these features, lightweight tools built with Valgrind runcomparatively slowly, but Valgrind can be used to build more in-teresting, Heavyweight tools that are difficult or impossible to buildwith other DBI frameworks such as Pin and and Subject [Software Engineer-ing]: Testing and Debugging debugging aids, monitors; [Programming Languages]: Processors incremental compilersGeneral TermsDesign, Performance, ExperimentationKeywordsValgrind, Memcheck, Dynamic binary instrumentation , Dynamic binary analysis, shadow values1. IntroductionValgrind is a Dynamic binary instrumentation (DBI) frameworkthat occupies a unique part of the DBI Framework design paper describes how it works, and how it differs from Dynamic binary Analysis and InstrumentationMany programmers use program analysis tools, such as errorcheckers and profilers, to improve the quality of their binary analysis(DBA) tools are one such class of tools;they analyse programs at run-time at the level of machine tools are often implemented usingdynamic binary instru-mentation(DBI), whereby theanalysis codeis added to the originalcode of theclient programat run-time.
3 This is convenient for users,Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. To copy otherwise, to republish, to post on servers or to redistributeto lists, requires prior specific permission and/or a 07 June 11 13, 2007, San Diego, California, 2007 ACM 978-1-59593-633-2/07/0006.. $ no preparation (such as recompiling or relinking) is , it gives 100% instrumentation coverage of user-mode code,without requiring source code. Several genericDBI frameworksex-ist, such as Pin [11], DynamoRIO [3], and Valgrind [18, 15]. Theyprovide a base system that can instrument and run code, plus anenvironment for writing tools that plug into the base performance of DBI frameworks has been studied closely [1,2, 9].
4 Less attention has been paid to their instrumentation capabil-ities, and the tools built with them. This is a shame, as it is the toolsthat make DBI frameworks useful, and complex tools are more in-teresting than simple tools. As a result, we believe the potential ofDBI has not been fully Shadow Value Tools and Heavyweight DBAOne interesting group of DBA tools are those that useshadowvalues. These tools shadow, purely in software, every register andmemory value with another value that says something about it. Wecall theseshadow value tools. Consider the following motivatinglist of shadow value tools; the descriptions are brief but demonstratethat shadow values (a) can be used in a wide variety of ways, and(b) are powerful and [25] uses shadow values to track which bit valuesare undefined ( uninitialised, or derived from undefined values)and can thus detect dangerous uses of undefined values.
5 It is usedby thousands of C and C++ programmers, and is probably the mostwidely-used DBA tool in [20] tracks which byte values are tainted ( froman untrusted source, or derived from tainted values) and canthus detect dangerous uses of tainted [6] andLIFT[23] are similar and Ernst s secret-tracking tool [13] tracks whichbit values are secret ( passwords), and determines how muchinformation about secret inputs is revealed by public [4] tracks each value s type (determined from opera-tions performed on the value) and can thus detect subsequent oper-ations inappropriate for a value of that [7] similarly determines abstract types of byte val-ues, for program comprehension and invariant detection [16] tracks which word values are array pointers, andfrom this can detect bounds [17] creates adynamic dataflow graph, a visualisation ofa program s entire computation; from the graph one can see all theprior operations that contributed to the each value s these tools each shadow value records a simple approxi-mation of each value s history one shadow bit per bit, one1 Purify [8] is a memory-checking tool similar to Memcheck.
6 However,Purify is not a shadow value tool as it does not does not track definednessof values through registers. As a result, it detects undefined value errors lessaccurately than byte per byte, or one shadow word per word which thetool uses in a useful way; in four of the above seven cases, the tooldetects operations on values that indicate a likely program value tools are a perfect example of what we call Heavyweight DBA tools. They involve large amounts of analysisdata that is accessed and updated in irregular patterns. They instru-ment many operations (instructions and system calls) in a varietyof ways for example, loads, adds, shifts, integer and FP opera-tions, and allocations and deallocations are all handled Heavyweight tools,the structure and maintenance of the tool sanalysis data is comparably complex to that of the client program soriginal data.
7 In other words, a Heavyweight tool s execution is ascomplex as the client program s. In comparison, more lightweighttools such as trace collectors and profilers add a lot of highly uni-form analysis code that updates analysis data in much simpler ways( appending events to a trace, or incrementing counters).Shadow value tools are powerful, but difficult to existing ones have slow-down factors of 10x 100x or evenmore, which is high but bearable if they are sufficiently are faster, but applicable in more limited circumstances, aswe will ContributionsThis paper makes the following contributions. Characterises shadow value using shadow valuesare not new, but the similarities they share have received littleattention. This introduction has identified these similarities, andSection 2 formalises them by specifying the requirements ofshadow value tools in detail.
8 Shows how to support shadow values in a DBI 3 describes how Valgrind works, emphasising its fea-tures that support robust Heavyweight tools, such as its coderepresentation, its first-class shadow registers, its events sys-tem, and its handling of threaded programs. This section doesnot delve deeply into well-studied topics, such as code cachemanagement and trace formation, that do not relate to shadowvalues and instrumentation capabilities. Section 4 then showshow Valgrind supports each of the shadow value requirementsfrom Section Shows that DBI frameworks are not all 5 eval-uates Valgrind s ease-of-tool-writing, robustness, instrumenta-tion capabilities and performance. It involves some detailedcomparisons between Valgrind and Pin, and between Mem-check and various other shadow value tools. Section 6 dis-cusses additional related work.
9 These two sections, along withsome details from earlier parts of the paper especially Sec-tion s novel identification of two basic code represen-tations (disassemble-and-resynthesise vs. copy-and-annotate)for DBI show that different DBI frameworks have differentstrengths and weaknesses. In particular, lightweight tools builtwith Valgrind run comparatively slowly, but Valgrind can beused to build more interesting, robust, Heavyweight tools thatare difficult or impossible to build with other DBI frameworkssuch as Pin and contributions show that there is great potential for new DBAtools that help programmers improve their programs, and that Val-2 Two prior publications [18, 15] described earlier versions of , they discussed shadow values in much less detail, and most ofValgrind s internals have changed since they were published: the old x86-specific JIT compiler has been replaced, its basic structure and start-upsequence has changed, its handling of threads, system calls, signals, andself-modifying code has improved, and function wrapping has been provides a good platform for building these tools.
10 At the pa-per s end, Section 7 describes future work and Shadow Value RequirementsThis section describes what a tool must do to support shadowvalues. We start here because (a) it shows that these requirementsare generic and not tied to Valgrind, and (b) knowledge of shadowvalues is crucial to understanding how Valgrind differs from otherDBI frameworks. Not until Sections 3 and 4 will we describeValgrind and show how it supports these requirements. Then inSections 5 and 6 we will explain in detail how Valgrind s supportfor these requirements is unique among DBI are three characteristics of program execution that arerelevant to shadow value tools: (a) programs maintain state,S, afinite set oflocationsthat can hold values ( registers and theuser-mode address space), (b) programs execute operations thatread and writeS, and (c) programs execute operations (allocationsand deallocations) that make memory locations active or group the nine shadow value requirements shadow value tool maintains a shadow state,S ,which contains a shadow value for every value inS.