CUDA by Example: An Introduction to General-Purpose GPU ...

PtgptgCUDA by ExampleptgThis page intentionally left blank ptgCUDA by ExampleAn Introduction to general - P u rPose GPu ProGrAmmInGJAson sAnders edwArd KAndrotUpper Saddle River, NJ Boston Indianapolis San FranciscoNew York Toronto Montreal London Munich Paris MadridCapetown Sydney Tokyo Singapore Mexico CityptgMany of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the publisher was aware of a trademark claim, the designations have been printed with initial capital letters or in all authors and publisher have taken care in the preparation of this book, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions.

No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained makes no warranty or representation that the techniques described herein are free from any Intellectual Property claims. The reader assumes all risk of any such claims based on his or her use of these publisher offers excellent discounts on this book when ordered in quantity for bulk purchases or special sales, which may include electronic versions and/or custom covers and content particular to your business, training goals, marketing focus, and branding interests. For more information, please Corporate and Government Sales(800) 382-3419 sales outside the United States, please contact:International us on the Web: of Congress Cataloging-in-Publication DataSanders, Jason.

CUDA by example : an Introduction to General-Purpose GPU programming /Jason Sanders, Edward Kandrot. p. cm. Includes index. ISBN 978-0-13-138768-3 (pbk. : alk. paper) 1. Application software Development. 2. Computer architecture. 3. Parallel programming (Computer science) I. Kandrot, Edward. II. Title. 2010 '75 dc22 2010017618 Copyright 2011 NVIDIA CorporationAll rights reserved. Printed in the United States of America. This publication is protected by copy-right, and permission must be obtained from the publisher prior to any prohibited reproduction, storage in a retrieval system, or transmission in any form or by any means, electronic, mechanical, photocopying, recording, or likewise.

For information regarding permissions, write to:Pearson Education, Inc. Rights and Contracts Department501 Boylston Street, Suite 900 Boston, MA 02116 Fax: (617) 671-3447 ISBN-13: 978-0-13-138768-3 ISBN-10: 0-13-138768-5 Text printed in the United States on recycled paper at Edwards Brothers in Ann Arbor, printing, July 2010ptgTo our f amil ies and fr iends , w ho gave us endl es s supp or t. To our reader s , w ho w il l br ing us the future. And to the teachers who taught our readers to page intentionally left blank ptgviiForeword .. xiii Preface .. xv xvii About the Authors .. xix1 Why CUDA? Why NoW? Objectives .. Age of Parallel Processing.

Central Processing Units .. Rise of GPU Computing .. A Brief History of GPUs .. Early GPU Computing .. What Is the CUDA Architecture? .. Using the CUDA Architecture.. of CUDA .. Imaging .. Fluid Dynamics .. Science .. Review ..11 Contents ptgviii contents2 GettiNG StArteD Objectives .. Environment .. Graphics Processors .. Device Driver .. Development Toolkit .. C Compiler .. Review .. 193iNtroDUCtioN to CUDA C Objectives .. First Program .. , World! .. Kernel Call .. Parameters .. Devices .. Device Properties .. Review .. 354 PArAllel ProGrAmmiNG iN CUDA C Objectives .. Parallel Programming .. Summing Vectors.

A Fun Example .. Review .. 57 ptg contentsix5 threAD CooPerAtioN Objectives .. Parallel Blocks .. Vector Sums: Redux .. GPU Ripple Using Threads .. Memory and Synchronization .. Product .. Product Optimized (Incorrectly) .. Memory Bitmap .. Review .. 946 CoNStANt memory AND eveNtS Objectives .. Memory .. Tracing Introduction .. Tracing on the GPU .. Tracing with Constant Memory .. with Constant Memory .. Performance with Events .. Measuring Ray Tracer Performance .. Review .. 1147textUre memory Chapter Objectives .. Texture Memory Overview .. 116 ptgContentsx7. 3simulating Heat transfer .. 1177. 3 .1simple Heating Model.

1177. 3 . 2 Computing temperature Updates .. 1197. 3 . 3 Animating the simulation .. 1217. 3 . 4 Using texture Memory .. 1257. 3 . 5 Using two-Dimensional texture Memory .. 1317. 4 Chapter Review .. 1378 Graphics interoperability objectives .. Interoperation .. Ripple with Graphics Interoperability .. the GPUAnimBitmap structure .. GPU Ripple Redux .. transfer with Graphics Interop .. Interoperability .. Review .. 1619atomics objectives .. 1649. 2 Compute Capability .. 1649. the Compute Capability of nVIDIA GPUs .. 1649. 2. 2 Compiling for a Minimum Compute Capability .. 1679. 3 Atomic operations overview .. 1689. 4 Computing Histograms.

1709. 4 .1 CPU Histogram .. 1719. 4 . 2 GPU Histogram .. Review .. 183 ptg contentsxi10 StreAmS Objectives .. Host Memory .. Streams .. a Single CUDA Stream .. Multiple CUDA Streams .. Work Scheduling .. Multiple CUDA Streams Effectively .. Review .. 21111 CUDA C oN mUltiPle GPUS Objectives .. Host Memory .. Zero-Copy Dot Product .. Zero-Copy Performance .. Multiple GPUs .. Pinned Memory .. Review .. 23512the FiNAl CoUNtDoWN Objectives .. Tools .. CUDA Toolkit .. CUFFT .. CUBLAS .. NVIDIA GPU Computing SDK .. 240 nVIDIA Performance Primitives .. Debugging CUDA C .. CUDA Visual Profiler .. Written Resources.

Programming Massively Parallel Processors: A Hands-on Approach .. CUDA U .. nVIDIA Forums .. Resources .. CUDA Data Parallel Primitives Library .. CULA tools .. Language Wrappers .. Review .. 248 AAdvAnced Atomics Product Revisited .. 250 Atomic Locks .. 251 Dot Product Redux: Atomic Locks .. a Hash table .. 258 Hash table overview .. 259 A CPU Hash table .. 261 Multithreaded Hash table .. 267 A GPU Hash table .. 268 Hash table Performance .. Review .. 277 Index .. 279 ptgxiiiForewordRecent activities of major chip manufacturers such as NVIDIA make it more evident than ever that future designs of microprocessors and large HPC systems will be hybrid/heterogeneous in nature.

These heterogeneous systems will rely on the integration of two major types of components in varying proportions: multi- and many-core CPU technology : The number of cores will continue to escalate because of the desire to pack more and more components on a chip while avoiding the power wall, the instruction-level parallelism wall, and the memory hardware and massively parallel accelerators : For example, GPUs from NVIDIA have outpaced standard CPUs in floating-point performance in recent years. Furthermore, they have arguably become as easy, if not easier, to program than multicore relative balance between these component types in future designs is not clear and will likely vary over time. There seems to be no doubt that future generations of computer systems, ranging from laptops to supercomputers, will consist of a composition of heterogeneous components.

CUDA by Example: An Introduction to General-Purpose GPU ...

Tags:

Information

Transcription of CUDA by Example: An Introduction to General-Purpose GPU ...

Related search queries

CUDA by Example: An Introduction to General-Purpose GPU ...

Tags:

Information

Documents from same domain

Related documents

Related search queries