How GPUs Work - NVIDIA

96 ComputerHOW THINGS WORKIn the early 1990s, ubiquitousinteractive 3D graphics was stillthe stuff of science fiction. By theend of the decade, nearly everynew computer contained a graph-ics processing unit (GPU) dedicated toproviding a high-performance, visu-ally rich, interactive 3D dramatic shift was the in-evitable consequence of consumerdemand for videogames, advances inmanufacturing technology, and theexploitation of the inherent paral-lelism in the feed-forward graphicspipeline. Today, the raw computa-tional power of a GPU dwarfs that ofthe most powerful CPU, and the gap issteadily , gpus have movedaway from the traditional fixed-func-tion 3D graphics pipeline toward a flexible general-purpose compu-tational engine.

Today, gpus can implement many parallel algorithmsdirectly using graphics algorithms that leverageall the underlying computationalhorsepower often achieve tremendousspeedups. Truly, the GPU is the firstwidely deployed commodity desktopparallel GRAPHICS PIPELINEThe task of any 3D graphics systemis to synthesize an image from adescription of a scene 60 times persecond for real-time graphics such asvideogames. This scene contains thegeometric primitives to be viewed aswell as descriptions of the lights illu-minating the scene, the way that eachobject reflects light, and the viewer sposition and designers traditionally haveexpressed this image-synthesis processas a hardware pipeline of specializedstages.

Here, we provide a high-leveloverview of the classic graphicspipeline; our goal is to highlight thoseaspects of the real-time rendering cal-culation that allow graphics applica-tion developers to exploit modernGPUs as general-purpose parallelcomputation inputMost real-time graphics systemsassume that everything is made of tri-angles, and they first carve up any morecomplex shapes, such as quadrilateralsor curved surface patches, into trian-gles. The developer uses a computergraphics library (such as OpenGL orDirect3D) to provide each triangle tothe graphics pipeline one vertex at atime; the GPU assembles vertices intotriangles as transformationsA GPU can specify each logicalobject in a scene in its own locallydefined coordinate system, which isconvenient for objects that are natu-rally defined hierarchically.

This con-venience comes at a price: beforerendering, the GPU must first trans-form all objects into a common coor-dinate system. To ensure that trianglesaren t warped or twisted into curvedshapes, this transformation is limitedto simple affine operations such asrotations, translations, scalings, andthe like. As the Homogeneous Coordinates sidebar explains, by representing eachvertex in homogeneous coordinates,the graphics system can perform theentire hierarchy of transformationssimultaneously with a single matrix-vector multiply. The need for efficienthardware to perform floating-pointvector arithmetic for millions of ver-tices each second has helped drive theGPU parallel-computing output of this stage of thepipeline is a stream of triangles, allexpressed in a common 3D coordinatesystem in which the viewer is locatedat the origin, and the direction of viewis aligned with the each triangle is in a globalcoordinate system, the GPU can com-pute its color based on the lights in thescene.

As an example, we describe thecalculations for a single-point lightsource (imagine a very small lightbulb).The GPU handles multiple lights bysumming the contributions of eachindividual light. The traditional graph-ics pipeline supports the Phong light-ing equation (B-T. Phong, Illumina-tion for Computer-Generated Images, Comm. ACM, June 1975, pp. 311-317), a phenomenological appearancemodel that approximates the look ofplastic. These materials combine a dulldiffuse base with a shiny specular high-How gpus Work David Luebke, NVIDIA ResearchGreg Humphreys, University of VirginiaGPUs have moved away from the traditional fixed-function 3D graphics pipeline toward a flexible general-purpose computational 23/1/07 12:44 PM Page 96light.

The Phong lighting equationgives the output color C= KdLi(N L)+ KsLi(R V)s. Table 1 defines each term in theequation. The mathematics here isn tas important as the computation sstructure; to evaluate this equationefficiently, gpus must again operatedirectly on vectors. In this case, werepeatedly evaluate the dot product oftwo vectors, performing a four-com-ponent multiply-and-add simulationThe graphics pipeline next projectseach colored 3D triangle onto the vir-tual camera s film plane. Like themodel transformations, the GPU doesthis using matrix-vector multiplication,again leveraging efficient vector opera-tions in hardware. This stage s outputis a stream of triangles in screen coor-dinates, ready to be turned into visible screen-space triangleoverlaps some pixels on the display;determining these pixels is called ras-terization.

GPU designers have incor-porated many rasterizatiom algo-rithms over the years, which all ex-ploit one crucial observation: Eachpixel can be treated independentlyfrom all other pixels. Therefore, themachine can handle all pixels in par-allel indeed, some exotic machineshave had a processor for each inherent independence has ledGPU designers to build increasinglyparallel sets of x t u r i n gThe actual color of each pixel canbe taken directly from the lighting cal-culations, but for added realism,images called textures are oftendraped over the geometry to give theillusion of detail. gpus store these tex-tures in high-speed memory, whicheach pixel calculation must access todetermine or modify that pixel s practice, the GPU might requiremultiple texture accesses per pixel tomitigate visual artifacts that can resultwhen textures appear either smalleror larger on screen than their nativeresolution.

Because the access patternto texture memory is typically veryregular (nearby pixels tend to accessnearby texture image locations), spe-cialized cache designs help hide thelatency of memory surfacesIn most scenes, some objectsobscure other objects. If each pixelwere simply written to display mem-ory, the most recently submitted tri-angle would appear to be in , correct hidden surface removalwould require sorting all trianglesfrom back to front for each view, anexpensive operation that isn t evenalways possible for all scenes. All modern gpus provide a depthbuffer, a region of memory that storesthe distance from each pixel to theviewer. Before writing to the display,the GPU compares a pixel s distance tothe distance of the pixel that s alreadypresent, and it updates the displaymemory only if the new pixel is GRAPHICS PIPELINE,EVOLVEDGPUs have evolved from a hardwiredimplementation of the graphics pipelineto a programmable computational sub-strate that can support it.

Fixed-func-tion units for transforming vertices andtexturing pixels have been subsumed bya unified grid of processors, or shaders,that can perform these tasks and muchmore. This evolution has taken placeover several generations by graduallyreplacing individual pipeline stages with increasingly programmable example, the NVIDIA GeForce 3,launched in February 2001, introducedprogrammable vertex shaders. Theseshaders provide units that the pro-grammer can use for performingmatrix-vector multiplication, exponen-tiation, and square root calculations, asFebruary 200797 Homogeneous CoordinatesPoints in three dimensions are typically represented as a triple (x,y,z). Incomputer graphics, however, it s frequently useful to add a fourth coordinate,w,to the point representation.

To convert a point to this new representation,we set w= 1. To recover the original point, we apply the transformation(x,y,z,w) > (x/w, y/w, z/w).Although at first glance this might seem like needless complexity, it has sev-eral significant advantages. As a simple example, we can use the otherwiseundefined point (x,y,z,0) to represent the direction vector (x,y,z). With this uni-fied representation for points and vectors in place, we can also perform severaluseful transformations such as simple matrix-vector multiplies that would oth-erwise be impossible. For example, the multiplicationcan accomplish translation by an amount Dx, Dy, , these matrices can encode useful nonlinear transformationssuch as perspective x010 y001 z000 1 xyzw Table 1.

How GPUs Work - NVIDIA

Tags:

Information

Transcription of How GPUs Work - NVIDIA

Related search queries

How GPUs Work - NVIDIA

Tags:

Information

Documents from same domain

Rreg Rmb Rpkg Rbump Lreg Lmb Lpkg Lbump Modeling and ...

Related documents

NVIDIA A100 Tensor Core GPU Architecture

nvidia-smi.txt Page 1

Fabric Manager for NVIDIA NVSwitch Systems

NVIDIA A100 | Tensor Core GPU

NVIDIA A100 | Tensor Core GPU

NVIDIA CUDA Installation Guide for Microsoft Windows

Related search queries