Transcription of NVIDIA CUDA Compute Unified Device Architecture
1 Version NVIDIA cuda Compute Unified Device Architecture ii cuda Programming Guide Version cuda Programming Guide Version iii Chapter 1..1 ..1 cuda : GPU ..3 ..6 Chapter 2..7 ..7 .. ! ..7 ..8 ..10 Chapter 3..13 SIMD ..13 ..14 ..15 ..16 ..16 Chapter 4. (API)..17 C ..17 ..17 ..18 .. ! ..19 iv cuda Programming Guide Version !
2 ! ..21 .. ! NVCC ..22 #pragma ..23 ..23 char1, uchar1, char2, uchar2, char3, uchar3, char4, uchar4, short1, ushort1, short2, ushort2, short3, ushort3, short4, ushort4, int1, uint1, int2, uint2, int3, uint3, int4, uint4, long1, ulong1, long2, ulong2, long3, ulong3, long4, ulong4, float1, float2, float3, dim3 ..23 .. ! ..24 ..24 ..24 ..25 cuda ..25 ..26 ..26 ..26 ..27 ..27 ..27 cuda Programming Guide Version v ..27 cuda ..28 ..28 .. ! ..29 .. ! .. ! OpenGL.
3 30 Direct3D ..30 ..31 .. ! ..32 ..32 ..34 ..34 ..35 OpenGL ..37 Direct3D ..37 ..37 .. ! ..39 .. ! ..40 ..40 .. ! ..42 ..43 ..44 OpenGL ..44 vi cuda Programming Guide Version Direct3D ..44 Chapter 5..47 ..47 ..47 ..47 ..48 ..49 ..49 ..49 .. ! ..55 ..55 .. ! .. !.
4 62 ..63 ..63 ..64 Chapter 6..67 .. ! ..69 ..71 Mul()..71 Muld()..71 Appendix A.. ! .. ! ..74 Appendix B..77 ..77 ..80 Appendix C..83 cuda Programming Guide Version vii .. ! atomicAdd()..83 atomicSub()..83 atomicExch()..83 atomicMin()..84 atomicMax()..84 atomicInc()..84 atomicDec()..84 atomicCAS()..84 ..85 atomicAnd()..85 atomicOr()..85 atomicXor()..85 Appendix D. API ..87 ..87 cudaGetDeviceCount()..87 cudaSetDevice()..87 cudaGetDevice()..87 cudaGetDeviceProperties()..88 cudaChooseDevice()..89 ..89 cudaThreadSynchronize()..89 cudaThreadExit()..89 ..89 cudaStreamCreate()..89 cudaStreamQuery().
5 89 cudaStreamSynchronize()..89 cudaStreamDestroy()..89 .. ! cudaEventCreate()..90 cudaEventRecord()..90 viii cuda Programming Guide Version cudaEventQuery()..90 cudaEventSynchronize()..90 cudaEventDestroy()..90 cudaEventElapsedTime()..90 .. ! cudaMalloc()..91 cudaMallocPitch()..91 cudaFree()..91 cudaMallocArray()..92 cudaFreeArray()..92 cudaMallocHost()..92 cudaFreeHost()..92 cudaMemset()..92 cudaMemset2D()..92 cudaMemcpy()..93 cudaMemcpy2D()..93 cudaMemcpyToArray()..94 cudaMemcpy2 DToArray()..94 cudaMemcpyFromArray()..95 cudaMemcpy2 DFromArray()..95 cudaMemcpyArrayToArray()..96 cudaMemcpy2 DArrayToArray()..96 cudaMemcpyToSymbol().
6 96 cudaMemcpyFromSymbol()..96 cudaGetSymbolAddress()..97 cudaGetSymbolSize()..97 ..97 cudaCreateChannelDesc()..97 cudaGetChannelDesc()..97 cudaGetTextureReference()..97 cuda Programming Guide Version ix cudaBindTexture()..98 cudaBindTextureToArray()..98 cudaUnbindTexture()..98 cudaGetTextureAlignmentOffset()..98 cudaCreateChannelDesc()..98 cudaBindTexture()..99 cudaBindTextureToArray()..99 cudaUnbindTexture()..99 .. ! cudaConfigureCall()..100 cudaLaunch()..100 cudaSetupArgument()..100 OpenGL ..100 cudaGLRegisterBufferObject()..100 cudaGLMapBufferObject()..101 cudaGLUnmapBufferObject()..101 cudaGLUnregisterBufferObject()..101 Direct3D ..101 cudaD3D9 Begin()..101 cudaD3D9 End()..101 cudaD3D9 RegisterVertexBuffer()..101 cudaD3D9 MapVertexBuffer()..101 cudaD3D9 UnmapVertexBuffer().
7 102 cudaD3D9 UnregisterVertexBuffer()..102 cudaD3D9 GetDevice()..102 ..102 cudaGetLastError()..102 cudaGetErrorString()..102 Appendix E. API ..103 .. ! x cuda Programming Guide Version cuInit()..103 ..103 cuDeviceGetCount()..103 cuDeviceGet()..103 cuDeviceGetName()..103 cuDeviceTotalMem()..104 cuDeviceComputeCapability()..104 cuDeviceGetAttribute()..104 cuDeviceGetProperties()..105 .. ! cuCtxCreate()..106 cuCtxAttach()..106 cuCtxDetach()..106 cuCtxGetDevice()..106 cuCtxSynchronize()..106 .. ! cuModuleLoad()..106 cuModuleLoadData()..107 cuModuleLoadFatBinary()..107 cuModuleUnload().
8 107 cuModuleGetFunction()..107 cuModuleGetGlobal()..107 cuModuleGetTexRef()..108 ..108 cuStreamCreate()..108 cuStreamQuery()..108 cuStreamSynchronize()..108 cuStreamDestroy()..108 .. ! cuEventCreate()..108 cuEventRecord()..108 cuda Programming Guide Version xi cuEventQuery()..109 cuEventSynchronize()..109 cuEventDestroy()..109 cuEventElapsedTime()..109 .. ! cuFuncSetBlockShape()..109 cuFuncSetSharedSize()..110 cuParamSetSize()..110 cuParamSeti()..110 cuParamSetf()..110 cuParamSetv()..110 cuParamSetTexRef()..110 cuLaunch()..111 cuLaunchGrid()..111 ..111 cuMemGetInfo()..111 cuMemAlloc()..111 cuMemAllocPitch()..111 cuMemFree()..112 cuMemAllocHost()..112 cuMemFreeHost()..112 cuMemGetAddressRange()..112 cuArrayCreate()..113 cuArrayGetDescriptor()..114 cuArrayDestroy().
9 114 cuMemset()..114 cuMemset2D()..114 cuMemcpyHtoD()..115 cuMemcpyDtoH()..115 cuMemcpyDtoD()..115 cuMemcpyDtoA()..116 xii cuda Programming Guide Version cuMemcpyAtoD()..116 cuMemcpyAtoH()..116 cuMemcpyHtoA()..116 cuMemcpyAtoA()..117 cuMemcpy2D()..117 ..119 cuTexRefCreate()..119 cuTexRefDestroy()..119 cuTexRefSetArray()..119 cuTexRefSetAddress()..120 cuTexRefSetFormat()..120 cuTexRefSetAddressMode()..120 cuTexRefSetFilterMode()..120 cuTexRefSetFlags()..121 cuTexRefGetAddress()..121 cuTexRefGetArray()..121 cuTexRefGetAddressMode()..121 cuTexRefGetFilterMode()..121 cuTexRefGetFormat()..122 cuTexRefGetFlags()..122 OpenGL ..122 cuGLInit()..122 cuGLRegisterBufferObject().
10 122 cuGLMapBufferObject()..122 cuGLUnmapBufferObject()..122 cuGLUnregisterBufferObject()..123 Direct3D ..123 cuD3D9 Begin()..123 cuD3D9 End()..123 cuD3D9 RegisterVertexBuffer()..123 cuD3D9 MapVertexBuffer()..123 cuda Programming Guide Version xiii cuD3D9 UnmapVertexBuffer()..123 cuD3D9 UnregisterVertexBuffer()..123 cuD3D9 GetDevice()..124 Appendix F..125 ..126 .. ! ..128 xiv cuda Programming Guide Version Figure 1-1..1 Figure 1-2..2 Figure 1-3..3 Figure 1-4..4 Figure 1-5..5 Figure 2-1..9 Figure 2-2..11 Figure 3-1..14 Figure 5-1..52 Figure 5-2..53 Figure 5-3.. ! Figure 5-4.. ! Figure 5-5..59 Figure 5-6.