CUDA (Compute Unified Device Architecture) is invented by NVIDIA Corporation Pvt. Ltd. First release of CUDA was in the year mid 2007.
CUDA is a Parallel Computing Platform one of its first kind which enables General Purpose Computing (widely known as GPGPU) in a very efficient and easy way. CUDA enables user to exploit the computing power of Graphics Processing Unit (GPU) present in underlying hardware. Before the inception of CUDA, one had to use DirectX or OpenGL
CUDA consists of 3 main components. One must be aware of following things to use CUDA:
NVIDIA provides a wide range of GPU Cards with device drivers for all possible Operating System Platforms. CUDA Toolkit is a Development Environment which consists of Compilers, Debugging Tools and Profiling Tools for Performance Measurement. CUDA SDK includes basic set of libraries, Sample codes. One can also use different optimized libraries for specific algorithm implementations. Click on the above links for getting more information about these components of CUDA.
CUDA Memory Architecture and CUDA Execution Model is mandatory for CUDA programming. Lets us discuss one by one. CUDA Memory Architecture/Model consists of different memories as a resource required to store the variables in CUDA program. CUDA enables users to generate threads in a very Massive way by using different Memories present in it. Each Memory has its own scope and advantages. One can use a combination of CUDA Memories to speed up the algorithms. Below is the list of CUDA Memories.
- Global Memory
- Constant Memory
- Texture Memory
- Shared Memory
|CUDA Memory Model
Each Memory has its own advantages as well as different scope and location. Refer the below diagram to get complete understanding of CUDA Memory scopes, Lifetime and other important properties.
|CUDA Memory Architecture
CUDA Execution Model consists of 3 main parts; Grids, Thread Blocks and individual Threads. Threads are the smallest execution part and a task can be assigned to it. One Thread Block is a group of threads. GRID can be visualized as one GPU card installed in the Machine. Threads, Thread-Blocks and GRIDs are the Software terms, so lets Map these terms with NVIDIA GPU Card Hardware. Hardware consists of Scalar Processor, Streaming Multiprocessor and Device. Thread are executed by Scalar Processors. Thread Blocks are executed on Multiprocessors. One Kernel function is launched on Device in the form of Grid. Please refer the following diagram to get more graphical details of this mapping we have just discussed.
|CUDA Execution Model
==>Posted By Yogesh B. Desai
Previous Post: Image Processing & CUDA Section Home
Next Post: Addition of two Numbers: A Simple Approach