Thursday 28 May 2015

What is CUDA?


CUDA (Compute Unified Device Architecture) is invented by NVIDIA Corporation Pvt. Ltd. First release of CUDA was in the year mid 2007.
CUDA is a Parallel Computing Platform one of its first kind which enables General Purpose Computing (widely known as GPGPU) in a very efficient and easy way. CUDA enables user to exploit the computing power of Graphics Processing Unit (GPU) present in underlying hardware. Before the inception of CUDA, one had to use DirectX or OpenGL
Application Programming Interfaces (API's) for GPGPU algorithm implementation. It was really tough task due to complexity involved in it. Programming languages like C, C++, FORTRAN can be used to implement CUDA programming model which is very easy as compared to OpenGL and DirectX models. Number of APIs, Library support are available for users. Nowadays, CUDA is also supported by Python (PyCUDA, Copperhead), JAVA, F#, .NET Platforms and Numerical Analytics (MATLAB, LabVIEW, Mathematica). Day by day CUDA is becoming more High level language.

CUDA consists of 3 main components. One must be aware of following things to use CUDA:
  1. CUDA Capable  GPU Card & NVIDIA Graphics card drivers
  2. CUDA Toolkit
  3. CUDA  Software Development Kit (SDK)

               NVIDIA provides a wide range of GPU Cards with device drivers for all possible Operating System Platforms. CUDA Toolkit is a Development Environment which consists of Compilers, Debugging Tools and Profiling Tools for Performance Measurement. CUDA SDK includes basic set of libraries, Sample codes. One can also use different optimized libraries for specific algorithm implementations. Click on the above links for getting more information about these components of CUDA.

               CUDA Memory Architecture and CUDA Execution Model is mandatory for CUDA programming. Lets us discuss one by one. CUDA Memory Architecture/Model consists of different memories as a resource required to store the variables in CUDA program. CUDA enables users to generate threads in a very Massive way by using different Memories present in it. Each Memory has its own scope and advantages. One can use a combination of CUDA Memories to speed up the algorithms. Below is the list of CUDA Memories.
  1. Global Memory
  2. Constant Memory
  3. Texture Memory
  4. Shared Memory
  5. Registers
You can refer following diagram for the visualization Purpose of CUDA Memories.
CUDA Memories
CUDA Memory Model

Each Memory has its own advantages as well as different scope and location. Refer the below diagram to get complete understanding of CUDA Memory scopes, Lifetime and other important properties.

CUDA Memory Details
CUDA Memory Architecture

               CUDA Execution Model consists of 3 main parts; Grids, Thread Blocks and individual Threads. Threads are the smallest execution part and a task can be assigned to it. One Thread Block is a group of threads. GRID can be visualized as one GPU card installed in the Machine. Threads, Thread-Blocks and GRIDs are the Software terms, so lets Map these terms with NVIDIA GPU Card Hardware. Hardware consists of Scalar Processor, Streaming Multiprocessor and Device. Thread are executed by Scalar Processors. Thread Blocks are executed on Multiprocessors. One Kernel function is launched on Device in the form of Grid. Please refer the following diagram to get more graphical details of this mapping we have just discussed.

CUDA Thread_Thread-Block_Grid Mapping
CUDA Execution Model

==>Posted By Yogesh B. Desai

You are Visitor Number:


  1. Harsh, I am glad to here that from you. Please stay tuned for more of technology related articles as this blog is in its initial stage.Also, please do share with your friends too.
    Thank you once again.

  2. Very nice and simple article.

    1. Thank you, I am glad to hear it. Stay tuned for more such articles.