If you are looking into accelerating your application with a GPGPU solution you have 3 major APIs to choose from: OpenCL, CUDA, DirectCompute. This post will cover a very high level overview of the pros and cons of each API.
- C-like language called C for CUDA for writing the kernel code.
- Kernel code has full pointer support
- Supports C++ constructs such as templating.
- Fairly simple integration API.
- Better fully GPU accelerated libraries currently available such as:
- Thrust (template and algorithm library)
- Cula (linear algebra library)
- CUBLAS (Basic Linear Algebra Subprograms)
- CUFFT (Fast Fourier Transform (FFT) library)
- CUSPARSE (linear algebra subroutines used for handling sparse matrices)
- CURAND (generation of high-quality pseudo-random and quasi-random numbers)
- Larger assortment of higher quality bindings for various languages, both commercial and free.
- Very well documented, plenty of samples available for various platforms.
- CUDA ships with brilliant debugging and visual profiling tools.
- Updates are more regular.
- Has been on the market much longer.
- Restricted to Nvidia GPUs only
- Will not fall back on to CPU if CUDA accelerated hardware is unavailable.
- Harder to get started with, requires setting up the nvcc compiler into your build.
- Much wider range of hardware and platform support. Supports AMD, Nvidia and Intel GPUs equally. Can also be used on newer versions of android phones, iPhones and other devices.
- Can fall-back on to CPU if GPU support does not exist. Though in reality, you would want to create different hardware paths as creating thousands of threads on the CPU is generally not a good idea.
- Supports synchronisation over multiple devices.
- Easy to get started with integrating OpenCL kernels in to your code.
- An open standard and not vendor locked.
- Kernel language based on C99 specification
- Share resources with OpenGL
- Nvidia still hasn’t’ released public drivers to support OpenCL 1.1. Currently only developer ones exist.
- Lacks mature libraries
- Debugging and profiling tools are not as advanced as CUDAs.
- Uses the HLSL syntax.
- API is simple to learn for existing DX programmers.
- Efficient interoperability with D3D graphics resources.
- Easily integrated into an existing DX-based game engine
- Single and familiar API for all hardware vendors and the Windows platform.
- Access to texturing features
- Works with most DX10 and DX11 GPUs.
- A DX11 API that works on Windows 7 and Vista only.
- Lack of libraries, examples and documentation
- Fewer bindings available
- No CPU fallback
In the coming weeks I’ll be preparing some more posts as to integrating GPGPU calculations in to your applications.
If you’re aware of any other differences please leave a comment!