Nvidia CUDA

From Lyceum
Jump to navigation Jump to search


You most likely have some idea of what CUDA is all about or you wouldn't be here, but for a general introduction see:



In short, CUDA provides a language (CUDA C), compiler, SDK and run time environment to allow you to write general purpose C code which is executed in a massively parallel manner by taking advantage of the capabilities of NVIDIA GPUs. It is actually quite easy to write code for and requires only a basic or intermediate level proficiency with C programming and does not require any knowledge of OpenGL or other graphics languages whatsoever. If you have an NVIDIA 8 series or newer card, including many mobile chipsets, it most likely supports CUDA and you can start writing parallel code to run on the device more easily than you may realize. It is sitting there quietly, waiting for you to dive in and take advantage of it.


In comparison to Nvidia's CUDA, there is also OpenCL. Newer Nvidia GPUs, when using the correct drivers, also support OpenCL


Nvidia is one of the companies developing OpenCL and you can find more information and driver at the Nvidia OpenCL page:


AMD also supports OpenCL and describes the implementation of the language at:


A short tutorial in OpenCL can be found here.

(You can use the GPU Caps utility to see if your GPU supports OpenCL.)


Oak Ridge National Lab CUDA tutorials are available at the OLCF site.

The 20 part tutorial series "Supercomputing for the Masses" by Rob Farber (a senior scientist at Pacific Northwest National Laboratory) is available here and also described in the Linux Journal article.

Here is a basic CUDA intro tutorial

A 2011 Linux Journal article by Alejandro Segovia is available here

Of course, there is also the Official Programming Guide as a definitive reference.

I would also recommend CUDA by Example: An Introduction to General Purpose GPU Programming by Jason Sanders and Edward Kandrot as an excellent way to get started. I am reading this now and find it quite helpful.

Additionally, there are of course, the SDK code examples you can download and get started with immediately.

CUDA Toolkit 5.0 Update


Nvidia significantly changed how the SDK, Toolkit and CUDA driver were packaged with the 5.0 release. This release combines them all into a single bundle and simplifies the install process.

You can extract the separate packages if needed with:

./cuda_5.0.35_linux_64_rhel6.x-1.run --extract=/home/you/NVIDIA_CUDA_5-0_Extracted/

The install guide has been updated and the overall install process streamlined. It seems some of the issues with missing symlinks to dev libraries have been resolved, though if you experience issues the 4.0 and 3.2 notes below may still be helpful. (See the CUDA_Samples_Release_Notes.txt bundled with cuda-samples for library and sym link info.)

To compile all the code examples, you will likely need to install various packages See install guide at list of packages in previous version section and add freeglut-devel glu-devel

To keep compiling when some targets can not be makde use: make -k

Note: Installing Mesa may overwrite the /usr/lib/libGL.so that was previously installed by the NVIDIA driver, so a reinstallation of the NVIDIA driver might be required after installing these libraries.

Also note the Linux version includes the new Eclipse based version of Parallel Nsight for debugging and profiling, which is installed by default as 'insight'

Compiler version

Note: While you can run the installer with -override compiler to install the toolkit, you may run into trouble compiling the code samples. I usually just install the version of gcc the toolkit expects.

For Suse 12.2/12.3, you will also need to install gcc46, gcc46-devel and gcc46-static using a repo from software.opensuse.com. You can then configure with update-alternatives:

update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-4.6 60 --slave /usr/bin/g++ g++ /usr/bin/g++-4.6
update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-4.7 40 --slave /usr/bin/g++ g++ /usr/bin/g++-4.7
update-alternatives --config gcc

In my case, using gcc 4.6 allowed the CUDA code samples to be built, though the linker reported the following for many of them:

x86_64-suse-linux/bin/ld: error in /usr/lib64/gcc/x86_64-suse-linux/4.6.3/crtend.o(.eh_frame); no .eh_frame_hdr table will be created

Regardless, they all appear to have compiled and run successfully.

CUDA Toolkit 4.0 Update

Documentation is now at:


Release notes are now here

Getting Started Guide is now here

Earlier versions, release notes, etc. are at http://developer.nvidia.com/cuda-toolkit-archive

CUDA SDK 3.2 Update

UPDATE I just did a fresh install of Suse 11.3 64-bit and the new CUDA SDK 3.2 and found the following issues:

  • Unfortunately, the CUDA SDK 3.2 release notes do not include the same mention of the missing symlinks as do the 3.1 release notes. Follow the procedure in the 3.1 notes, which is also described below.
  • While I had hoped the new CUDA SDK 3.2 would be compatible with gcc 4.5, it is not as described in this Nvidia Forum thread. You will still need to install and configure gcc43 and gcc43-g++ as alternatives as described below in the section GCC Version Issues.
  • Several X11 development libraries are required. On Suse 11.3 install these and their dependencies with the following:
zypper install libXi6-devel libXmu-devel xorg-x11-libXmu-devel xorg-x11-libXext-devel
  • For RHEL the packages are:
yum install libXi-devel libXmu-devel libXext-devel freeglut freeglut-devel

Make Errors:

The error:

86_64-suse-linux/bin/ld: cannot find -lX11

Is caused from a missing xorg-x11-libXext-devel

Errors like:

iomanip(64): error: expected an expression

Are likely from using an incompatible version of gcc (like 4.5). This is best resolved by installed gcc43 as an alternative (below) or as described on the Nvidia post:

A simple workaround to compiling the rest of the SDK would be to remove the folder Interval from the SDK's src folder and put it somewhere in your home directory where you will remember it later.

The complete list of programs that don't compile from the 3.2 SDK with GCC 4.5.1 are:

Errors regarding:


Can be solved by installing libtool

The remaining setup for SDK 3.2 is as given for the previous version.

Install / Compile Issues

NOTE: Please see Nvidia-Settings for information on the Nvidia driver itself, module error codes, install options, etc.

Firstly, follow the very detailed Getting Started Linux guide to get your development environment setup. The documentation is quite good and provides all the basic info on installing the Nvidia driver, the CUDA Toolkit and SDK and setting up your paths for locating the binaries and libraries. All required files are available on the CUDA downloads page.

Note: Ensure your NVIDIA driver meets or exceeds the version required by the SDK. The version in your distro's restricted driver (non-OSS) repository may not be sufficient. I recommend installing the NVIDIA driver from the CUDA downloads area to prevent any potential trouble.

Missing Symlinks

(Applies to Red Hat, CentOS, OpenSUSE, etc.)

It is important to read the SDK 3.1 SDK release notes - really, stop now and read them. You will fine under Section III. (b) Known Issues on CUDA SDK for Linux that often there are a few key libraries which need to have symlinks created so the linker can find them. In my case this was required for libglut and libGLU. If missing you will see errors such as:

/usr/bin/ld: cannot find -lglut
/usr/bin/ld: cannot find -lGLU

In either case, create the required symlinks:

#ln -s /usr/lib/libglut.so.3 /usr/lib/libglut.so
#ln -s /usr/lib/libGLU.so.1 /usr/lib/libGLU.so
#ln -s /usr/lib/libX11.so.6 /usr/lib/libX11
(Or /usr/lib64/ if running 64-bit)

If no write access to /usr/lib, create in another location and modify -L in Makefile (or add the path to the symlink to /etc/ld.so.conf and run ldconfig).

If you get a linking error: cannot find -lcuda then create this additional symlink (your library version may vary slightly):

#ln -s /usr/lib/libcuda.so.256.35 /usr/lib/libcuda.so

Compiling Order - Do what the Guide Says

Being an eager beaver, I decided to just compile the deviceQuery program first to make sure everything was working. This was a bad decision as it, like many others, requires shared objects which had not been created yet. As a result, I was getting compile errors such as:

[email protected]:~/NVIDIA_GPU_Computing_SDK/C/src/deviceQuery> make
deviceQuery.cpp:126:11: warning: extra tokens at end of #else directive
deviceQuery.cpp:135:11: warning: extra tokens at end of #else directive
/usr/lib/gcc/i586-suse-linux/4.5/../../../../i586-suse-linux/bin/ld: cannot find -lcutil_i386
collect2: ld returned 1 exit status
make: *** [../../bin/linux/release/deviceQuery] Error 1

I resolved this by compiling in this order:

[email protected]:~/NVIDIA_GPU_Computing_SDK/C/common> make
[email protected]:~/NVIDIA_GPU_Computing_SDK/shared> make
[email protected]:~/NVIDIA_GPU_Computing_SDK/C/src/deviceQuery> make

The real solution though, oddly enough, is to do what the Getting Started Guide said: You should compile them all by changing to NVIDIA_GPU_Computing_SDK/C in the userʹs home directory and typing make. The resulting binaries will be installed under the home directory in NVIDIA_GPU_Computing_SDK/C/bin/linux/release

So, just compile them all and avoid such problems:

[email protected]:~/NVIDIA_GPU_Computing_SDK/C/make

GCC Version Issues

(For OpenSUSE 11.3, Fedora 13, Ubuntu 10.04, etc.)

If you are running a distro newer than the latest release of the CUDA Toolkit and SDK, then your versions of gcc and glibc may be newer and not compatible. This is generally resolveable by installing the earlier versions, and setting up your system to allow easy selection of which version to compile with. On my OpenSuse 11.3 install I used the following links to implement the solution which follows:





bob:~ # gcc --version
gcc (SUSE Linux) 4.5.0 20100604 [gcc-4_5-branch revision 160292]  ### <== Too new, does not allow code examples to compile!

(Use Yast to install gcc43 gcc43-c++ gcc43-info and any required dependencies. My packages are as follows:)

bob:~ # rpm -qa | grep gcc

(Now, set up both versions so you can easily select which one is the default:)

bob:~ #sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-4.3 60 --slave /usr/bin/g++ g++ /usr/bin/g++-4.3
bob:~ #sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-4.5 40 --slave /usr/bin/g++ g++ /usr/bin/g++-4.5
bob:~ # update-alternatives --config gcc

There are 2 alternatives which provide `gcc'.

  Selection    Alternative
*+        1    /usr/bin/gcc-4.3
          2    /usr/bin/gcc-4.5

Press enter to keep the default[*], or type selection number: 1
Using '/usr/bin/gcc-4.3' to provide 'gcc'.
bob:~ # gcc --version
gcc (SUSE Linux) 4.3.4 [gcc-4_3-branch revision 152973]  ## <== You are now set to use CUDA with SDK version 3.1

No X Windows Running?

If you are installing on a server, and do not have X running, you will need to ensure the required device files are created. The getting started guide provides the following script to accomplish such. You can save this as nvidia_setup.sh and then invoke it from /etc/rc.local during boot:

/sbin/modprobe nvidia
if [ "$?" -eq 0 ]; then 
# Count the number of NVIDIA controllers found. 
NVDEVS=`lspci | grep -i NVIDIA`
N3D=`echo "$NVDEVS" | grep "3D controller" | wc -l`
NVGA=`echo "$NVDEVS" | grep "VGA compatible controller" | wc -l` 

N=`expr $N3D + $NVGA - 1` 
for i in `seq 0 $N`; do 
  mknod -m 666 /dev/nvidia$i c 195 $i

mknod -m 666 /dev/nvidiactl c 195 255

# Set GPUs to persistent mode so driver stays loaded
nvidia-smi -pm 1

 exit 1


Persistence Mode

If running the GPU on a server without X, the CUDA driver is loaded when needed and then unloaded. This adds a delay of 1 - 2 seconds, which can be very annoying when testing code. Enable persistence mode to eliminate this delay:

Enable persistence on all devices:

/usr/bin/nvidia-smi -pm 1

You may wish to add the above to /etc/rc.local to enable on boot.

Verify with:

/usr/bin/nvidia-smi -q | grep -i Persistence 

nvidia-smi provides a wealth of information about your CUDA devices.

See Nvidia-Settings for related information.

MultiGPU Systems

If you have multiple GPUs, you can select which ones are available to the runtime environment with:


This will mask GPU 2. This does not change what nvidia-smi -L shows, but does seem to direct which GPU the runtime environment uses.

Multiple Version of GCC

Sometimes conflicts arise when you need one version of gcc for the CUDA tooklkit, while other tool chains and programming needs require a different version. Often you distro package manager will provide several versions of gcc. If installed, switching between them is made easier by using the update / update-alternative tool.

In this example we configure gcc 4.3 and 4.5 so as to be able to easily switch between then as needed:

sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-4.3 60 --slave /usr/bin/g++ g++ /usr/bin/g++-4.3
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-4.5 40 --slave /usr/bin/g++ g++ /usr/bin/g++-4.5
update-alternatives --config gcc


Some OS X OpenCL documentation is here

CUDA vs OpenCL


GPU Caps Viewer

This utility polls GPUs and returns the device capabilities.

This utility can run under Wine on Linux as well, at least as of version 1.7:



Now what?

So what else can you do with CUDA? What applications exist out there to sink the teeth of your GPU into? Well, the list is growing all the time, but I've started a page where I'll be adding some too as I discover them, and you can find it at CUDA Applications

Check out CUDA Data Parallel Primitives Library - CUDPP

Explore the CUDA Performance Profiler




If you wish to view OpenGL simulations remotely you will quickly discover that OpenGL is actually rendered on the client. Although the computations may be done on the remote system, all rendering commands are sent to the client to be displayed locally, essentially killing performance. VirtualGL provides a solution to this by rendering results on the server, storing them in a pixel buffer and transferring that to the client. It does so in a clever way, by attaching a loadable module to the binary which intercepts the OpenGL calls and redirects them locally.

See the official VirtualGL site for complete details. There is also some useful information in this Sun documentation and this Nvidia forum thread.

(Newer versions do not require the libjpeg-turbo. There are several steps required to configure X, so see the package documentation, and VirtualGL-2.3.2/doc/unixconfig.txt in particular, prior to trying the below commands.)

To start a VGL forwarded SSH connection from the client:

/opt/VirtualGL/bin/vglconnect -force [email protected]

Then, start the OpenGL app on the server with vglrun:

[email protected]:~> cd NVIDIA_GPU_Computing_SDK/C/bin/linux/release/
[email protected]:~/NVIDIA_GPU_Computing_SDK/C/bin/linux/release> /opt/VirtualGL/bin/vglrun ./oceanFFT

Also, if the server has another (non-compute) graphics card you might explore using NXServer and connecting the to remote desktop via NXclient. NX provides much faster connectivity compared to VNC, and I have used it for other apps which use OpenGL such as Visit, Matlab, etc. though I've not actually tested it with CUDA and OpenGL. The Nvidia forums seem to suggest this as an alternative. Of course if you only have one GPU on the compute node, or no desktop environment on it than this option is out.



You can enable overclocking via the nvidia-settings utility by simply adding an option to your xorg.conf Nvidia Device section:

   Option "Coolbits" "1"

Restart X and the nvidia-setting GUI will now have overclocking options.





Provides basic overclocking capabilities, though newer cards may not be supported. (Unfortunately, the prospects of porting other overclocking tools such as EVGA's Precision utility are not showing much promise at this time. This does not run under Wine and a Linux implementation is not planned per this post.

The qt version seems to be broken in ./configure but the gtk version compiles fine. Command line and GTK versions binaries are:

nvclock0.8b4> src/gtk/nvclock_gtk

nvclock0.8b4> src/nvclock

## This will overclock the GPU to 300 and Memory to 400 – Change accordingly!
nvclock -b coolbits -n 300.000 -m 400.000

Language Bindings

Looking for more ways to leverage CUDAs capabilities from other languages? Try these resources:

Kappa Framework:


CUDA Library for R:


CUDA Library for IDL:





Programming Massively Parallel Processors: A Hands-on Approach

ISBN-13: 978-0123814722

CUDA by Example: An Introduction to General-Purpose GPU Programming

ISBN-13: 978-0131387683

Scientific Computing with Multicore and Accelerators

ISBN: 978-1-4398253-6-5