How do I select which GPU to run a job on?

CudaNvidia

Cuda Problem Overview


In a multi-GPU computer, how do I designate which GPU a CUDA job should run on?

As an example, when installing CUDA, I opted to install the NVIDIA_CUDA-<#.#>_Samples then ran several instances of the nbody simulation, but they all ran on one GPU 0; GPU 1 was completely idle (monitored using watch -n 1 nvidia-dmi). Checking CUDA_VISIBLE_DEVICES using

echo $CUDA_VISIBLE_DEVICES

I found this was not set. I tried setting it using

CUDA_VISIBLE_DEVICES=1

then running nbody again but it also went to GPU 0.

I looked at the related question, how to choose designated GPU to run CUDA program?, but deviceQuery command is not in the CUDA 8.0 bin directory. In addition to $CUDA_VISIBLE_DEVICES$, I saw other posts refer to the environment variable $CUDA_DEVICES but these were not set and I did not find information on how to use it.

While not directly related to my question, using nbody -device=1 I was able to get the application to run on GPU 1 but using nbody -numdevices=2 did not run on both GPU 0 and 1.

I am testing this on a system running using the bash shell, on CentOS 6.8, with CUDA 8.0, 2 GTX 1080 GPUs, and NVIDIA driver 367.44.

I know when writing using CUDA you can manage and control which CUDA resources to use but how would I manage this from the command line when running a compiled CUDA executable?

Cuda Solutions


Solution 1 - Cuda

The problem was caused by not setting the CUDA_VISIBLE_DEVICES variable within the shell correctly.

To specify CUDA device 1 for example, you would set the CUDA_VISIBLE_DEVICES using

export CUDA_VISIBLE_DEVICES=1

or

CUDA_VISIBLE_DEVICES=1 ./cuda_executable

The former sets the variable for the life of the current shell, the latter only for the lifespan of that particular executable invocation.

If you want to specify more than one device, use

export CUDA_VISIBLE_DEVICES=0,1

or

CUDA_VISIBLE_DEVICES=0,1 ./cuda_executable

Solution 2 - Cuda

In case of someone else is doing it in Python and it is not working, try to set it before do the imports of pycuda and tensorflow.

I.e.:

import os
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"] = "0"
...
import pycuda.autoinit
import tensorflow as tf
...

As saw here.

Solution 3 - Cuda

Set the following two environment variables:

NVIDIA_VISIBLE_DEVICES=$gpu_id
CUDA_VISIBLE_DEVICES=0

where gpu_id is the ID of your selected GPU, as seen in the host system's nvidia-smi (a 0-based integer) that will be made available to the guest system (e.g. to the Docker container environment).

You can verify that a different card is selected for each value of gpu_id by inspecting Bus-Id parameter in nvidia-smi run in a terminal in the guest system).

More info

This method based on NVIDIA_VISIBLE_DEVICES exposes only a single card to the system (with local ID zero), hence we also hard-code the other variable, CUDA_VISIBLE_DEVICES to 0 (mainly to prevent it from defaulting to an empty string that would indicate no GPU).

Note that the environmental variable should be set before the guest system is started (so no chances of doing it in your Jupyter Notebook's terminal), for instance using docker run -e NVIDIA_VISIBLE_DEVICES=0 or env in Kubernetes or Openshift.

If you want GPU load-balancing, make gpu_id random at each guest system start.

If setting this with python, make sure you are using strings for all environment variables, including numerical ones.

You can verify that a different card is selected for each value of gpu_id by inspecting nvidia-smi's Bus-Id parameter (in a terminal run in the guest system).

The accepted solution based on CUDA_VISIBLE_DEVICES alone does not hide other cards (different from the pinned one), and thus causes access errors if you try to use them in your GPU-enabled python packages. With this solution, other cards are not visible to the guest system, but other users still can access them and share their computing power on an equal basis, just like with CPU's (verified).

This is also preferable to solutions using Kubernetes / Openshift controlers (resources.limits.nvidia.com/gpu), that would impose a lock on the allocated card, removing it from the pool of available resources (so the number of containers with GPU access could not exceed the number of physical cards).

This has been tested under CUDA 8.0, 9.0, 10.1, and 11.2 in docker containers running Ubuntu 18.04 or 20.04 and orchestrated by Openshift 3.11.

Solution 4 - Cuda

You can also set the GPU in the command line so that you don't need to hard-code the device into your script (which may fail on systems without multiple GPUs). Say you want to run your script on GPU number 5, you can type the following on the command line and it will run your script just this once on GPU#5:

CUDA_VISIBLE_DEVICES=5, python test_script.py

Solution 5 - Cuda

For a random gpu you can do this:

export CUDA_VISIBLE_DEVICES=$((( RANDOM % 8 )))

Solution 6 - Cuda

Choose GPU with lowest utilization

After making xml2json available in your path you can select the N GPU(s) that have the lowest utilization:

export CUDA_VISIBLE_DEVICES=$(nvidia-smi -x -q | xml2json | jq '.' | python -c 'import json;import sys;print(",".join([str(gpu[0]) for gpu in sorted([(int(gpu["minor_number"]), float(gpu["utilization"]["gpu_util"].split(" ")[0])) for gpu in json.load(sys.stdin)["nvidia_smi_log"]["gpu"]], key=lambda x: x[1])[:2]]))')

Just replace the [:2] by [:1] if you need a single GPU or any number according to you maximum number of available GPUs.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionSteven C. HowellView Question on Stackoverflow
Solution 1 - CudaSteven C. HowellView Answer on Stackoverflow
Solution 2 - CudaLucasView Answer on Stackoverflow
Solution 3 - CudamirekphdView Answer on Stackoverflow
Solution 4 - CudaEduardo BarreraView Answer on Stackoverflow
Solution 5 - CudaCharlie ParkerView Answer on Stackoverflow
Solution 6 - CudaJanView Answer on Stackoverflow