Skip to content

Commit d93fe58

Browse files
authored
Fix cuda installs (#652)
* Correct cuda_samples_path for secure nfs-homedirs The ``rocky`` user's homedir has moved from ``/home/rocky`` to ``/var/lib/rocky``. This updates ``cuda_samples_path`` to use that. * Hardcode cuda_version_short and remove lookup We already hardcode the version of cuda which we install. This ensures that we will also use the requested version when multiple cuda versions are installed. Moves ``cuda_version_short`` to be next to ``cuda_package_version`` so it's harder to miss updating one when the other is changed. * Install cmake and cuda-toolkit
1 parent 6bf1dbb commit d93fe58

File tree

3 files changed

+5
-13
lines changed

3 files changed

+5
-13
lines changed

ansible/roles/cuda/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,6 @@ Requires OFED to be installed to provide required kernel-* packages.
1010

1111
- `cuda_repo_url`: Optional. URL of `.repo` file. Default is upstream for appropriate OS/architecture.
1212
- `cuda_nvidia_driver_stream`: Optional. Version of `nvidia-driver` stream to enable. This controls whether the open or proprietary drivers are installed and the major version. Changing this once the drivers are installed does not change the version.
13-
- `cuda_packages`: Optional. Default: `['cuda', 'nvidia-gds']`.
13+
- `cuda_packages`: Optional. Default: `['cuda', 'nvidia-gds', 'cmake', 'cuda-toolkit-12-8']`.
1414
- `cuda_package_version`: Optional. Default `latest` which will install the latest packages if not installed but won't upgrade already-installed packages. Use `'none'` to skip installing CUDA.
1515
- `cuda_persistenced_state`: Optional. State of systemd `nvidia-persistenced` service. Values as [ansible.builtin.systemd:state](https://docs.ansible.com/ansible/latest/collections/ansible/builtin/systemd_module.html#parameter-state). Default `started`.

ansible/roles/cuda/defaults/main.yml

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,14 @@
11
cuda_repo_url: "https://developer.download.nvidia.com/compute/cuda/repos/rhel{{ ansible_distribution_major_version }}/{{ ansible_architecture }}/cuda-rhel{{ ansible_distribution_major_version }}.repo"
22
cuda_nvidia_driver_stream: '570-open'
33
cuda_package_version: '12.8.1-1'
4+
cuda_version_short: '12.8'
45
cuda_packages:
56
- "cuda{{ ('-' + cuda_package_version) if cuda_package_version != 'latest' else '' }}"
67
- nvidia-gds
7-
# _cuda_version_tuple: # discovered from installed package e.g. ('12', '1', '0')
8-
cuda_version_short: "{{ _cuda_version_tuple[0] }}.{{ _cuda_version_tuple[1] }}"
8+
- cmake
9+
- cuda-toolkit-12-8
910
cuda_samples_release_url: "https://github.com/NVIDIA/cuda-samples/archive/refs/tags/v{{ cuda_version_short }}.tar.gz"
10-
cuda_samples_path: "/home/{{ ansible_user }}/cuda_samples"
11+
cuda_samples_path: "/var/lib/{{ ansible_user }}/cuda_samples"
1112
cuda_samples_programs:
1213
- deviceQuery
1314
- bandwidthTest

ansible/roles/cuda/tasks/samples.yml

Lines changed: 0 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,3 @@
1-
- name: Read CUDA version file
2-
slurp:
3-
src: /usr/local/cuda/version.json
4-
register: _cuda_samples_version
5-
6-
- name: Set fact for discovered CUDA version
7-
set_fact:
8-
_cuda_version_tuple: "{{ (_cuda_samples_version.content | b64decode | from_json).cuda.version | split('.') }}" # e.g. '12.1.0'
9-
101
- name: Ensure cuda_samples_path exists
112
file:
123
state: directory

0 commit comments

Comments
 (0)