Skip to content

Add support for CUDA #253

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
May 24, 2023
Merged

Add support for CUDA #253

merged 13 commits into from
May 24, 2023

Conversation

sjpb
Copy link
Collaborator

@sjpb sjpb commented Mar 8, 2023

  • Adds a role cuda to install the NVIDIA CUDA toolkit onto hosts, see ansible/roles/cuda/README. This is not enabled by default, and requires OFED to be installed which currently must be done outside the appliance.
  • Adds a new adhoc playbook ansible/adhoc/cudatests.yml which uses the cuda-samples utilities deviceQuery and bandwidthTest to test GPU functionality.
  • Updates the stackhpc.openhpc role to:
    • Support defining gres parameter in openhpc_slurm_partitions parameter, to allow scheduling onto GPUs. (see v0.19.0)
    • Allow defining a custom openhpc package repo URL (see v0.19.0)
    • Allow multiple empty partitions (see v0.20.0)

Note that CUDA is not included in the default fat image. However if a compute host is added to the cuda group then the openstack.openhpc (fat image) or openstack.compute (compute image) will include the CUDA install.

@sjpb sjpb marked this pull request as ready for review May 12, 2023 15:58
@sjpb sjpb requested a review from a team as a code owner May 12, 2023 15:58
Copy link
Collaborator

@m-bull m-bull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sjpb sjpb merged commit 9f4ef8e into main May 24, 2023
@sjpb sjpb deleted the cuda branch May 24, 2023 10:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants