Skip to content

Commit f70d3f4

Browse files
committed
Add docs for vGPU VM drivers, licencing, and tests
1 parent 7778a58 commit f70d3f4

File tree

1 file changed

+130
-0
lines changed

1 file changed

+130
-0
lines changed

source/gpus_in_openstack.rst

Lines changed: 130 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -458,6 +458,76 @@ Booting the VM:
458458
$ openstack server add security group nvidia-dls-1 nvidia-dls
459459
460460
461+
Manual VM driver and licence configuration
462+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
463+
464+
vGPU client VMs need to be configured with Nvidia drivers to run GPU workloads.
465+
The host drivers should already be applied to the hypervisor.
466+
467+
GCP hosts compatible client drivers `here
468+
<https://cloud.google.com/compute/docs/gpus/grid-drivers-table>`__.
469+
470+
Find the correct version (when in doubt, use the same version as the host) and
471+
download it to the VM. The exact dependencies will depend on the base image you
472+
are using but at a minimum, you will need GCC installed.
473+
474+
Ubuntu Jammy example:
475+
476+
.. code-block:: bash
477+
478+
sudo apt update
479+
sudo apt install -y make gcc wget
480+
wget https://storage.googleapis.com/nvidia-drivers-us-public/GRID/vGPU17.1/NVIDIA-Linux-x86_64-550.54.15-grid.run
481+
sudo sh NVIDIA-Linux-x86_64-550.54.15-grid.run
482+
483+
Check the ``nvidia-smi`` client is available:
484+
485+
.. code-block:: bash
486+
487+
nvidia-smi
488+
489+
Generate a token from the licence server, and copy the token file to the client
490+
VM.
491+
492+
On the client, create an Nvidia grid config file from the template:
493+
494+
.. code-block:: bash
495+
496+
sudo cp /etc/nvidia/gridd.conf.template /etc/nvidia/gridd.conf
497+
498+
Edit it to set ``FeatureType=1`` and leave the rest of the settings as default.
499+
500+
Copy the client configuration token into the ``/etc/nvidia/ClientConfigToken``
501+
directory.
502+
503+
Ensure the correct permissions are set:
504+
505+
.. code-block:: bash
506+
507+
sudo chmod 744 /etc/nvidia/ClientConfigToken/client_configuration_token_<datetime>.tok
508+
509+
Restart the ``nvidia-gridd`` service:
510+
511+
.. code-block:: bash
512+
513+
sudo systemctl restart nvidia-gridd
514+
515+
Check that the token has been recognised:
516+
517+
.. code-block:: bash
518+
519+
nvidia-smi -q | grep 'License Status'
520+
521+
If not, an error should appear in the journal:
522+
523+
.. code-block:: bash
524+
525+
sudo journalctl -xeu nvidia-gridd
526+
527+
A successfully licenced VM can be snapshotted to create an image in Glance that
528+
includes the drivers and licencing token. Alternatively, an image can be
529+
created using Diskimage Builder.
530+
461531
Disk image builder recipe to automatically license VGPU on boot
462532
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
463533

@@ -536,6 +606,66 @@ when copying the contents as it can contain invisible characters. It is best to
536606
into your openstack-config repository and vault encrypt it. The ``file`` lookup plugin can be used to decrypt
537607
the file (as shown in the example above).
538608

609+
Testing vGPU VMs
610+
^^^^^^^^^^^^^^^^
611+
612+
vGPU VMs can be validated using the following test workload. The test should
613+
succeed if the VM is correctly licenced and drivers are correctly installed for
614+
both the host and client VM.
615+
616+
Install ``cuda-toolkit`` using the instructions `here
617+
<https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html>`__.
618+
619+
Ubuntu Jammy example:
620+
621+
.. code-block:: bash
622+
623+
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
624+
sudo dpkg -i cuda-keyring_1.1-1_all.deb
625+
sudo apt update -y
626+
sudo apt install -y cuda-toolkit make
627+
628+
The VM may require a reboot at this point.
629+
630+
Clone the ``cuda-samples`` repo:
631+
632+
.. code-block:: bash
633+
634+
git clone https://github.com/NVIDIA/cuda-samples.git
635+
636+
Build and run a test workload:
637+
638+
.. code-block:: bash
639+
640+
cd cuda-samples/Samples/6_Performance/transpose
641+
make
642+
./transpose
643+
644+
Example output:
645+
646+
.. code-block::
647+
648+
Transpose Starting...
649+
650+
GPU Device 0: "Ampere" with compute capability 8.0
651+
652+
> Device 0: "GRID A100D-1-10C MIG 1g.10gb"
653+
> SM Capability 8.0 detected:
654+
> [GRID A100D-1-10C MIG 1g.10gb] has 14 MP(s) x 64 (Cores/MP) = 896 (Cores)
655+
> Compute performance scaling factor = 1.00
656+
657+
Matrix size: 1024x1024 (64x64 tiles), tile size: 16x16, block size: 16x16
658+
659+
transpose simple copy , Throughput = 159.1779 GB/s, Time = 0.04908 ms, Size = 1048576 fp32 elements, NumDevsUsed = 1, Workgroup = 256
660+
transpose shared memory copy, Throughput = 152.1922 GB/s, Time = 0.05133 ms, Size = 1048576 fp32 elements, NumDevsUsed = 1, Workgroup = 256
661+
transpose naive , Throughput = 117.2670 GB/s, Time = 0.06662 ms, Size = 1048576 fp32 elements, NumDevsUsed = 1, Workgroup = 256
662+
transpose coalesced , Throughput = 135.0813 GB/s, Time = 0.05784 ms, Size = 1048576 fp32 elements, NumDevsUsed = 1, Workgroup = 256
663+
transpose optimized , Throughput = 145.4326 GB/s, Time = 0.05372 ms, Size = 1048576 fp32 elements, NumDevsUsed = 1, Workgroup = 256
664+
transpose coarse-grained , Throughput = 145.2941 GB/s, Time = 0.05377 ms, Size = 1048576 fp32 elements, NumDevsUsed = 1, Workgroup = 256
665+
transpose fine-grained , Throughput = 150.5703 GB/s, Time = 0.05189 ms, Size = 1048576 fp32 elements, NumDevsUsed = 1, Workgroup = 256
666+
transpose diagonal , Throughput = 117.6831 GB/s, Time = 0.06639 ms, Size = 1048576 fp32 elements, NumDevsUsed = 1, Workgroup = 256
667+
Test passed
668+
539669
Changing VGPU device types
540670
^^^^^^^^^^^^^^^^^^^^^^^^^^
541671

0 commit comments

Comments
 (0)