Skip to content

Commit 542c91f

Browse files
committed
Merge remote-tracking branch 'upstream/main' into device-properties
2 parents 56ca8ae + d6afedf commit 542c91f

File tree

10 files changed

+48
-35
lines changed

10 files changed

+48
-35
lines changed

.github/workflows/build-docs.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ jobs:
3838
# The build stage could fail but we want the CI to keep moving.
3939
if: ${{ github.repository_owner == 'nvidia' && !cancelled() }}
4040
# WAR: Building the doc currently requires a GPU (NVIDIA/cuda-python#326,327)
41-
runs-on: linux-amd64-gpu-t4-latest-1-testing
41+
runs-on: linux-amd64-gpu-t4-latest-1
4242
#runs-on: ubuntu-latest
4343
defaults:
4444
run:

.github/workflows/test-wheel.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ jobs:
2828
if: ${{ github.repository_owner == 'nvidia' && !cancelled() }}
2929
runs-on: ${{ (inputs.runner == 'default' && inputs.host-platform == 'linux-64' && 'linux-amd64-gpu-v100-latest-1') ||
3030
(inputs.runner == 'default' && inputs.host-platform == 'linux-aarch64' && 'linux-arm64-gpu-a100-latest-1') ||
31-
(inputs.runner == 'H100' && 'linux-amd64-gpu-h100-latest-1-testing') }}
31+
(inputs.runner == 'H100' && 'linux-amd64-gpu-h100-latest-1') }}
3232
# Our self-hosted runners require a container
3333
# TODO: use a different (nvidia?) container
3434
container:

cuda_bindings/docs/build_docs.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ if [[ -z "${SPHINX_CUDA_BINDINGS_VER}" ]]; then
2323
fi
2424

2525
# build the docs (in parallel)
26-
SPHINXOPTS="-j 4" make html
26+
SPHINXOPTS="-j 4 -d build/.doctrees" make html
2727

2828
# for debugging/developing (conf.py), please comment out the above line and
2929
# use the line below instead, as we must build in serial to avoid getting

cuda_bindings/docs/source/conf.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@
3232
# ones.
3333
extensions = ["sphinx.ext.autodoc", "sphinx.ext.napoleon", "myst_nb", "enum_tools.autoenum"]
3434

35-
jupyter_execute_notebooks = "force"
35+
nb_execution_mode = "off"
3636
numfig = True
3737

3838
# Add any paths that contain templates here, relative to this directory.

cuda_bindings/docs/source/overview.md

Lines changed: 12 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,3 @@
1-
---
2-
jupytext:
3-
text_representation:
4-
format_name: myst
5-
kernelspec:
6-
display_name: Python 3
7-
name: python3
8-
---
9-
101
# Overview
112

123
<p style="font-size: 14px; color: grey; text-align: right;">by <a
@@ -48,7 +39,7 @@ API](https://docs.nvidia.com/cuda/cuda-driver-api/index.html) and
4839
Python package. In this example, you copy data from the host to device. You need
4940
[NumPy](https://numpy.org/doc/stable/contents.html) to store data on the host.
5041

51-
```{code-cell} python
42+
```python
5243
from cuda.bindings import driver, nvrtc
5344
import numpy as np
5445
```
@@ -58,7 +49,7 @@ example is provided.
5849
In a future release, this may automatically raise exceptions using a Python
5950
object model.
6051

61-
```{code-cell} python
52+
```python
6253
def _cudaGetErrorEnum(error):
6354
if isinstance(error, driver.CUresult):
6455
err, name = driver.cuGetErrorName(error)
@@ -86,7 +77,7 @@ Python that requires some understanding of CUDA C++. For more information, see
8677
[An Even Easier Introduction to
8778
CUDA](https://developer.nvidia.com/blog/even-easier-introduction-cuda/).
8879

89-
```{code-cell} python
80+
```python
9081
saxpy = """\
9182
extern "C" __global__
9283
void saxpy(float a, float *x, float *y, float *out, size_t n)
@@ -108,7 +99,7 @@ In the following code example, the Driver API is initialized so that the NVIDIA
10899
and GPU are accessible. Next, the GPU is queried for their compute capability. Finally,
109100
the program is compiled to target our local compute capability architecture with FMAD enabled.
110101

111-
```{code-cell} python
102+
```python
112103
# Initialize CUDA Driver API
113104
checkCudaErrors(driver.cuInit(0))
114105

@@ -138,7 +129,7 @@ context. CUDA contexts are analogous to host processes for the device. In the
138129
following code example, a handle for compute device 0 is passed to
139130
`cuCtxCreate` to designate that GPU for context creation.
140131

141-
```{code-cell} python
132+
```python
142133
# Create context
143134
context = checkCudaErrors(driver.cuCtxCreate(0, cuDevice))
144135
```
@@ -148,7 +139,7 @@ module. A module is analogous to dynamically loaded libraries for the device.
148139
After loading into the module, extract a specific kernel with
149140
`cuModuleGetFunction`. It is not uncommon for multiple kernels to reside in PTX.
150141

151-
```{code-cell} python
142+
```python
152143
# Load PTX as module data and retrieve function
153144
ptx = np.char.array(ptx)
154145
# Note: Incompatible --gpu-architecture would be detected here
@@ -161,7 +152,7 @@ application performance, you can input data on the device to eliminate data
161152
transfers. For completeness, this example shows how you would transfer data to
162153
and from the device.
163154

164-
```{code-cell} python
155+
```python
165156
NUM_THREADS = 512 # Threads per block
166157
NUM_BLOCKS = 32768 # Blocks per grid
167158

@@ -184,7 +175,7 @@ Python doesn’t have a natural concept of pointers, yet `cuMemcpyHtoDAsync` exp
184175
`void*`. Therefore, `XX.ctypes.data` retrieves the pointer value associated with
185176
XX.
186177

187-
```{code-cell} python
178+
```python
188179
dXclass = checkCudaErrors(driver.cuMemAlloc(bufferSize))
189180
dYclass = checkCudaErrors(driver.cuMemAlloc(bufferSize))
190181
dOutclass = checkCudaErrors(driver.cuMemAlloc(bufferSize))
@@ -209,7 +200,7 @@ Like `cuMemcpyHtoDAsync`, `cuLaunchKernel` expects `void**` in the argument list
209200
the earlier code example, it creates `void**` by grabbing the `void*` value of each
210201
individual argument and placing them into its own contiguous memory.
211202

212-
```{code-cell} python
203+
```python
213204
# The following code example is not intuitive
214205
# Subject to change in a future release
215206
dX = np.array([int(dXclass)], dtype=np.uint64)
@@ -222,7 +213,7 @@ args = np.array([arg.ctypes.data for arg in args], dtype=np.uint64)
222213

223214
Now the kernel can be launched:
224215

225-
```{code-cell} python
216+
```python
226217
checkCudaErrors(driver.cuLaunchKernel(
227218
kernel,
228219
NUM_BLOCKS, # grid x dim
@@ -251,7 +242,7 @@ stream are serialized. After the call to transfer data back to the host is
251242
executed, `cuStreamSynchronize` is used to halt CPU execution until all operations
252243
in the designated stream are finished.
253244

254-
```{code-cell} python
245+
```python
255246
# Assert values are same after running kernel
256247
hZ = a * hX + hY
257248
if not np.allclose(hOut, hZ):
@@ -261,7 +252,7 @@ if not np.allclose(hOut, hZ):
261252
Perform verification of the data to ensure correctness and finish the code with
262253
memory clean up.
263254

264-
```{code-cell} python
255+
```python
265256
checkCudaErrors(driver.cuStreamDestroy(stream))
266257
checkCudaErrors(driver.cuMemFree(dXclass))
267258
checkCudaErrors(driver.cuMemFree(dYclass))

cuda_core/cuda/core/experimental/_memoryview.pyx

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -48,20 +48,20 @@ cdef class StridedMemoryView:
4848
----------
4949
ptr : int
5050
Pointer to the tensor buffer (as a Python `int`).
51-
shape: tuple
51+
shape : tuple
5252
Shape of the tensor.
53-
strides: tuple
53+
strides : tuple
5454
Strides of the tensor (in **counts**, not bytes).
5555
dtype: numpy.dtype
5656
Data type of the tensor.
57-
device_id: int
57+
device_id : int
5858
The device ID for where the tensor is located. It is -1 for CPU tensors
5959
(meaning those only accessible from the host).
60-
is_device_accessible: bool
60+
is_device_accessible : bool
6161
Whether the tensor data can be accessed on the GPU.
6262
readonly: bool
6363
Whether the tensor data can be modified in place.
64-
exporting_obj: Any
64+
exporting_obj : Any
6565
A reference to the original tensor object that is being viewed.
6666
6767
Parameters
@@ -334,7 +334,8 @@ cdef StridedMemoryView view_as_cai(obj, stream_ptr, view=None):
334334

335335

336336
def args_viewable_as_strided_memory(tuple arg_indices):
337-
"""Decorator to create proxy objects to :obj:`StridedMemoryView` for the
337+
"""
338+
Decorator to create proxy objects to :obj:`StridedMemoryView` for the
338339
specified positional arguments.
339340
340341
This allows array/tensor attributes to be accessed inside the function

cuda_core/docs/build_docs.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ if [[ -z "${SPHINX_CUDA_CORE_VER}" ]]; then
1919
fi
2020

2121
# build the docs (in parallel)
22-
SPHINXOPTS="-j 4" make html
22+
SPHINXOPTS="-j 4 -d build/.doctrees" make html
2323

2424
# for debugging/developing (conf.py), please comment out the above line and
2525
# use the line below instead, as we must build in serial to avoid getting

cuda_core/docs/source/conf.py

Lines changed: 22 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,8 +10,11 @@
1010
# add these directories to sys.path here. If the directory is relative to the
1111
# documentation root, use os.path.abspath to make it absolute, like shown here.
1212
import os
13+
import sys
14+
from unittest.mock import MagicMock
15+
16+
from cuda.core.experimental._system import System
1317

14-
# import sys
1518
# sys.path.insert(0, os.path.abspath('.'))
1619

1720

@@ -102,6 +105,24 @@
102105
napoleon_numpy_docstring = True
103106

104107

108+
# Mock the System class and its methods
109+
class MockSystem:
110+
def __init__(self, *args, **kwargs):
111+
pass
112+
113+
driver_version = MagicMock()
114+
driver_version.__doc__ = System.driver_version.__doc__
115+
num_devices = MagicMock()
116+
num_devices.__doc__ = System.num_devices.__doc__
117+
devices = MagicMock()
118+
devices.__doc__ = System.devices.__doc__
119+
120+
121+
sys.modules["cuda.core.experimental._system.System"] = MagicMock(System=MockSystem)
122+
123+
# Add 'cuda.core.experimental.system' to autodoc_mock_imports
124+
autodoc_mock_imports = ["cuda.core.experimental.system"]
125+
105126
section_titles = ["Returns"]
106127

107128

cuda_python/docs/build_docs.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ if [[ -z "${SPHINX_CUDA_PYTHON_VER}" ]]; then
2323
fi
2424

2525
# build the docs (in parallel)
26-
SPHINXOPTS="-j 4" make html
26+
SPHINXOPTS="-j 4 -d build/.doctrees" make html
2727

2828
# for debugging/developing (conf.py), please comment out the above line and
2929
# use the line below instead, as we must build in serial to avoid getting

cuda_python/docs/source/release/11.8.6-notes.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ Released on January 24, 2025.
44

55
## Included components
66

7-
- [`cuda.bindings` 11.8.6](https://nvidia.github.io/cuda-python/cuda-bindings/11.8.6/release/11.8.6-notes.html)
7+
- [`cuda.bindings` 11.8.6](https://nvidia.github.io/cuda-python/cuda-bindings/12.8.0/release/11.8.6-notes.html)
88

99

1010
## Highlights

0 commit comments

Comments
 (0)