Skip to content

Commit 1d137ad

Browse files
committed
Merge branch 'main' of https://github.com/NVIDIA/cuda-python into add_system
2 parents 19a649e + fd71ced commit 1d137ad

File tree

6 files changed

+147
-49
lines changed

6 files changed

+147
-49
lines changed

README.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# CUDA-Python
1+
# cuda-python
22

33
CUDA Python is the home for accessing NVIDIA’s CUDA platform from Python. It consists of multiple components:
44

@@ -7,21 +7,21 @@ CUDA Python is the home for accessing NVIDIA’s CUDA platform from Python. It c
77
* [cuda.cooperative](https://nvidia.github.io/cccl/cuda_cooperative/): Pythonic exposure of CUB cooperative algorithms
88
* [cuda.parallel](https://nvidia.github.io/cccl/cuda_parallel/): Pythonic exposure of Thrust parallel algorithms
99

10-
For access to NVIDIA Math Libraries, please refer to [nvmath-python](https://docs.nvidia.com/cuda/nvmath-python/latest).
10+
For access to NVIDIA CPU & GPU Math Libraries, please refer to [nvmath-python](https://docs.nvidia.com/cuda/nvmath-python/latest).
1111

1212
CUDA Python is currently undergoing an overhaul to improve existing and bring up new components. All of the previously available functionalities from the cuda-python package will continue to be available, please refer to the [cuda.bindings](https://nvidia.github.io/cuda-python/cuda-bindings/latest) documentation for installation guide and further detail.
1313

14-
## CUDA-Python as a metapackage
14+
## cuda-python as a metapackage
1515

16-
CUDA-Python is structured to become a metapackage that contains a collection of subpackages. Each subpackage is versioned independently, allowing installation of each component as needed.
16+
`cuda-python` is being re-structured to become a metapackage that contains a collection of subpackages. Each subpackage is versioned independently, allowing installation of each component as needed.
1717

1818
### Subpackage: `cuda.core`
1919

2020
The `cuda.core` package offers idiomatic, pythonic access to CUDA Runtime and other functionalities.
2121

2222
The goals are to
2323

24-
1. Provide **idiomatic (pythonic)** access to CUDA Driver/Runtime
24+
1. Provide **idiomatic ("pythonic")** access to CUDA Driver, Runtime, and JIT compiler toolchain
2525
2. Focus on **developer productivity** by ensuring end-to-end CUDA development can be performed quickly and entirely in Python
2626
3. **Avoid homegrown** Python abstractions for CUDA for new Python GPU libraries starting from scratch
2727
4. **Ease** developer **burden of maintaining** and catching up with latest CUDA features

cuda_bindings/README.md

Lines changed: 29 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,15 @@
11
# `cuda.bindings`: Low-level CUDA interfaces
22

3-
CUDA Python is a standard set of low-level interfaces, providing full coverage of and access to the CUDA host APIs from Python. Checkout the [Overview](https://nvidia.github.io/cuda-python/cuda-bindings/latest/overview.html) for the workflow and performance results.
3+
`cuda.bindings` is a standard set of low-level interfaces, providing full coverage of and access to the CUDA host APIs from Python. Checkout the [Overview](https://nvidia.github.io/cuda-python/cuda-bindings/latest/overview.html) for the workflow and performance results.
4+
5+
`cuda.bindings` is a subpackage of `cuda-python`.
46

57
## Installing
68

79
CUDA Python can be installed from:
810

9-
* PYPI
10-
* Conda (nvidia channel)
11+
* PyPI
12+
* Conda (conda-forge/nvidia channels)
1113
* Source builds
1214

1315
Differences between these options are described in [Installation](https://nvidia.github.io/cuda-python/cuda-bindings/latest/install.html) documentation. Each package guarantees minor version compatibility.
@@ -31,6 +33,30 @@ Source builds work for multiple Python versions, however pre-build PyPI and Cond
3133

3234
* Python 3.9 to 3.12
3335

36+
## Developing
37+
38+
We use `pre-commit` to manage various tools to help development and ensure consistency.
39+
```shell
40+
pip install pre-commit
41+
```
42+
43+
### Code linting
44+
45+
Run this command before checking in the code changes
46+
```shell
47+
pre-commit run -a --show-diff-on-failure
48+
```
49+
to ensure the code formatting is in line of the requirements (as listed in [`pyproject.toml`](./pyproject.toml)).
50+
51+
### Code signing
52+
53+
This repository implements a security check to prevent the CI system from running untrusted code. A part of the
54+
security check consists of checking if the git commits are signed. See
55+
[here](https://docs.gha-runners.nvidia.com/apps/copy-pr-bot/faqs/#why-did-i-receive-a-comment-that-my-pull-request-requires-additional-validation)
56+
and
57+
[here](https://docs.github.com/en/authentication/managing-commit-signature-verification/about-commit-signature-verification)
58+
for more details, including how to sign your commits.
59+
3460
## Testing
3561

3662
Latest dependencies can be found in [requirements.txt](https://github.com/NVIDIA/cuda-python/blob/main/cuda_bindings/requirements.txt).

cuda_bindings/tests/test_nvjitlink.py

Lines changed: 76 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -4,13 +4,22 @@
44

55
import pytest
66

7-
from cuda.bindings import nvjitlink
7+
from cuda.bindings import nvjitlink, nvrtc
88

9-
ptx_kernel = """
10-
.version 8.5
11-
.target sm_90
9+
# Establish a handful of compatible architectures and PTX versions to test with
10+
ARCHITECTURES = ["sm_60", "sm_75", "sm_80", "sm_90"]
11+
PTX_VERSIONS = ["5.0", "6.4", "7.0", "8.5"]
12+
13+
14+
def ptx_header(version, arch):
15+
return f"""
16+
.version {version}
17+
.target {arch}
1218
.address_size 64
19+
"""
20+
1321

22+
ptx_kernel = """
1423
.visible .entry _Z6kernelPi(
1524
.param .u64 _Z6kernelPi_param_0
1625
)
@@ -28,18 +37,40 @@
2837
"""
2938

3039
minimal_ptx_kernel = """
31-
.version 8.5
32-
.target sm_90
33-
.address_size 64
34-
3540
.func _MinimalKernel()
3641
{
3742
ret;
3843
}
3944
"""
4045

41-
ptx_kernel_bytes = ptx_kernel.encode("utf-8")
42-
minimal_ptx_kernel_bytes = minimal_ptx_kernel.encode("utf-8")
46+
ptx_kernel_bytes = [
47+
(ptx_header(version, arch) + ptx_kernel).encode("utf-8") for version, arch in zip(PTX_VERSIONS, ARCHITECTURES)
48+
]
49+
minimal_ptx_kernel_bytes = [
50+
(ptx_header(version, arch) + minimal_ptx_kernel).encode("utf-8")
51+
for version, arch in zip(PTX_VERSIONS, ARCHITECTURES)
52+
]
53+
54+
55+
# create a valid LTOIR input for testing
56+
@pytest.fixture
57+
def get_dummy_ltoir():
58+
def CHECK_NVRTC(err):
59+
if err != nvrtc.nvrtcResult.NVRTC_SUCCESS:
60+
raise RuntimeError(f"Nvrtc Error: {err}")
61+
62+
empty_cplusplus_kernel = "__global__ void A() {}"
63+
err, program_handle = nvrtc.nvrtcCreateProgram(empty_cplusplus_kernel.encode(), b"", 0, [], [])
64+
CHECK_NVRTC(err)
65+
nvrtc.nvrtcCompileProgram(program_handle, 1, [b"-dlto"])
66+
err, size = nvrtc.nvrtcGetLTOIRSize(program_handle)
67+
CHECK_NVRTC(err)
68+
empty_kernel_ltoir = b" " * size
69+
(err,) = nvrtc.nvrtcGetLTOIR(program_handle, empty_kernel_ltoir)
70+
CHECK_NVRTC(err)
71+
(err,) = nvrtc.nvrtcDestroyProgram(program_handle)
72+
CHECK_NVRTC(err)
73+
return empty_kernel_ltoir
4374

4475

4576
def test_unrecognized_option_error():
@@ -52,39 +83,41 @@ def test_invalid_arch_error():
5283
nvjitlink.create(1, ["-arch=sm_XX"])
5384

5485

55-
def test_create_and_destroy():
56-
handle = nvjitlink.create(1, ["-arch=sm_53"])
86+
@pytest.mark.parametrize("option", ARCHITECTURES)
87+
def test_create_and_destroy(option):
88+
handle = nvjitlink.create(1, [f"-arch={option}"])
5789
assert handle != 0
5890
nvjitlink.destroy(handle)
5991

6092

61-
def test_complete_empty():
62-
handle = nvjitlink.create(1, ["-arch=sm_90"])
93+
@pytest.mark.parametrize("option", ARCHITECTURES)
94+
def test_complete_empty(option):
95+
handle = nvjitlink.create(1, [f"-arch={option}"])
6396
nvjitlink.complete(handle)
6497
nvjitlink.destroy(handle)
6598

6699

67-
def test_add_data():
68-
handle = nvjitlink.create(1, ["-arch=sm_90"])
69-
nvjitlink.add_data(handle, nvjitlink.InputType.ANY, ptx_kernel_bytes, len(ptx_kernel_bytes), "test_data")
70-
nvjitlink.add_data(
71-
handle, nvjitlink.InputType.ANY, minimal_ptx_kernel_bytes, len(minimal_ptx_kernel_bytes), "minimal_test_data"
72-
)
100+
@pytest.mark.parametrize("option, ptx_bytes", zip(ARCHITECTURES, ptx_kernel_bytes))
101+
def test_add_data(option, ptx_bytes):
102+
handle = nvjitlink.create(1, [f"-arch={option}"])
103+
nvjitlink.add_data(handle, nvjitlink.InputType.ANY, ptx_bytes, len(ptx_bytes), "test_data")
73104
nvjitlink.complete(handle)
74105
nvjitlink.destroy(handle)
75106

76107

77-
def test_add_file(tmp_path):
78-
handle = nvjitlink.create(1, ["-arch=sm_90"])
108+
@pytest.mark.parametrize("option, ptx_bytes", zip(ARCHITECTURES, ptx_kernel_bytes))
109+
def test_add_file(option, ptx_bytes, tmp_path):
110+
handle = nvjitlink.create(1, [f"-arch={option}"])
79111
file_path = tmp_path / "test_file.cubin"
80-
file_path.write_bytes(ptx_kernel_bytes)
112+
file_path.write_bytes(ptx_bytes)
81113
nvjitlink.add_file(handle, nvjitlink.InputType.ANY, str(file_path))
82114
nvjitlink.complete(handle)
83115
nvjitlink.destroy(handle)
84116

85117

86-
def test_get_error_log():
87-
handle = nvjitlink.create(1, ["-arch=sm_90"])
118+
@pytest.mark.parametrize("option", ARCHITECTURES)
119+
def test_get_error_log(option):
120+
handle = nvjitlink.create(1, [f"-arch={option}"])
88121
nvjitlink.complete(handle)
89122
log_size = nvjitlink.get_error_log_size(handle)
90123
log = bytearray(log_size)
@@ -93,9 +126,10 @@ def test_get_error_log():
93126
nvjitlink.destroy(handle)
94127

95128

96-
def test_get_info_log():
97-
handle = nvjitlink.create(1, ["-arch=sm_90"])
98-
nvjitlink.add_data(handle, nvjitlink.InputType.ANY, ptx_kernel_bytes, len(ptx_kernel_bytes), "test_data")
129+
@pytest.mark.parametrize("option, ptx_bytes", zip(ARCHITECTURES, ptx_kernel_bytes))
130+
def test_get_info_log(option, ptx_bytes):
131+
handle = nvjitlink.create(1, [f"-arch={option}"])
132+
nvjitlink.add_data(handle, nvjitlink.InputType.ANY, ptx_bytes, len(ptx_bytes), "test_data")
99133
nvjitlink.complete(handle)
100134
log_size = nvjitlink.get_info_log_size(handle)
101135
log = bytearray(log_size)
@@ -104,9 +138,10 @@ def test_get_info_log():
104138
nvjitlink.destroy(handle)
105139

106140

107-
def test_get_linked_cubin():
108-
handle = nvjitlink.create(1, ["-arch=sm_90"])
109-
nvjitlink.add_data(handle, nvjitlink.InputType.ANY, ptx_kernel_bytes, len(ptx_kernel_bytes), "test_data")
141+
@pytest.mark.parametrize("option, ptx_bytes", zip(ARCHITECTURES, ptx_kernel_bytes))
142+
def test_get_linked_cubin(option, ptx_bytes):
143+
handle = nvjitlink.create(1, [f"-arch={option}"])
144+
nvjitlink.add_data(handle, nvjitlink.InputType.ANY, ptx_bytes, len(ptx_bytes), "test_data")
110145
nvjitlink.complete(handle)
111146
cubin_size = nvjitlink.get_linked_cubin_size(handle)
112147
cubin = bytearray(cubin_size)
@@ -115,11 +150,16 @@ def test_get_linked_cubin():
115150
nvjitlink.destroy(handle)
116151

117152

118-
def test_get_linked_ptx():
119-
# TODO improve this test to call get_linked_ptx without this error
120-
handle = nvjitlink.create(2, ["-arch=sm_90", "-lto"])
121-
with pytest.raises(nvjitlink.nvJitLinkError, match="ERROR_NVVM_COMPILE"):
122-
nvjitlink.complete(handle)
153+
@pytest.mark.parametrize("option", ARCHITECTURES)
154+
def test_get_linked_ptx(option, get_dummy_ltoir):
155+
handle = nvjitlink.create(3, [f"-arch={option}", "-lto", "-ptx"])
156+
nvjitlink.add_data(handle, nvjitlink.InputType.LTOIR, get_dummy_ltoir, len(get_dummy_ltoir), "test_data")
157+
nvjitlink.complete(handle)
158+
ptx_size = nvjitlink.get_linked_ptx_size(handle)
159+
ptx = bytearray(ptx_size)
160+
nvjitlink.get_linked_ptx(handle, ptx)
161+
assert len(ptx) == ptx_size
162+
nvjitlink.destroy(handle)
123163

124164

125165
def test_package_version():

cuda_core/README.md

Lines changed: 35 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,43 @@
11
# `cuda.core`: (experimental) pythonic CUDA module
22

3-
Currently under active development. To build from source, just do:
3+
Currently under active developmen; see [the documentation](https://nvidia.github.io/cuda-python/cuda-core/latest/) for more details.
4+
5+
## Installing
6+
7+
TO build from source, just do:
48
```shell
59
$ git clone https://github.com/NVIDIA/cuda-python
610
$ cd cuda-python/cuda_core # move to the directory where this README locates
711
$ pip install .
812
```
913
For now `cuda-python` is a required dependency.
14+
15+
## Developing
16+
17+
We use `pre-commit` to manage various tools to help development and ensure consistency.
18+
```shell
19+
pip install pre-commit
20+
```
21+
22+
### Code linting
23+
24+
Run this command before checking in the code changes
25+
```shell
26+
pre-commit run -a --show-diff-on-failure
27+
```
28+
to ensure the code formatting is in line of the requirements (as listed in [`pyproject.toml`](./pyproject.toml)).
29+
30+
### Code signing
31+
32+
This repository implements a security check to prevent the CI system from running untrusted code. A part of the
33+
security check consists of checking if the git commits are signed. See
34+
[here](https://docs.gha-runners.nvidia.com/apps/copy-pr-bot/faqs/#why-did-i-receive-a-comment-that-my-pull-request-requires-additional-validation)
35+
and
36+
[here](https://docs.github.com/en/authentication/managing-commit-signature-verification/about-commit-signature-verification)
37+
for more details, including how to sign your commits.
38+
39+
## Testing
40+
41+
To run these tests:
42+
* `python -m pytest tests/` against local builds
43+
* `pytest tests/` against installed packages

cuda_core/cuda/core/experimental/_stream.py

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -211,9 +211,7 @@ def wait(self, event_or_stream: Union[Event, Stream]):
211211
try:
212212
stream = Stream._init(event_or_stream)
213213
except Exception as e:
214-
raise ValueError(
215-
"only an Event, Stream, or object supporting __cuda_stream__ can be waited"
216-
) from e
214+
raise ValueError("only an Event, Stream, or object supporting __cuda_stream__ can be waited") from e
217215
else:
218216
stream = event_or_stream
219217
event = handle_return(cuda.cuEventCreate(cuda.CUevent_flags.CU_EVENT_DISABLE_TIMING))

cuda_python/docs/source/index.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ multiple components:
99
- `cuda.cooperative`_: Pythonic exposure of CUB cooperative algorithms
1010
- `cuda.parallel`_: Pythonic exposure of Thrust parallel algorithms
1111

12-
For access to NVIDIA Math Libraries, please refer to `nvmath-python`_.
12+
For access to NVIDIA CPU & GPU Math Libraries, please refer to `nvmath-python`_.
1313

1414
.. _nvmath-python: https://docs.nvidia.com/cuda/nvmath-python/latest
1515

0 commit comments

Comments
 (0)