Skip to content

Commit 2279bda

Browse files
committed
Merge branch 'main' into path_finder_review1
2 parents bdfc6a7 + 19df0d9 commit 2279bda

File tree

10 files changed

+199
-63
lines changed

10 files changed

+199
-63
lines changed

CONTRIBUTING.md

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,3 +12,29 @@ Thank you for your interest in contributing to CUDA Python! Based on the type of
1212
- Please refer to each component's guideline:
1313
- [`cuda.core`](https://nvidia.github.io/cuda-python/cuda-core/latest/contribute.html)
1414
- [`cuda.bindings`](https://nvidia.github.io/cuda-python/cuda-bindings/latest/contribute.html)
15+
16+
## Pre-commit
17+
This project uses [pre-commit.ci](https://pre-commit.ci/) with GitHub Actions. All pull requests are automatically checked for pre-commit compliance, and any pre-commit failures will block merging until resolved.
18+
19+
To set yourself up for running pre-commit checks locally and to catch issues before pushing your changes, follow these steps:
20+
21+
* Install pre-commit with: `pip install pre-commit`
22+
* You can manually check all files at any time by running: `pre-commit run --all-files`
23+
24+
This command runs all configured hooks (such as linters and formatters) across your repository, letting you review and address issues before committing.
25+
26+
**Optional: Enable automatic checks on every commit**
27+
If you want pre-commit hooks to run automatically each time you make a commit, install the git hook with:
28+
29+
`pre-commit install`
30+
31+
This sets up a git pre-commit hook so that all configured checks will run before each commit is accepted. If any hook fails, the commit will be blocked until the issues are resolved.
32+
33+
**Note on workflow flexibility**
34+
Some contributors prefer to commit intermediate or work-in-progress changes that may not pass all pre-commit checks, and only clean up their commits before pushing (for example, by squashing and running `pre-commit run --all-files` manually at the end). If this fits your workflow, you may choose not to run `pre-commit install` and instead rely on manual checks. This approach avoids disruption during iterative development, while still ensuring code quality before code is shared or merged.
35+
36+
Choose the setup that best fits your workflow and development style.
37+
38+
## Code signing
39+
40+
This repository implements a security check to prevent the CI system from running untrusted code. A part of the security check consists of checking if the git commits are signed. Please ensure that your commits are signed [following GitHub’s instruction](https://docs.github.com/en/authentication/managing-commit-signature-verification/about-commit-signature-verification).

cuda_bindings/DESCRIPTION.rst

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,14 @@
1+
.. SPDX-License-Identifier: LicenseRef-NVIDIA-SOFTWARE-LICENSE
2+
13
****************************************
2-
cuda.bindings: Low-level CUDA interfaces
4+
cuda-bindings: Low-level CUDA interfaces
35
****************************************
46

5-
`cuda.bindings` is a standard set of low-level interfaces, providing full coverage of and 1:1 access to the CUDA host APIs from Python. Checkout the `Overview <https://nvidia.github.io/cuda-python/cuda-bindings/latest/overview.html>`_ for the workflow and performance results.
7+
`cuda.bindings <https://nvidia.github.io/cuda-python/cuda-bindings/>`_ is a standard set of low-level interfaces, providing full coverage of and 1:1 access to the CUDA host APIs from Python. Checkout the `Overview <https://nvidia.github.io/cuda-python/cuda-bindings/latest/overview.html>`_ for the workflow and performance results.
8+
9+
* `Repository <https://github.com/NVIDIA/cuda-python/tree/main/cuda_bindings>`_
10+
* `Documentation <https://nvidia.github.io/cuda-python/cuda-bindings/>`_
11+
* `Examples <https://github.com/NVIDIA/cuda-python/tree/main/cuda_bindings/examples>`_
12+
* `Issue tracker <https://github.com/NVIDIA/cuda-python/issues/>`_
613

714
For the installation instruction, please refer to the `Installation <https://nvidia.github.io/cuda-python/cuda-bindings/latest/install.html>`_ page.

cuda_bindings/README.md

Lines changed: 1 addition & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -2,35 +2,13 @@
22

33
`cuda.bindings` is a standard set of low-level interfaces, providing full coverage of and access to the CUDA host APIs from Python. Checkout the [Overview page](https://nvidia.github.io/cuda-python/cuda-bindings/latest/overview.html) for the workflow and performance results.
44

5-
`cuda.bindings` is a subpackage of `cuda-python`.
6-
75
## Installing
86

97
Please refer to the [Installation page](https://nvidia.github.io/cuda-python/cuda-bindings/latest/install.html) for instructions and required/optional dependencies.
108

119
## Developing
1210

13-
We use `pre-commit` to manage various tools to help development and ensure consistency.
14-
```shell
15-
pip install pre-commit
16-
```
17-
18-
### Code linting
19-
20-
Run this command before checking in the code changes
21-
```shell
22-
pre-commit run -a --show-diff-on-failure
23-
```
24-
to ensure the code formatting is in line of the requirements (as listed in [`pyproject.toml`](./pyproject.toml)).
25-
26-
### Code signing
27-
28-
This repository implements a security check to prevent the CI system from running untrusted code. A part of the
29-
security check consists of checking if the git commits are signed. See
30-
[here](https://docs.gha-runners.nvidia.com/apps/copy-pr-bot/faqs/#why-did-i-receive-a-comment-that-my-pull-request-requires-additional-validation)
31-
and
32-
[here](https://docs.github.com/en/authentication/managing-commit-signature-verification/about-commit-signature-verification)
33-
for more details, including how to sign your commits.
11+
This subpackage adheres to the developing practices described in the parent metapackage [CONTRIBUTING.md](https://github.com/NVIDIA/cuda-python/blob/main/CONTRIBUTING.md).
3412

3513
## Testing
3614

cuda_core/DESCRIPTION.rst

Lines changed: 4 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,10 @@
1+
.. SPDX-License-Identifier: Apache-2.0
2+
13
*******************************************************
24
cuda-core: Pythonic access to CUDA core functionalities
35
*******************************************************
46

5-
`cuda.core <https://nvidia.github.io/cuda-python/cuda-core/>`_ bridges Python's productivity
6-
with CUDA's performance through intuitive and pythonic APIs.
7-
The mission is to provide users full access to all of the core CUDA features in Python,
8-
such as runtime control, compiler and linker.
7+
`cuda.core <https://nvidia.github.io/cuda-python/cuda-core/>`_ bridges Python's productivity with CUDA's performance through intuitive and pythonic APIs. The mission is to provide users full access to all of the core CUDA features in Python, such as runtime control, compiler and linker.
98

109
* `Repository <https://github.com/NVIDIA/cuda-python/tree/main/cuda_core>`_
1110
* `Documentation <https://nvidia.github.io/cuda-python/cuda-core/>`_
@@ -22,6 +21,4 @@ Installation
2221
2322
pip install cuda-core[cu12]
2423
25-
Please refer to the `installation instructions
26-
<https://nvidia.github.io/cuda-python/cuda-core/latest/install.html>`_ for different
27-
ways of installing `cuda.core`, including building from source.
24+
Please refer to the `installation instructions <https://nvidia.github.io/cuda-python/cuda-core/latest/install.html>`_ for different ways of installing `cuda.core`, including building from source.

cuda_core/README.md

Lines changed: 2 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -4,37 +4,11 @@ Currently under active development; see [the documentation](https://nvidia.githu
44

55
## Installing
66

7-
To build from source, just do:
8-
```shell
9-
$ git clone https://github.com/NVIDIA/cuda-python
10-
$ cd cuda-python/cuda_core # move to the directory where this README locates
11-
$ pip install .
12-
```
13-
For now `cuda-python` is a required dependency.
7+
Please refer to the [Installation page](https://nvidia.github.io/cuda-python/cuda-bindings/latest/install.html) for instructions and required/optional dependencies.
148

159
## Developing
1610

17-
We use `pre-commit` to manage various tools to help development and ensure consistency.
18-
```shell
19-
pip install pre-commit
20-
```
21-
22-
### Code linting
23-
24-
Run this command before checking in the code changes
25-
```shell
26-
pre-commit run -a --show-diff-on-failure
27-
```
28-
to ensure the code formatting is in line of the requirements (as listed in [`pyproject.toml`](./pyproject.toml)).
29-
30-
### Code signing
31-
32-
This repository implements a security check to prevent the CI system from running untrusted code. A part of the
33-
security check consists of checking if the git commits are signed. See
34-
[here](https://docs.gha-runners.nvidia.com/apps/copy-pr-bot/faqs/#why-did-i-receive-a-comment-that-my-pull-request-requires-additional-validation)
35-
and
36-
[here](https://docs.github.com/en/authentication/managing-commit-signature-verification/about-commit-signature-verification)
37-
for more details, including how to sign your commits.
11+
This subpackage adheres to the developing practices described in the parent metapackage [CONTRIBUTING.md](https://github.com/NVIDIA/cuda-python/blob/main/CONTRIBUTING.md).
3812

3913
## Testing
4014

cuda_core/examples/pytorch_example.py

Lines changed: 106 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,106 @@
1+
# Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. ALL RIGHTS RESERVED.
2+
#
3+
# SPDX-License-Identifier: Apache-2.0
4+
5+
## Usage: pip install "cuda-core[cu12]"
6+
## python python_example.py
7+
import sys
8+
9+
import torch
10+
11+
from cuda.core.experimental import Device, LaunchConfig, Program, ProgramOptions, launch
12+
13+
# SAXPY kernel - passing a as a pointer to avoid any type issues
14+
code = """
15+
template<typename T>
16+
__global__ void saxpy_kernel(const T* a, const T* x, const T* y, T* out, size_t N) {
17+
const unsigned int tid = threadIdx.x + blockIdx.x * blockDim.x;
18+
if (tid < N) {
19+
// Dereference a to get the scalar value
20+
out[tid] = (*a) * x[tid] + y[tid];
21+
}
22+
}
23+
"""
24+
25+
dev = Device()
26+
dev.set_current()
27+
28+
# Get PyTorch's current stream
29+
pt_stream = torch.cuda.current_stream()
30+
print(f"PyTorch stream: {pt_stream}")
31+
32+
33+
# Create a wrapper class that implements __cuda_stream__
34+
class PyTorchStreamWrapper:
35+
def __init__(self, pt_stream):
36+
self.pt_stream = pt_stream
37+
38+
def __cuda_stream__(self):
39+
stream_id = self.pt_stream.cuda_stream
40+
return (0, stream_id) # Return format required by CUDA Python
41+
42+
43+
s = PyTorchStreamWrapper(pt_stream)
44+
45+
# prepare program
46+
arch = "".join(f"{i}" for i in dev.compute_capability)
47+
program_options = ProgramOptions(std="c++11", arch=f"sm_{arch}")
48+
prog = Program(code, code_type="c++", options=program_options)
49+
mod = prog.compile(
50+
"cubin",
51+
logs=sys.stdout,
52+
name_expressions=("saxpy_kernel<float>", "saxpy_kernel<double>"),
53+
)
54+
55+
# Run in single precision
56+
ker = mod.get_kernel("saxpy_kernel<float>")
57+
dtype = torch.float32
58+
59+
# prepare input/output
60+
size = 64
61+
# Use a single element tensor for 'a'
62+
a = torch.tensor([10.0], dtype=dtype, device="cuda")
63+
x = torch.rand(size, dtype=dtype, device="cuda")
64+
y = torch.rand(size, dtype=dtype, device="cuda")
65+
out = torch.empty_like(x)
66+
67+
# prepare launch
68+
block = 32
69+
grid = int((size + block - 1) // block)
70+
config = LaunchConfig(grid=grid, block=block)
71+
ker_args = (a.data_ptr(), x.data_ptr(), y.data_ptr(), out.data_ptr(), size)
72+
73+
# launch kernel on our stream
74+
launch(s, config, ker, *ker_args)
75+
76+
# check result
77+
assert torch.allclose(out, a.item() * x + y)
78+
print("Single precision test passed!")
79+
80+
# let's repeat again with double precision
81+
ker = mod.get_kernel("saxpy_kernel<double>")
82+
dtype = torch.float64
83+
84+
# prepare input
85+
size = 128
86+
# Use a single element tensor for 'a'
87+
a = torch.tensor([42.0], dtype=dtype, device="cuda")
88+
x = torch.rand(size, dtype=dtype, device="cuda")
89+
y = torch.rand(size, dtype=dtype, device="cuda")
90+
91+
# prepare output
92+
out = torch.empty_like(x)
93+
94+
# prepare launch
95+
block = 64
96+
grid = int((size + block - 1) // block)
97+
config = LaunchConfig(grid=grid, block=block)
98+
ker_args = (a.data_ptr(), x.data_ptr(), y.data_ptr(), out.data_ptr(), size)
99+
100+
# launch kernel on PyTorch's stream
101+
launch(s, config, ker, *ker_args)
102+
103+
# check result
104+
assert torch.allclose(out, a * x + y)
105+
print("Double precision test passed!")
106+
print("All tests passed successfully!")

cuda_core/examples/saxpy.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@
1818
size_t N) {
1919
const unsigned int tid = threadIdx.x + blockIdx.x * blockDim.x;
2020
for (size_t i=tid; i<N; i+=gridDim.x*blockDim.x) {
21-
out[tid] = a * x[tid] + y[tid];
21+
out[i] = a * x[i] + y[i];
2222
}
2323
}
2424
"""

cuda_core/tests/example_tests/utils.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@ def run_example(samples_path, filename, env=None):
3737
exec(script, env if env else {}) # nosec B102
3838
except ImportError as e:
3939
# for samples requiring any of optional dependencies
40-
for m in ("cupy",):
40+
for m in ("cupy", "torch"):
4141
if f"No module named '{m}'" in str(e):
4242
pytest.skip(f"{m} not installed, skipping related tests")
4343
break

cuda_python/DESCRIPTION.rst

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
.. SPDX-License-Identifier: LicenseRef-NVIDIA-SOFTWARE-LICENSE
2+
3+
**************************************************************
4+
cuda-python: Metapackage collection of CUDA Python subpackages
5+
**************************************************************
6+
7+
CUDA Python is the home for accessing NVIDIA's CUDA platform from Python. It consists of multiple components:
8+
9+
* `cuda.core <https://nvidia.github.io/cuda-python/cuda-core/latest>`_: Pythonic access to CUDA Runtime and other core functionalities
10+
* `cuda.bindings <https://nvidia.github.io/cuda-python/cuda-bindings/latest>`_: Low-level Python bindings to CUDA C APIs
11+
* `cuda.cooperative <https://nvidia.github.io/cccl/cuda_cooperative/>`_: A Python package providing CCCL's reusable block-wide and warp-wide *device* primitives for use within Numba CUDA kernels
12+
* `cuda.parallel <https://nvidia.github.io/cccl/cuda_parallel/>`_: A Python package for easy access to CCCL's highly efficient and customizable parallel algorithms, like `sort`, `scan`, `reduce`, `transform`, etc, that are callable on the *host*
13+
* `numba.cuda <https://nvidia.github.io/numba-cuda/>`_: Numba's target for CUDA GPU programming by directly compiling a restricted subset of Python code into CUDA kernels and device functions following the CUDA execution model.
14+
15+
For access to NVIDIA CPU & GPU Math Libraries, please refer to `nvmath-python <https://docs.nvidia.com/cuda/nvmath-python/latest>`_.
16+
17+
CUDA Python is currently undergoing an overhaul to improve existing and bring up new components. All of the previously available functionalities from the `cuda-python` package will continue to be available, please refer to the `cuda.bindings <https://nvidia.github.io/cuda-python/cuda-bindings/latest>`_ documentation for installation guide and further detail.
18+
19+
cuda-python as a metapackage
20+
============================
21+
22+
`cuda-python` is now a metapackage that contains a collection of subpackages. Each subpackage is versioned independently, allowing installation of each component as needed.
23+
24+
Subpackage: cuda.core
25+
---------------------
26+
27+
The `cuda.core` package offers idiomatic, pythonic access to CUDA Runtime and other functionalities.
28+
29+
The goals are to
30+
31+
1. Provide **idiomatic ("pythonic")** access to CUDA Driver, Runtime, and JIT compiler toolchain
32+
2. Focus on **developer productivity** by ensuring end-to-end CUDA development can be performed quickly and entirely in Python
33+
3. **Avoid homegrown** Python abstractions for CUDA for new Python GPU libraries starting from scratch
34+
4. **Ease** developer **burden of maintaining** and catching up with latest CUDA features
35+
5. **Flatten the learning curve** for current and future generations of CUDA developers
36+
37+
Subpackage: cuda.bindings
38+
-------------------------
39+
40+
The `cuda.bindings` package is a standard set of low-level interfaces, providing full coverage of and access to the CUDA host APIs from Python.
41+
42+
The list of available interfaces are:
43+
44+
* CUDA Driver
45+
* CUDA Runtime
46+
* NVRTC
47+
* nvJitLink
48+
* NVVM

cuda_python/pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ build-backend = "setuptools.build_meta"
99
[project]
1010
name = "cuda-python"
1111
description = "CUDA Python: Performance meets Productivity"
12-
readme = {file = "README.md", content-type = "text/markdown"}
12+
readme = {file = "DESCRIPTION.rst", content-type = "text/x-rst"}
1313
authors = [{name = "NVIDIA Corporation", email = "[email protected]"},]
1414
license = "LicenseRef-NVIDIA-SOFTWARE-LICENSE"
1515
classifiers = [

0 commit comments

Comments
 (0)