Skip to content

Various documentation improvements #547

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Apr 7, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 5 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,13 @@ CUDA Python is the home for accessing NVIDIA’s CUDA platform from Python. It c

* [cuda.core](https://nvidia.github.io/cuda-python/cuda-core/latest): Pythonic access to CUDA Runtime and other core functionalities
* [cuda.bindings](https://nvidia.github.io/cuda-python/cuda-bindings/latest): Low-level Python bindings to CUDA C APIs
* [cuda.cooperative](https://nvidia.github.io/cccl/cuda_cooperative/): A Python package for easy access to highly efficient and customizable parallel algorithms, like `sort`, `scan`, `reduce`, `transform`, etc.
* [cuda.parallel](https://nvidia.github.io/cccl/cuda_parallel/): A Python package providing CUB's reusable block-wide and warp-wide primitives for use within Numba CUDA kernels
* [cuda.cooperative](https://nvidia.github.io/cccl/cuda_cooperative/): A Python package providing CCCL's reusable block-wide and warp-wide *device* primitives for use within Numba CUDA kernels
* [cuda.parallel](https://nvidia.github.io/cccl/cuda_parallel/): A Python package for easy access to CCCL's highly efficient and customizable parallel algorithms, like `sort`, `scan`, `reduce`, `transform`, etc, that are callable on the *host*
* [numba.cuda](https://nvidia.github.io/numba-cuda/): Numba's target for CUDA GPU programming by directly compiling a restricted subset of Python code into CUDA kernels and device functions following the CUDA execution model.

For access to NVIDIA CPU & GPU Math Libraries, please refer to [nvmath-python](https://docs.nvidia.com/cuda/nvmath-python/latest).

CUDA Python is currently undergoing an overhaul to improve existing and bring up new components. All of the previously available functionalities from the cuda-python package will continue to be available, please refer to the [cuda.bindings](https://nvidia.github.io/cuda-python/cuda-bindings/latest) documentation for installation guide and further detail.
CUDA Python is currently undergoing an overhaul to improve existing and bring up new components. All of the previously available functionalities from the `cuda-python` package will continue to be available, please refer to the [cuda.bindings](https://nvidia.github.io/cuda-python/cuda-bindings/latest) documentation for installation guide and further detail.

## cuda-python as a metapackage

Expand Down Expand Up @@ -37,9 +38,4 @@ The list of available interfaces are:
* CUDA Runtime
* NVRTC
* nvJitLink

## Supported Python Versions

All `cuda-python` subpackages follows CPython [End-Of-Life](https://devguide.python.org/versions/) schedule for supported Python version guarantee.

Before dropping support there will be an issue raised as a notice.
* NVVM
27 changes: 4 additions & 23 deletions cuda_bindings/DESCRIPTION.rst
Original file line number Diff line number Diff line change
@@ -1,26 +1,7 @@
*******************************************************
****************************************
cuda.bindings: Low-level CUDA interfaces
*******************************************************
****************************************

`cuda.bindings` is a standard set of low-level interfaces, providing full coverage of and access to the CUDA host APIs from Python. Checkout the `Overview <https://nvidia.github.io/cuda-python/cuda-bindings/latest/overview.html>`_ for the workflow and performance results.
`cuda.bindings` is a standard set of low-level interfaces, providing full coverage of and 1:1 access to the CUDA host APIs from Python. Checkout the `Overview <https://nvidia.github.io/cuda-python/cuda-bindings/latest/overview.html>`_ for the workflow and performance results.

Installation
============

`cuda.bindings` can be installed from:

* PyPI
* Conda (conda-forge/nvidia channels)
* Source builds

Differences between these options are described in `Installation <https://nvidia.github.io/cuda-python/cuda-bindings/latest/install.html>`_ documentation. Each package guarantees minor version compatibility.

Runtime Dependencies
====================

`cuda.bindings` is supported on all the same platforms as CUDA. Specific dependencies are as follows:

* Driver: Linux (450.80.02 or later) Windows (456.38 or later)
* CUDA Toolkit 12.x

Only the NVRTC and nvJitLink redistributable components are required from the CUDA Toolkit, which can be obtained via PyPI, Conda, or local installers (as described in the CUDA Toolkit `Windows <https://docs.nvidia.com/cuda/cuda-installation-guide-microsoft-windows/index.html>`_ and `Linux <https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html>`_ Installation Guides).
For the installation instruction, please refer to the `Installation <https://nvidia.github.io/cuda-python/cuda-bindings/latest/install.html>`_ page.
19 changes: 2 additions & 17 deletions cuda_bindings/README.md
Original file line number Diff line number Diff line change
@@ -1,27 +1,12 @@
# `cuda.bindings`: Low-level CUDA interfaces

`cuda.bindings` is a standard set of low-level interfaces, providing full coverage of and access to the CUDA host APIs from Python. Checkout the [Overview](https://nvidia.github.io/cuda-python/cuda-bindings/latest/overview.html) for the workflow and performance results.
`cuda.bindings` is a standard set of low-level interfaces, providing full coverage of and access to the CUDA host APIs from Python. Checkout the [Overview page](https://nvidia.github.io/cuda-python/cuda-bindings/latest/overview.html) for the workflow and performance results.

`cuda.bindings` is a subpackage of `cuda-python`.

## Installing

`cuda.bindings` can be installed from:

* PyPI
* Conda (conda-forge/nvidia channels)
* Source builds

Differences between these options are described in [Installation](https://nvidia.github.io/cuda-python/cuda-bindings/latest/install.html) documentation. Each package guarantees minor version compatibility.

## Runtime Dependencies

`cuda.bindings` is supported on all the same platforms as CUDA. Specific dependencies are as follows:

* Driver: Linux (450.80.02 or later) Windows (456.38 or later)
* CUDA Toolkit 12.x

Only the NVRTC and nvJitLink redistributable components are required from the CUDA Toolkit, which can be obtained via PyPI, Conda, or local installers (as described in the CUDA Toolkit [Windows](https://docs.nvidia.com/cuda/cuda-installation-guide-microsoft-windows/index.html) and [Linux](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html) Installation Guides).
Please refer to the [Installation page](https://nvidia.github.io/cuda-python/cuda-bindings/latest/install.html) for instructions and required/optional dependencies.

## Developing

Expand Down
2 changes: 1 addition & 1 deletion cuda_bindings/docs/build_docs.sh
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ fi
# version selector or directory structure.
if [[ -z "${SPHINX_CUDA_BINDINGS_VER}" ]]; then
export SPHINX_CUDA_BINDINGS_VER=$(python -c "from importlib.metadata import version; \
ver = '.'.join(str(version('cuda-python')).split('.')[:3]); \
ver = '.'.join(str(version('cuda-bindings')).split('.')[:3]); \
print(ver)" \
| awk -F'+' '{print $1}')
fi
Expand Down
21 changes: 19 additions & 2 deletions cuda_bindings/docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
# -- Project information -----------------------------------------------------

project = "cuda.bindings"
copyright = "2021-2024, NVIDIA"
copyright = "2021-2025, NVIDIA"
author = "NVIDIA"

# The full version, including alpha/beta/rc tags
Expand All @@ -30,7 +30,14 @@
# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = ["sphinx.ext.autodoc", "sphinx.ext.napoleon", "myst_nb", "enum_tools.autoenum"]
extensions = [
"sphinx.ext.autodoc",
"sphinx.ext.napoleon",
"sphinx.ext.intersphinx",
"myst_nb",
"enum_tools.autoenum",
"sphinx_copybutton",
]

nb_execution_mode = "off"
numfig = True
Expand Down Expand Up @@ -85,6 +92,16 @@
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ["_static"]

# skip cmdline prompts
copybutton_exclude = ".linenos, .gp"

intersphinx_mapping = {
"python": ("https://docs.python.org/3/", None),
"numpy": ("https://numpy.org/doc/stable/", None),
"nvvm": ("https://docs.nvidia.com/cuda/libnvvm-api/", None),
"nvjitlink": ("https://docs.nvidia.com/cuda/nvjitlink/", None),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to add nvrtc or the CUDA runtime / driver APIs here?

Copy link
Member Author

@leofang leofang Apr 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently the two codegens set different expectations regarding API doc generation. The codegen used by driver/runtime/nvrtc regenerates the entire C API references in the docs (with signatures adjusted to match Python), whereas the codegen used by nvvm/nvjitlink generates basic docs with a see also link to the corresponding C API. The addition of these links here contain proper objects.inv that allows cross-linking, e.g.
https://nvidia.github.io/cuda-python/pr-preview/pr-547/cuda-bindings/latest/module/nvjitlink.html#cuda.bindings.nvjitlink.create
The see also link works.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(FWIW regarding objects.inv, it's a binary that is generated by Sphinx and can be introspected by the intersphinx plugin.)

}

suppress_warnings = [
# for warnings about multiple possible targets, see NVIDIA/cuda-python#152
"ref.python",
Expand Down
10 changes: 8 additions & 2 deletions cuda_bindings/docs/source/install.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,13 @@

`cuda.bindings` supports the same platforms as CUDA. Runtime dependencies are:

* Linux (x86-64, arm64) and Windows (x86-64)
* Python 3.9 - 3.13
* Driver: Linux (450.80.02 or later) Windows (456.38 or later)
* CUDA Toolkit 12.x
* Optionally, NVRTC, nvJitLink, and NVVM from CUDA Toolkit 12.x

```{note}
Only the NVRTC and nvJitLink redistributable components are required from the CUDA Toolkit, which can be obtained via PyPI, Conda, or local installers (as described in the CUDA Toolkit [Windows](https://docs.nvidia.com/cuda/cuda-installation-guide-microsoft-windows/index.html) and [Linux](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html) Installation Guides).
The optional CUDA Toolkit components can be installed via PyPI, Conda, OS-specific package managers, or local installers (as described in the CUDA Toolkit [Windows](https://docs.nvidia.com/cuda/cuda-installation-guide-microsoft-windows/index.html) and [Linux](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html) Installation Guides).
```

Starting from v12.8.0, `cuda-python` becomes a meta package which currently depends only on `cuda-bindings`; in the future more sub-packages will be added to `cuda-python`. In the instructions below, we still use `cuda-python` as example to serve existing users, but everything is applicable to `cuda-bindings` as well.
Expand Down Expand Up @@ -44,13 +46,17 @@ $ conda install -c conda-forge cuda-python
### Requirements

* CUDA Toolkit headers[^1]
* CUDA Runtime static library[^2]

[^1]: User projects that `cimport` CUDA symbols in Cython must also use CUDA Toolkit (CTK) types as provided by the `cuda.bindings` major.minor version. This results in CTK headers becoming a transitive dependency of downstream projects through CUDA Python.

[^2]: The CUDA Runtime static library (`libcudart_static.a` on Linux, `cudart_static.lib` on Windows) is part of the CUDA Toolkit. If using conda packages, it is contained in the `cuda-cudart-static` package.

Source builds require that the provided CUDA headers are of the same major.minor version as the `cuda.bindings` you're trying to build. Despite this requirement, note that the minor version compatibility is still maintained. Use the `CUDA_HOME` (or `CUDA_PATH`) environment variable to specify the location of your headers. For example, if your headers are located in `/usr/local/cuda/include`, then you should set `CUDA_HOME` with:

```console
$ export CUDA_HOME=/usr/local/cuda
$ export LIBRARY_PATH=$CUDA_HOME/lib64:$LIBRARY_PATH
Comment on lines 58 to +59
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for the future: we really shouldn't need to set these when things are installed in standard locations.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, at some point we should revisit the build-time search behavior. Currently we limit to full explicitness at build time (no implicit auto-discovery behind users' back), while the ongoing path finder project (#451) focuses on run-time use cases. Once the path finder is mature we can consider using it at build time too (cc @rwgk for vis).

FWIW though, right now it is not as bad as it seems. If CUDA is installed via Linux system pkg mgr or conda, we need at most $CUDA_HOME defined. The system or conda compiler knows where the static library is. So this is required really only for CUDA installed to non-default locations.

BTW, Python projects can be built against CUDA wheels, as long as they don't contain device code that needs to be compiled by nvcc. I've enabled this for cuquantum-python/nvmath-python, and it's quite handy actually. It's only a matter of time that we also propagate this capability to cuda-bindings. The only downside is that if the C libraries are not yet public, building against public C wheels does not work for obvious reasons and we need a fallback (i.e. the current behavior).

```

See [Environment Variables](environment_variables.md) for a description of other build-time environment variables.
Expand Down
2 changes: 2 additions & 0 deletions cuda_bindings/docs/source/module/nvjitlink.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
.. default-role:: cpp:any

nvjitlink
=========

Expand Down
2 changes: 2 additions & 0 deletions cuda_bindings/docs/source/module/nvvm.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
.. default-role:: cpp:any

nvvm
====

Expand Down
2 changes: 1 addition & 1 deletion cuda_bindings/docs/source/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -205,7 +205,7 @@ argument on either host or device. Since we already prepared each of our argumen
construction of our final contiguous array is done by retrieving the `XX.ctypes.data`
of each kernel argument.

```{code-cell} python
```python
args = [a, dX, dY, dOut, n]
args = np.array([arg.ctypes.data for arg in args], dtype=np.uint64)
```
Expand Down
6 changes: 6 additions & 0 deletions cuda_bindings/docs/source/release/11.8.7-notes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,9 @@ Highlights

* The ``cuda.bindings.nvvm`` Python module was added, wrapping the
`libNVVM C API <https://docs.nvidia.com/cuda/libnvvm-api/>`_.


Bug fixes
---------

* Fix segfault when converting char* NULL to bytes
16 changes: 15 additions & 1 deletion cuda_bindings/docs/source/release/12.X.Y-notes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,5 +11,19 @@ Highlights
`libNVVM C API <https://docs.nvidia.com/cuda/libnvvm-api/>`_.
* Source build error checking added for missing required headers
* Statically link CUDA Runtime instead of reimplementing it
* Fix performance hint warnings raised by Cython 3
* Move stream callback wrappers to the Python layer
* Return code construction is made faster

Bug fixes
---------

* Fix segfault when converting char* NULL to bytes


Miscellaneous
-------------

* Benchmark suite is updated
* Improvements in the introductory code samples
* Fix performance hint warnings raised by Cython 3
* Improvements in the Overview page
6 changes: 3 additions & 3 deletions cuda_core/docs/source/install.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ dependencies are as follows:

[^1]: Including `cuda-python`.

`cuda.core` supports Python 3.9 - 3.12, on Linux (x86-64, arm64) and Windows (x86-64).
`cuda.core` supports Python 3.9 - 3.13, on Linux (x86-64, arm64) and Windows (x86-64).

## Installing from PyPI

Expand All @@ -22,8 +22,8 @@ $ pip install cuda-core[cu12]
```
and likewise use `[cu11]` for CUDA 11.

Note that using `cuda.core` with NVRTC or nvJitLink installed from PyPI via `pip install` is currently
not supported. This will be fixed in a future release.
Note that using `cuda.core` with NVRTC or nvJitLink installed from PyPI via `pip install` requires
`cuda.bindings` 12.8.0+ or 11.8.6+.

## Installing from Conda (conda-forge)

Expand Down
1 change: 1 addition & 0 deletions cuda_python/docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -95,4 +95,5 @@
.. _cuda.bindings: {CUDA_PYTHON_DOMAIN}/cuda-bindings/latest
.. _cuda.cooperative: https://nvidia.github.io/cccl/cuda_cooperative/
.. _cuda.parallel: https://nvidia.github.io/cccl/cuda_parallel/
.. _numba.cuda: https://nvidia.github.io/numba-cuda/
"""
6 changes: 4 additions & 2 deletions cuda_python/docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,9 @@ multiple components:

- `cuda.core`_: Pythonic access to CUDA runtime and other core functionalities
- `cuda.bindings`_: Low-level Python bindings to CUDA C APIs
- `cuda.cooperative`_: A Python package for easy access to highly efficient and customizable parallel algorithms, like `sort`, `scan`, `reduce`, `transform`, etc.
- `cuda.parallel`_: A Python package providing CUB's reusable block-wide and warp-wide primitives for use within Numba CUDA kernels
- `cuda.cooperative`_: A Python package providing CCCL's reusable block-wide and warp-wide *device* primitives for use within Numba CUDA kernels
- `cuda.parallel`_: A Python package for easy access to CCCL's highly efficient and customizable parallel algorithms, like ``sort``, ``scan``, ``reduce``, ``transform``, etc, that are callable on the *host*
- `numba.cuda`_: Numba's target for CUDA GPU programming by directly compiling a restricted subset of Python code into CUDA kernels and device functions following the CUDA execution model.

For access to NVIDIA CPU & GPU Math Libraries, please refer to `nvmath-python`_.

Expand All @@ -30,5 +31,6 @@ be available, please refer to the `cuda.bindings`_ documentation for installatio
cuda.bindings <https://nvidia.github.io/cuda-python/cuda-bindings/latest>
cuda.cooperative <https://nvidia.github.io/cccl/cuda_cooperative>
cuda.parallel <https://nvidia.github.io/cccl/cuda_parallel>
numba.cuda <https://nvidia.github.io/numba-cuda/>
conduct.md
contribute.md
Loading