NVIDIA · leofang · Apr 7, 2025 · Apr 5, 2025 · Apr 5, 2025 · Apr 5, 2025
diff --git a/README.md b/README.md
@@ -4,12 +4,13 @@ CUDA Python is the home for accessing NVIDIA’s CUDA platform from Python. It c
 
 * [cuda.core](https://nvidia.github.io/cuda-python/cuda-core/latest): Pythonic access to CUDA Runtime and other core functionalities
 * [cuda.bindings](https://nvidia.github.io/cuda-python/cuda-bindings/latest): Low-level Python bindings to CUDA C APIs
-* [cuda.cooperative](https://nvidia.github.io/cccl/cuda_cooperative/): A Python package for easy access to highly efficient and customizable parallel algorithms, like `sort`, `scan`, `reduce`, `transform`, etc.
-* [cuda.parallel](https://nvidia.github.io/cccl/cuda_parallel/): A Python package providing CUB's reusable block-wide and warp-wide primitives for use within Numba CUDA kernels
+* [cuda.cooperative](https://nvidia.github.io/cccl/cuda_cooperative/): A Python package providing CCCL's reusable block-wide and warp-wide *device* primitives for use within Numba CUDA kernels
+* [cuda.parallel](https://nvidia.github.io/cccl/cuda_parallel/): A Python package for easy access to CCCL's highly efficient and customizable parallel algorithms, like `sort`, `scan`, `reduce`, `transform`, etc, that are callable on the *host*
+* [numba.cuda](https://nvidia.github.io/numba-cuda/): Numba's target for CUDA GPU programming by directly compiling a restricted subset of Python code into CUDA kernels and device functions following the CUDA execution model.
 
 For access to NVIDIA CPU & GPU Math Libraries, please refer to [nvmath-python](https://docs.nvidia.com/cuda/nvmath-python/latest).
 
-CUDA Python is currently undergoing an overhaul to improve existing and bring up new components. All of the previously available functionalities from the cuda-python package will continue to be available, please refer to the [cuda.bindings](https://nvidia.github.io/cuda-python/cuda-bindings/latest) documentation for installation guide and further detail.
+CUDA Python is currently undergoing an overhaul to improve existing and bring up new components. All of the previously available functionalities from the `cuda-python` package will continue to be available, please refer to the [cuda.bindings](https://nvidia.github.io/cuda-python/cuda-bindings/latest) documentation for installation guide and further detail.
 
 ## cuda-python as a metapackage
 
@@ -37,9 +38,4 @@ The list of available interfaces are:
 * CUDA Runtime
 * NVRTC
 * nvJitLink
-
-## Supported Python Versions
-
-All `cuda-python` subpackages follows CPython [End-Of-Life](https://devguide.python.org/versions/) schedule for supported Python version guarantee.
-
-Before dropping support there will be an issue raised as a notice.
+* NVVM
diff --git a/cuda_bindings/DESCRIPTION.rst b/cuda_bindings/DESCRIPTION.rst
@@ -1,26 +1,7 @@
-*******************************************************
+****************************************
 cuda.bindings: Low-level CUDA interfaces
-*******************************************************
+****************************************
 
-`cuda.bindings` is a standard set of low-level interfaces, providing full coverage of and access to the CUDA host APIs from Python. Checkout the `Overview <https://nvidia.github.io/cuda-python/cuda-bindings/latest/overview.html>`_ for the workflow and performance results.
+`cuda.bindings` is a standard set of low-level interfaces, providing full coverage of and 1:1 access to the CUDA host APIs from Python. Checkout the `Overview <https://nvidia.github.io/cuda-python/cuda-bindings/latest/overview.html>`_ for the workflow and performance results.
 
-Installation
-============
-
-`cuda.bindings` can be installed from:
-
-* PyPI
-* Conda (conda-forge/nvidia channels)
-* Source builds
-
-Differences between these options are described in `Installation <https://nvidia.github.io/cuda-python/cuda-bindings/latest/install.html>`_ documentation. Each package guarantees minor version compatibility.
-
-Runtime Dependencies
-====================
-
-`cuda.bindings` is supported on all the same platforms as CUDA. Specific dependencies are as follows:
-
-* Driver: Linux (450.80.02 or later) Windows (456.38 or later)
-* CUDA Toolkit 12.x
-
-Only the NVRTC and nvJitLink redistributable components are required from the CUDA Toolkit, which can be obtained via PyPI, Conda, or local installers (as described in the CUDA Toolkit `Windows <https://docs.nvidia.com/cuda/cuda-installation-guide-microsoft-windows/index.html>`_ and `Linux <https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html>`_ Installation Guides).
+For the installation instruction, please refer to the `Installation <https://nvidia.github.io/cuda-python/cuda-bindings/latest/install.html>`_ page.
diff --git a/cuda_bindings/README.md b/cuda_bindings/README.md
@@ -1,27 +1,12 @@
 # `cuda.bindings`: Low-level CUDA interfaces
 
-`cuda.bindings` is a standard set of low-level interfaces, providing full coverage of and access to the CUDA host APIs from Python. Checkout the [Overview](https://nvidia.github.io/cuda-python/cuda-bindings/latest/overview.html) for the workflow and performance results.
+`cuda.bindings` is a standard set of low-level interfaces, providing full coverage of and access to the CUDA host APIs from Python. Checkout the [Overview page](https://nvidia.github.io/cuda-python/cuda-bindings/latest/overview.html) for the workflow and performance results.
 
 `cuda.bindings` is a subpackage of `cuda-python`.
 
 ## Installing
 
-`cuda.bindings` can be installed from:
-
-* PyPI
-* Conda (conda-forge/nvidia channels)
-* Source builds
-
-Differences between these options are described in [Installation](https://nvidia.github.io/cuda-python/cuda-bindings/latest/install.html) documentation. Each package guarantees minor version compatibility.
-
-## Runtime Dependencies
-
-`cuda.bindings` is supported on all the same platforms as CUDA. Specific dependencies are as follows:
-
-* Driver: Linux (450.80.02 or later) Windows (456.38 or later)
-* CUDA Toolkit 12.x
-
-Only the NVRTC and nvJitLink redistributable components are required from the CUDA Toolkit, which can be obtained via PyPI, Conda, or local installers (as described in the CUDA Toolkit [Windows](https://docs.nvidia.com/cuda/cuda-installation-guide-microsoft-windows/index.html) and [Linux](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html) Installation Guides).
+Please refer to the [Installation page](https://nvidia.github.io/cuda-python/cuda-bindings/latest/install.html) for instructions and required/optional dependencies.
 
 ## Developing
 

diff --git a/cuda_bindings/docs/build_docs.sh b/cuda_bindings/docs/build_docs.sh
@@ -17,7 +17,7 @@ fi
 # version selector or directory structure.
 if [[ -z "${SPHINX_CUDA_BINDINGS_VER}" ]]; then
     export SPHINX_CUDA_BINDINGS_VER=$(python -c "from importlib.metadata import version; \
-                                                 ver = '.'.join(str(version('cuda-python')).split('.')[:3]); \
+                                                 ver = '.'.join(str(version('cuda-bindings')).split('.')[:3]); \
                                                  print(ver)" \
                                       | awk -F'+' '{print $1}')
 fi

diff --git a/cuda_bindings/docs/source/conf.py b/cuda_bindings/docs/source/conf.py
@@ -18,7 +18,7 @@
 # -- Project information -----------------------------------------------------
 
 project = "cuda.bindings"
-copyright = "2021-2024, NVIDIA"
+copyright = "2021-2025, NVIDIA"
 author = "NVIDIA"
 
 # The full version, including alpha/beta/rc tags
@@ -30,7 +30,14 @@
 # Add any Sphinx extension module names here, as strings. They can be
 # extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
 # ones.
-extensions = ["sphinx.ext.autodoc", "sphinx.ext.napoleon", "myst_nb", "enum_tools.autoenum"]
+extensions = [
+    "sphinx.ext.autodoc",
+    "sphinx.ext.napoleon",
+    "sphinx.ext.intersphinx",
+    "myst_nb",
+    "enum_tools.autoenum",
+    "sphinx_copybutton",
+]
 
 nb_execution_mode = "off"
 numfig = True
@@ -85,6 +92,16 @@
 # so a file named "default.css" will overwrite the builtin "default.css".
 html_static_path = ["_static"]
 
+# skip cmdline prompts
+copybutton_exclude = ".linenos, .gp"
+
+intersphinx_mapping = {
+    "python": ("https://docs.python.org/3/", None),
+    "numpy": ("https://numpy.org/doc/stable/", None),
+    "nvvm": ("https://docs.nvidia.com/cuda/libnvvm-api/", None),
+    "nvjitlink": ("https://docs.nvidia.com/cuda/nvjitlink/", None),
+}
+
 suppress_warnings = [
     # for warnings about multiple possible targets, see NVIDIA/cuda-python#152
     "ref.python",

diff --git a/cuda_bindings/docs/source/install.md b/cuda_bindings/docs/source/install.md
@@ -4,11 +4,13 @@
 
 `cuda.bindings` supports the same platforms as CUDA. Runtime dependencies are:
 
+* Linux (x86-64, arm64) and Windows (x86-64)
+* Python 3.9 - 3.13
 * Driver: Linux (450.80.02 or later) Windows (456.38 or later)
-* CUDA Toolkit 12.x
+* Optionally, NVRTC, nvJitLink, and NVVM from CUDA Toolkit 12.x
 
 ```{note}
-Only the NVRTC and nvJitLink redistributable components are required from the CUDA Toolkit, which can be obtained via PyPI, Conda, or local installers (as described in the CUDA Toolkit [Windows](https://docs.nvidia.com/cuda/cuda-installation-guide-microsoft-windows/index.html) and [Linux](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html) Installation Guides).
+The optional CUDA Toolkit components can be installed via PyPI, Conda, OS-specific package managers, or local installers (as described in the CUDA Toolkit [Windows](https://docs.nvidia.com/cuda/cuda-installation-guide-microsoft-windows/index.html) and [Linux](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html) Installation Guides).
 ```
 
 Starting from v12.8.0, `cuda-python` becomes a meta package which currently depends only on `cuda-bindings`; in the future more sub-packages will be added to `cuda-python`. In the instructions below, we still use `cuda-python` as example to serve existing users, but everything is applicable to `cuda-bindings` as well.
@@ -44,13 +46,17 @@ $ conda install -c conda-forge cuda-python
 ### Requirements
 
 * CUDA Toolkit headers[^1]
+* CUDA Runtime static library[^2]
 
 [^1]: User projects that `cimport` CUDA symbols in Cython must also use CUDA Toolkit (CTK) types as provided by the `cuda.bindings` major.minor version. This results in CTK headers becoming a transitive dependency of downstream projects through CUDA Python.
 
+[^2]: The CUDA Runtime static library (`libcudart_static.a` on Linux, `cudart_static.lib` on Windows) is part of the CUDA Toolkit. If using conda packages, it is contained in the `cuda-cudart-static` package.
+
 Source builds require that the provided CUDA headers are of the same major.minor version as the `cuda.bindings` you're trying to build. Despite this requirement, note that the minor version compatibility is still maintained. Use the `CUDA_HOME` (or `CUDA_PATH`) environment variable to specify the location of your headers. For example, if your headers are located in `/usr/local/cuda/include`, then you should set `CUDA_HOME` with:
 
 ```console
 $ export CUDA_HOME=/usr/local/cuda
+$ export LIBRARY_PATH=$CUDA_HOME/lib64:$LIBRARY_PATH
 ```
 
 See [Environment Variables](environment_variables.md) for a description of other build-time environment variables.

diff --git a/cuda_bindings/docs/source/module/nvjitlink.rst b/cuda_bindings/docs/source/module/nvjitlink.rst
@@ -1,3 +1,5 @@
+.. default-role:: cpp:any
+
 nvjitlink
 =========
 

diff --git a/cuda_bindings/docs/source/module/nvvm.rst b/cuda_bindings/docs/source/module/nvvm.rst
@@ -1,3 +1,5 @@
+.. default-role:: cpp:any
+
 nvvm
 ====
 

diff --git a/cuda_bindings/docs/source/overview.md b/cuda_bindings/docs/source/overview.md
@@ -205,7 +205,7 @@ argument on either host or device. Since we already prepared each of our argumen
 construction of our final contiguous array is done by retrieving the `XX.ctypes.data`
 of each kernel argument.
 
-```{code-cell} python
+```python
 args = [a, dX, dY, dOut, n]
 args = np.array([arg.ctypes.data for arg in args], dtype=np.uint64)
 ```

diff --git a/cuda_bindings/docs/source/release/11.8.7-notes.rst b/cuda_bindings/docs/source/release/11.8.7-notes.rst
@@ -9,3 +9,9 @@ Highlights
 
 * The ``cuda.bindings.nvvm`` Python module was added, wrapping the
   `libNVVM C API <https://docs.nvidia.com/cuda/libnvvm-api/>`_.
+
+
+Bug fixes
+---------
+
+* Fix segfault when converting char* NULL to bytes
diff --git a/cuda_bindings/docs/source/release/12.X.Y-notes.rst b/cuda_bindings/docs/source/release/12.X.Y-notes.rst
@@ -11,5 +11,19 @@ Highlights
   `libNVVM C API <https://docs.nvidia.com/cuda/libnvvm-api/>`_.
 * Source build error checking added for missing required headers
 * Statically link CUDA Runtime instead of reimplementing it
-* Fix performance hint warnings raised by Cython 3
 * Move stream callback wrappers to the Python layer
+* Return code construction is made faster
+
+Bug fixes
+---------
+
+* Fix segfault when converting char* NULL to bytes
+
+
+Miscellaneous
+-------------
+
+* Benchmark suite is updated
+* Improvements in the introductory code samples
+* Fix performance hint warnings raised by Cython 3
+* Improvements in the Overview page
diff --git a/cuda_core/docs/source/install.md b/cuda_core/docs/source/install.md
@@ -12,7 +12,7 @@ dependencies are as follows:
 
 [^1]: Including `cuda-python`.
 
-`cuda.core` supports Python 3.9 - 3.12, on Linux (x86-64, arm64) and Windows (x86-64).
+`cuda.core` supports Python 3.9 - 3.13, on Linux (x86-64, arm64) and Windows (x86-64).
 
 ## Installing from PyPI
 
@@ -22,8 +22,8 @@ $ pip install cuda-core[cu12]
 ```
 and likewise use `[cu11]` for CUDA 11.
 
-Note that using `cuda.core` with NVRTC or nvJitLink installed from PyPI via `pip install` is currently
-not supported. This will be fixed in a future release.
+Note that using `cuda.core` with NVRTC or nvJitLink installed from PyPI via `pip install` requires
+`cuda.bindings` 12.8.0+ or 11.8.6+.
 
 ## Installing from Conda (conda-forge)
 

diff --git a/cuda_python/docs/source/conf.py b/cuda_python/docs/source/conf.py
@@ -95,4 +95,5 @@
 .. _cuda.bindings: {CUDA_PYTHON_DOMAIN}/cuda-bindings/latest
 .. _cuda.cooperative: https://nvidia.github.io/cccl/cuda_cooperative/
 .. _cuda.parallel: https://nvidia.github.io/cccl/cuda_parallel/
+.. _numba.cuda: https://nvidia.github.io/numba-cuda/
 """
diff --git a/cuda_python/docs/source/index.rst b/cuda_python/docs/source/index.rst
@@ -6,8 +6,9 @@ multiple components:
 
 - `cuda.core`_: Pythonic access to CUDA runtime and other core functionalities
 - `cuda.bindings`_: Low-level Python bindings to CUDA C APIs
-- `cuda.cooperative`_: A Python package for easy access to highly efficient and customizable parallel algorithms, like `sort`, `scan`, `reduce`, `transform`, etc.
-- `cuda.parallel`_: A Python package providing CUB's reusable block-wide and warp-wide primitives for use within Numba CUDA kernels
+- `cuda.cooperative`_: A Python package providing CCCL's reusable block-wide and warp-wide *device* primitives for use within Numba CUDA kernels
+- `cuda.parallel`_: A Python package for easy access to CCCL's highly efficient and customizable parallel algorithms, like ``sort``, ``scan``, ``reduce``, ``transform``, etc, that are callable on the *host*
+- `numba.cuda`_: Numba's target for CUDA GPU programming by directly compiling a restricted subset of Python code into CUDA kernels and device functions following the CUDA execution model.
 
 For access to NVIDIA CPU & GPU Math Libraries, please refer to `nvmath-python`_.
 
@@ -30,5 +31,6 @@ be available, please refer to the `cuda.bindings`_ documentation for installatio
    cuda.bindings <https://nvidia.github.io/cuda-python/cuda-bindings/latest>
    cuda.cooperative <https://nvidia.github.io/cccl/cuda_cooperative>
    cuda.parallel <https://nvidia.github.io/cccl/cuda_parallel>
+   numba.cuda <https://nvidia.github.io/numba-cuda/>
    conduct.md
    contribute.md
-Original file line number
+Diff line change
@@ -1,3 +1,5 @@
+    .. default-role:: cpp:any
     nvvm
     ====
@@ Expand Down @@