Skip to content

cuda.core v0.1.1 final doc touch #301

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Dec 20, 2024
2 changes: 2 additions & 0 deletions cuda_core/docs/source/install.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@ dependencies are as follows:

[^1]: Including `cuda-python`.

`cuda.core` supports Python 3.9 - 3.12, on Linux (x86-64, arm64) and Windows (x86-64).

## Installing from PyPI

`cuda.core` works with `cuda.bindings` (part of `cuda-python`) 11 or 12. For example with CUDA 12:
Expand Down
2 changes: 1 addition & 1 deletion cuda_core/docs/source/release/0.1.0-notes.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# `cuda.core` Release notes
# `cuda.core` v0.1.0 Release notes

Released on Nov 8, 2024

Expand Down
25 changes: 15 additions & 10 deletions cuda_core/docs/source/release/0.1.1-notes.md
Original file line number Diff line number Diff line change
@@ -1,25 +1,28 @@
# `cuda.core` Release notes
# `cuda.core` v0.1.1 Release notes

Released on Dec XX, 2024
Released on Dec 20, 2024

## Hightlights

- Add `StridedMemoryView` and `@args_viewable_as_strided_memory` that provide a concrete
implementation of DLPack & CUDA Array Interface supports.
- Add `Linker` that can link one or multiple `ObjectCode` instances generated by `Program`s. Under
the hood, it uses either the nvJitLink or cuLink APIs depending on the CUDA version detected
in the current environment.
- Add a `cuda.core.experimental.system` module for querying system- or process- wide information.
- Support TCC devices with a default synchronous memory resource to avoid the use of memory pools
- Add `Linker` that can link one or multiple `ObjectCode` instances generated by `Program`. Under
the hood, it uses either the nvJitLink or driver (`cuLink*`) APIs depending on the CUDA version
detected in the current environment.
- Support `pip install cuda-core`. Please see the Installation Guide for further details.

## New features

- Add a `cuda.core.experimental.system` module for querying system- or process- wide information.
- Add `LaunchConfig.cluster` to support thread block clusters on Hopper GPUs.

## Enchancements

- Ensure "ltoir" is a valid code type to `ObjectCode`.
- Improve test coverage.
- The internal handle held by `ObjectCode` is now lazily initialized upon first touch.
- Support TCC devices with a default synchronous memory resource to avoid the use of memory pools.
- Ensure `"ltoir"` is a valid code type to `ObjectCode`.
- Document the `__cuda_stream__` protocol.
- Improve test coverage & documentation cross-references.
- Enforce code formatting.

## Bug fixes
Expand All @@ -35,4 +38,6 @@ Released on Dec XX, 2024
not supported. This will be fixed in a future release.
- Some `LinkerOptions` are only available when using a modern version of CUDA. When using CUDA <12,
the backend is the cuLink api which supports only a subset of the options that nvjitlink does.
Further, some options aren't available on CUDA versions <12.6
Further, some options aren't available on CUDA versions <12.6.
- To use `cuda.core` with Python 3.13, it currently requires building `cuda-python` from source
prior to `pip install`. This extra step will be fixed soon.