-
Notifications
You must be signed in to change notification settings - Fork 52
Add Python API, semantics and implementation details for DLPack #106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 3 commits
Commits
Show all changes
12 commits
Select commit
Hold shift + click to select a range
30276e6
Update data interchange section with Python API and semantics
rgommers 79bb689
temporary commit to enable MyST feature
rgommers 4e2c49d
Add DLPack synchronization semantics; add from_dlpack/__dlpack__ to API
rgommers 0b701c4
Update stream numbering for `__dlpack__`
rgommers a4549af
Add __dlpack__ device and update description of stream=None
rgommers a719b18
Add more device-specific notes for CUDA/ROCm stream handling
rgommers 693b15a
Fix issue where producer/consumer were reversed
rgommers 897ca2e
Improve the description of the stream keyword for `__dlpack__`
rgommers d3b9a79
Update __dlpack_device__ to use IntEnum for device type
rgommers 75261cc
Add -1 as a sentinel value for DLPack stream handling
rgommers 5cde9aa
Add supported DLPack version range.
rgommers 603ad2e
Add details on strides null and size 0 arrays.
rgommers File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this is different from how
stream
is specified in https://numba.readthedocs.io/en/latest/cuda/cuda_array_interface.html#python-interface-specification. I actually don't understand that spec - it says forNone
that no synchronization is needed, it uses 1/2 for legacy/per-thread default stream and other integers for non-default streams. Which seems odd - what if the stream number of a non-default stream in use is2
for example?Using:
None
: legacy default stream0
: per-thread default stream1, 2, ...
non-default stream numbersseems to make more sense. @leofang am I missing something there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rgommers
None
is actually confusing across libraries. For example, in NumbaNone
means there is no Numba's default stream (to be distinguished from "CUDA's (whichever) default stream"), whereas in CuPyNone
simply refers to CUDA's default stream, which in turn is1
(the legacy default stream), though we're on the process of adopting2
(the per-thread default stream).0
is not acceptable either for the same reason: it's semantically unclear depending on how the libraries containing CUDA code are compiled and the runtime behavior defined in the Python hooks.Note that in CUDA you don't get to choose the stream numbers --- CUDA macro-defines
1
forcudaStreamLegacy
and2
forcudaStreamPerThread
, which CAI v3 followed. Any user/non-default stream created viacudaStreamCreate()
is guaranteed to start on or after3
. (In fact the CUDA driver would reserve a stream pool internally, so the actual start number is way after3
.) I hope this makes CAI v3 clearer to you.I'll try to catch up the rest of the discussions here as well as in the DLPack repo after Monday (tomorrow)...😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw fun fact: in HIP syncing over the stream
1
or2
would lead to segfault, as HIP does not support them: cupy/cupy#4458 (comment).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation and links @leofang. I updated it to match
__cuda_array_interface__
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @rgommers. Apology, I realized my first sentence wasn't complete; it should have been