Skip to content

Implement Kernel.num_arguments, and Kernel.arguments_info #612

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
May 14, 2025

Conversation

oleksandr-pavlyk
Copy link
Contributor

Description

closes #568

This PR implements Kernel.num_arguments and Kernel.argument_info properties:

  • Kernel.num_arguments returns the number of arguments in the kernel instance
  • Kernel.argument_info returns a list of tuples (offset, size) which describes layout of the struct containing all kernel arguments,

Checklist

  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

Copy link
Contributor

copy-pr-bot bot commented May 6, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

Copy link
Member

@leofang leofang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Sasha! Looks great except for the docstrings, see my comment below.

We also need to add a feature entry to cuda_core/docs/source/release/0.3.0-notes.rst.

@leofang leofang added P1 Medium priority - Should do feature New feature or request cuda.core Everything related to the cuda.core module labels May 6, 2025
@leofang leofang added this to the cuda.core beta 4 milestone May 6, 2025
…erties

Also parametrize test to check arguments_info to check with all int,
and all short arguments.
@oleksandr-pavlyk oleksandr-pavlyk marked this pull request as ready for review May 8, 2025 13:58
Copy link
Contributor

copy-pr-bot bot commented May 8, 2025

Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@oleksandr-pavlyk
Copy link
Contributor Author

/ok to test

This comment has been minimized.

rwgk
rwgk previously approved these changes May 8, 2025
Copy link
Collaborator

@rwgk rwgk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM although I'd use a dataclass for better ergonomics.

@oleksandr-pavlyk
Copy link
Contributor Author

I fixed pre-commit, and added a line to "0.3.0-notes.rst".

@oleksandr-pavlyk oleksandr-pavlyk requested review from rwgk and leofang May 8, 2025 20:07
rwgk
rwgk previously approved these changes May 8, 2025
Copy link
Collaborator

@rwgk rwgk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two minor suggestions, please see here for background:

https://chatgpt.com/share/681d1a68-97d0-8008-bad5-40ae287f7d55

(When I wrote the first prompt I had "module scope" in mind, but I think class scope is better.)

Used modern namedtuple instance constructor.
@oleksandr-pavlyk
Copy link
Contributor Author

This perplexes me:

(dev-cuda-python) opavlyk@ee09c48-lcedt:~/repos/cuda-python/cuda_core$ ipython
Python 3.12.10 | packaged by conda-forge | (main, Apr 10 2025, 22:21:13) [GCC 13.3.0]
Type 'copyright', 'credits' or 'license' for more information
IPython 9.2.0 -- An enhanced Interactive Python. Type '?' for help.
Tip: Use `object?` to see the help on `object`, `object??` to view its source

In [1]: import cuda.core.experimental as cc

In [2]: o1 = cc.Program("__global__ void bar(double *p, int n, double x) { *p = n * x; }", code_type="c++").compile("cubin", name_expressions=("bar",))

In [3]: k1 = o1.get_kernel("bar")

In [4]: k1.arguments_info
Out[4]: []

In [5]: k1.num_arguments
Out[5]: 0

@oleksandr-pavlyk oleksandr-pavlyk requested a review from rwgk May 8, 2025 22:25
rwgk
rwgk previously approved these changes May 8, 2025
Copy link
Collaborator

@rwgk rwgk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks perfect to me. Caveat: I cannot explain the ipython observation.

@leofang
Copy link
Member

leofang commented May 9, 2025

This perplexes me

Let me look into it tmr

@oleksandr-pavlyk oleksandr-pavlyk requested a review from leofang May 13, 2025 14:01
@leofang
Copy link
Member

leofang commented May 13, 2025

nit: as much as I love credit attribution (thanks), let's not do @someone in the commit messages. GitHub still handles them badly (after all these years) and I end up receiving a ton of notifications every time someone forks/rebases/pushes 😞

@leofang
Copy link
Member

leofang commented May 13, 2025

/ok to test 2ca9704

leofang
leofang previously approved these changes May 13, 2025
This required moving ParamInfo definition from class scope
to module scope, since referencing Kernel.ParamInfo from
annotations of methods of the Kernel class results in error
that Kernel class does not yet exist.
rwgk
rwgk previously approved these changes May 13, 2025
@leofang
Copy link
Member

leofang commented May 14, 2025

/ok to test 55f6d31

leofang
leofang previously approved these changes May 14, 2025
@leofang leofang enabled auto-merge (squash) May 14, 2025 01:25
@oleksandr-pavlyk
Copy link
Contributor Author

Test that num_arguments should raise if CUDA has not been initialized failed:

_________________________ test_num_args_error_handling _________________________

deinit_cuda = None, cuda12_prerequisite_check = True

    @skipif_testing_with_compute_sanitizer
    def test_num_args_error_handling(deinit_cuda, cuda12_prerequisite_check):
        if not cuda12_prerequisite_check:
            pytest.skip("Test requires CUDA 12")
        src = "__global__ void foo(int a) { }"
        prog = Program(src, code_type="c++")
        mod = prog.compile(
            "cubin",
            name_expressions=("foo",),
        )
        krn = mod.get_kernel("foo")
>       with pytest.raises(CUDAError):
E       Failed: DID NOT RAISE <class 'cuda.core.experimental._utils.cuda_utils.CUDAError'>

tests/test_module.py:246: Failed

Did I misunderstand the intent of deinit_cuda fixture?

@leofang
Copy link
Member

leofang commented May 14, 2025

I feel deinit_cuda might be a confusing name. The purpose of it is to clean up the per-thread device instances + ensure no CUDA context is set to current after the test finishes. Did you mean to do this before the test starts?

@oleksandr-pavlyk
Copy link
Contributor Author

oleksandr-pavlyk commented May 14, 2025

I meant it to ensure that cuda is not initialized at the start of this test. Kind of like opposite of cuda_init fixture

@leofang
Copy link
Member

leofang commented May 14, 2025

Yes feel free to implement any fixture you need. No need to refactor the existing fixtures if you don't want to sweat in this PR -- but then we should create an issue to track it (to the very least, we should rename deinit_cuda to something like deinit_device_at_teardown).

To make sure we understand "initialized" in the same way -- You don't mean driver.cuInit(0), do you? Note that it is a once-per-process call, effectively as if std::call_once() is used (I dunno how it's implemented exactly in the driver), so any subsequent calls are cheap no-ops, and there's no way to undo cuInit(0).

Use in test_module.py::test_num_args_error_handling
Add comments
@oleksandr-pavlyk oleksandr-pavlyk dismissed stale reviews from leofang and rwgk via 912ae11 May 14, 2025 12:46
@oleksandr-pavlyk
Copy link
Contributor Author

/ok to test

@oleksandr-pavlyk
Copy link
Contributor Author

/ok to test

1. Changed fixture to provide a function that empties the stack of
   contexts. The function has hidden max_iters bound. If exceeded,
   a RuntimeError is raised

2. Modified _device_unset_current utility function to return a boolean.
   True is returned is a context was popped, False if the stack was
   already empty.
@oleksandr-pavlyk
Copy link
Contributor Author

/ok to test

@oleksandr-pavlyk oleksandr-pavlyk requested review from leofang and rwgk May 14, 2025 21:02
Copy link
Collaborator

@rwgk rwgk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple cosmetic suggestions. Optional.

raise NotImplementedError("New backend is required")
arg_pos = 0
param_info_data = []
while True:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe (sorry I overlooked this before):

        for arg_pos in itertools.count():

Then you don't need arg_pos = 0 above and arg_pos = arg_pos + 1 below.

Comment on lines +64 to +68
if _device_unset_current():
# context was popped, continue until stack is empty
continue
# no active context, we are ready
break
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe shorter (replace 5 lines with 2):

            if not _device_unset_current():
                break

@leofang leofang merged commit 71762af into NVIDIA:main May 14, 2025
75 checks passed
Copy link

Doc Preview CI
Preview removed because the pull request was closed or merged.

@oleksandr-pavlyk oleksandr-pavlyk deleted the add-num-arguments branch May 15, 2025 01:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuda.core Everything related to the cuda.core module feature New feature or request P1 Medium priority - Should do
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Expose cuda.core.Kernel.attributes.num_arguments
3 participants