Skip to content

[SYCL][NVPTX] Split max_work_group_size into 3 NVVM annotations #14420

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jul 4, 2024

Conversation

frasercrmck
Copy link
Contributor

NVVM IR supports separated maxntidx, maxntidy, and maxntidz annotations. The backend will print them individually as three dimensions. This better preserves programmer intent than prematurely flattening them together.

Note that the semantics are in fact identical; the CUDA implementation internally multiplies all dimensions together and only guarantees that the total is never exceeded, but not that any individual dimension is not exceeded. Thus 64,1,1 is identical to 4,4,4.

We try and preserve a logical mapping of dimensions by index flipping between SYCL (z,y,x) and NVVM (x,y,z) in CUDA terminology despite, as mentioned above, it being largely irrelevant.

Also this patch simplifies the attribute's getter functions as all dimensions are mandatory, and the getters seemed copied from the reqd_work_group_size attribute where some are optional.

We could probably improve the code further by making the operands "unsigned" and not "Expr", and renaming them from X,Y,Z to Dim{0,1,2} as per the SYCL spec. This has been left for future work, however, as there's a non-trivial amount of code that expects to be able to treat the max_work_group_size and reqd_work_group_size attributes identically through templates and identical helper methods.

NVVM IR supports separated maxntidx, maxntidy, and maxntidz annotations.
The backend will print them individually as three dimensions. This
better preserves programmer intent than prematurely flattening them
together.

Note that the semantics are in fact identical; the CUDA implementation
internally multiplies all dimensions together and only guarantees that
the total is never exceeded, but not that any individual dimension is
not exceeded. Thus 64,1,1 is identical to 4,4,4.

We try and preserve a logical mapping of dimensions by index flipping
between SYCL (z,y,x) and NVVM (x,y,z) in CUDA terminology despite, as
mentioned above, it being largely irrelevant.

Also this patch simplifies the attribute's getter functions as all
dimensions are mandatory, and the getters seemed copied from the
reqd_work_group_size attribute where some are optional.

We could probably improve the code further by making the operands
"unsigned" and not "Expr", and renaming them from X,Y,Z to Dim{0,1,2} as
per the SYCL spec. This has been left for future work, however, as
there's a non-trivial amount of code that expects to be able to treat
the max_work_group_size and reqd_work_group_size attributes identically
through templates and identical helper methods.
@frasercrmck frasercrmck requested a review from a team as a code owner July 3, 2024 13:21
Copy link
Contributor

@steffenlarsen steffenlarsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great improvement! 🚀

Copy link
Contributor

@smanna12 smanna12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you

Copy link
Contributor

@jchlanda jchlanda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🏅

@frasercrmck
Copy link
Contributor Author

@intel/llvm-gatekeepers this is good to merge, thank you!

@martygrant martygrant merged commit ef62cad into intel:sycl Jul 4, 2024
14 checks passed
@frasercrmck frasercrmck deleted the sycl-split-maxntid-annotation branch July 4, 2024 15:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants