-
Notifications
You must be signed in to change notification settings - Fork 787
[SYCL][NVPTX] Split max_work_group_size into 3 NVVM annotations #14420
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL][NVPTX] Split max_work_group_size into 3 NVVM annotations #14420
Conversation
NVVM IR supports separated maxntidx, maxntidy, and maxntidz annotations. The backend will print them individually as three dimensions. This better preserves programmer intent than prematurely flattening them together. Note that the semantics are in fact identical; the CUDA implementation internally multiplies all dimensions together and only guarantees that the total is never exceeded, but not that any individual dimension is not exceeded. Thus 64,1,1 is identical to 4,4,4. We try and preserve a logical mapping of dimensions by index flipping between SYCL (z,y,x) and NVVM (x,y,z) in CUDA terminology despite, as mentioned above, it being largely irrelevant. Also this patch simplifies the attribute's getter functions as all dimensions are mandatory, and the getters seemed copied from the reqd_work_group_size attribute where some are optional. We could probably improve the code further by making the operands "unsigned" and not "Expr", and renaming them from X,Y,Z to Dim{0,1,2} as per the SYCL spec. This has been left for future work, however, as there's a non-trivial amount of code that expects to be able to treat the max_work_group_size and reqd_work_group_size attributes identically through templates and identical helper methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great improvement! 🚀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thank you
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🏅
@intel/llvm-gatekeepers this is good to merge, thank you! |
NVVM IR supports separated maxntidx, maxntidy, and maxntidz annotations. The backend will print them individually as three dimensions. This better preserves programmer intent than prematurely flattening them together.
Note that the semantics are in fact identical; the CUDA implementation internally multiplies all dimensions together and only guarantees that the total is never exceeded, but not that any individual dimension is not exceeded. Thus 64,1,1 is identical to 4,4,4.
We try and preserve a logical mapping of dimensions by index flipping between SYCL (z,y,x) and NVVM (x,y,z) in CUDA terminology despite, as mentioned above, it being largely irrelevant.
Also this patch simplifies the attribute's getter functions as all dimensions are mandatory, and the getters seemed copied from the reqd_work_group_size attribute where some are optional.
We could probably improve the code further by making the operands "unsigned" and not "Expr", and renaming them from X,Y,Z to Dim{0,1,2} as per the SYCL spec. This has been left for future work, however, as there's a non-trivial amount of code that expects to be able to treat the max_work_group_size and reqd_work_group_size attributes identically through templates and identical helper methods.