You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[SYCL][NVPTX] Split max_work_group_size into 3 NVVM annotations (#14420)
NVVM IR supports separated maxntidx, maxntidy, and maxntidz annotations.
The backend will print them individually as three dimensions. This
better preserves programmer intent than prematurely flattening them
together.
Note that the semantics are in fact identical; the CUDA implementation
internally multiplies all dimensions together and only guarantees that
the total is never exceeded, but not that any individual dimension is
not exceeded. Thus 64,1,1 is identical to 4,4,4.
We try and preserve a logical mapping of dimensions by index flipping
between SYCL (z,y,x) and NVVM (x,y,z) in CUDA terminology despite, as
mentioned above, it being largely irrelevant.
Also this patch simplifies the attribute's getter functions as all
dimensions are mandatory, and the getters seemed copied from the
reqd_work_group_size attribute where some are optional.
We could probably improve the code further by making the operands
"unsigned" and not "Expr", and renaming them from X,Y,Z to Dim{0,1,2} as
per the SYCL spec. This has been left for future work, however, as
there's a non-trivial amount of code that expects to be able to treat
the max_work_group_size and reqd_work_group_size attributes identically
through templates and identical helper methods.
0 commit comments