-
Notifications
You must be signed in to change notification settings - Fork 790
[SYCL][Fusion] Enable fusion of kernels with different ND-ranges #8209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL][Fusion] Enable fusion of kernels with different ND-ranges #8209
Conversation
All kernels with the same (or unspecified) local size and offset can be fused. In order to make this work, some builtins getting index space information must be remapped and the resulting ND-range of the fused kernel, calculated. The ND-range of the fused kernel will have: 1. The same number of dimensions as the input ND-range with the higher number of dimensions; 2. The same local size as the shared local size (or unspecified) 3. The same offset as the shared offset 4. The global size will be the **greatest** input global size as per the following ordering: i. Number of work items (enforces correctness); ii. Number of occurrences (less remappings needed); iii. Lexical order of the dimensions (introduces determinism). Builtins obtaining the local/global size/id, work-group id, number of work-groups or offset are remapped introducing as per an alwaysinline function that can be reused along the fusion pass. More information can be found in the Builtins.cpp file, where the remapping logic is implemented. Signed-off-by: Victor Perez <[email protected]>
5e2927d
to
eff2d2f
Compare
Reviewers guide: |
Check that kernel fusion works when fusing kernels with different ND-ranges (different global sizes and dimensions). Implementation: intel/llvm#8209 Signed-off-by: Victor Perez <[email protected]>
/verify with intel/llvm-test-suite#1575 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Detailed comments inline.
Did you test with a shared library build? I suspect that the linker script also must be updated, now that the interface to JIT compiler library was extended & changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implementation looks good now, just a few comments about CMake setup.
I have no rights to merge this commit. Can I get it merged, please? @intel/llvm-gatekeepers |
Pre-commit is failing to build it. Looks like you need an |
@steffenlarsen Thanks for pointing out. I've added the header. Failures are now due to missing tests updates (PR in same state). |
/verify with intel/llvm-test-suite#1575 |
It appears there are two unrelated tests failing. Fusion should not affect those whatsoever. |
Failure in ESIMD/imulh_umulh.cpp addressed in intel/llvm-test-suite#1593. |
Check that kernel fusion works when fusing kernels with different ND-ranges (different global sizes and dimensions). Implementation: intel/llvm#8209 --------- Signed-off-by: Victor Perez <[email protected]> Co-authored-by: Lukas Sommer <[email protected]>
…/llvm-test-suite#1575) Check that kernel fusion works when fusing kernels with different ND-ranges (different global sizes and dimensions). Implementation: intel#8209 --------- Signed-off-by: Victor Perez <[email protected]> Co-authored-by: Lukas Sommer <[email protected]>
All kernels with the same (or unspecified) local size and offset can be fused. In order to make this work, some builtins getting index space information must be remapped and the resulting ND-range of the fused kernel, calculated.
The ND-range of the fused kernel will have:
i. Number of work items (enforces correctness);
ii. Number of occurrences (less remappings needed);
iii. Lexical order of the dimensions (introduces determinism).
Builtins obtaining the local/global size/id, work-group id, number of work-groups or offset are remapped introducing as per an alwaysinline function that can be reused along the fusion pass. More information can be found in the Builtins.cpp file, where the remapping logic is implemented.
Signed-off-by: Victor Perez [email protected]