[ESIMD] Fix atomic_update() implementation for N=16 and N=32 on Gen12 #12722
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
atomic_update() for USM and ACC N=16,32 were lowered to SVM/DWORD atomic
intrinsics even though the HW instructions on Gen12 supported only
N up to 8 for USM and up to 16 for ACC.
GPU had legalization pass for N that split longer vectors to smaller and available in HW.
That GPU optimization/legalization workes incorrectly for USM as it
splits longer vectors assuming instruction is available for N=16 in case
of USM, which is not correct.
The patch here implements splitting of N=16 and N=32 cases for
atomic_update(usm, ...) to N=8 vectors until GPU fixes the legalization
for USM atomic_update.