[ESIMD] Fix perf regression caused by assumed align in block_load(usm) #11850

v-klochkov · 2023-11-09T23:58:28Z

The element-size address alignment is valid from correctness point of view, but using 1-byte and 2-byte alignment implicitly causes performance regression for block_load(const int8_t *, ...) and block_load(const int16_t *, ...) because GPU BE have to generate slower GATHER instead of more efficient BLOCK-LOAD. Without this fix block-load causes up to 44% performance slow-down on some apps that used block_load() with alignment assumptions used before block_load(usm, ..., compile_time_props) was implemented.

The reasoning for the expected/assumed alignment from element-size to 4-bytes for byte- and word-vectors is such:
The idea of block_load() call (opposing to gather() call) is to have
efficient block-load, and thus the assumed alignment is such that
allows to generate block-load. This is a bit more tricky for user
but that is how block_load/store API always worked before: block-load
had restrictions that needed to be honored.
To be on safer side, user can always pass the guaranteed alignment.

The element-size address alignment is valid from correctness point of view, but using 1-byte and 2-byte alignment implicitly causes performance regression for block_load(const int8_t *, ...) and block_load(const int16_t *, ...) because GPU BE have to generate slower GATHER instead of more efficient BLOCK-LOAD. Without this fix block-load causes up to 44% performance slow-down on some apps that used block_load() with alignment assumptions used before block_load(usm, ..., compile_time_props) was implemented. The reasoning for the expected/assumed alignment from element-size to 4-bytes for byte- and word-vectors is such: The idea of block_load() call (opposing to gather() call) is to have efficient block-load, and thus the assumed alignment is such that allows to generate block-load. This is a bit more tricky for user but that is how block_load/store API always worked before: block-load had restrictions that needed to be honored. To be on safer side, user can always pass the guaranteed alignment. Signed-off-by: Klochkov, Vyacheslav N <[email protected]>

Signed-off-by: Klochkov, Vyacheslav N <[email protected]>

turinevgeny

Makes sense.

v-klochkov requested a review from a team as a code owner November 9, 2023 23:58

Add a test case for the changes

30d95c0

Signed-off-by: Klochkov, Vyacheslav N <[email protected]>

v-klochkov temporarily deployed to WindowsCILock November 10, 2023 00:44 — with GitHub Actions Inactive

turinevgeny approved these changes Nov 10, 2023

View reviewed changes

v-klochkov temporarily deployed to WindowsCILock November 10, 2023 02:29 — with GitHub Actions Inactive

v-klochkov merged commit c6362a0 into intel:sycl Nov 10, 2023

v-klochkov deleted the esimd_fix_block_load_usm_perf_alignment branch November 10, 2023 03:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ESIMD] Fix perf regression caused by assumed align in block_load(usm) #11850

[ESIMD] Fix perf regression caused by assumed align in block_load(usm) #11850

Uh oh!

v-klochkov commented Nov 9, 2023

Uh oh!

turinevgeny left a comment

Uh oh!

Uh oh!

[ESIMD] Fix perf regression caused by assumed align in block_load(usm) #11850

[ESIMD] Fix perf regression caused by assumed align in block_load(usm) #11850

Uh oh!

Conversation

v-klochkov commented Nov 9, 2023

Uh oh!

turinevgeny left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!