You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[ESIMD] Fix perf regression caused by assumed align in block_load(usm) (#11850)
The element-size address alignment is valid from correctness point of
view, but using 1-byte and 2-byte alignment implicitly causes
performance regression for block_load(const int8_t *, ...) and
block_load(const int16_t *, ...) because GPU BE have to generate slower
GATHER instead of more efficient BLOCK-LOAD. Without this fix block-load
causes up to 44% performance slow-down on some apps that used
block_load() with alignment assumptions used before block_load(usm, ...,
compile_time_props) was implemented.
The reasoning for the expected/assumed alignment from element-size to
4-bytes for byte- and word-vectors is such:
The idea of block_load() call (opposing to gather() call) is to have
efficient block-load, and thus the assumed alignment is such that
allows to generate block-load. This is a bit more tricky for user
but that is how block_load/store API always worked before: block-load
had restrictions that needed to be honored.
To be on safer side, user can always pass the guaranteed alignment.
---------
Signed-off-by: Klochkov, Vyacheslav N <[email protected]>
0 commit comments