-
Notifications
You must be signed in to change notification settings - Fork 787
[SYCL][ESIMD] Implement accessor-based gather/scatter and scalar I/O #2700
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
/// Load a scalar value from an accessor. | ||
template <typename T, typename AccessorTy, CacheHint L1H = CacheHint::None, | ||
CacheHint L3H = CacheHint::None> | ||
ESIMD_INLINE ESIMD_NODEBUG T load(AccessorTy acc, uint32_t offset) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is basically redundant to SIMD1 gather, which is not commonly used in CM for performance reason. Do we really need to expose a dedicated API for this? The naming is too general and could be a source of confusion, perhaps change to scalar_load to make a clear distinction with block_load.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is basically redundant to SIMD1 gather
This is a convenient shortcut to calling generic gather/scatter. So I don't think it is redundant. scalar_load
sounds good.
is not commonly used in CM for performance reason. Do we really need to expose a dedicated API for this?
there are cases when scalar access is needed by algorithm (e.g. haar), so yes, I think we need this API for scalar access via accessors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Please update name to scalar_load/scalar_store.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
|
||
// surface index-based gather/scatter: | ||
// num blocks, scale, surface index, global offset, elem offsets | ||
{"surf_read", {"gather.scaled2", {t(3), c16(0), aSI(1), a(2), a(3)}}}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not map to gather.scaled? It may need general predicate support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. I'm actually following advice from the BE team (@aus-intel) who recommended to use this one for now. When predication is needed, we can add this w/o breaking user code.
607cb77
to
198e799
Compare
…access. Signed-off-by: Konstantin S Bobrovsky <[email protected]>
198e799
to
4883ae0
Compare
Doing compatibility testing. Please do not merge. |
This PR implements accessor-based gather/scatter and scalar memory access.
Functional tests: intel/llvm-test-suite#43
Signed-off-by: Konstantin S Bobrovsky [email protected]