-
Notifications
You must be signed in to change notification settings - Fork 787
[ESIMD] Add semi-dynamic SLM allocation - esimd::experimental::slm_allocator. #7759
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…locator. This patch adds new class - slm_allocator - and its lowering. This is RAII-style class used to implement "semi-dynamic" SLM allocation. SLM is allocated in the constructor and released in the destructor, that's why it is "dynamic", as opposed to fully static allocation style of 'slm_init'. Actual offset of SLM chunk allocated by the call is calculated at compile time, that's why it is "semi-". To calculate SLM usage by a kernel, compiler finds a path in a callgraph with the largest amount of SLM "locked" by slm_allocator objects live along the paths. slm_init call also participates in calculating SLM budget. It can be modelled as slm_allocator object declared at the very beginning of a kernel and live till its the very end. Since a call graph is used, function pointers and recursion is not supported. Signed-off-by: Konstantin S Bobrovsky <[email protected]>
Unit tests are in flight |
@@ -1755,6 +1755,36 @@ __ESIMD_API simd<T, N> atomic_update(T *p, simd<unsigned, N> offset, | |||
p, offset, src1, src0, mask); | |||
} | |||
|
|||
/// RAII-style class used to implement "semi-dynamic" SLM allocation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@petercaday, please review this part. E2E test - intel/llvm-test-suite#1449.
@@ -19,7 +19,9 @@ | |||
// deducible when BE is ready | |||
|
|||
#include "llvm/SYCLLowerIR/LowerInvokeSimd.h" | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rolandschulz, please review this file
@kbobrovs - please take a look at BUILD+LIT fail: esimd/genx_func_attr.cpp |
Yes, looking into this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please take a look at few ESIMD fails, e.g.
slm_gather_scatter_rgba.cpp test prints:
LLVM ERROR: slm_init call met in non-kernel function _ZNK6KernelIiLj8ELN4sycl3_V13ext5intel5esimd17rgba_channel_maskE1EEclENS1_7nd_itemILi1EEE
/verify with intel/llvm-test-suite#1449 |
@kbobrovs - I just noticed that esimd/genx_func_attr.cpp test fails. Is that the test that you mentioned as the one failing with slm_init call in SYCL EXTERNAL FUNCTION ? |
@asudarsa - would you please review this patch to let Konst fix the issues in sycl-post-link if there are any concerns from your side. |
I have started looking at this. Sorry for delay. I had a quick question about how we handle overflows. Thanks |
Overflows are detected at runtime by the JITter, as we don't know available SLM size at compile-time. |
Yes, I'm looking into this |
Hi @kbobrovs and @v-klochkov All the sycl-post-link related changes are NFC changes and look fine. I will take a quick look at the full patch before approval. Hope that's ok. Please let me know if I missed something. Thanks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good from DPCPP Tools perspective. Thanks
/// as a class' template argument. | ||
/// | ||
/// Since a call graph is used, function pointers and recursion is not | ||
/// supported. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these cases detected and flagged as compile-time errors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, no. Recursion is not a problem, because it is not supported by SYCL. But presence of function pointers should be detected, agree. Can diagnostics be added as a follow-up commit? Tagging @v-klochkov.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But presence of function pointers should be detected, agree.
@petercaday, I added diagnostics in 8d20150
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @kbobrovs!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Thanks for implementing this, @kbobrovs!
@v-klochkov, could you fix pre-commit failures, please? |
I'll start looking at this LIT test fail after returning from vacation (Jan 2023) |
Folks, I'm looking into this failure. Hope can fix before the NY |
* maximum SLM offset calculation for each scope * maximum SLM usage per kernel This also fixed a bug in calculating maximum SLM usage per kernel. - Update the unit test to use new API and to require correct SLM size for each kernel. Multiple slm_init's and slm_init's in non-inlined SYCL_EXTERNAL functions did not work anyway. Signed-off-by: kbobrovs <[email protected]>
Signed-off-by: kbobrovs <[email protected]>
Signed-off-by: kbobrovs <[email protected]>
e814d80
to
8d20150
Compare
Signed-off-by: kbobrovs <[email protected]>
@bader, would you mind merging, please? |
Post-commit failed on MacOS due to missing include, the fix is in #7880 |
This patch adds new class - slm_allocator - and its lowering. This is RAII-style class used to implement "semi-dynamic" SLM allocation. SLM is allocated in the constructor and released in the destructor, that's why it is "dynamic", as opposed to fully static allocation style of 'slm_init'. Actual offset of SLM chunk allocated by the call is calculated at compile time, that's why it is "semi-". To calculate SLM usage by a kernel, compiler finds a path in a callgraph with the largest amount of SLM "locked" by slm_allocator objects live along the paths. slm_init call also participates in calculating SLM budget. It can be modelled as slm_allocator object declared at the very beginning of a kernel and live till its the very end.
Since a call graph is used, function pointers and recursion is not supported.
Complementary E2E test: intel/llvm-test-suite#1449.
Signed-off-by: Konstantin S Bobrovsky [email protected]