[libspirv][ptx-nvidiacl] Change clcgroup_scratch size to 32 x i128 #18431

wenju-he · 2025-05-13T02:06:05Z

To align with the comment in the file that specifies 32 storage locations and 128 bits per warp.
Change file to opaque pointer mode.
Add more global variables for different sizes to resolve Reducing storage for small data types.

To align with the comment in the file that specifies 32 storage locations and 128 bits per warp.

frasercrmck

Do we even still need all the @__clc__get_group_scratch_<type> overloads? With opaque pointers they're all equivalent.

frasercrmck · 2025-05-19T10:43:07Z

libclc/libspirv/lib/ptx-nvidiacl/group/collectives_helpers.ll


 define i8 addrspace(3)* @__clc__get_group_scratch_bool() nounwind alwaysinline {
 entry:
-  %ptr = getelementptr inbounds [128 x i64], [128 x i64] addrspace(3)* @__clc__group_scratch, i64 0, i64 0


We could/should probably rewrite this to use opaque pointers while we're here. If we do, almost all of this can go away. You could just return ptr addrspace(3) @__clc__group_scratch for every overload, I think?

We could/should probably rewrite this to use opaque pointers while we're here.

done

If we do, almost all of this can go away. You could just return ptr addrspace(3) @__clc__group_scratch for every overload, I think?

I added more global variables for different sizes. It should resolve the comment Reducing storage for small data types or increasing it for user-defined types will likely require an additional pass to track group algorithm usage on the top of the file.

Thanks. I take it the scratch memory isn't mean to be shared between the different types? If so we couldn''t have separate globals in this way.

frasercrmck · 2025-05-19T10:46:39Z

libclc/libspirv/lib/ptx-nvidiacl/group/collectives_helpers.ll

  %cast = bitcast i64 addrspace(3)* %ptr to i8 addrspace(3)*
  ret i8 addrspace(3)* %cast
 }

 define i8 addrspace(3)* @__clc__get_group_scratch_char() nounwind alwaysinline {
 entry:
-  %ptr = getelementptr inbounds [128 x i64], [128 x i64] addrspace(3)* @__clc__group_scratch, i64 0, i64 0
+  %ptr = getelementptr inbounds [32 x i128], [32 x i128] addrspace(3)* @__clc__group_scratch, i64 0, i64 0
  %cast = bitcast i64 addrspace(3)* %ptr to i8 addrspace(3)*


If we don't switch to opaque pointers, this needs fixing. Having bitcast i64 is incorrect as %ptr would be i128 addrspace(3)*?

frasercrmck · 2025-05-21T17:07:40Z

libclc/libspirv/lib/ptx-nvidiacl/group/collectives_helpers.ll

-  %ptr = getelementptr inbounds [128 x i64], [128 x i64] addrspace(3)* @__clc__group_scratch, i64 0, i64 0
-  %cast = bitcast i64 addrspace(3)* %ptr to i8 addrspace(3)*
-  ret i8 addrspace(3)* %cast
+  %0 = getelementptr inbounds [32 x i1], ptr addrspace(3) @__clc__group_scratch_i1, i64 0, i64 0


This is equivalent to simply ret ptr addrspace(3) @__clc_group_scratch_i1. I don't think we need these getelementptrs at all.

That's why I'm not sure we even really need the _<TYPE> functions anymore. We could probably just have one unified scratch helper.

This is equivalent to simply ret ptr addrspace(3) @__clc_group_scratch_i1. I don't think we need these getelementptrs at all.

done, thanks

That's why I'm not sure we even really need the _<TYPE> functions anymore. We could probably just have one unified scratch helper.

Using multiple global variable with different sizes can solve the overestimation of local variable size issue, as mentioned in deleted comment by this PR. For instance, if a test is using char type, there is no need to use a local variable of size 32 x i128.

frasercrmck

Yeah, I suppose the tradeoff is that if an application uses multiple types then we'll end up using more memory than before. I don't have a good idea which to prioritise so I'm okay with this as you've proposed it.

We could probably combine these two files in a subsequent PR.

libclc/libspirv/lib/amdgcn-amdhsa/group/collectives_helpers.ll

wenju-he · 2025-05-26T00:44:58Z

Yeah, I suppose the tradeoff is that if an application uses multiple types then we'll end up using more memory than before.

I think that is a general issue that is unrelated to this PR, because any kernel that uses multiple local variables in multiple non-kernel functions would have the same issue. It would lead to overestimation of the total size of used local memory in a kernel. However, I think the backend should be able to optimize this case and accurately report the the total size of used local memory.

wenju-he · 2025-05-26T09:49:07Z

@intel/llvm-gatekeepers please merge, thanks. Jenkins/Precommit fail is probably infrastructure issue.

[libspirv][ptx-nvidiacl] Change __clc__group_scratch size to 32 x i128

0f5efc3

To align with the comment in the file that specifies 32 storage locations and 128 bits per warp.

wenju-he requested a review from a team as a code owner May 13, 2025 02:06

wenju-he requested a review from ldrumm May 13, 2025 02:06

wenju-he temporarily deployed to WindowsCILock May 13, 2025 02:06 — with GitHub Actions Inactive

wenju-he temporarily deployed to WindowsCILock May 13, 2025 02:31 — with GitHub Actions Inactive

frasercrmck reviewed May 19, 2025

View reviewed changes

opaque pointer, add global variables for all sizes

cf9ec4a

wenju-he had a problem deploying to WindowsCILock May 20, 2025 01:30 — with GitHub Actions Error

undef -> poison

8898ee5

wenju-he temporarily deployed to WindowsCILock May 20, 2025 01:37 — with GitHub Actions Inactive

wenju-he temporarily deployed to WindowsCILock May 20, 2025 02:05 — with GitHub Actions Inactive

wenju-he had a problem deploying to WindowsCILock May 20, 2025 02:05 — with GitHub Actions Failure

wenju-he temporarily deployed to WindowsCILock May 20, 2025 02:41 — with GitHub Actions Inactive

wenju-he requested a review from frasercrmck May 20, 2025 03:11

frasercrmck reviewed May 21, 2025

View reviewed changes

remove gep

8c02604

wenju-he temporarily deployed to WindowsCILock May 22, 2025 01:03 — with GitHub Actions Inactive

wenju-he requested a review from frasercrmck May 22, 2025 01:05

wenju-he temporarily deployed to WindowsCILock May 22, 2025 01:31 — with GitHub Actions Inactive

frasercrmck approved these changes May 22, 2025

View reviewed changes

libclc/libspirv/lib/amdgcn-amdhsa/group/collectives_helpers.ll Outdated Show resolved Hide resolved

i32 addrspace(3)* -> ptr addrspace(3)

0693f5e

wenju-he temporarily deployed to WindowsCILock May 25, 2025 23:19 — with GitHub Actions Inactive

wenju-he temporarily deployed to WindowsCILock May 25, 2025 23:41 — with GitHub Actions Inactive

frasercrmck approved these changes May 26, 2025

View reviewed changes

uditagarwal97 merged commit 89f6a39 into intel:sycl May 26, 2025
23 of 24 checks passed

wenju-he deleted the ptx-nvidiacl-collectives_helpers.ll branch May 27, 2025 00:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[libspirv][ptx-nvidiacl] Change clcgroup_scratch size to 32 x i128 #18431

[libspirv][ptx-nvidiacl] Change clcgroup_scratch size to 32 x i128 #18431

Uh oh!

wenju-he commented May 13, 2025 •

edited

Loading

Uh oh!

frasercrmck left a comment

Uh oh!

frasercrmck May 19, 2025

Uh oh!

wenju-he May 20, 2025

Uh oh!

frasercrmck May 21, 2025

Uh oh!

frasercrmck May 19, 2025

Uh oh!

wenju-he May 20, 2025

Uh oh!

frasercrmck May 21, 2025

Uh oh!

frasercrmck May 21, 2025

Uh oh!

wenju-he May 22, 2025

Uh oh!

wenju-he May 22, 2025

Uh oh!

frasercrmck left a comment

Uh oh!

Uh oh!

wenju-he commented May 26, 2025

Uh oh!

wenju-he commented May 26, 2025

Uh oh!

Uh oh!

Uh oh!

[libspirv][ptx-nvidiacl] Change __clc__group_scratch size to 32 x i128 #18431

[libspirv][ptx-nvidiacl] Change __clc__group_scratch size to 32 x i128 #18431

Uh oh!

Conversation

wenju-he commented May 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

frasercrmck left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

frasercrmck left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

wenju-he commented May 26, 2025

Uh oh!

wenju-he commented May 26, 2025

Uh oh!

Uh oh!

Uh oh!

[libspirv][ptx-nvidiacl] Change clcgroup_scratch size to 32 x i128 #18431

[libspirv][ptx-nvidiacl] Change clcgroup_scratch size to 32 x i128 #18431

wenju-he commented May 13, 2025 •

edited

Loading