[SYCL] Extend global offset intrinsic removal #11909

MartinWehking · 2023-11-16T10:01:44Z

Extend #11674 by modifying the globaloffset optimization pass to always
replace uses of Loads from the llvm.nvvm.implicit.offset and
llvm.amdgcn.implicit.offset intrinsics with constant zeros in the
original non-offset kernel.
Hence, perform the optimization even when
-enable-global-offset=true (default).
Duplicate recursively functions containing calls to the implicit offset
intrinsic and let the implicit offset kernel entry point only call
the original functions (i.e. do not call the functions with added offset
arguments).
Remove zero allocations for the original kernel entry points.

Extend intel#11674 by modifying the globaloffset optimization pass to always replace uses of Loads from the llvm.nvvm.implicit.offset and llvm.amdgcn.implicit.offset intrinsics with constant zeros in the original non-offset kernel. Hence, perform the optimization even when -enable-global-offset=true (default). Duplicate recursively functions containing calls to the implicit offset intrinsic and let the implicit offset kernel entry point only call the original functions (i.e. do not call the functions with added offset arguments). Remove zero allocations for the original kernel entry points.

Use std::get to retrieve an element from a tuple that used to be pair previously. Do not mention deleted allocas of zeros anymore.

Fix an incomplete removal of calls to the implicit offset intrinsics. The global-offset-multiple-calls-from-one-function.ll test highlighted this flaw inside the IR. Recurse only non cloned functions for adding offset parameters and obtain call intructions that are to be replaced through a global VMap.

Check for functions with and without offset parameters inside the NVPTX and AMDGPU global-offset test cases. Take into account that functions that contain calls to the global-offset intrinsic or transitively call one of these functions have two different versions now - one with and one without offset argument. Check that the correct ones are called. Add also a new test case for mixed functions, i.e. functions containing direct calls to the intrinsic and functions that eventually transitively call the intrinsic. Do this to check if the recursion inside the global offset pass handles these correctly.

Copy the global offset correctly to addrspace(3) inside the kernel. Do not perform this operation if the cloned function is not a kernel entry point. Adapt the AMDGPU test cases to take the copying of the global offset into account

Before the global-offset optimization, non-offset and offset kernels called the same function with an added offset argument. The non-offset versions allocated an array with 3 zeros in addrspace(5) and passed it as an argument to the function calls containing calls to the intrinsic. Since the allocation and filling is not possible in addrspace(4), the offset version adds an extra alloca in addrspace(5) and does an addrspace cast to addrspace(4). This is not required anymore since the non-offset version does not perform an alloca of zeros anymore. However, the optimization is rendered almost useless as the global offset feature is deprecated. Adapt the comment to reflect this.

llvm/include/llvm/SYCLLowerIR/GlobalOffset.h

llvm/lib/SYCLLowerIR/GlobalOffset.cpp

jchlanda · 2023-12-06T09:18:16Z

llvm/lib/SYCLLowerIR/GlobalOffset.cpp

+      continue;
+
+    // Kernel entry points need additional processing and change Metdadata.
+    if (EntryPointMetadata.count(Caller) != 0)


Zero is false.

I don't know if it's a good idea to change it. There are two reasons:

We're dealing here with natural numbers (counts), not bools

I didn't introduce the change originally, so it would add more diff

EntryPointMetadata is a map and therefore .count() is often used as contains, i.e. implicit conversion to bool should be ok in that context.

But I agree with (2)

llvm/lib/SYCLLowerIR/GlobalOffset.cpp

llvm/test/CodeGen/AMDGPU/global-offset-intrinsic-function-mix.ll

Martin added 7 commits November 14, 2023 17:43

Fix tuple element retrieval + Fix function doc

1bedc29

Use std::get to retrieve an element from a tuple that used to be pair previously. Do not mention deleted allocas of zeros anymore.

Fix global-offset pass for AMDGPU

126046e

Copy the global offset correctly to addrspace(3) inside the kernel. Do not perform this operation if the cloned function is not a kernel entry point. Adapt the AMDGPU test cases to take the copying of the global offset into account

Refactor documentation + Reformat code

e74f929

Fix formatting issue

1414855

MartinWehking temporarily deployed to WindowsCILock November 16, 2023 10:03 — with GitHub Actions Inactive

MartinWehking had a problem deploying to WindowsCILock November 16, 2023 10:53 — with GitHub Actions Failure

MartinWehking temporarily deployed to WindowsCILock November 16, 2023 11:37 — with GitHub Actions Inactive

MartinWehking had a problem deploying to WindowsCILock November 16, 2023 11:37 — with GitHub Actions Failure

MartinWehking temporarily deployed to WindowsCILock November 16, 2023 14:00 — with GitHub Actions Inactive

MartinWehking had a problem deploying to WindowsCILock November 16, 2023 14:42 — with GitHub Actions Failure

Remove whitespaces + Add parameter comments

fec7dd2

MartinWehking changed the title ~~Global offset~~ [SYCL] Extend global offset intrinsic removal Dec 4, 2023

MartinWehking temporarily deployed to WindowsCILock December 4, 2023 12:24 — with GitHub Actions Inactive

MartinWehking temporarily deployed to WindowsCILock December 4, 2023 13:17 — with GitHub Actions Inactive

MartinWehking marked this pull request as ready for review December 4, 2023 13:38

MartinWehking requested a review from a team as a code owner December 4, 2023 13:38

MartinWehking requested a review from jchlanda December 4, 2023 13:38

jchlanda approved these changes Dec 6, 2023

View reviewed changes

Apply suggestions

46f6899

MartinWehking temporarily deployed to WindowsCILock December 7, 2023 10:18 — with GitHub Actions Inactive

MartinWehking temporarily deployed to WindowsCILock December 7, 2023 10:56 — with GitHub Actions Inactive

AlexeySachkov approved these changes Dec 7, 2023

View reviewed changes

againull merged commit 03be036 into intel:sycl Dec 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SYCL] Extend global offset intrinsic removal #11909

[SYCL] Extend global offset intrinsic removal #11909

Uh oh!

MartinWehking commented Nov 16, 2023 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

jchlanda Dec 6, 2023

Uh oh!

MartinWehking Dec 7, 2023

Uh oh!

AlexeySachkov Dec 7, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[SYCL] Extend global offset intrinsic removal #11909

[SYCL] Extend global offset intrinsic removal #11909

Uh oh!

Conversation

MartinWehking commented Nov 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jchlanda Dec 6, 2023

Choose a reason for hiding this comment

Uh oh!

MartinWehking Dec 7, 2023

Choose a reason for hiding this comment

Uh oh!

AlexeySachkov Dec 7, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MartinWehking commented Nov 16, 2023 •

edited

Loading