Skip to content

[SYCL] Extend global offset intrinsic removal #11909

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Dec 8, 2023

Conversation

MartinWehking
Copy link
Contributor

@MartinWehking MartinWehking commented Nov 16, 2023

Extend #11674 by modifying the globaloffset optimization pass to always
replace uses of Loads from the llvm.nvvm.implicit.offset and
llvm.amdgcn.implicit.offset intrinsics with constant zeros in the
original non-offset kernel.
Hence, perform the optimization even when
-enable-global-offset=true (default).
Duplicate recursively functions containing calls to the implicit offset
intrinsic and let the implicit offset kernel entry point only call
the original functions (i.e. do not call the functions with added offset
arguments).
Remove zero allocations for the original kernel entry points.

Martin added 7 commits November 14, 2023 17:43
Extend intel#11674 by modifying the globaloffset optimization pass to always
replace uses of Loads from the llvm.nvvm.implicit.offset and
llvm.amdgcn.implicit.offset intrinsics with constant zeros in the
original non-offset kernel.

Hence, perform the optimization even when
-enable-global-offset=true (default).

Duplicate recursively functions containing calls to the implicit offset
intrinsic and let the implicit offset kernel entry point only call
the original functions (i.e. do not call the functions with added offset
arguments).

Remove zero allocations for the original kernel entry points.
Use std::get to retrieve an element from a tuple that used to be pair
previously.

Do not mention deleted allocas of zeros anymore.
Fix an incomplete removal of calls to the implicit offset intrinsics.
The global-offset-multiple-calls-from-one-function.ll test highlighted
this flaw inside the IR.

Recurse only non cloned functions for adding offset parameters
and obtain call intructions that are to be replaced through a
global VMap.
Check for functions with and without offset parameters inside the
NVPTX and AMDGPU global-offset test cases.
Take into account that functions that contain calls to the global-offset
intrinsic or transitively call one of these functions have two different
versions now - one with and one without offset argument.
Check that the correct ones are called.
Add also a new test case for mixed functions, i.e. functions containing
direct calls to the intrinsic and functions that eventually transitively
call the intrinsic.
Do this to check if the recursion inside the global offset pass handles
these correctly.
Copy the global offset correctly to addrspace(3) inside the kernel.
Do not perform this operation if the cloned function is not a kernel
entry point.
Adapt the AMDGPU test cases to take the copying of the global offset
into account
Before the global-offset optimization, non-offset
and offset kernels called the same function with
an added offset argument.
The non-offset versions allocated an array with
3 zeros in addrspace(5) and passed it as an
argument to the function calls containing
calls to the intrinsic.
Since the allocation and filling is not possible
in addrspace(4), the offset version adds an extra
alloca in addrspace(5) and does an addrspace cast
to addrspace(4).
This is not required anymore since the non-offset
version does not perform an alloca of zeros
anymore. However, the optimization is rendered
almost useless as the global offset feature is
deprecated.
Adapt the comment to reflect this.
@MartinWehking MartinWehking changed the title Global offset [SYCL] Extend global offset intrinsic removal Dec 4, 2023
@MartinWehking MartinWehking marked this pull request as ready for review December 4, 2023 13:38
@MartinWehking MartinWehking requested a review from a team as a code owner December 4, 2023 13:38
continue;

// Kernel entry points need additional processing and change Metdadata.
if (EntryPointMetadata.count(Caller) != 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Zero is false.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if it's a good idea to change it. There are two reasons:

  1. We're dealing here with natural numbers (counts), not bools
  2. I didn't introduce the change originally, so it would add more diff

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EntryPointMetadata is a map and therefore .count() is often used as contains, i.e. implicit conversion to bool should be ok in that context.

But I agree with (2)

@againull againull merged commit 03be036 into intel:sycl Dec 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants