Skip to content

[Clang][Driver] Enable internalization by default for AMDGPU #138365

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 3, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions clang/lib/Driver/ToolChains/Clang.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -9284,6 +9284,12 @@ void LinkerWrapper::ConstructJob(Compilation &C, const JobAction &JA,
CmdArgs.push_back(Args.MakeArgString(
"--device-linker=" + TC->getTripleString() + "=" + Arg));

// Enable internalization for AMDGPU.
if (TC->getTriple().isAMDGPU())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Braces

CmdArgs.push_back(
Args.MakeArgString("--device-linker=" + TC->getTripleString() +
"=-plugin-opt=-amdgpu-internalize-symbols"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forget what this hacked around, was hoping we didn't need it anymore. Maybe @arsenm remembers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really a hack TBH, though we are the only target explicitly using it. It significantly affects our performance. The remaining uses of this pass are in (Thin)LTO, which has broader impact.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks more like "always" than "by default".

The implementation details are kind of a hack. We should be able to just run the ordinary internalize pass without the special AMDGPU filter.

static bool mustPreserveGV(const GlobalValue &GV) {

The most legitimate part of this is the isEntryFunctionCC, which we could just make the internalize pass directly do. The sanitizer name hacks are also obviously hacks.

The attempt to drop constant users looks like a bad side effect a predicate should not have, I don't know what that's doing there

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have an example of what kind of functions aren't being internalized? Last time I had this issue it's because the ROCm Device Libs were compiling with -fvisibility=protected for some reason.


// Forward the LTO mode relying on the Driver's parsing.
if (C.getDriver().getOffloadLTOMode() == LTOK_Full)
CmdArgs.push_back(Args.MakeArgString(
Expand Down
6 changes: 6 additions & 0 deletions clang/test/Driver/hip-options.hip
Original file line number Diff line number Diff line change
Expand Up @@ -259,3 +259,9 @@
// RUN: --offload-arch=gfx1100 --offload-new-driver --offload-jobs=0x4 %s 2>&1 | \
// RUN: FileCheck -check-prefix=INVJOBS %s
// INVJOBS: clang: error: invalid integral value '0x4' in '--offload-jobs=0x4'

// Check that internalization is enabled for offloading on AMDGPU.
// RUN: %clang -### -Werror --target=x86_64-unknown-linux-gnu -nogpuinc -nogpulib \
// RUN: --offload-arch=gfx1100 --offload-new-driver %s 2>&1 | \
// RUN: FileCheck -check-prefix=INTERNALIZATION %s
// INTERNALIZATION: --device-linker=amdgcn-amd-amdhsa=-plugin-opt=-amdgpu-internalize-symbols
7 changes: 7 additions & 0 deletions clang/test/Driver/openmp-offload-gpu.c
Original file line number Diff line number Diff line change
Expand Up @@ -393,3 +393,10 @@
// RUN: --offload-arch=sm_52 -foffload-lto=thin -nogpulib -nogpuinc %s 2>&1 \
// RUN: | FileCheck --check-prefix=THINLTO-SM52 %s
// THINLTO-SM52: --device-compiler=nvptx64-nvidia-cuda=-flto=thin

// Check that internalization is enabled for OpenMP offloading on AMDGPU.
//
// RUN: %clang -### --target=x86_64-unknown-linux-gnu -fopenmp=libomp \
// RUN: --offload-arch=gfx906 -nogpulib -nogpuinc %s 2>&1 \
// RUN: | FileCheck --check-prefix=INTERNALIZATION-GFX906 %s
// INTERNALIZATION-GFX906: --device-linker=amdgcn-amd-amdhsa=-plugin-opt=-amdgpu-internalize-symbols