[Offload] Always consider `flto` on for AMDGPU #129118

jhuber6 · 2025-02-27T21:07:44Z

Summary:
Previously we turned this off, but that led to a regression in some of
the option handling. I would argue that handling LTO by default was
incorrect bheavior, but for AMDGPU people were used to this default, so
we pass it by default. -fno-lto overrides.

Summary: Previously we turned this off, but that led to a regression in some of the option handling. I would argue that handling LTO by default was incorrect bheavior, but for AMDGPU people were used to this default, so we pass it by default. `-fno-lto` overrides.

llvmbot · 2025-02-27T21:08:18Z

@llvm/pr-subscribers-clang-driver

@llvm/pr-subscribers-clang

Author: Joseph Huber (jhuber6)

Changes

Summary:
Previously we turned this off, but that led to a regression in some of
the option handling. I would argue that handling LTO by default was
incorrect bheavior, but for AMDGPU people were used to this default, so
we pass it by default. -fno-lto overrides.

Full diff: https://github.com/llvm/llvm-project/pull/129118.diff

2 Files Affected:

(modified) clang/test/Driver/linker-wrapper.c (+7-7)
(modified) clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp (+4)

diff --git a/clang/test/Driver/linker-wrapper.c b/clang/test/Driver/linker-wrapper.c
index 7586b87743bf5..79c0df10c8358 100644
--- a/clang/test/Driver/linker-wrapper.c
+++ b/clang/test/Driver/linker-wrapper.c
@@ -39,7 +39,7 @@ __attribute__((visibility("protected"), used)) int x;
 // RUN: clang-linker-wrapper --host-triple=x86_64-unknown-linux-gnu --dry-run \
 // RUN:   --linker-path=/usr/bin/ld %t.o -o a.out 2>&1 | FileCheck %s --check-prefix=AMDGPU-LINK
 
-// AMDGPU-LINK: clang{{.*}} -o {{.*}}.img --target=amdgcn-amd-amdhsa -mcpu=gfx908 -Wl,--no-undefined {{.*}}.o {{.*}}.o
+// AMDGPU-LINK: clang{{.*}} -o {{.*}}.img --target=amdgcn-amd-amdhsa -mcpu=gfx908 -flto -Wl,--no-undefined {{.*}}.o {{.*}}.o
 
 // RUN: clang-offload-packager -o %t.out \
 // RUN:   --image=file=%t.amdgpu.bc,kind=openmp,triple=amdgcn-amd-amdhsa,arch=gfx1030 \
@@ -48,7 +48,7 @@ __attribute__((visibility("protected"), used)) int x;
 // RUN: clang-linker-wrapper --host-triple=x86_64-unknown-linux-gnu --dry-run --device-compiler=--save-temps \
 // RUN:   --linker-path=/usr/bin/ld %t.o -o a.out 2>&1 | FileCheck %s --check-prefix=AMDGPU-LTO-TEMPS
 
-// AMDGPU-LTO-TEMPS: clang{{.*}} --target=amdgcn-amd-amdhsa -mcpu=gfx1030 {{.*}}-save-temps
+// AMDGPU-LTO-TEMPS: clang{{.*}} --target=amdgcn-amd-amdhsa -mcpu=gfx1030 -flto {{.*}}-save-temps
 
 // RUN: clang-offload-packager -o %t.out \
 // RUN:   --image=file=%t.elf.o,kind=openmp,triple=x86_64-unknown-linux-gnu \
@@ -148,7 +148,7 @@ __attribute__((visibility("protected"), used)) int x;
 // RUN: clang-linker-wrapper --host-triple=x86_64-unknown-linux-gnu --dry-run --clang-backend \
 // RUN:   --linker-path=/usr/bin/ld %t.o -o a.out 2>&1 | FileCheck %s --check-prefix=CLANG-BACKEND
 
-// CLANG-BACKEND: clang{{.*}} -o {{.*}}.img --target=amdgcn-amd-amdhsa -mcpu=gfx908 -Wl,--no-undefined {{.*}}.o
+// CLANG-BACKEND: clang{{.*}} -o {{.*}}.img --target=amdgcn-amd-amdhsa -mcpu=gfx908 -flto -Wl,--no-undefined {{.*}}.o
 
 // RUN: clang-offload-packager -o %t.out \
 // RUN:   --image=file=%t.elf.o,kind=openmp,triple=nvptx64-nvidia-cuda,arch=sm_70
@@ -171,8 +171,8 @@ __attribute__((visibility("protected"), used)) int x;
 // RUN: clang-linker-wrapper --host-triple=x86_64-unknown-linux-gnu --dry-run \
 // RUN:   --linker-path=/usr/bin/ld %t-on.o %t-off.o %t.a -o a.out 2>&1 | FileCheck %s --check-prefix=AMD-TARGET-ID
 
-// AMD-TARGET-ID: clang{{.*}} -o {{.*}}.img --target=amdgcn-amd-amdhsa -mcpu=gfx90a:xnack+ -Wl,--no-undefined {{.*}}.o {{.*}}.o
-// AMD-TARGET-ID: clang{{.*}} -o {{.*}}.img --target=amdgcn-amd-amdhsa -mcpu=gfx90a:xnack- -Wl,--no-undefined {{.*}}.o {{.*}}.o
+// AMD-TARGET-ID: clang{{.*}} -o {{.*}}.img --target=amdgcn-amd-amdhsa -mcpu=gfx90a:xnack+ -flto -Wl,--no-undefined {{.*}}.o {{.*}}.o
+// AMD-TARGET-ID: clang{{.*}} -o {{.*}}.img --target=amdgcn-amd-amdhsa -mcpu=gfx90a:xnack- -flto -Wl,--no-undefined {{.*}}.o {{.*}}.o
 
 // RUN: clang-offload-packager -o %t-lib.out \
 // RUN:   --image=file=%t.elf.o,kind=openmp,triple=amdgcn-amd-amdhsa,arch=generic
@@ -187,8 +187,8 @@ __attribute__((visibility("protected"), used)) int x;
 // RUN: clang-linker-wrapper --host-triple=x86_64-unknown-linux-gnu --dry-run \
 // RUN:   --linker-path=/usr/bin/ld %t1.o %t2.o %t.a -o a.out 2>&1 | FileCheck %s --check-prefix=ARCH-ALL
 
-// ARCH-ALL: clang{{.*}} -o {{.*}}.img --target=amdgcn-amd-amdhsa -mcpu=gfx90a -Wl,--no-undefined {{.*}}.o {{.*}}.o
-// ARCH-ALL: clang{{.*}} -o {{.*}}.img --target=amdgcn-amd-amdhsa -mcpu=gfx908 -Wl,--no-undefined {{.*}}.o {{.*}}.o
+// ARCH-ALL: clang{{.*}} -o {{.*}}.img --target=amdgcn-amd-amdhsa -mcpu=gfx90a -flto -Wl,--no-undefined {{.*}}.o {{.*}}.o
+// ARCH-ALL: clang{{.*}} -o {{.*}}.img --target=amdgcn-amd-amdhsa -mcpu=gfx908 -flto -Wl,--no-undefined {{.*}}.o {{.*}}.o
 
 // RUN: clang-offload-packager -o %t.out \
 // RUN:   --image=file=%t.elf.o,kind=openmp,triple=x86_64-unknown-linux-gnu \
diff --git a/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp b/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
index 7db8f3e27d704..adc15caf9ef39 100644
--- a/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
+++ b/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
@@ -495,6 +495,10 @@ Expected<StringRef> clang(ArrayRef<StringRef> InputFiles, const ArgList &Args) {
     Triple.isAMDGPU() ? CmdArgs.push_back(Args.MakeArgString("-mcpu=" + Arch))
                       : CmdArgs.push_back(Args.MakeArgString("-march=" + Arch));
 
+  // AMDGPU is always in LTO mode currently.
+  if (Triple.isAMDGPU())
+    CmdArgs.push_back("-flto");
+
   // Forward all of the `--offload-opt` and similar options to the device.
   for (auto &Arg : Args.filtered(OPT_offload_opt_eq_minus, OPT_mllvm))
     CmdArgs.append(

shiltian · 2025-02-28T04:58:31Z

We do have some framework teams that are still using non-LTO (or non-gpu-rdc) build.

jhuber6 · 2025-02-28T13:01:01Z

We do have some framework teams that are still using non-LTO (or non-gpu-rdc) build.

This doesn't force that, adding -flto just informs the compiler to forward certain flags to the linker.

shiltian · 2025-02-28T14:38:53Z

but it turns on LTO by default right?

jhuber6 · 2025-02-28T14:39:53Z

but it turns on LTO by default right?

No, that decision was made by clang. This just informs the linker that LTO may be performed.

jplehr · 2025-03-07T10:53:43Z

but it turns on LTO by default right?

This is very much how I read that patch too.
Maybe the flag names and comments are a bit misleading here.

jhuber6 · 2025-03-07T12:24:20Z

but it turns on LTO by default right?

This is very much how I read that patch too. Maybe the flag names and comments are a bit misleading here.

I think people are just confusing what -flto means when put on a link job.

jplehr · 2025-03-07T12:48:54Z

I think people are just confusing what -flto means when put on a link job.

Including me. :)

Apologies for my ignorance, what does this do and why do we want it?
It allows for some (LTO-related?) flags to be forwarded/handled correctly?

jhuber6 · 2025-03-07T12:50:35Z

I think people are just confusing what -flto means when put on a link job.

Including me. :)

Apologies for my ignorance, what does this do and why do we want it? It allows for some (LTO-related?) flags to be forwarded/handled correctly?

The clang driver is very conservative with forwarding flags to the linker because it doesn't always know that the linker supports. -flto is a promise that your linker supports LTO and it should accept some flags. It could honestly be improved if we know the linker is ld.lld (like in the AMDGPU case) but that's just how it works right now.

jplehr

With the explanation that change seems reasonable. Thank you.

Summary: Previously we turned this off, but that led to a regression in some of the option handling. I would argue that handling LTO by default was incorrect bheavior, but for AMDGPU people were used to this default, so we pass it by default. `-fno-lto` overrides.

jhuber6 requested review from jdoerfert, jplehr, shiltian and yxsamliu February 27, 2025 21:07

llvmbot added clang Clang issues not falling into any other category clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' labels Feb 27, 2025

jplehr approved these changes Mar 7, 2025

View reviewed changes

jhuber6 merged commit 90e4215 into llvm:main Mar 7, 2025
14 checks passed

jhuber6 deleted the LTOFix branch March 7, 2025 12:54

omarahmed1111 mentioned this pull request Apr 18, 2025

[SYCL][clang-linker-wrapper] Replace -lto-emit-asm option with -S for cuda pipeline intel/llvm#18000

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Offload] Always consider `flto` on for AMDGPU #129118

[Offload] Always consider `flto` on for AMDGPU #129118

Uh oh!

jhuber6 commented Feb 27, 2025

Uh oh!

llvmbot commented Feb 27, 2025 •

edited

Loading

Uh oh!

shiltian commented Feb 28, 2025

Uh oh!

jhuber6 commented Feb 28, 2025

Uh oh!

shiltian commented Feb 28, 2025

Uh oh!

jhuber6 commented Feb 28, 2025

Uh oh!

jplehr commented Mar 7, 2025

Uh oh!

jhuber6 commented Mar 7, 2025

Uh oh!

jplehr commented Mar 7, 2025

Uh oh!

jhuber6 commented Mar 7, 2025

Uh oh!

jplehr left a comment

Uh oh!

Uh oh!

Uh oh!

[Offload] Always consider flto on for AMDGPU #129118

[Offload] Always consider flto on for AMDGPU #129118

Uh oh!

Conversation

jhuber6 commented Feb 27, 2025

Uh oh!

llvmbot commented Feb 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shiltian commented Feb 28, 2025

Uh oh!

jhuber6 commented Feb 28, 2025

Uh oh!

shiltian commented Feb 28, 2025

Uh oh!

jhuber6 commented Feb 28, 2025

Uh oh!

jplehr commented Mar 7, 2025

Uh oh!

jhuber6 commented Mar 7, 2025

Uh oh!

jplehr commented Mar 7, 2025

Uh oh!

jhuber6 commented Mar 7, 2025

Uh oh!

jplehr left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

[Offload] Always consider `flto` on for AMDGPU #129118

[Offload] Always consider `flto` on for AMDGPU #129118

llvmbot commented Feb 27, 2025 •

edited

Loading