Skip to content

[LLVM] Add 'ExpandVariadicsPass' to LTO default pipeline #100479

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 25, 2024

Conversation

jhuber6
Copy link
Contributor

@jhuber6 jhuber6 commented Jul 24, 2024

Summary:
This pass expands variadic functions into non-variadic function calls
according to the target ABI. Currently, this is used as the lowering for
the NVPTX and AMDGPU targets.

This pass is currently only run late in the target's backend. However,
during LTO we want to run it before the inliner pass so that the
expanded functions can be inlined using standard heuristics. This pass
is a no-op for unsupported targets, so this won't apply to any code that
isn't already using it.

Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we get a phase ordering test to show the inlining effect

@jhuber6
Copy link
Contributor Author

jhuber6 commented Jul 25, 2024

Can we get a phase ordering test to show the inlining effect

Good point, done.

@llvmbot
Copy link
Member

llvmbot commented Jul 25, 2024

@llvm/pr-subscribers-llvm-transforms

Author: Joseph Huber (jhuber6)

Changes

Summary:
This pass expands variadic functions into non-variadic function calls
according to the target ABI. Currently, this is used as the lowering for
the NVPTX, AMDGPU, and WASM targets.

This pass is currently only run late in the target's backend. However,
during LTO we want to run it before the inliner pass so that the
expanded functions can be inlined using standard heuristics. This pass
is a no-op for unsupported targets, so this won't apply to any code that
isn't already using it.


Full diff: https://github.com/llvm/llvm-project/pull/100479.diff

3 Files Affected:

  • (modified) llvm/lib/Passes/PassBuilderPipelines.cpp (+4)
  • (modified) llvm/test/Other/new-pm-lto-defaults.ll (+1)
  • (added) llvm/test/Transforms/PhaseOrdering/varargs.ll (+48)
diff --git a/llvm/lib/Passes/PassBuilderPipelines.cpp b/llvm/lib/Passes/PassBuilderPipelines.cpp
index 6f36bdad780ae3..757b20dcd6693a 100644
--- a/llvm/lib/Passes/PassBuilderPipelines.cpp
+++ b/llvm/lib/Passes/PassBuilderPipelines.cpp
@@ -48,6 +48,7 @@
 #include "llvm/Transforms/IPO/DeadArgumentElimination.h"
 #include "llvm/Transforms/IPO/ElimAvailExtern.h"
 #include "llvm/Transforms/IPO/EmbedBitcodePass.h"
+#include "llvm/Transforms/IPO/ExpandVariadics.h"
 #include "llvm/Transforms/IPO/ForceFunctionAttrs.h"
 #include "llvm/Transforms/IPO/FunctionAttrs.h"
 #include "llvm/Transforms/IPO/GlobalDCE.h"
@@ -1874,6 +1875,9 @@ PassBuilder::buildLTODefaultPipeline(OptimizationLevel Level,
   MPM.addPass(createModuleToFunctionPassAdaptor(std::move(PeepholeFPM),
                                                 PTO.EagerlyInvalidateAnalyses));
 
+  // Lower variadic functions for supported targets prior to inlining.
+  MPM.addPass(ExpandVariadicsPass(ExpandVariadicsMode::Optimize));
+
   // Note: historically, the PruneEH pass was run first to deduce nounwind and
   // generally clean up exception handling overhead. It isn't clear this is
   // valuable as the inliner doesn't currently care whether it is inlining an
diff --git a/llvm/test/Other/new-pm-lto-defaults.ll b/llvm/test/Other/new-pm-lto-defaults.ll
index d451d2897f673c..1b345780476b71 100644
--- a/llvm/test/Other/new-pm-lto-defaults.ll
+++ b/llvm/test/Other/new-pm-lto-defaults.ll
@@ -29,6 +29,7 @@
 
 ; CHECK-EP: Running pass: NoOpModulePass
 ; CHECK-O: Running pass: CrossDSOCFIPass
+; CHECK-O-NEXT: Running pass: ExpandVariadicsPass
 ; CHECK-O-NEXT: Running pass: OpenMPOptPass
 ; CHECK-O-NEXT: Running pass: GlobalDCEPass
 ; CHECK-O-NEXT: Running pass: InferFunctionAttrsPass
diff --git a/llvm/test/Transforms/PhaseOrdering/varargs.ll b/llvm/test/Transforms/PhaseOrdering/varargs.ll
new file mode 100644
index 00000000000000..dd6a41fcc96012
--- /dev/null
+++ b/llvm/test/Transforms/PhaseOrdering/varargs.ll
@@ -0,0 +1,48 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt -mtriple=amdgcn-- -S -passes='lto<O2>' < %s | FileCheck %s
+target triple = "amdgcn-amd-amdhsa"
+
+; We use the ExpandVariadics pass to lower variadic functions so they can be
+; inlined.
+
+define i32 @foo() {
+; CHECK-LABEL: define i32 @foo(
+; CHECK-SAME: ) local_unnamed_addr #[[ATTR0:[0-9]+]] {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    ret i32 6
+;
+entry:
+  %call = tail call i32 (i32, ...) @vararg(i32 poison, i32 noundef 1, i32 noundef 2, i32 noundef 3)
+  ret i32 %call
+}
+
+define internal i32 @vararg(i32 %first, ...) {
+entry:
+  %vlist = alloca ptr, align 8, addrspace(5)
+  %vlist.ascast = addrspacecast ptr addrspace(5) %vlist to ptr
+  call void @llvm.lifetime.start.p5(i64 8, ptr addrspace(5) %vlist)
+  call void @llvm.va_start.p0(ptr %vlist.ascast)
+  %vlist.promoted = load ptr, ptr addrspace(5) %vlist, align 8
+  %argp.next = getelementptr inbounds i8, ptr %vlist.promoted, i64 4
+  store ptr %argp.next, ptr addrspace(5) %vlist, align 8
+  %0 = load i32, ptr %vlist.promoted, align 4
+  %argp.next.1 = getelementptr inbounds i8, ptr %vlist.promoted, i64 8
+  store ptr %argp.next.1, ptr addrspace(5) %vlist, align 8
+  %1 = load i32, ptr %argp.next, align 4
+  %add.1 = add nsw i32 %1, %0
+  %argp.next.2 = getelementptr inbounds i8, ptr %vlist.promoted, i64 12
+  store ptr %argp.next.2, ptr addrspace(5) %vlist, align 8
+  %2 = load i32, ptr %argp.next.1, align 4
+  %add.2 = add nsw i32 %2, %add.1
+  call void @llvm.va_end.p0(ptr %vlist.ascast)
+  call void @llvm.lifetime.end.p5(i64 8, ptr addrspace(5) %vlist)
+  ret i32 %add.2
+}
+
+declare void @llvm.lifetime.start.p5(i64 immarg, ptr addrspace(5) nocapture)
+
+declare void @llvm.va_start.p0(ptr)
+
+declare void @llvm.lifetime.end.p5(i64 immarg, ptr addrspace(5) nocapture)
+
+declare void @llvm.va_end.p0(ptr)

@jhuber6 jhuber6 force-pushed the ExpandVariadicsLTO branch from 2a6590e to 56af884 Compare July 25, 2024 11:58
Summary:
This pass expands variadic functions into non-variadic function calls
according to the target ABI. Currently, this is used as the lowering for
the NVPTX, AMDGPU, and WASM targets.

This pass is currently only run late in the target's backend. However,
during LTO we want to run it before the inliner pass so that the
expanded functions can be inlined using standard heuristics. This pass
is a no-op for unsupported targets, so this won't apply to any code that
isn't already using it.
@jhuber6 jhuber6 force-pushed the ExpandVariadicsLTO branch from 56af884 to 802badf Compare July 25, 2024 12:08
@@ -1874,6 +1875,9 @@ PassBuilder::buildLTODefaultPipeline(OptimizationLevel Level,
MPM.addPass(createModuleToFunctionPassAdaptor(std::move(PeepholeFPM),
PTO.EagerlyInvalidateAnalyses));

// Lower variadic functions for supported targets prior to inlining.
MPM.addPass(ExpandVariadicsPass(ExpandVariadicsMode::Optimize));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The optimise version can run however often we like, on any subsets of the IR module. All good.

@JonChesterfield
Copy link
Collaborator

The original idea was to run the optimise per-tu instead of on the whole module at lto time, but until there's a non-gpu target using it in production the distinction doesn't matter too much. It's presumably at lto time here in order to combine application code with libc which seems legitimate

@jhuber6 jhuber6 merged commit 8758091 into llvm:main Jul 25, 2024
4 of 7 checks passed
yuxuanchen1997 pushed a commit that referenced this pull request Jul 25, 2024
Summary:
This pass expands variadic functions into non-variadic function calls
according to the target ABI. Currently, this is used as the lowering for
the NVPTX and AMDGPU targets.

This pass is currently only run late in the target's backend. However,
during LTO we want to run it before the inliner pass so that the
expanded functions can be inlined using standard heuristics. This pass
is a no-op for unsupported targets, so this won't apply to any code that
isn't already using it.

Test Plan: 

Reviewers: 

Subscribers: 

Tasks: 

Tags: 


Differential Revision: https://phabricator.intern.facebook.com/D60250576
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants