-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[LLVM] Add 'ExpandVariadicsPass' to LTO default pipeline #100479
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we get a phase ordering test to show the inlining effect
182181e
to
2a6590e
Compare
Good point, done. |
@llvm/pr-subscribers-llvm-transforms Author: Joseph Huber (jhuber6) ChangesSummary: This pass is currently only run late in the target's backend. However, Full diff: https://github.com/llvm/llvm-project/pull/100479.diff 3 Files Affected:
diff --git a/llvm/lib/Passes/PassBuilderPipelines.cpp b/llvm/lib/Passes/PassBuilderPipelines.cpp
index 6f36bdad780ae3..757b20dcd6693a 100644
--- a/llvm/lib/Passes/PassBuilderPipelines.cpp
+++ b/llvm/lib/Passes/PassBuilderPipelines.cpp
@@ -48,6 +48,7 @@
#include "llvm/Transforms/IPO/DeadArgumentElimination.h"
#include "llvm/Transforms/IPO/ElimAvailExtern.h"
#include "llvm/Transforms/IPO/EmbedBitcodePass.h"
+#include "llvm/Transforms/IPO/ExpandVariadics.h"
#include "llvm/Transforms/IPO/ForceFunctionAttrs.h"
#include "llvm/Transforms/IPO/FunctionAttrs.h"
#include "llvm/Transforms/IPO/GlobalDCE.h"
@@ -1874,6 +1875,9 @@ PassBuilder::buildLTODefaultPipeline(OptimizationLevel Level,
MPM.addPass(createModuleToFunctionPassAdaptor(std::move(PeepholeFPM),
PTO.EagerlyInvalidateAnalyses));
+ // Lower variadic functions for supported targets prior to inlining.
+ MPM.addPass(ExpandVariadicsPass(ExpandVariadicsMode::Optimize));
+
// Note: historically, the PruneEH pass was run first to deduce nounwind and
// generally clean up exception handling overhead. It isn't clear this is
// valuable as the inliner doesn't currently care whether it is inlining an
diff --git a/llvm/test/Other/new-pm-lto-defaults.ll b/llvm/test/Other/new-pm-lto-defaults.ll
index d451d2897f673c..1b345780476b71 100644
--- a/llvm/test/Other/new-pm-lto-defaults.ll
+++ b/llvm/test/Other/new-pm-lto-defaults.ll
@@ -29,6 +29,7 @@
; CHECK-EP: Running pass: NoOpModulePass
; CHECK-O: Running pass: CrossDSOCFIPass
+; CHECK-O-NEXT: Running pass: ExpandVariadicsPass
; CHECK-O-NEXT: Running pass: OpenMPOptPass
; CHECK-O-NEXT: Running pass: GlobalDCEPass
; CHECK-O-NEXT: Running pass: InferFunctionAttrsPass
diff --git a/llvm/test/Transforms/PhaseOrdering/varargs.ll b/llvm/test/Transforms/PhaseOrdering/varargs.ll
new file mode 100644
index 00000000000000..dd6a41fcc96012
--- /dev/null
+++ b/llvm/test/Transforms/PhaseOrdering/varargs.ll
@@ -0,0 +1,48 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt -mtriple=amdgcn-- -S -passes='lto<O2>' < %s | FileCheck %s
+target triple = "amdgcn-amd-amdhsa"
+
+; We use the ExpandVariadics pass to lower variadic functions so they can be
+; inlined.
+
+define i32 @foo() {
+; CHECK-LABEL: define i32 @foo(
+; CHECK-SAME: ) local_unnamed_addr #[[ATTR0:[0-9]+]] {
+; CHECK-NEXT: [[ENTRY:.*:]]
+; CHECK-NEXT: ret i32 6
+;
+entry:
+ %call = tail call i32 (i32, ...) @vararg(i32 poison, i32 noundef 1, i32 noundef 2, i32 noundef 3)
+ ret i32 %call
+}
+
+define internal i32 @vararg(i32 %first, ...) {
+entry:
+ %vlist = alloca ptr, align 8, addrspace(5)
+ %vlist.ascast = addrspacecast ptr addrspace(5) %vlist to ptr
+ call void @llvm.lifetime.start.p5(i64 8, ptr addrspace(5) %vlist)
+ call void @llvm.va_start.p0(ptr %vlist.ascast)
+ %vlist.promoted = load ptr, ptr addrspace(5) %vlist, align 8
+ %argp.next = getelementptr inbounds i8, ptr %vlist.promoted, i64 4
+ store ptr %argp.next, ptr addrspace(5) %vlist, align 8
+ %0 = load i32, ptr %vlist.promoted, align 4
+ %argp.next.1 = getelementptr inbounds i8, ptr %vlist.promoted, i64 8
+ store ptr %argp.next.1, ptr addrspace(5) %vlist, align 8
+ %1 = load i32, ptr %argp.next, align 4
+ %add.1 = add nsw i32 %1, %0
+ %argp.next.2 = getelementptr inbounds i8, ptr %vlist.promoted, i64 12
+ store ptr %argp.next.2, ptr addrspace(5) %vlist, align 8
+ %2 = load i32, ptr %argp.next.1, align 4
+ %add.2 = add nsw i32 %2, %add.1
+ call void @llvm.va_end.p0(ptr %vlist.ascast)
+ call void @llvm.lifetime.end.p5(i64 8, ptr addrspace(5) %vlist)
+ ret i32 %add.2
+}
+
+declare void @llvm.lifetime.start.p5(i64 immarg, ptr addrspace(5) nocapture)
+
+declare void @llvm.va_start.p0(ptr)
+
+declare void @llvm.lifetime.end.p5(i64 immarg, ptr addrspace(5) nocapture)
+
+declare void @llvm.va_end.p0(ptr)
|
2a6590e
to
56af884
Compare
Summary: This pass expands variadic functions into non-variadic function calls according to the target ABI. Currently, this is used as the lowering for the NVPTX, AMDGPU, and WASM targets. This pass is currently only run late in the target's backend. However, during LTO we want to run it before the inliner pass so that the expanded functions can be inlined using standard heuristics. This pass is a no-op for unsupported targets, so this won't apply to any code that isn't already using it.
56af884
to
802badf
Compare
@@ -1874,6 +1875,9 @@ PassBuilder::buildLTODefaultPipeline(OptimizationLevel Level, | |||
MPM.addPass(createModuleToFunctionPassAdaptor(std::move(PeepholeFPM), | |||
PTO.EagerlyInvalidateAnalyses)); | |||
|
|||
// Lower variadic functions for supported targets prior to inlining. | |||
MPM.addPass(ExpandVariadicsPass(ExpandVariadicsMode::Optimize)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The optimise version can run however often we like, on any subsets of the IR module. All good.
The original idea was to run the optimise per-tu instead of on the whole module at lto time, but until there's a non-gpu target using it in production the distinction doesn't matter too much. It's presumably at lto time here in order to combine application code with libc which seems legitimate |
Summary: This pass expands variadic functions into non-variadic function calls according to the target ABI. Currently, this is used as the lowering for the NVPTX and AMDGPU targets. This pass is currently only run late in the target's backend. However, during LTO we want to run it before the inliner pass so that the expanded functions can be inlined using standard heuristics. This pass is a no-op for unsupported targets, so this won't apply to any code that isn't already using it. Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: https://phabricator.intern.facebook.com/D60250576
Summary:
This pass expands variadic functions into non-variadic function calls
according to the target ABI. Currently, this is used as the lowering for
the NVPTX and AMDGPU targets.
This pass is currently only run late in the target's backend. However,
during LTO we want to run it before the inliner pass so that the
expanded functions can be inlined using standard heuristics. This pass
is a no-op for unsupported targets, so this won't apply to any code that
isn't already using it.