-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[transforms] Inline simple variadic functions #81058
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[transforms] Inline simple variadic functions #81058
Conversation
@llvm/pr-subscribers-clang @llvm/pr-subscribers-backend-x86 Author: Jon Chesterfield (JonChesterfield) ChangesThis is a MVP style commit for eliminating variadic functions as an IR pass. It replaces the ... with a va_list value. The implementation strategy is to divide the transform into orthogonal parts that can be tested independently. In particular this seeks to decouple target specific complexity from llvm modelling complexity. The original motivation was to provide variadic function support to amdgpu and nvptx. That's tested and working in the downstream branch from which this patch was extracted. This commit introduces the target specific part of the transform, implemented for 32 and 64 bit x86. Test coverage of more interesting types from clang requires either pull/80002 or complicated filecheck patterns. It rewrites very simple variadic calls into one to an equivalent function taking a va_list. Subsequent patches are needed to:
This patch is known to be incomplete. It is posted in this form to check the strategy and provide a mostly-correct baseline to extend. Initial discussion is at https://discourse.llvm.org/t/rfc-desugar-variadics-codegen-for-new-targets-optimisation-for-existing/72854 Patch is 176.90 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/81058.diff 16 Files Affected:
diff --git a/clang/test/CodeGen/expand-variadic-call.c b/clang/test/CodeGen/expand-variadic-call.c
new file mode 100644
index 00000000000000..fa2b984bec08a5
--- /dev/null
+++ b/clang/test/CodeGen/expand-variadic-call.c
@@ -0,0 +1,273 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
+
+// REQUIRES: x86-registered-target
+// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -target-cpu x86-64-v4 -std=c23 -O1 -ffreestanding -emit-llvm -o - %s | FileCheck %s
+
+// This test sanity checks calling a variadic function with the expansion transform disabled.
+// The IR test cases {arch}/expand-variadic-call-*.ll correspond to IR generated from this test case.
+
+typedef __builtin_va_list va_list;
+#define va_copy(dest, src) __builtin_va_copy(dest, src)
+#define va_start(ap, ...) __builtin_va_start(ap, 0)
+#define va_end(ap) __builtin_va_end(ap)
+#define va_arg(ap, type) __builtin_va_arg(ap, type)
+
+// 32 bit x86 alignment uses getTypeStackAlign for special cases
+// Whitebox testing.
+// Needs a type >= 16 which is either a simd or a struct containing a simd
+// darwinvectorabi should force 4 bytes
+// linux vectors with align 16/32/64 return that alignment
+
+
+void wrapped(va_list);
+
+// CHECK-LABEL: @codegen_for_copy(
+// CHECK-NEXT: entry:
+// CHECK-NEXT: [[CP:%.*]] = alloca [1 x %struct.__va_list_tag], align 16
+// CHECK-NEXT: call void @llvm.lifetime.start.p0(i64 24, ptr nonnull [[CP]]) #[[ATTR7:[0-9]+]]
+// CHECK-NEXT: call void @llvm.va_copy(ptr nonnull [[CP]], ptr [[X:%.*]])
+// CHECK-NEXT: call void @wrapped(ptr noundef nonnull [[CP]]) #[[ATTR8:[0-9]+]]
+// CHECK-NEXT: call void @llvm.va_end(ptr [[CP]])
+// CHECK-NEXT: call void @llvm.lifetime.end.p0(i64 24, ptr nonnull [[CP]]) #[[ATTR7]]
+// CHECK-NEXT: ret void
+//
+void codegen_for_copy(va_list x)
+{
+ va_list cp;
+ va_copy(cp, x);
+ wrapped(cp);
+ va_end(cp);
+}
+
+
+// CHECK-LABEL: @vararg(
+// CHECK-NEXT: entry:
+// CHECK-NEXT: [[VA:%.*]] = alloca [1 x %struct.__va_list_tag], align 16
+// CHECK-NEXT: call void @llvm.lifetime.start.p0(i64 24, ptr nonnull [[VA]]) #[[ATTR7]]
+// CHECK-NEXT: call void @llvm.va_start(ptr nonnull [[VA]])
+// CHECK-NEXT: call void @wrapped(ptr noundef nonnull [[VA]]) #[[ATTR8]]
+// CHECK-NEXT: call void @llvm.va_end(ptr [[VA]])
+// CHECK-NEXT: call void @llvm.lifetime.end.p0(i64 24, ptr nonnull [[VA]]) #[[ATTR7]]
+// CHECK-NEXT: ret void
+//
+ void vararg(...)
+{
+ va_list va;
+ __builtin_va_start(va, 0);
+ wrapped(va);
+ va_end(va);
+}
+
+// vectors with alignment 16/32/64 are natively aligned on linux x86
+// v32f32 would be a m1024 type, larger than x64 defines at time of writing
+typedef int i32;
+typedef float v4f32 __attribute__((__vector_size__(16), __aligned__(16)));
+typedef float v8f32 __attribute__((__vector_size__(32), __aligned__(32)));
+typedef float v16f32 __attribute__((__vector_size__(64), __aligned__(64)));
+typedef float v32f32 __attribute__((__vector_size__(128), __aligned__(128)));
+
+
+// Pass a single value to wrapped() via vararg(...)
+// CHECK-LABEL: @single_i32(
+// CHECK-NEXT: entry:
+// CHECK-NEXT: tail call void (...) @vararg(i32 noundef [[X:%.*]]) #[[ATTR9:[0-9]+]]
+// CHECK-NEXT: ret void
+//
+void single_i32(i32 x)
+{
+ vararg(x);
+}
+
+// CHECK-LABEL: @single_double(
+// CHECK-NEXT: entry:
+// CHECK-NEXT: tail call void (...) @vararg(double noundef [[X:%.*]]) #[[ATTR9]]
+// CHECK-NEXT: ret void
+//
+void single_double(double x)
+{
+ vararg(x);
+}
+
+// CHECK-LABEL: @single_v4f32(
+// CHECK-NEXT: entry:
+// CHECK-NEXT: tail call void (...) @vararg(<4 x float> noundef [[X:%.*]]) #[[ATTR9]]
+// CHECK-NEXT: ret void
+//
+void single_v4f32(v4f32 x)
+{
+ vararg(x);
+}
+
+// CHECK-LABEL: @single_v8f32(
+// CHECK-NEXT: entry:
+// CHECK-NEXT: tail call void (...) @vararg(<8 x float> noundef [[X:%.*]]) #[[ATTR9]]
+// CHECK-NEXT: ret void
+//
+void single_v8f32(v8f32 x)
+{
+ vararg(x);
+}
+
+// CHECK-LABEL: @single_v16f32(
+// CHECK-NEXT: entry:
+// CHECK-NEXT: tail call void (...) @vararg(<16 x float> noundef [[X:%.*]]) #[[ATTR9]]
+// CHECK-NEXT: ret void
+//
+void single_v16f32(v16f32 x)
+{
+ vararg(x);
+}
+
+// CHECK-LABEL: @single_v32f32(
+// CHECK-NEXT: entry:
+// CHECK-NEXT: [[INDIRECT_ARG_TEMP:%.*]] = alloca <32 x float>, align 128
+// CHECK-NEXT: [[X:%.*]] = load <32 x float>, ptr [[TMP0:%.*]], align 128, !tbaa [[TBAA2:![0-9]+]]
+// CHECK-NEXT: store <32 x float> [[X]], ptr [[INDIRECT_ARG_TEMP]], align 128, !tbaa [[TBAA2]]
+// CHECK-NEXT: tail call void (...) @vararg(ptr noundef nonnull byval(<32 x float>) align 128 [[INDIRECT_ARG_TEMP]]) #[[ATTR9]]
+// CHECK-NEXT: ret void
+//
+void single_v32f32(v32f32 x)
+{
+ vararg(x);
+}
+
+
+
+// CHECK-LABEL: @i32_double(
+// CHECK-NEXT: entry:
+// CHECK-NEXT: tail call void (...) @vararg(i32 noundef [[X:%.*]], double noundef [[Y:%.*]]) #[[ATTR9]]
+// CHECK-NEXT: ret void
+//
+void i32_double(i32 x, double y)
+{
+ vararg(x, y);
+}
+
+// CHECK-LABEL: @double_i32(
+// CHECK-NEXT: entry:
+// CHECK-NEXT: tail call void (...) @vararg(double noundef [[X:%.*]], i32 noundef [[Y:%.*]]) #[[ATTR9]]
+// CHECK-NEXT: ret void
+//
+void double_i32(double x, i32 y)
+{
+ vararg(x, y);
+}
+
+
+// A struct used by libc variadic tests
+
+typedef struct {
+ char c;
+ short s;
+ int i;
+ long l;
+ float f;
+ double d;
+} libcS;
+
+// CHECK-LABEL: @i32_libcS(
+// CHECK-NEXT: entry:
+// CHECK-NEXT: tail call void (...) @vararg(i32 noundef [[X:%.*]], ptr noundef nonnull byval([[STRUCT_LIBCS:%.*]]) align 8 [[Y:%.*]]) #[[ATTR9]]
+// CHECK-NEXT: ret void
+//
+void i32_libcS(i32 x, libcS y)
+{
+ vararg(x, y);
+}
+
+// CHECK-LABEL: @libcS_i32(
+// CHECK-NEXT: entry:
+// CHECK-NEXT: tail call void (...) @vararg(ptr noundef nonnull byval([[STRUCT_LIBCS:%.*]]) align 8 [[X:%.*]], i32 noundef [[Y:%.*]]) #[[ATTR9]]
+// CHECK-NEXT: ret void
+//
+void libcS_i32(libcS x, i32 y)
+{
+ vararg(x, y);
+}
+
+
+// CHECK-LABEL: @i32_v4f32(
+// CHECK-NEXT: entry:
+// CHECK-NEXT: tail call void (...) @vararg(i32 noundef [[X:%.*]], <4 x float> noundef [[Y:%.*]]) #[[ATTR9]]
+// CHECK-NEXT: ret void
+//
+void i32_v4f32(i32 x, v4f32 y)
+{
+ vararg(x, y);
+}
+
+// CHECK-LABEL: @v4f32_i32(
+// CHECK-NEXT: entry:
+// CHECK-NEXT: tail call void (...) @vararg(<4 x float> noundef [[X:%.*]], i32 noundef [[Y:%.*]]) #[[ATTR9]]
+// CHECK-NEXT: ret void
+//
+void v4f32_i32(v4f32 x, i32 y)
+{
+ vararg(x, y);
+}
+
+// CHECK-LABEL: @i32_v8f32(
+// CHECK-NEXT: entry:
+// CHECK-NEXT: tail call void (...) @vararg(i32 noundef [[X:%.*]], <8 x float> noundef [[Y:%.*]]) #[[ATTR9]]
+// CHECK-NEXT: ret void
+//
+void i32_v8f32(i32 x, v8f32 y)
+{
+ vararg(x, y);
+}
+
+// CHECK-LABEL: @v8f32_i32(
+// CHECK-NEXT: entry:
+// CHECK-NEXT: tail call void (...) @vararg(<8 x float> noundef [[X:%.*]], i32 noundef [[Y:%.*]]) #[[ATTR9]]
+// CHECK-NEXT: ret void
+//
+void v8f32_i32(v8f32 x, i32 y)
+{
+ vararg(x, y);
+}
+
+// CHECK-LABEL: @i32_v16f32(
+// CHECK-NEXT: entry:
+// CHECK-NEXT: tail call void (...) @vararg(i32 noundef [[X:%.*]], <16 x float> noundef [[Y:%.*]]) #[[ATTR9]]
+// CHECK-NEXT: ret void
+//
+void i32_v16f32(i32 x, v16f32 y)
+{
+ vararg(x, y);
+}
+
+// CHECK-LABEL: @v16f32_i32(
+// CHECK-NEXT: entry:
+// CHECK-NEXT: tail call void (...) @vararg(<16 x float> noundef [[X:%.*]], i32 noundef [[Y:%.*]]) #[[ATTR9]]
+// CHECK-NEXT: ret void
+//
+void v16f32_i32(v16f32 x, i32 y)
+{
+ vararg(x, y);
+}
+
+// CHECK-LABEL: @i32_v32f32(
+// CHECK-NEXT: entry:
+// CHECK-NEXT: [[INDIRECT_ARG_TEMP:%.*]] = alloca <32 x float>, align 128
+// CHECK-NEXT: [[Y:%.*]] = load <32 x float>, ptr [[TMP0:%.*]], align 128, !tbaa [[TBAA2]]
+// CHECK-NEXT: store <32 x float> [[Y]], ptr [[INDIRECT_ARG_TEMP]], align 128, !tbaa [[TBAA2]]
+// CHECK-NEXT: tail call void (...) @vararg(i32 noundef [[X:%.*]], ptr noundef nonnull byval(<32 x float>) align 128 [[INDIRECT_ARG_TEMP]]) #[[ATTR9]]
+// CHECK-NEXT: ret void
+//
+void i32_v32f32(i32 x, v32f32 y)
+{
+ vararg(x, y);
+}
+
+// CHECK-LABEL: @v32f32_i32(
+// CHECK-NEXT: entry:
+// CHECK-NEXT: [[INDIRECT_ARG_TEMP:%.*]] = alloca <32 x float>, align 128
+// CHECK-NEXT: [[X:%.*]] = load <32 x float>, ptr [[TMP0:%.*]], align 128, !tbaa [[TBAA2]]
+// CHECK-NEXT: store <32 x float> [[X]], ptr [[INDIRECT_ARG_TEMP]], align 128, !tbaa [[TBAA2]]
+// CHECK-NEXT: tail call void (...) @vararg(ptr noundef nonnull byval(<32 x float>) align 128 [[INDIRECT_ARG_TEMP]], i32 noundef [[Y:%.*]]) #[[ATTR9]]
+// CHECK-NEXT: ret void
+//
+void v32f32_i32(v32f32 x, i32 y)
+{
+ vararg(x, y);
+}
diff --git a/clang/test/CodeGen/variadic-wrapper-removal.c b/clang/test/CodeGen/variadic-wrapper-removal.c
new file mode 100644
index 00000000000000..da41dde16f3d73
--- /dev/null
+++ b/clang/test/CodeGen/variadic-wrapper-removal.c
@@ -0,0 +1,86 @@
+// REQUIRES: x86-registered-target
+// RUN: %clang_cc1 -triple i386-unknown-linux-gnu -O1 -emit-llvm -o - %s | opt --passes=expand-variadics -S | FileCheck %s
+// RUN: %clang_cc1 -triple=x86_64-linux-gnu -O1 -emit-llvm -o - %s | opt --passes=expand-variadics -S | FileCheck %s
+
+// neither arm arch is implemented yet, leaving it here as a reminder
+// armv6 is a ptr as far as the struct is concerned, but possibly also a [1 x i32] passed by value
+// that seems insistent, maybe leave 32 bit arm alone for now
+// aarch64 is a struct of five things passed using byval memcpy
+
+// R-N: %clang_cc1 -triple=armv6-none--eabi -O1 -emit-llvm -o - %s | opt --passes=expand-variadics -S | FileCheck %s
+
+// R-N: %clang_cc1 -triple=aarch64-none-linux-gnu -O1 -emit-llvm -o - %s | opt --passes=expand-variadics -S | FileCheck %s
+
+
+
+// expand-variadics rewrites calls to variadic functions into calls to
+// equivalent functions that take a va_list argument. A property of the
+// implementation is that said "equivalent function" may be a pre-existing one.
+// This is equivalent to inlining a sufficiently simple variadic wrapper.
+
+#include <stdarg.h>
+
+typedef int FILE; // close enough for this test
+
+// fprintf is sometimes implemented as a call to vfprintf. That fits the
+// pattern the transform pass recognises - given an implementation of fprintf
+// in the IR module, calls to it can be rewritten into calls into vfprintf.
+
+// CHECK-LABEL: define{{.*}} i32 @fprintf(
+// CHECK-LABEL: define{{.*}} i32 @call_fprintf(
+// CHECK-NOT: @fprintf
+// CHECK: @vfprintf
+int vfprintf(FILE *restrict f, const char *restrict fmt, va_list ap);
+int fprintf(FILE *restrict f, const char *restrict fmt, ...)
+{
+ int ret;
+ va_list ap;
+ va_start(ap, fmt);
+ ret = vfprintf(f, fmt, ap);
+ va_end(ap);
+ return ret;
+}
+int call_fprintf(FILE *f)
+{
+ int x = 42;
+ double y = 3.14;
+ return fprintf(f, "int %d dbl %g\n", x, y);
+}
+
+// Void return type is also OK
+
+// CHECK-LABEL: define{{.*}} void @no_result(
+// CHECK-LABEL: define{{.*}} void @call_no_result(
+// CHECK-NOT: @no_result
+// CHECK: @vno_result
+void vno_result(const char * fmt, va_list);
+void no_result(const char * fmt, ...)
+{
+ va_list ap;
+ va_start(ap, fmt);
+ vno_result(fmt, ap);
+ va_end(ap);
+}
+void call_no_result(FILE *f)
+{
+ int x = 101;
+ no_result("", x);
+}
+
+// The vaend in the forwarding implementation is optional where it's a no-op
+
+// CHECK-LABEL: define{{.*}} i32 @no_vaend(
+// CHECK-LABEL: define{{.*}} i32 @call_no_vaend(
+// CHECK-NOT: @no_vaend
+// CHECK: @vno_vaend
+int vno_vaend(int x, va_list);
+int no_vaend(int x, ...)
+{
+ va_list ap;
+ va_start(ap, x);
+ return vno_vaend(x, ap);
+}
+int call_no_vaend(int x)
+{
+ return no_vaend(x, 10, 2.5f);
+}
diff --git a/clang/test/CodeGenCXX/inline-then-fold-variadics.cpp b/clang/test/CodeGenCXX/inline-then-fold-variadics.cpp
new file mode 100644
index 00000000000000..cf436ead77a2cb
--- /dev/null
+++ b/clang/test/CodeGenCXX/inline-then-fold-variadics.cpp
@@ -0,0 +1,117 @@
+// RUN: %clang_cc1 -triple i386-unknown-linux-gnu -Wno-varargs -O1 -disable-llvm-passes -emit-llvm -o - %s | opt --passes=instcombine | opt -passes="expand-variadics,default<O1>" -S | FileCheck %s --check-prefixes=CHECK,X86Linux
+
+// RUN: %clang_cc1 -triple x86_64-linux-gnu -Wno-varargs -O1 -disable-llvm-passes -emit-llvm -o - %s | opt --passes=instcombine | opt -passes="expand-variadics,default<O1>" -S | FileCheck %s --check-prefixes=CHECK,X64SystemV
+
+// RUN: %clang_cc1 -triple i386-apple-darwin -Wno-varargs -O1 -disable-llvm-passes -emit-llvm -o - %s | opt --passes=instcombine | opt -passes="expand-variadics,default<O1>" -S | FileCheck %s --check-prefixes=CHECK,X86Darwin
+
+// RUN: %clang_cc1 -triple x86_64-apple-darwin -Wno-varargs -O1 -disable-llvm-passes -emit-llvm -o - %s | opt --passes=instcombine | opt -passes="expand-variadics,default<O1>" -S | FileCheck %s --check-prefixes=CHECK,X64SystemV
+
+// RUN: %clang_cc1 -triple i686-windows-msvc -Wno-varargs -O1 -disable-llvm-passes -emit-llvm -o - %s | opt --passes=instcombine | opt -passes="expand-variadics,default<O1>" -S | FileCheck %s --check-prefixes=CHECK,X86Windows
+
+// 64 bit windows va_arg passes most types indirectly but the call instruction uses the types by value
+// ___: %clang_cc1 -triple x86_64-pc-windows-msvc -Wno-varargs -O1 -disable-llvm-passes -emit-llvm -o - %s | opt --passes=instcombine | opt -passes="expand-variadics,default<O1>" -S | FileCheck %s --check-prefixes=CHECK
+
+// Checks for consistency between clang and expand-variadics
+// 1. Use clang to lower va_arg
+// 2. Use expand-variadics to lower the rest of the variadic operations
+// 3. Use opt -O1 to simplify the result for simpler filecheck patterns
+// The simplification will fail when the two are not consistent, modulo bugs elsewhere.
+
+#include <stdarg.h>
+
+// This test can be simplified when expand-variadics is extended to apply to more patterns.
+// The first_valist and second_valist functions can then be inlined, either in the test or
+// by enabling optimisaton passes in the clang invocation.
+// The explicit instcombine pass canonicalises the variadic function IR.
+
+// More complicated tests need instcombine of ptrmask to land first.
+
+template <typename X, typename Y>
+static X first_valist(va_list va) {
+ return va_arg(va, X);
+}
+
+template <typename X, typename Y>
+static X first(...) {
+ va_list va;
+ __builtin_va_start(va, 0);
+ return first_valist<X,Y>(va);
+}
+
+template <typename X, typename Y>
+static Y second_valist(va_list va) {
+ va_arg(va, X);
+ Y r = va_arg(va, Y);
+ return r;
+}
+
+
+template <typename X, typename Y>
+static Y second(...) {
+ va_list va;
+ __builtin_va_start(va, 0);
+ return second_valist<X,Y>(va);
+}
+
+extern "C"
+{
+// CHECK-LABEL: define{{.*}} i32 @first_i32_i32(i32{{.*}} %x, i32{{.*}} %y)
+// CHECK: entry:
+// CHECK: ret i32 %x
+int first_i32_i32(int x, int y)
+{
+ return first<int,int>(x, y);
+}
+
+// CHECK-LABEL: define{{.*}} i32 @second_i32_i32(i32{{.*}} %x, i32{{.*}} %y)
+// CHECK: entry:
+// CHECK: ret i32 %y
+int second_i32_i32(int x, int y)
+{
+ return second<int,int>(x, y);
+}
+}
+
+// Permutations of an int and a double
+extern "C"
+{
+// CHECK-LABEL: define{{.*}} i32 @first_i32_f64(i32{{.*}} %x, double{{.*}} %y)
+// CHECK: entry:
+// CHECK: ret i32 %x
+int first_i32_f64(int x, double y)
+{
+ return first<int,double>(x, y);
+}
+
+// CHECK-LABEL: define{{.*}} double @second_i32_f64(i32{{.*}} %x, double{{.*}} %y)
+// CHECK: entry:
+
+// X86Linux: ret double %y
+// X64SystemV: ret double %y
+// X86Darwin: ret double %y
+// X86Windows: [[TMP0:%.*]] = alloca <{ [4 x i8], double }>, align 4
+// X86Windows: [[TMP1:%.*]] = getelementptr inbounds i8, ptr [[TMP0]], i32 4
+// X86Windows: store double %y, ptr [[TMP1]], align 4
+// X86Windows: [[TMP2:%.*]] = load double, ptr [[TMP0]], align 4
+// X86Windows: ret double [[TMP2]]
+double second_i32_f64(int x, double y)
+{
+ return second<int,double>(x, y);
+}
+
+// CHECK-LABEL: define{{.*}} double @first_f64_i32(double{{.*}} %x, i32{{.*}} %y)
+// CHECK: entry:
+// CHECK: ret double %x
+double first_f64_i32(double x, int y)
+{
+ return first<double,int>(x, y);
+}
+
+// CHECK-LABEL: define{{.*}} i32 @second_f64_i32(double{{.*}} %x, i32{{.*}} %y)
+// CHECK: entry:
+// CHECK: ret i32 %y
+int second_f64_i32(double x, int y)
+{
+ return second<double,int>(x, y);
+}
+}
diff --git a/llvm/include/llvm/CodeGen/Passes.h b/llvm/include/llvm/CodeGen/Passes.h
index bbfb8a0dbe26a4..fe3208df7a23b2 100644
--- a/llvm/include/llvm/CodeGen/Passes.h
+++ b/llvm/include/llvm/CodeGen/Passes.h
@@ -600,6 +600,10 @@ namespace llvm {
/// Lowers KCFI operand bundles for indirect calls.
FunctionPass *createKCFIPass();
+
+ // Inline variadic functions and expand variadic intrinsics.
+ ModulePass *createExpandVariadicsPass();
+
} // End llvm namespace
#endif
diff --git a/llvm/include/llvm/InitializePasses.h b/llvm/include/llvm/InitializePasses.h
index 3db639a6872407..6487d0a5e26d1b 100644
--- a/llvm/include/llvm/InitializePasses.h
+++ b/llvm/include/llvm/InitializePasses.h
@@ -106,6 +106,7 @@ void initializeExpandLargeDivRemLegacyPassPass(PassRegistry&);
void initializeExpandMemCmpLegacyPassPass(PassRegistry &);
void initializeExpandPostRAPass(PassRegistry&);
void initializeExpandReductionsPass(PassRegistry&);
+void initializeExpandVariadicsPass(PassRegistry &);
void initializeExpandVectorPredicationPass(PassRegistry &);
void initializeExternalAAWrapperPassPass(PassRegistry&);
void initializeFEntryInserterPass(PassRegistry&);
diff --git a/llvm/include/llvm/Transforms/IPO/ExpandVariadics.h b/llvm/include/llvm/Transforms/IPO/ExpandVariadics.h
new file mode 100644
index 00000000000000..e7ffe343b940e9
--- /dev/null
+++ b/llvm/include/llvm/Transforms/IPO/ExpandVariadics.h
@@ -0,0 +1,17 @@
+#ifndef LLVM_TRANSFORMS_IPO_EXPANDVARIADICS_H
+#define LLVM_TRANSFORMS_IPO_EXPANDVARIADICS_H
+
+#include "llvm/IR/PassManager.h"
+
+namespace llvm {
+
+class Module;
+
+class ExpandVariadicsPass : public PassInfoMixin<ExpandVariadicsPass> {
+public:
+ PreservedAnalyses run(Module &M, ModuleAnalysisManager &AM);
+};
+
+} // end namespace llvm
+
+#endif // LLVM_TRANSFORMS_IPO_EXPANDVARIADICS_H
diff --git a/llvm/lib/Passes/PassBuilder.cpp b/llvm/lib/Passes/PassBuilder.cpp
index c934ec42f6eb15..624fffd233ce56 100644
--- a/llvm/lib/Passes/PassBuilder.cpp
+++ b/llvm/lib/Passes/PassBuilder.cpp
@@ -131,6 +131,7 @@
#include "llvm/Transforms/IPO/DeadArgumentElimination.h"
#include "llvm/Transforms/IPO/ElimAvailExtern.h"
#include "llvm/Transforms/IPO/EmbedBitcodePass.h"
+#include "llvm/Transforms/IPO/ExpandVariadics.h"
#include "llvm/Transforms/IPO/ForceFunctionAttrs.h"
#include "llvm/Transforms/IPO/FunctionAttrs.h"
#include "llvm/Transforms/IPO/FunctionImport.h"
diff --git a/llvm/lib/Passes/PassRegistry.def b/llvm/lib/Passes/PassRegistry.def
index 44511800ccff8d..4ea9493208315a 100644
--- a/llvm/lib/Passes/PassRegistry.def
+++ b/llvm/lib/Passes/PassRegistry.def
@@ -59,6 +59,7 @@ MODULE_PASS("dot-callgraph", CallGraphDOTPrinterPass())
MODULE_PASS("dxil-upgrade", DXILUpgradePass())
MODULE_PASS("elim-avail-extern", EliminateAvailableExternallyPass())
MODULE_PASS("extract-blocks", BlockExtractorPass({}, false))
+MODULE_PASS("expand-variadics", ExpandVariadicsPass())
MODULE_PASS("forceattrs", ForceFunctionAttrsPass())
MODULE_PASS("function-import", FunctionImportPass())
MODULE_PASS("globalopt", GlobalOptPass())
diff --git a/llvm/lib/Transforms/IPO/CMakeLists.txt b/llvm/lib/Transforms/IPO/CMakeLists.txt
index 034f1587ae8df4..b8bd0be91d2232 100644
--- a/llvm/lib/Transforms/IPO/CMakeLists.txt
+++ b/llvm/lib/Transforms/IPO/CMakeLists.txt
@@ -12,6 +12,7 @@ add_llvm_component_library(LLVMipo
DeadArgumentElimination.cpp
ElimAvailExtern.cpp
EmbedBitcodePass.cpp
+ ExpandVariadics.cpp
ExtractGV.cpp
ForceFunctionAttrs.cpp
FunctionAttrs.cpp
diff --git a/llvm/lib/Transforms/IPO/ExpandVariadics.cpp b/llvm/lib/Transforms/IPO/ExpandVariadics.cpp
new file mode 100644
index 00000000000000..4b4c635cef5092
--- /dev/null
+++ b/llvm/lib/Transforms/IPO/ExpandVariadics.cpp
@@ -0,0 +1,723 @@
+//===-- ExpandVariadicsPass.cpp --------------------------------*- C++ -*-=//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// S...
[truncated]
|
@llvm/pr-subscribers-llvm-transforms Author: Jon Chesterfield (JonChesterfield) ChangesThis is a MVP style commit for eliminating variadic functions as an IR pass. It replaces the ... with a va_list value. The implementation strategy is to divide the transform into orthogonal parts that can be tested independently. In particular this seeks to decouple target specific complexity from llvm modelling complexity. The original motivation was to provide variadic function support to amdgpu and nvptx. That's tested and working in the downstream branch from which this patch was extracted. This commit introduces the target specific part of the transform, implemented for 32 and 64 bit x86. Test coverage of more interesting types from clang requires either pull/80002 or complicated filecheck patterns. It rewrites very simple variadic calls into one to an equivalent function taking a va_list. Subsequent patches are needed to:
This patch is known to be incomplete. It is posted in this form to check the strategy and provide a mostly-correct baseline to extend. Initial discussion is at https://discourse.llvm.org/t/rfc-desugar-variadics-codegen-for-new-targets-optimisation-for-existing/72854 Patch is 176.90 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/81058.diff 16 Files Affected:
diff --git a/clang/test/CodeGen/expand-variadic-call.c b/clang/test/CodeGen/expand-variadic-call.c
new file mode 100644
index 00000000000000..fa2b984bec08a5
--- /dev/null
+++ b/clang/test/CodeGen/expand-variadic-call.c
@@ -0,0 +1,273 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
+
+// REQUIRES: x86-registered-target
+// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -target-cpu x86-64-v4 -std=c23 -O1 -ffreestanding -emit-llvm -o - %s | FileCheck %s
+
+// This test sanity checks calling a variadic function with the expansion transform disabled.
+// The IR test cases {arch}/expand-variadic-call-*.ll correspond to IR generated from this test case.
+
+typedef __builtin_va_list va_list;
+#define va_copy(dest, src) __builtin_va_copy(dest, src)
+#define va_start(ap, ...) __builtin_va_start(ap, 0)
+#define va_end(ap) __builtin_va_end(ap)
+#define va_arg(ap, type) __builtin_va_arg(ap, type)
+
+// 32 bit x86 alignment uses getTypeStackAlign for special cases
+// Whitebox testing.
+// Needs a type >= 16 which is either a simd or a struct containing a simd
+// darwinvectorabi should force 4 bytes
+// linux vectors with align 16/32/64 return that alignment
+
+
+void wrapped(va_list);
+
+// CHECK-LABEL: @codegen_for_copy(
+// CHECK-NEXT: entry:
+// CHECK-NEXT: [[CP:%.*]] = alloca [1 x %struct.__va_list_tag], align 16
+// CHECK-NEXT: call void @llvm.lifetime.start.p0(i64 24, ptr nonnull [[CP]]) #[[ATTR7:[0-9]+]]
+// CHECK-NEXT: call void @llvm.va_copy(ptr nonnull [[CP]], ptr [[X:%.*]])
+// CHECK-NEXT: call void @wrapped(ptr noundef nonnull [[CP]]) #[[ATTR8:[0-9]+]]
+// CHECK-NEXT: call void @llvm.va_end(ptr [[CP]])
+// CHECK-NEXT: call void @llvm.lifetime.end.p0(i64 24, ptr nonnull [[CP]]) #[[ATTR7]]
+// CHECK-NEXT: ret void
+//
+void codegen_for_copy(va_list x)
+{
+ va_list cp;
+ va_copy(cp, x);
+ wrapped(cp);
+ va_end(cp);
+}
+
+
+// CHECK-LABEL: @vararg(
+// CHECK-NEXT: entry:
+// CHECK-NEXT: [[VA:%.*]] = alloca [1 x %struct.__va_list_tag], align 16
+// CHECK-NEXT: call void @llvm.lifetime.start.p0(i64 24, ptr nonnull [[VA]]) #[[ATTR7]]
+// CHECK-NEXT: call void @llvm.va_start(ptr nonnull [[VA]])
+// CHECK-NEXT: call void @wrapped(ptr noundef nonnull [[VA]]) #[[ATTR8]]
+// CHECK-NEXT: call void @llvm.va_end(ptr [[VA]])
+// CHECK-NEXT: call void @llvm.lifetime.end.p0(i64 24, ptr nonnull [[VA]]) #[[ATTR7]]
+// CHECK-NEXT: ret void
+//
+ void vararg(...)
+{
+ va_list va;
+ __builtin_va_start(va, 0);
+ wrapped(va);
+ va_end(va);
+}
+
+// vectors with alignment 16/32/64 are natively aligned on linux x86
+// v32f32 would be a m1024 type, larger than x64 defines at time of writing
+typedef int i32;
+typedef float v4f32 __attribute__((__vector_size__(16), __aligned__(16)));
+typedef float v8f32 __attribute__((__vector_size__(32), __aligned__(32)));
+typedef float v16f32 __attribute__((__vector_size__(64), __aligned__(64)));
+typedef float v32f32 __attribute__((__vector_size__(128), __aligned__(128)));
+
+
+// Pass a single value to wrapped() via vararg(...)
+// CHECK-LABEL: @single_i32(
+// CHECK-NEXT: entry:
+// CHECK-NEXT: tail call void (...) @vararg(i32 noundef [[X:%.*]]) #[[ATTR9:[0-9]+]]
+// CHECK-NEXT: ret void
+//
+void single_i32(i32 x)
+{
+ vararg(x);
+}
+
+// CHECK-LABEL: @single_double(
+// CHECK-NEXT: entry:
+// CHECK-NEXT: tail call void (...) @vararg(double noundef [[X:%.*]]) #[[ATTR9]]
+// CHECK-NEXT: ret void
+//
+void single_double(double x)
+{
+ vararg(x);
+}
+
+// CHECK-LABEL: @single_v4f32(
+// CHECK-NEXT: entry:
+// CHECK-NEXT: tail call void (...) @vararg(<4 x float> noundef [[X:%.*]]) #[[ATTR9]]
+// CHECK-NEXT: ret void
+//
+void single_v4f32(v4f32 x)
+{
+ vararg(x);
+}
+
+// CHECK-LABEL: @single_v8f32(
+// CHECK-NEXT: entry:
+// CHECK-NEXT: tail call void (...) @vararg(<8 x float> noundef [[X:%.*]]) #[[ATTR9]]
+// CHECK-NEXT: ret void
+//
+void single_v8f32(v8f32 x)
+{
+ vararg(x);
+}
+
+// CHECK-LABEL: @single_v16f32(
+// CHECK-NEXT: entry:
+// CHECK-NEXT: tail call void (...) @vararg(<16 x float> noundef [[X:%.*]]) #[[ATTR9]]
+// CHECK-NEXT: ret void
+//
+void single_v16f32(v16f32 x)
+{
+ vararg(x);
+}
+
+// CHECK-LABEL: @single_v32f32(
+// CHECK-NEXT: entry:
+// CHECK-NEXT: [[INDIRECT_ARG_TEMP:%.*]] = alloca <32 x float>, align 128
+// CHECK-NEXT: [[X:%.*]] = load <32 x float>, ptr [[TMP0:%.*]], align 128, !tbaa [[TBAA2:![0-9]+]]
+// CHECK-NEXT: store <32 x float> [[X]], ptr [[INDIRECT_ARG_TEMP]], align 128, !tbaa [[TBAA2]]
+// CHECK-NEXT: tail call void (...) @vararg(ptr noundef nonnull byval(<32 x float>) align 128 [[INDIRECT_ARG_TEMP]]) #[[ATTR9]]
+// CHECK-NEXT: ret void
+//
+void single_v32f32(v32f32 x)
+{
+ vararg(x);
+}
+
+
+
+// CHECK-LABEL: @i32_double(
+// CHECK-NEXT: entry:
+// CHECK-NEXT: tail call void (...) @vararg(i32 noundef [[X:%.*]], double noundef [[Y:%.*]]) #[[ATTR9]]
+// CHECK-NEXT: ret void
+//
+void i32_double(i32 x, double y)
+{
+ vararg(x, y);
+}
+
+// CHECK-LABEL: @double_i32(
+// CHECK-NEXT: entry:
+// CHECK-NEXT: tail call void (...) @vararg(double noundef [[X:%.*]], i32 noundef [[Y:%.*]]) #[[ATTR9]]
+// CHECK-NEXT: ret void
+//
+void double_i32(double x, i32 y)
+{
+ vararg(x, y);
+}
+
+
+// A struct used by libc variadic tests
+
+typedef struct {
+ char c;
+ short s;
+ int i;
+ long l;
+ float f;
+ double d;
+} libcS;
+
+// CHECK-LABEL: @i32_libcS(
+// CHECK-NEXT: entry:
+// CHECK-NEXT: tail call void (...) @vararg(i32 noundef [[X:%.*]], ptr noundef nonnull byval([[STRUCT_LIBCS:%.*]]) align 8 [[Y:%.*]]) #[[ATTR9]]
+// CHECK-NEXT: ret void
+//
+void i32_libcS(i32 x, libcS y)
+{
+ vararg(x, y);
+}
+
+// CHECK-LABEL: @libcS_i32(
+// CHECK-NEXT: entry:
+// CHECK-NEXT: tail call void (...) @vararg(ptr noundef nonnull byval([[STRUCT_LIBCS:%.*]]) align 8 [[X:%.*]], i32 noundef [[Y:%.*]]) #[[ATTR9]]
+// CHECK-NEXT: ret void
+//
+void libcS_i32(libcS x, i32 y)
+{
+ vararg(x, y);
+}
+
+
+// CHECK-LABEL: @i32_v4f32(
+// CHECK-NEXT: entry:
+// CHECK-NEXT: tail call void (...) @vararg(i32 noundef [[X:%.*]], <4 x float> noundef [[Y:%.*]]) #[[ATTR9]]
+// CHECK-NEXT: ret void
+//
+void i32_v4f32(i32 x, v4f32 y)
+{
+ vararg(x, y);
+}
+
+// CHECK-LABEL: @v4f32_i32(
+// CHECK-NEXT: entry:
+// CHECK-NEXT: tail call void (...) @vararg(<4 x float> noundef [[X:%.*]], i32 noundef [[Y:%.*]]) #[[ATTR9]]
+// CHECK-NEXT: ret void
+//
+void v4f32_i32(v4f32 x, i32 y)
+{
+ vararg(x, y);
+}
+
+// CHECK-LABEL: @i32_v8f32(
+// CHECK-NEXT: entry:
+// CHECK-NEXT: tail call void (...) @vararg(i32 noundef [[X:%.*]], <8 x float> noundef [[Y:%.*]]) #[[ATTR9]]
+// CHECK-NEXT: ret void
+//
+void i32_v8f32(i32 x, v8f32 y)
+{
+ vararg(x, y);
+}
+
+// CHECK-LABEL: @v8f32_i32(
+// CHECK-NEXT: entry:
+// CHECK-NEXT: tail call void (...) @vararg(<8 x float> noundef [[X:%.*]], i32 noundef [[Y:%.*]]) #[[ATTR9]]
+// CHECK-NEXT: ret void
+//
+void v8f32_i32(v8f32 x, i32 y)
+{
+ vararg(x, y);
+}
+
+// CHECK-LABEL: @i32_v16f32(
+// CHECK-NEXT: entry:
+// CHECK-NEXT: tail call void (...) @vararg(i32 noundef [[X:%.*]], <16 x float> noundef [[Y:%.*]]) #[[ATTR9]]
+// CHECK-NEXT: ret void
+//
+void i32_v16f32(i32 x, v16f32 y)
+{
+ vararg(x, y);
+}
+
+// CHECK-LABEL: @v16f32_i32(
+// CHECK-NEXT: entry:
+// CHECK-NEXT: tail call void (...) @vararg(<16 x float> noundef [[X:%.*]], i32 noundef [[Y:%.*]]) #[[ATTR9]]
+// CHECK-NEXT: ret void
+//
+void v16f32_i32(v16f32 x, i32 y)
+{
+ vararg(x, y);
+}
+
+// CHECK-LABEL: @i32_v32f32(
+// CHECK-NEXT: entry:
+// CHECK-NEXT: [[INDIRECT_ARG_TEMP:%.*]] = alloca <32 x float>, align 128
+// CHECK-NEXT: [[Y:%.*]] = load <32 x float>, ptr [[TMP0:%.*]], align 128, !tbaa [[TBAA2]]
+// CHECK-NEXT: store <32 x float> [[Y]], ptr [[INDIRECT_ARG_TEMP]], align 128, !tbaa [[TBAA2]]
+// CHECK-NEXT: tail call void (...) @vararg(i32 noundef [[X:%.*]], ptr noundef nonnull byval(<32 x float>) align 128 [[INDIRECT_ARG_TEMP]]) #[[ATTR9]]
+// CHECK-NEXT: ret void
+//
+void i32_v32f32(i32 x, v32f32 y)
+{
+ vararg(x, y);
+}
+
+// CHECK-LABEL: @v32f32_i32(
+// CHECK-NEXT: entry:
+// CHECK-NEXT: [[INDIRECT_ARG_TEMP:%.*]] = alloca <32 x float>, align 128
+// CHECK-NEXT: [[X:%.*]] = load <32 x float>, ptr [[TMP0:%.*]], align 128, !tbaa [[TBAA2]]
+// CHECK-NEXT: store <32 x float> [[X]], ptr [[INDIRECT_ARG_TEMP]], align 128, !tbaa [[TBAA2]]
+// CHECK-NEXT: tail call void (...) @vararg(ptr noundef nonnull byval(<32 x float>) align 128 [[INDIRECT_ARG_TEMP]], i32 noundef [[Y:%.*]]) #[[ATTR9]]
+// CHECK-NEXT: ret void
+//
+void v32f32_i32(v32f32 x, i32 y)
+{
+ vararg(x, y);
+}
diff --git a/clang/test/CodeGen/variadic-wrapper-removal.c b/clang/test/CodeGen/variadic-wrapper-removal.c
new file mode 100644
index 00000000000000..da41dde16f3d73
--- /dev/null
+++ b/clang/test/CodeGen/variadic-wrapper-removal.c
@@ -0,0 +1,86 @@
+// REQUIRES: x86-registered-target
+// RUN: %clang_cc1 -triple i386-unknown-linux-gnu -O1 -emit-llvm -o - %s | opt --passes=expand-variadics -S | FileCheck %s
+// RUN: %clang_cc1 -triple=x86_64-linux-gnu -O1 -emit-llvm -o - %s | opt --passes=expand-variadics -S | FileCheck %s
+
+// neither arm arch is implemented yet, leaving it here as a reminder
+// armv6 is a ptr as far as the struct is concerned, but possibly also a [1 x i32] passed by value
+// that seems insistent, maybe leave 32 bit arm alone for now
+// aarch64 is a struct of five things passed using byval memcpy
+
+// R-N: %clang_cc1 -triple=armv6-none--eabi -O1 -emit-llvm -o - %s | opt --passes=expand-variadics -S | FileCheck %s
+
+// R-N: %clang_cc1 -triple=aarch64-none-linux-gnu -O1 -emit-llvm -o - %s | opt --passes=expand-variadics -S | FileCheck %s
+
+
+
+// expand-variadics rewrites calls to variadic functions into calls to
+// equivalent functions that take a va_list argument. A property of the
+// implementation is that said "equivalent function" may be a pre-existing one.
+// This is equivalent to inlining a sufficiently simple variadic wrapper.
+
+#include <stdarg.h>
+
+typedef int FILE; // close enough for this test
+
+// fprintf is sometimes implemented as a call to vfprintf. That fits the
+// pattern the transform pass recognises - given an implementation of fprintf
+// in the IR module, calls to it can be rewritten into calls into vfprintf.
+
+// CHECK-LABEL: define{{.*}} i32 @fprintf(
+// CHECK-LABEL: define{{.*}} i32 @call_fprintf(
+// CHECK-NOT: @fprintf
+// CHECK: @vfprintf
+int vfprintf(FILE *restrict f, const char *restrict fmt, va_list ap);
+int fprintf(FILE *restrict f, const char *restrict fmt, ...)
+{
+ int ret;
+ va_list ap;
+ va_start(ap, fmt);
+ ret = vfprintf(f, fmt, ap);
+ va_end(ap);
+ return ret;
+}
+int call_fprintf(FILE *f)
+{
+ int x = 42;
+ double y = 3.14;
+ return fprintf(f, "int %d dbl %g\n", x, y);
+}
+
+// Void return type is also OK
+
+// CHECK-LABEL: define{{.*}} void @no_result(
+// CHECK-LABEL: define{{.*}} void @call_no_result(
+// CHECK-NOT: @no_result
+// CHECK: @vno_result
+void vno_result(const char * fmt, va_list);
+void no_result(const char * fmt, ...)
+{
+ va_list ap;
+ va_start(ap, fmt);
+ vno_result(fmt, ap);
+ va_end(ap);
+}
+void call_no_result(FILE *f)
+{
+ int x = 101;
+ no_result("", x);
+}
+
+// The vaend in the forwarding implementation is optional where it's a no-op
+
+// CHECK-LABEL: define{{.*}} i32 @no_vaend(
+// CHECK-LABEL: define{{.*}} i32 @call_no_vaend(
+// CHECK-NOT: @no_vaend
+// CHECK: @vno_vaend
+int vno_vaend(int x, va_list);
+int no_vaend(int x, ...)
+{
+ va_list ap;
+ va_start(ap, x);
+ return vno_vaend(x, ap);
+}
+int call_no_vaend(int x)
+{
+ return no_vaend(x, 10, 2.5f);
+}
diff --git a/clang/test/CodeGenCXX/inline-then-fold-variadics.cpp b/clang/test/CodeGenCXX/inline-then-fold-variadics.cpp
new file mode 100644
index 00000000000000..cf436ead77a2cb
--- /dev/null
+++ b/clang/test/CodeGenCXX/inline-then-fold-variadics.cpp
@@ -0,0 +1,117 @@
+// RUN: %clang_cc1 -triple i386-unknown-linux-gnu -Wno-varargs -O1 -disable-llvm-passes -emit-llvm -o - %s | opt --passes=instcombine | opt -passes="expand-variadics,default<O1>" -S | FileCheck %s --check-prefixes=CHECK,X86Linux
+
+// RUN: %clang_cc1 -triple x86_64-linux-gnu -Wno-varargs -O1 -disable-llvm-passes -emit-llvm -o - %s | opt --passes=instcombine | opt -passes="expand-variadics,default<O1>" -S | FileCheck %s --check-prefixes=CHECK,X64SystemV
+
+// RUN: %clang_cc1 -triple i386-apple-darwin -Wno-varargs -O1 -disable-llvm-passes -emit-llvm -o - %s | opt --passes=instcombine | opt -passes="expand-variadics,default<O1>" -S | FileCheck %s --check-prefixes=CHECK,X86Darwin
+
+// RUN: %clang_cc1 -triple x86_64-apple-darwin -Wno-varargs -O1 -disable-llvm-passes -emit-llvm -o - %s | opt --passes=instcombine | opt -passes="expand-variadics,default<O1>" -S | FileCheck %s --check-prefixes=CHECK,X64SystemV
+
+// RUN: %clang_cc1 -triple i686-windows-msvc -Wno-varargs -O1 -disable-llvm-passes -emit-llvm -o - %s | opt --passes=instcombine | opt -passes="expand-variadics,default<O1>" -S | FileCheck %s --check-prefixes=CHECK,X86Windows
+
+// 64 bit windows va_arg passes most types indirectly but the call instruction uses the types by value
+// ___: %clang_cc1 -triple x86_64-pc-windows-msvc -Wno-varargs -O1 -disable-llvm-passes -emit-llvm -o - %s | opt --passes=instcombine | opt -passes="expand-variadics,default<O1>" -S | FileCheck %s --check-prefixes=CHECK
+
+// Checks for consistency between clang and expand-variadics
+// 1. Use clang to lower va_arg
+// 2. Use expand-variadics to lower the rest of the variadic operations
+// 3. Use opt -O1 to simplify the result for simpler filecheck patterns
+// The simplification will fail when the two are not consistent, modulo bugs elsewhere.
+
+#include <stdarg.h>
+
+// This test can be simplified when expand-variadics is extended to apply to more patterns.
+// The first_valist and second_valist functions can then be inlined, either in the test or
+// by enabling optimisaton passes in the clang invocation.
+// The explicit instcombine pass canonicalises the variadic function IR.
+
+// More complicated tests need instcombine of ptrmask to land first.
+
+template <typename X, typename Y>
+static X first_valist(va_list va) {
+ return va_arg(va, X);
+}
+
+template <typename X, typename Y>
+static X first(...) {
+ va_list va;
+ __builtin_va_start(va, 0);
+ return first_valist<X,Y>(va);
+}
+
+template <typename X, typename Y>
+static Y second_valist(va_list va) {
+ va_arg(va, X);
+ Y r = va_arg(va, Y);
+ return r;
+}
+
+
+template <typename X, typename Y>
+static Y second(...) {
+ va_list va;
+ __builtin_va_start(va, 0);
+ return second_valist<X,Y>(va);
+}
+
+extern "C"
+{
+// CHECK-LABEL: define{{.*}} i32 @first_i32_i32(i32{{.*}} %x, i32{{.*}} %y)
+// CHECK: entry:
+// CHECK: ret i32 %x
+int first_i32_i32(int x, int y)
+{
+ return first<int,int>(x, y);
+}
+
+// CHECK-LABEL: define{{.*}} i32 @second_i32_i32(i32{{.*}} %x, i32{{.*}} %y)
+// CHECK: entry:
+// CHECK: ret i32 %y
+int second_i32_i32(int x, int y)
+{
+ return second<int,int>(x, y);
+}
+}
+
+// Permutations of an int and a double
+extern "C"
+{
+// CHECK-LABEL: define{{.*}} i32 @first_i32_f64(i32{{.*}} %x, double{{.*}} %y)
+// CHECK: entry:
+// CHECK: ret i32 %x
+int first_i32_f64(int x, double y)
+{
+ return first<int,double>(x, y);
+}
+
+// CHECK-LABEL: define{{.*}} double @second_i32_f64(i32{{.*}} %x, double{{.*}} %y)
+// CHECK: entry:
+
+// X86Linux: ret double %y
+// X64SystemV: ret double %y
+// X86Darwin: ret double %y
+// X86Windows: [[TMP0:%.*]] = alloca <{ [4 x i8], double }>, align 4
+// X86Windows: [[TMP1:%.*]] = getelementptr inbounds i8, ptr [[TMP0]], i32 4
+// X86Windows: store double %y, ptr [[TMP1]], align 4
+// X86Windows: [[TMP2:%.*]] = load double, ptr [[TMP0]], align 4
+// X86Windows: ret double [[TMP2]]
+double second_i32_f64(int x, double y)
+{
+ return second<int,double>(x, y);
+}
+
+// CHECK-LABEL: define{{.*}} double @first_f64_i32(double{{.*}} %x, i32{{.*}} %y)
+// CHECK: entry:
+// CHECK: ret double %x
+double first_f64_i32(double x, int y)
+{
+ return first<double,int>(x, y);
+}
+
+// CHECK-LABEL: define{{.*}} i32 @second_f64_i32(double{{.*}} %x, i32{{.*}} %y)
+// CHECK: entry:
+// CHECK: ret i32 %y
+int second_f64_i32(double x, int y)
+{
+ return second<double,int>(x, y);
+}
+}
diff --git a/llvm/include/llvm/CodeGen/Passes.h b/llvm/include/llvm/CodeGen/Passes.h
index bbfb8a0dbe26a4..fe3208df7a23b2 100644
--- a/llvm/include/llvm/CodeGen/Passes.h
+++ b/llvm/include/llvm/CodeGen/Passes.h
@@ -600,6 +600,10 @@ namespace llvm {
/// Lowers KCFI operand bundles for indirect calls.
FunctionPass *createKCFIPass();
+
+ // Inline variadic functions and expand variadic intrinsics.
+ ModulePass *createExpandVariadicsPass();
+
} // End llvm namespace
#endif
diff --git a/llvm/include/llvm/InitializePasses.h b/llvm/include/llvm/InitializePasses.h
index 3db639a6872407..6487d0a5e26d1b 100644
--- a/llvm/include/llvm/InitializePasses.h
+++ b/llvm/include/llvm/InitializePasses.h
@@ -106,6 +106,7 @@ void initializeExpandLargeDivRemLegacyPassPass(PassRegistry&);
void initializeExpandMemCmpLegacyPassPass(PassRegistry &);
void initializeExpandPostRAPass(PassRegistry&);
void initializeExpandReductionsPass(PassRegistry&);
+void initializeExpandVariadicsPass(PassRegistry &);
void initializeExpandVectorPredicationPass(PassRegistry &);
void initializeExternalAAWrapperPassPass(PassRegistry&);
void initializeFEntryInserterPass(PassRegistry&);
diff --git a/llvm/include/llvm/Transforms/IPO/ExpandVariadics.h b/llvm/include/llvm/Transforms/IPO/ExpandVariadics.h
new file mode 100644
index 00000000000000..e7ffe343b940e9
--- /dev/null
+++ b/llvm/include/llvm/Transforms/IPO/ExpandVariadics.h
@@ -0,0 +1,17 @@
+#ifndef LLVM_TRANSFORMS_IPO_EXPANDVARIADICS_H
+#define LLVM_TRANSFORMS_IPO_EXPANDVARIADICS_H
+
+#include "llvm/IR/PassManager.h"
+
+namespace llvm {
+
+class Module;
+
+class ExpandVariadicsPass : public PassInfoMixin<ExpandVariadicsPass> {
+public:
+ PreservedAnalyses run(Module &M, ModuleAnalysisManager &AM);
+};
+
+} // end namespace llvm
+
+#endif // LLVM_TRANSFORMS_IPO_EXPANDVARIADICS_H
diff --git a/llvm/lib/Passes/PassBuilder.cpp b/llvm/lib/Passes/PassBuilder.cpp
index c934ec42f6eb15..624fffd233ce56 100644
--- a/llvm/lib/Passes/PassBuilder.cpp
+++ b/llvm/lib/Passes/PassBuilder.cpp
@@ -131,6 +131,7 @@
#include "llvm/Transforms/IPO/DeadArgumentElimination.h"
#include "llvm/Transforms/IPO/ElimAvailExtern.h"
#include "llvm/Transforms/IPO/EmbedBitcodePass.h"
+#include "llvm/Transforms/IPO/ExpandVariadics.h"
#include "llvm/Transforms/IPO/ForceFunctionAttrs.h"
#include "llvm/Transforms/IPO/FunctionAttrs.h"
#include "llvm/Transforms/IPO/FunctionImport.h"
diff --git a/llvm/lib/Passes/PassRegistry.def b/llvm/lib/Passes/PassRegistry.def
index 44511800ccff8d..4ea9493208315a 100644
--- a/llvm/lib/Passes/PassRegistry.def
+++ b/llvm/lib/Passes/PassRegistry.def
@@ -59,6 +59,7 @@ MODULE_PASS("dot-callgraph", CallGraphDOTPrinterPass())
MODULE_PASS("dxil-upgrade", DXILUpgradePass())
MODULE_PASS("elim-avail-extern", EliminateAvailableExternallyPass())
MODULE_PASS("extract-blocks", BlockExtractorPass({}, false))
+MODULE_PASS("expand-variadics", ExpandVariadicsPass())
MODULE_PASS("forceattrs", ForceFunctionAttrsPass())
MODULE_PASS("function-import", FunctionImportPass())
MODULE_PASS("globalopt", GlobalOptPass())
diff --git a/llvm/lib/Transforms/IPO/CMakeLists.txt b/llvm/lib/Transforms/IPO/CMakeLists.txt
index 034f1587ae8df4..b8bd0be91d2232 100644
--- a/llvm/lib/Transforms/IPO/CMakeLists.txt
+++ b/llvm/lib/Transforms/IPO/CMakeLists.txt
@@ -12,6 +12,7 @@ add_llvm_component_library(LLVMipo
DeadArgumentElimination.cpp
ElimAvailExtern.cpp
EmbedBitcodePass.cpp
+ ExpandVariadics.cpp
ExtractGV.cpp
ForceFunctionAttrs.cpp
FunctionAttrs.cpp
diff --git a/llvm/lib/Transforms/IPO/ExpandVariadics.cpp b/llvm/lib/Transforms/IPO/ExpandVariadics.cpp
new file mode 100644
index 00000000000000..4b4c635cef5092
--- /dev/null
+++ b/llvm/lib/Transforms/IPO/ExpandVariadics.cpp
@@ -0,0 +1,723 @@
+//===-- ExpandVariadicsPass.cpp --------------------------------*- C++ -*-=//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// S...
[truncated]
|
7285c68
to
c77e660
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this. A few quick nits, overall looks like lots of straightforward testing with a beefy new pass.
virtual void initializeVAList(LLVMContext &Ctx, IRBuilder<> &Builder, | ||
AllocaInst *va_list, Value * /*buffer*/) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we assert a value is non-null, it's best to make it a reference in the interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Expressing the class invariants through the vtable seems harder than it should be. This function is only called if valistOnStack returns true, changing the base to a builtin_unreachable.
}; | ||
|
||
std::unique_ptr<Interface> create(Module &M) { | ||
Triple Trip = Triple(M.getTargetTriple()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit. kind of a non-standard name. I prefer
Triple Trip = Triple(M.getTargetTriple()); | |
llvm::Triple Triple(M.getTargetTriple()); |
uint32_t minimum_slot_align() override { return 4; } | ||
uint32_t maximum_slot_align() override { return 0; } | ||
}; | ||
return std::make_unique<X86Windows>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do these need to be special types? The usage seems like it could just be a config struct to the base class with some pair.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They were a struct with two named fields previously, could go with something like
struct slotAlignTy {
uint32_t min;
uint32_t max;
};
slotAlignTy slot_align() override {return {0, 4}};
Or put state in the base class, that probably optimises better than virtual functions that return constants.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the state in the base class approach. Const fields at fixed offset, initialisation specified where it was before. Would be clearer with named parameters but the min, max sequence convention is probably good enough.
static char ID; | ||
std::unique_ptr<VariadicABIInfo::Interface> ABI; | ||
|
||
// todo: drop the boolean |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// todo: drop the boolean | |
// TODO: drop the boolean |
} | ||
}; | ||
|
||
Function *ExpandVariadics::isFunctionInlinable(Module &M, Function *F) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel a lot of these arguments should be references since we're not checking for null.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a fair amount of pointer compares. Target of call instruction to function and similar. Could go with references everywhere, but the &reference == pointer
construct reads poorly to me.
54b09d5
to
b9e21af
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some more comments
AllocaInst * /*va_list*/, Value * /*buffer*/) { | ||
// Function needs to be implemented if valist is on the stack | ||
assert(!valistOnStack()); | ||
__builtin_unreachable(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is confusing. If this function is never to be called why not make it pure virtual? __builtin_unreachable()
is an optimization hint while llvm_unreachable
is a runtime assertion that the code should not be executed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, llvm_unreachable
is usually used inside LLVM, so definitely better to use llvm_unreachable
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've gone with more comments, but maybe to make this clear enough I need to separate the field alignment from whether va_list is a void* or something that needs more work. The function is called when it is overridden and it's not great to provide overrides that are themselves not called.
if (!isa<ReturnInst>(BB.getTerminator())) { | ||
return nullptr; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style nit. braces around single line block
return nullptr; | ||
} | ||
|
||
// Walk the block in order checking for specific instructions, some of them |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: style. LLVM uses PascalCase
for variables in all locations below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Advice sought on how to PascalCase variables with va_list in the name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's a well-understood thing then we usually ignore it, like argc
. But otherwise probably VAList
.
// with it, such that target specific va_arg instructions will correctly | ||
// iterate over it. Primarily this means getting the alignment right. | ||
|
||
class { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure how I feel about a big anonymous class in the middle of this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm expecting some resistance to large functions. The cyclometric complexity is very low - they read linearly from top to bottom - but nevertheless people might prefer blocks of code with a single use to be moved further from it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally don't like it, I'd rather see the class moved outside the function. I understand that it's only used by one function but IMO a class def in the middle of a function breaks the "flow" and makes it harder to read it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I don't think I've encountered any anonymous classes in LLVM. I've only ever seen anonymous structs that are POD without any members. You call this Frame
, so just move it out and call it FrameTy
or something.
enum { N = 4 }; | ||
SmallVector<Type *, N> fieldTypes; | ||
SmallVector<std::pair<Value *, bool>, N> maybeValueIsByval; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually it's easiest to leave off the template argument and let the type default to something sane unless there's an explicit. I think by default you get 64 bytes, so that'll likely be 8 or so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The vectors are always the same length as each other, does that affect your recommendation to use a default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well the size is only relevant for the small size optimizations until we do malloc. I suppose if you know ahead of time that it's less likely to need malloc with 4 on both it could be argued, but if they're different sizes and it's not a performance issue it's usually cleaner.
if (Frame.empty()) { | ||
// Not passing anything, hopefully va_arg won't try to dereference it | ||
// Might be a target specific thing whether one can pass nullptr instead | ||
// of undef i32 | ||
Frame.append(Type::getInt32Ty(Ctx), nullptr, false); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (Frame.empty()) { | |
// Not passing anything, hopefully va_arg won't try to dereference it | |
// Might be a target specific thing whether one can pass nullptr instead | |
// of undef i32 | |
Frame.append(Type::getInt32Ty(Ctx), nullptr, false); | |
} | |
// Not passing anything, hopefully va_arg won't try to dereference it | |
// Might be a target specific thing whether one can pass nullptr instead | |
// of undef i32 | |
if (Frame.empty()) | |
Frame.append(Type::getInt32Ty(Ctx), nullptr, false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not keen - the comment is true within the scope of the if and false outside of the scope. If deleting the braces is important I can rephrase the comment (probably by writing 'if' in the comment as well as in the code)
@@ -0,0 +1,17 @@ | |||
#ifndef LLVM_TRANSFORMS_IPO_EXPANDVARIADICS_H |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LLVM copyright header as well as (brief) documentation of the pass
AllocaInst * /*va_list*/, Value * /*buffer*/) { | ||
// Function needs to be implemented if valist is on the stack | ||
assert(!valistOnStack()); | ||
__builtin_unreachable(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, llvm_unreachable
is usually used inside LLVM, so definitely better to use llvm_unreachable
here.
bool VACopyIsMemcpy() { return true; } | ||
}; | ||
|
||
struct X64SystemV final : public Interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this part needs to be in the backend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really hope not. It doesn't duplicate information that the backend already knows and IR passes reaching into the backend is a mess. Likewise reaching into the front end to try to deduplicate against clang, which has the complement to this information, will have a severe adverse effect on legibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it doesn't look like a good idea to handle multiple targets in one single file. Here it just includes x86 family. There are those many targets supporting variadic functions out there. FWIW, I don't think the information needed for this pass becomes "messy" in backend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It scales sublinearly in number of targets and x86 is about the worst case. Scattering the information across all the backends does nothing to decrease the amount of information needed.
I don't think we should try to put an abstraction over this until the switch has become unwieldy. Partly because that day may not come, partly because the details are likely to change (further) under review, and partly because the last time I tried to export information from the backend the infra didn't work with the new pass manager.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I fixed your target information issue, you just need to add the TargetMachine to the pass constructor: f7dcabe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for patching the lds pass, commit is a useful reference on the pass manager boilerplate.
Do we have a targetmachine instance available in the middle of opt? I thought that was only created once we hit codegen, in which case depending on one would prevent this from being run as an optimisation in the opt O1 or O2 pipeline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The TargetMachine
in PassBuilder seems optional, but I'm wondering if we'd even want to run this pass if there is no TM. I don't especially like seeing target-specific logic in an optimization, but I'm not contributing enough to optimizations either to assert whether it's a bad thing or not
|
||
// All targets currently implemented use a ptr for the valist parameter | ||
Type *vaListParameterType(LLVMContext &Ctx) { | ||
return PointerType::getUnqual(Ctx); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd expect this to use the alloca addrspace?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unclear - va_list as a type is declared in clang as a void* (or maybe a char * for amdgpu) at present. None of them have addrspace annotations. The IR works out nicer if we can change that to an addrspace(5) void*. Semi-related, amdgpu needs a patch to a pointer rounding function that uses ptrmask to tolerate addrspace. Missing from this patch as amdgpu isn't in the MVP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we can change that, it doesn't have to be in the MVP
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we need to change those to use the right address space
bool VACopyIsMemcpy() { return true; } | ||
}; | ||
|
||
struct X64SystemV final : public Interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I fixed your target information issue, you just need to add the TargetMachine to the pass constructor: f7dcabe
const bool IsLinuxABI = Triple.isOSLinux() || Triple.isOSCygMing(); | ||
|
||
switch (Triple.getArch()) { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extra empty line
if (GlobalValue::isInterposableLinkage(F->getLinkage())) | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simpler as F->isDefinitionExact?
Don't we want to do this unconditionally for AMDGPU?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a few conditional statements needed when the later patch lands that permits changing the function interfaces from codegen. This is indeed one of them. I'll check isDefinitionExact.
} | ||
} | ||
|
||
// Branch funnels look like variadic functions but arent: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Branch funnels look like variadic functions but arent: | |
// Branch funnels look like variadic functions but aren't: |
NewCB->setCallingConv(CB->getCallingConv()); | ||
NewCB->copyMetadata(*CB, {LLVMContext::MD_prof, LLVMContext::MD_dbg}); | ||
|
||
if (!CB->use_empty()) // dead branch? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can just replaceAllUsesWith, it shouldn't die on empty uses
NewCB->setAttributes(PAL); | ||
NewCB->takeName(CB); | ||
NewCB->setCallingConv(CB->getCallingConv()); | ||
NewCB->copyMetadata(*CB, {LLVMContext::MD_prof, LLVMContext::MD_dbg}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd expect this to be able to preserve all metadata?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unsure - dead argument elimination preserves exactly these two
attributes #0 = { nounwind "min-legal-vector-width"="0" "no-builtins" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cmov,+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" } | ||
attributes #1 = { mustprogress nocallback nofree nosync nounwind willreturn memory(argmem: readwrite) } | ||
attributes #2 = { mustprogress nocallback nofree nosync nounwind willreturn } | ||
attributes #3 = { "no-builtins" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cmov,+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" } | ||
attributes #4 = { nounwind "min-legal-vector-width"="128" "no-builtins" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cmov,+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" } | ||
attributes #5 = { nounwind } | ||
attributes #6 = { nobuiltin nounwind "no-builtins" } | ||
attributes #7 = { nobuiltin "no-builtins" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be able to drop all of the attributes
attributes #6 = { nobuiltin nounwind "no-builtins" } | ||
attributes #7 = { nobuiltin "no-builtins" } | ||
|
||
!llvm.module.flags = !{!0, !1, !2} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At most you need whatever is required to show the debug info preservation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Debug info is something I haven't really thought about. That's probably something that should have its own test case.
These are aspiring to show that the call frame is correct for the given architecture - variables are at the right offset and so forth - other things like metadata / attribute propagation are target independent and should have their own tests.
I'll look at removing the attributes from these, would give better signal to noise.
ret void | ||
} | ||
|
||
define dso_local void @v32f32_i32(ptr nocapture noundef readonly byval(<32 x float>) align 128 %0, i32 noundef %y) local_unnamed_addr #0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't need all the local_unnamed_addr / dso_local
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, gone.
@@ -0,0 +1,117 @@ | |||
// RUN: %clang_cc1 -triple i386-unknown-linux-gnu -Wno-varargs -O1 -disable-llvm-passes -emit-llvm -o - %s | opt --passes=instcombine | opt -passes="expand-variadics,default<O1>" -S | FileCheck %s --check-prefixes=CHECK,X86Linux |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can do this in one opt run?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope. There's something weird/broken with opt here. Failure reproduces with other passes. Issue #81128
void v32f32_i32(v32f32 x, i32 y) | ||
{ | ||
vararg(x, y); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also get the half and bfloat cases
Patch run through |
b9e21af
to
122a305
Compare
122a305
to
956f4f9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My comments are mostly about style, I haven't done a deep dive into the logic of the pass yet
if (!Equivalent) | ||
return Changed; | ||
|
||
for (User *U : llvm::make_early_inc_range(F->users())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you actually need llvm::
for those?
case VariadicABIInfo::ValistCc::value: { | ||
// If it's being passed by value, need a load | ||
// TODO: Check it's loading the right thing | ||
auto *load = dyn_cast<LoadInst>(&*Iter); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CamelCase
if (!Call) | ||
return nullptr; | ||
|
||
if (auto *end = dyn_cast<VAEndInst>(&*Iter)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CamelCase
// Not totally convinced this is necessary but dead store elimination | ||
// decides to discard the stores to the alloca and pass uninitialised | ||
// memory along instead when the function is marked tailcall | ||
if (TCK == CallInst::TCK_Tail) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can drop {}
NewCB->copyMetadata(*CB, {LLVMContext::MD_prof, LLVMContext::MD_dbg}); | ||
|
||
if (!CB->use_empty()) // dead branch? | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can drop {}
too
if (CB->isMustTailCall()) { | ||
return false; | ||
} | ||
|
||
if (!CB->isCallee(&U) || CB->getFunctionType() != F->getFunctionType()) { | ||
return false; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
both can drop {}
if (Triple.isOSDarwin()) { | ||
return std::make_unique<VariadicABIInfo::X64SystemV>(); | ||
} | ||
|
||
if (IsLinuxABI) { | ||
return std::make_unique<VariadicABIInfo::X64SystemV>(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
both can drop {}
// aarch64 uses byval | ||
enum class ValistCc { value, pointer, /*byval*/ }; | ||
|
||
struct Interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interface
is a very generic name, can you make it a bit more specific and add docs?
bool VACopyIsMemcpy() { return true; } | ||
}; | ||
|
||
struct X64SystemV final : public Interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The TargetMachine
in PassBuilder seems optional, but I'm wondering if we'd even want to run this pass if there is no TM. I don't especially like seeing target-specific logic in an optimization, but I'm not contributing enough to optimizations either to assert whether it's a bad thing or not
|
||
if (IsLinuxABI) { | ||
struct X86Linux final : public Interface { | ||
X86Linux() : Interface(4, 0) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't understand, why can't this just be a std::make_unique<Interface>(4, 0)
? Why the need for a derived type that does not add any feature/override?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really like the whole "sufficiently simple function" thing. It seems fragile. You should be able to just take a arbitrary internal varargs function, rewrite its signature to take a va_list argument, rewrite calls to va_start to make a copy of that va_list, and rewrite the callers to construct that va_list. If that function turns out to be inlinable, great; if not, you haven't really lost anything.
(Rewriting the signature of a function is complicated in its own way because you need to allocate a new Function, then transplant the original function's body into it. But it's not uncharted territory: we should be able to refactor code out of llvm/lib/Transforms/IPO/ArgumentPromotion.cpp .)
Do we have a testing plan for this? Messing up calling convention stuff tends to lead to extremely subtle bugs.
// varargs can't be tail called anyway | ||
// Not totally convinced this is necessary but dead store elimination | ||
// decides to discard the stores to the alloca and pass uninitialised | ||
// memory along instead when the function is marked tailcall |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As defined in LangRef, "tail" actually just means "doesn't reference any memory allocated in the current call's stack". It's a prerequisite for tail-calling a function, but doesn't indicate the function is actually tail-callable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll have to recheck dead store elimination. If the tail call flag isn't removed, DSE turns the IR into undef, which seems wrong to me - but maybe it's fine and I need to reword this comment
High level question: |
Why have the x86 tests been placed in test\CodeGen\X86 instead of something like test\Transforms\ExpandVariadics\X86 ? |
Yes, you can and I do. That's patch 2 of the series, numbered 1. in the list in the commit message (for this is 0)
And this is why it's a separate patch. The rewrite-call-instruction is the target dependent bit, in this patch for an initial target. The rewrite-function is (almost) target agnostic and involves a surprisingly large amount of book keeping. We have duplicated code in ArgumentPromotion, dead argument removal, the function cloning in attributor and probably elsewhere.
This would be why I want to do it all in IR, staying away from the backend lowering. It's also why it's structured as orthogonal transforms with independent testing. Fun story, the whole reason this exists for x64 is that I had such a rough time debugging it natively on amdgpu that it was easier to generalise to x64. |
This patch will rewrite calls to a variadic function into calls to a function taking a va_list. Later patches expand that to cover the rest of the cases. More complicated functions, other targets etc. Note that calls to variadic function pointers cannot generally be rewritten without permitting ABI changes, which is what I plan to do for nvptx and amdgpu. |
Not sure if this means isFunctionInlinable will go away in the followup patch, or if you plan to rewrite functions in a way that satisfies isFunctionInlinable. I think the end state should be that all functions go down the same codepath, not conditionally do something different based on whether they're "simple". I guess I don't have a strong preference for how you get there, though. |
The logic I've got at present (which include the ABI rewriting) is bool usefulToSplit =
splitFunctions() && (!F->isDeclaration() || rewriteABI());
// F may already be a single basic block calling a known function
// that takes a va_list, in which case it doens't need to be split.
Function *Equivalent = isFunctionInlinable(M, F);
if (usefulToSplit && !Equivalent) {
Equivalent = DeriveInlinableVariadicFunctionPair(M, *F);
assert(Equivalent);
assert(isFunctionInlinable(M, *F)); // branch doesn't do this presently but it could do
changed = true;
functionToInliningTarget[F] = Equivalent;
} I'm not especially attached to the specific control flow. The two transforms - inlining/call-rewrite and splitting an entry block off the variadic function function so that it can be inlined - are genuinely orthogonal and that is a really good property to preserve. It took some thought to get to that structure. If we ignore that design and run functions through the block splitting unnecessarily, we win a combinatorial increase in required testing, a decrease in emitted code quality (spurious extra functions), an inability to pattern match on fprintf->vfprintf style code that happens to be in the application already. We would get to delete the isFunctionInlinable predicate. The independent transform pipeline pattern is more important than the no special case branching heuristic. If it helps, view it as two complementary transforms where the one is skipped when it would be a no-op. Related - if there's an objection to landing this as an inactive pass (only exercised by test code) we can put it into an optimisation pipeline immediately, it'll still remove some real world variadic calls even if the later patches don't make it. |
If we do unnecessary splitting, then run the inliner, we should end up with exactly the same thing you'd get by special-casing isFunctionInlinable functions. So that's... very slightly slower compile-time, in exchange for dropping the delicate isFunctionInlinable check. That seems worthwhile to me. There's really a lot of stuff you can easily get wrong in a check like that... for example, it looks like you forgot to check the type of the load, or whether the load is volatile. Not sure what you're referring to with the "combinatorial increase in required testing".
I'm fine with off-by-default as long as we have a roadmap for enabling it at some point. |
Ah OK, so split every variadic definition and let the inliner sort it out afterwards. Yes, I'm good with that. Tests either get messier or add a call to the inliner. Will update the PR correspondingly, solid simplification, thanks! Discard the combinatorial testing comment - I misunderstood the structure you had in mind. Unconditional splitting means the transforms need to land in the other order. First splitter/outliner, second the callinstr rewrite. With the function pattern recogniser deleted the test cases for this won't fire. After the varargs splitter/outliner lands the tests for this will PR make sense again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some nits, mostly just formatting and naming that hasn't been updated.
I agree overall that we should just put this in some canonical form and rely on other LLVM passes to take care of things like inlining. Eager to have this functionality in, so hopefully we can keep this moving.
if (Triple.isOSDarwin()) { | ||
return std::make_unique<VariadicABIInfo::X64SystemV>(); | ||
} | ||
|
||
if (IsLinuxABI) { | ||
return std::make_unique<VariadicABIInfo::X64SystemV>(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (Triple.isOSDarwin()) { | |
return std::make_unique<VariadicABIInfo::X64SystemV>(); | |
} | |
if (IsLinuxABI) { | |
return std::make_unique<VariadicABIInfo::X64SystemV>(); | |
} | |
if (Triple.isOSDarwin()) | |
return std::make_unique<VariadicABIInfo::X64SystemV>(); | |
if (IsLinuxABI) | |
return std::make_unique<VariadicABIInfo::X64SystemV>(); |
Nit, no braces around single blocks.
// with it, such that target specific va_arg instructions will correctly | ||
// iterate over it. Primarily this means getting the alignment right. | ||
|
||
class { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I don't think I've encountered any anonymous classes in LLVM. I've only ever seen anonymous structs that are POD without any members. You call this Frame
, so just move it out and call it FrameTy
or something.
|
||
// Initialise a va_list pointing to that struct and pass it as the last | ||
// argument | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this need to be in its own block? I'd assume if the names VoidPtr
conflict then it would be the same thing. If you don't want the VoidBuffer escaping the scope I'd just pass it directly to the uses in both cases since it's short enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Title should be rephrased; this doesn't have anything to do with inlining
Feedback is merged into #89007 which represents this transform, with the pattern match Eli recommended against removed (so some test cases now need to run the inliner), the anonymous class is moved, the multiple patch sequence idea is discarded in favour of one patch that does the target specific and the target agnostic mangling bits. |
Marked as "draft" to approximate the state of "available for reference but not a candidate for merging" |
Majority is landed as #93362 |
This is a MVP style commit for eliminating variadic functions as an IR pass. It replaces the ... with a va_list value.
The implementation strategy is to divide the transform into orthogonal parts that can be tested independently. In particular this seeks to decouple target specific complexity from llvm modelling complexity. The original motivation was to provide variadic function support to amdgpu and nvptx. That's tested and working in the downstream branch from which this patch was extracted.
This commit introduces the target specific part of the transform, implemented for 32 and 64 bit x86. Test coverage of more interesting types from clang requires either pull/80002 or complicated filecheck patterns. It rewrites very simple variadic calls into one to an equivalent function taking a va_list.
Subsequent patches are needed to:
This patch is known to be incomplete. It is posted in this form to check the strategy and provide a mostly-correct baseline to extend. Initial discussion is at https://discourse.llvm.org/t/rfc-desugar-variadics-codegen-for-new-targets-optimisation-for-existing/72854
The end point is calls to known variadic functions are replaced with calls to va_list equivalents and thus exposed to inlining etc. The clang test checks that the variadics are eliminated entirely by this pass and existing transforms.
On nvptx, this lowering is compatible with the documentation for ptx.