[transforms] Inline simple variadic functions #81058

JonChesterfield · 2024-02-07T23:49:34Z

This is a MVP style commit for eliminating variadic functions as an IR pass. It replaces the ... with a va_list value.

The implementation strategy is to divide the transform into orthogonal parts that can be tested independently. In particular this seeks to decouple target specific complexity from llvm modelling complexity. The original motivation was to provide variadic function support to amdgpu and nvptx. That's tested and working in the downstream branch from which this patch was extracted.

This commit introduces the target specific part of the transform, implemented for 32 and 64 bit x86. Test coverage of more interesting types from clang requires either pull/80002 or complicated filecheck patterns. It rewrites very simple variadic calls into one to an equivalent function taking a va_list.

Subsequent patches are needed to:

rewrite variadic functions that don't happen to already match this pattern
add more targets - aarch64, amdgpu, nvptx are planned at present
call the pass from codegen to expand all variadic functions for GPU targets
close the holes in test coverage (and implementation)
add this transform to one of the optimisation pipelines, maybe O1
maybe expand va_arg instructions (for the GPUs, or maybe from alternative front ends)

This patch is known to be incomplete. It is posted in this form to check the strategy and provide a mostly-correct baseline to extend. Initial discussion is at https://discourse.llvm.org/t/rfc-desugar-variadics-codegen-for-new-targets-optimisation-for-existing/72854

The end point is calls to known variadic functions are replaced with calls to va_list equivalents and thus exposed to inlining etc. The clang test checks that the variadics are eliminated entirely by this pass and existing transforms.

On nvptx, this lowering is compatible with the documentation for ptx.

llvmbot · 2024-02-07T23:50:02Z

@llvm/pr-subscribers-clang

@llvm/pr-subscribers-backend-x86

Author: Jon Chesterfield (JonChesterfield)

Changes

This is a MVP style commit for eliminating variadic functions as an IR pass. It replaces the ... with a va_list value.

The implementation strategy is to divide the transform into orthogonal parts that can be tested independently. In particular this seeks to decouple target specific complexity from llvm modelling complexity. The original motivation was to provide variadic function support to amdgpu and nvptx. That's tested and working in the downstream branch from which this patch was extracted.

This commit introduces the target specific part of the transform, implemented for 32 and 64 bit x86. Test coverage of more interesting types from clang requires either pull/80002 or complicated filecheck patterns. It rewrites very simple variadic calls into one to an equivalent function taking a va_list.

Subsequent patches are needed to:

rewrite variadic functions that don't happen to already match this pattern
add more targets - aarch64, amdgpu, nvptx are planned at present
call the pass from codegen to expand all variadic functions for GPU targets
close the holes in test coverage (and implementation)
add this transform to one of the optimisation pipelines, maybe O1
maybe expand va_arg instructions (for the GPUs, or maybe from alternative front ends)

This patch is known to be incomplete. It is posted in this form to check the strategy and provide a mostly-correct baseline to extend. Initial discussion is at https://discourse.llvm.org/t/rfc-desugar-variadics-codegen-for-new-targets-optimisation-for-existing/72854

Patch is 176.90 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/81058.diff

16 Files Affected:

(added) clang/test/CodeGen/expand-variadic-call.c (+273)
(added) clang/test/CodeGen/variadic-wrapper-removal.c (+86)
(added) clang/test/CodeGenCXX/inline-then-fold-variadics.cpp (+117)
(modified) llvm/include/llvm/CodeGen/Passes.h (+4)
(modified) llvm/include/llvm/InitializePasses.h (+1)
(added) llvm/include/llvm/Transforms/IPO/ExpandVariadics.h (+17)
(modified) llvm/lib/Passes/PassBuilder.cpp (+1)
(modified) llvm/lib/Passes/PassRegistry.def (+1)
(modified) llvm/lib/Transforms/IPO/CMakeLists.txt (+1)
(added) llvm/lib/Transforms/IPO/ExpandVariadics.cpp (+723)
(added) llvm/test/CodeGen/X86/expand-variadic-call-i386-darwin.ll (+385)
(added) llvm/test/CodeGen/X86/expand-variadic-call-i386-linux.ll (+385)
(added) llvm/test/CodeGen/X86/expand-variadic-call-i686-msvc.ll (+402)
(added) llvm/test/CodeGen/X86/expand-variadic-call-x64-darwin.ll (+589)
(added) llvm/test/CodeGen/X86/expand-variadic-call-x64-linux.ll (+589)
(modified) llvm/utils/gn/secondary/llvm/lib/Transforms/IPO/BUILD.gn (+1)

diff --git a/clang/test/CodeGen/expand-variadic-call.c b/clang/test/CodeGen/expand-variadic-call.c
new file mode 100644
index 00000000000000..fa2b984bec08a5
--- /dev/null
+++ b/clang/test/CodeGen/expand-variadic-call.c
@@ -0,0 +1,273 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
+
+// REQUIRES: x86-registered-target
+// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -target-cpu x86-64-v4 -std=c23 -O1 -ffreestanding -emit-llvm -o - %s | FileCheck %s
+
+// This test sanity checks calling a variadic function with the expansion transform disabled.
+// The IR test cases {arch}/expand-variadic-call-*.ll correspond to IR generated from this test case.
+
+typedef __builtin_va_list va_list;
+#define va_copy(dest, src) __builtin_va_copy(dest, src)
+#define va_start(ap, ...) __builtin_va_start(ap, 0)
+#define va_end(ap) __builtin_va_end(ap)
+#define va_arg(ap, type) __builtin_va_arg(ap, type)
+
+// 32 bit x86 alignment uses getTypeStackAlign for special cases
+// Whitebox testing.
+// Needs a type >= 16 which is either a simd or a struct containing a simd
+// darwinvectorabi should force 4 bytes
+// linux vectors with align 16/32/64 return that alignment
+
+
+void wrapped(va_list);
+
+// CHECK-LABEL: @codegen_for_copy(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[CP:%.*]] = alloca [1 x %struct.__va_list_tag], align 16
+// CHECK-NEXT:    call void @llvm.lifetime.start.p0(i64 24, ptr nonnull [[CP]]) #[[ATTR7:[0-9]+]]
+// CHECK-NEXT:    call void @llvm.va_copy(ptr nonnull [[CP]], ptr [[X:%.*]])
+// CHECK-NEXT:    call void @wrapped(ptr noundef nonnull [[CP]]) #[[ATTR8:[0-9]+]]
+// CHECK-NEXT:    call void @llvm.va_end(ptr [[CP]])
+// CHECK-NEXT:    call void @llvm.lifetime.end.p0(i64 24, ptr nonnull [[CP]]) #[[ATTR7]]
+// CHECK-NEXT:    ret void
+//
+void codegen_for_copy(va_list x)
+{
+  va_list cp;
+  va_copy(cp, x);
+  wrapped(cp);
+  va_end(cp);
+}
+
+
+// CHECK-LABEL: @vararg(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[VA:%.*]] = alloca [1 x %struct.__va_list_tag], align 16
+// CHECK-NEXT:    call void @llvm.lifetime.start.p0(i64 24, ptr nonnull [[VA]]) #[[ATTR7]]
+// CHECK-NEXT:    call void @llvm.va_start(ptr nonnull [[VA]])
+// CHECK-NEXT:    call void @wrapped(ptr noundef nonnull [[VA]]) #[[ATTR8]]
+// CHECK-NEXT:    call void @llvm.va_end(ptr [[VA]])
+// CHECK-NEXT:    call void @llvm.lifetime.end.p0(i64 24, ptr nonnull [[VA]]) #[[ATTR7]]
+// CHECK-NEXT:    ret void
+//
+ void vararg(...)
+{
+  va_list va;
+  __builtin_va_start(va, 0);
+  wrapped(va);
+  va_end(va);
+}
+
+// vectors with alignment 16/32/64 are natively aligned on linux x86
+// v32f32 would be a m1024 type, larger than x64 defines at time of writing
+typedef int i32;
+typedef float v4f32 __attribute__((__vector_size__(16), __aligned__(16)));
+typedef float v8f32 __attribute__((__vector_size__(32), __aligned__(32)));
+typedef float v16f32 __attribute__((__vector_size__(64), __aligned__(64)));
+typedef float v32f32 __attribute__((__vector_size__(128), __aligned__(128)));
+
+
+// Pass a single value to wrapped() via vararg(...)
+// CHECK-LABEL: @single_i32(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    tail call void (...) @vararg(i32 noundef [[X:%.*]]) #[[ATTR9:[0-9]+]]
+// CHECK-NEXT:    ret void
+//
+void single_i32(i32 x)
+{
+  vararg(x);
+}
+
+// CHECK-LABEL: @single_double(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    tail call void (...) @vararg(double noundef [[X:%.*]]) #[[ATTR9]]
+// CHECK-NEXT:    ret void
+//
+void single_double(double x)
+{
+  vararg(x);
+}
+
+// CHECK-LABEL: @single_v4f32(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    tail call void (...) @vararg(<4 x float> noundef [[X:%.*]]) #[[ATTR9]]
+// CHECK-NEXT:    ret void
+//
+void single_v4f32(v4f32 x)
+{
+  vararg(x);
+}
+
+// CHECK-LABEL: @single_v8f32(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    tail call void (...) @vararg(<8 x float> noundef [[X:%.*]]) #[[ATTR9]]
+// CHECK-NEXT:    ret void
+//
+void single_v8f32(v8f32 x)
+{
+  vararg(x);
+}
+
+// CHECK-LABEL: @single_v16f32(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    tail call void (...) @vararg(<16 x float> noundef [[X:%.*]]) #[[ATTR9]]
+// CHECK-NEXT:    ret void
+//
+void single_v16f32(v16f32 x)
+{
+  vararg(x);
+}
+
+// CHECK-LABEL: @single_v32f32(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[INDIRECT_ARG_TEMP:%.*]] = alloca <32 x float>, align 128
+// CHECK-NEXT:    [[X:%.*]] = load <32 x float>, ptr [[TMP0:%.*]], align 128, !tbaa [[TBAA2:![0-9]+]]
+// CHECK-NEXT:    store <32 x float> [[X]], ptr [[INDIRECT_ARG_TEMP]], align 128, !tbaa [[TBAA2]]
+// CHECK-NEXT:    tail call void (...) @vararg(ptr noundef nonnull byval(<32 x float>) align 128 [[INDIRECT_ARG_TEMP]]) #[[ATTR9]]
+// CHECK-NEXT:    ret void
+//
+void single_v32f32(v32f32 x)
+{
+  vararg(x);
+}
+
+
+
+// CHECK-LABEL: @i32_double(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    tail call void (...) @vararg(i32 noundef [[X:%.*]], double noundef [[Y:%.*]]) #[[ATTR9]]
+// CHECK-NEXT:    ret void
+//
+void i32_double(i32 x, double y)
+{
+  vararg(x, y);
+}
+
+// CHECK-LABEL: @double_i32(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    tail call void (...) @vararg(double noundef [[X:%.*]], i32 noundef [[Y:%.*]]) #[[ATTR9]]
+// CHECK-NEXT:    ret void
+//
+void double_i32(double x, i32 y)
+{
+  vararg(x, y);
+}
+
+
+// A struct used by libc variadic tests
+
+typedef struct {
+  char c;
+  short s;
+  int i;
+  long l;
+  float f;
+  double d;
+}  libcS;
+
+// CHECK-LABEL: @i32_libcS(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    tail call void (...) @vararg(i32 noundef [[X:%.*]], ptr noundef nonnull byval([[STRUCT_LIBCS:%.*]]) align 8 [[Y:%.*]]) #[[ATTR9]]
+// CHECK-NEXT:    ret void
+//
+void i32_libcS(i32 x, libcS y)
+{
+  vararg(x, y);
+}
+
+// CHECK-LABEL: @libcS_i32(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    tail call void (...) @vararg(ptr noundef nonnull byval([[STRUCT_LIBCS:%.*]]) align 8 [[X:%.*]], i32 noundef [[Y:%.*]]) #[[ATTR9]]
+// CHECK-NEXT:    ret void
+//
+void libcS_i32(libcS x, i32 y)
+{
+  vararg(x, y);
+}
+
+
+// CHECK-LABEL: @i32_v4f32(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    tail call void (...) @vararg(i32 noundef [[X:%.*]], <4 x float> noundef [[Y:%.*]]) #[[ATTR9]]
+// CHECK-NEXT:    ret void
+//
+void i32_v4f32(i32 x, v4f32 y)
+{
+  vararg(x, y);
+}
+
+// CHECK-LABEL: @v4f32_i32(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    tail call void (...) @vararg(<4 x float> noundef [[X:%.*]], i32 noundef [[Y:%.*]]) #[[ATTR9]]
+// CHECK-NEXT:    ret void
+//
+void v4f32_i32(v4f32 x, i32 y)
+{
+  vararg(x, y);
+}
+
+// CHECK-LABEL: @i32_v8f32(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    tail call void (...) @vararg(i32 noundef [[X:%.*]], <8 x float> noundef [[Y:%.*]]) #[[ATTR9]]
+// CHECK-NEXT:    ret void
+//
+void i32_v8f32(i32 x, v8f32 y)
+{
+  vararg(x, y);
+}
+
+// CHECK-LABEL: @v8f32_i32(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    tail call void (...) @vararg(<8 x float> noundef [[X:%.*]], i32 noundef [[Y:%.*]]) #[[ATTR9]]
+// CHECK-NEXT:    ret void
+//
+void v8f32_i32(v8f32 x, i32 y)
+{
+  vararg(x, y);
+}
+
+// CHECK-LABEL: @i32_v16f32(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    tail call void (...) @vararg(i32 noundef [[X:%.*]], <16 x float> noundef [[Y:%.*]]) #[[ATTR9]]
+// CHECK-NEXT:    ret void
+//
+void i32_v16f32(i32 x, v16f32 y)
+{
+  vararg(x, y);
+}
+
+// CHECK-LABEL: @v16f32_i32(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    tail call void (...) @vararg(<16 x float> noundef [[X:%.*]], i32 noundef [[Y:%.*]]) #[[ATTR9]]
+// CHECK-NEXT:    ret void
+//
+void v16f32_i32(v16f32 x, i32 y)
+{
+  vararg(x, y);
+}
+
+// CHECK-LABEL: @i32_v32f32(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[INDIRECT_ARG_TEMP:%.*]] = alloca <32 x float>, align 128
+// CHECK-NEXT:    [[Y:%.*]] = load <32 x float>, ptr [[TMP0:%.*]], align 128, !tbaa [[TBAA2]]
+// CHECK-NEXT:    store <32 x float> [[Y]], ptr [[INDIRECT_ARG_TEMP]], align 128, !tbaa [[TBAA2]]
+// CHECK-NEXT:    tail call void (...) @vararg(i32 noundef [[X:%.*]], ptr noundef nonnull byval(<32 x float>) align 128 [[INDIRECT_ARG_TEMP]]) #[[ATTR9]]
+// CHECK-NEXT:    ret void
+//
+void i32_v32f32(i32 x, v32f32 y)
+{
+  vararg(x, y);
+}
+
+// CHECK-LABEL: @v32f32_i32(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[INDIRECT_ARG_TEMP:%.*]] = alloca <32 x float>, align 128
+// CHECK-NEXT:    [[X:%.*]] = load <32 x float>, ptr [[TMP0:%.*]], align 128, !tbaa [[TBAA2]]
+// CHECK-NEXT:    store <32 x float> [[X]], ptr [[INDIRECT_ARG_TEMP]], align 128, !tbaa [[TBAA2]]
+// CHECK-NEXT:    tail call void (...) @vararg(ptr noundef nonnull byval(<32 x float>) align 128 [[INDIRECT_ARG_TEMP]], i32 noundef [[Y:%.*]]) #[[ATTR9]]
+// CHECK-NEXT:    ret void
+//
+void v32f32_i32(v32f32 x, i32 y)
+{
+  vararg(x, y);
+}
diff --git a/clang/test/CodeGen/variadic-wrapper-removal.c b/clang/test/CodeGen/variadic-wrapper-removal.c
new file mode 100644
index 00000000000000..da41dde16f3d73
--- /dev/null
+++ b/clang/test/CodeGen/variadic-wrapper-removal.c
@@ -0,0 +1,86 @@
+// REQUIRES: x86-registered-target
+// RUN: %clang_cc1 -triple i386-unknown-linux-gnu -O1 -emit-llvm -o - %s | opt --passes=expand-variadics -S | FileCheck %s
+// RUN: %clang_cc1 -triple=x86_64-linux-gnu -O1 -emit-llvm -o - %s | opt --passes=expand-variadics -S | FileCheck %s
+
+// neither arm arch is implemented yet, leaving it here as a reminder
+// armv6 is a ptr as far as the struct is concerned, but possibly also a [1 x i32] passed by value
+// that seems insistent, maybe leave 32 bit arm alone for now
+// aarch64 is a struct of five things passed using byval memcpy
+
+// R-N: %clang_cc1 -triple=armv6-none--eabi -O1 -emit-llvm -o - %s | opt --passes=expand-variadics -S | FileCheck %s
+
+// R-N: %clang_cc1 -triple=aarch64-none-linux-gnu -O1 -emit-llvm -o - %s | opt --passes=expand-variadics -S | FileCheck %s
+
+
+
+// expand-variadics rewrites calls to variadic functions into calls to
+// equivalent functions that take a va_list argument. A property of the
+// implementation is that said "equivalent function" may be a pre-existing one.
+// This is equivalent to inlining a sufficiently simple variadic wrapper.
+
+#include <stdarg.h>
+
+typedef int FILE; // close enough for this test
+
+// fprintf is sometimes implemented as a call to vfprintf. That fits the
+// pattern the transform pass recognises - given an implementation of fprintf
+// in the IR module, calls to it can be rewritten into calls into vfprintf.
+
+// CHECK-LABEL: define{{.*}} i32 @fprintf(
+// CHECK-LABEL: define{{.*}} i32 @call_fprintf(
+// CHECK-NOT:   @fprintf
+// CHECK:       @vfprintf
+int vfprintf(FILE *restrict f, const char *restrict fmt, va_list ap);
+int fprintf(FILE *restrict f, const char *restrict fmt, ...)
+{
+  int ret;
+  va_list ap;
+  va_start(ap, fmt);
+  ret = vfprintf(f, fmt, ap);
+  va_end(ap);
+  return ret;
+}
+int call_fprintf(FILE *f)
+{
+  int x = 42;
+  double y = 3.14;
+  return fprintf(f, "int %d dbl %g\n", x, y);
+}
+
+// Void return type is also OK
+
+// CHECK-LABEL: define{{.*}} void @no_result(
+// CHECK-LABEL: define{{.*}} void @call_no_result(
+// CHECK-NOT:   @no_result
+// CHECK:       @vno_result
+void vno_result(const char * fmt, va_list);
+void no_result(const char * fmt, ...)
+{
+  va_list ap;
+  va_start(ap, fmt);
+  vno_result(fmt, ap);
+  va_end(ap);
+}
+void call_no_result(FILE *f)
+{
+  int x = 101;
+  no_result("", x);
+}
+
+// The vaend in the forwarding implementation is optional where it's a no-op
+
+// CHECK-LABEL: define{{.*}} i32 @no_vaend(
+// CHECK-LABEL: define{{.*}} i32 @call_no_vaend(
+// CHECK-NOT:   @no_vaend
+// CHECK:       @vno_vaend
+int vno_vaend(int x, va_list);
+int no_vaend(int x, ...)
+{
+  va_list ap;
+  va_start(ap, x);
+  return vno_vaend(x, ap);
+}
+int call_no_vaend(int x)
+{
+  return no_vaend(x, 10, 2.5f);
+}
diff --git a/clang/test/CodeGenCXX/inline-then-fold-variadics.cpp b/clang/test/CodeGenCXX/inline-then-fold-variadics.cpp
new file mode 100644
index 00000000000000..cf436ead77a2cb
--- /dev/null
+++ b/clang/test/CodeGenCXX/inline-then-fold-variadics.cpp
@@ -0,0 +1,117 @@
+// RUN: %clang_cc1 -triple i386-unknown-linux-gnu -Wno-varargs -O1 -disable-llvm-passes -emit-llvm -o - %s | opt --passes=instcombine | opt -passes="expand-variadics,default<O1>" -S | FileCheck %s --check-prefixes=CHECK,X86Linux
+
+// RUN: %clang_cc1 -triple x86_64-linux-gnu -Wno-varargs -O1 -disable-llvm-passes -emit-llvm -o - %s | opt --passes=instcombine | opt -passes="expand-variadics,default<O1>" -S | FileCheck %s --check-prefixes=CHECK,X64SystemV
+
+// RUN: %clang_cc1 -triple i386-apple-darwin -Wno-varargs -O1 -disable-llvm-passes -emit-llvm -o - %s | opt --passes=instcombine | opt -passes="expand-variadics,default<O1>" -S | FileCheck %s --check-prefixes=CHECK,X86Darwin
+
+// RUN: %clang_cc1 -triple x86_64-apple-darwin -Wno-varargs -O1 -disable-llvm-passes -emit-llvm -o - %s | opt --passes=instcombine | opt -passes="expand-variadics,default<O1>" -S | FileCheck %s --check-prefixes=CHECK,X64SystemV
+
+// RUN: %clang_cc1 -triple i686-windows-msvc -Wno-varargs -O1 -disable-llvm-passes -emit-llvm -o - %s | opt --passes=instcombine | opt -passes="expand-variadics,default<O1>" -S | FileCheck %s --check-prefixes=CHECK,X86Windows
+
+// 64 bit windows va_arg passes most types indirectly but the call instruction uses the types by value
+// ___: %clang_cc1 -triple x86_64-pc-windows-msvc -Wno-varargs -O1 -disable-llvm-passes -emit-llvm -o - %s | opt --passes=instcombine | opt -passes="expand-variadics,default<O1>" -S | FileCheck %s --check-prefixes=CHECK
+
+// Checks for consistency between clang and expand-variadics
+// 1. Use clang to lower va_arg
+// 2. Use expand-variadics to lower the rest of the variadic operations
+// 3. Use opt -O1 to simplify the result for simpler filecheck patterns
+// The simplification will fail when the two are not consistent, modulo bugs elsewhere.
+
+#include <stdarg.h>
+
+// This test can be simplified when expand-variadics is extended to apply to more patterns.
+// The first_valist and second_valist functions can then be inlined, either in the test or
+// by enabling optimisaton passes in the clang invocation.
+// The explicit instcombine pass canonicalises the variadic function IR.
+
+// More complicated tests need instcombine of ptrmask to land first.
+
+template <typename X, typename Y>
+static X first_valist(va_list va) {
+  return va_arg(va, X);
+}
+
+template <typename X, typename Y>
+static X first(...) {
+  va_list va;
+  __builtin_va_start(va, 0);
+  return first_valist<X,Y>(va);
+}
+
+template <typename X, typename Y>
+static Y second_valist(va_list va) {
+  va_arg(va, X);
+  Y r = va_arg(va, Y);
+  return r;
+}
+
+
+template <typename X, typename Y>
+static Y second(...) {
+  va_list va;
+  __builtin_va_start(va, 0);
+  return second_valist<X,Y>(va);
+}
+
+extern "C"
+{
+// CHECK-LABEL: define{{.*}} i32 @first_i32_i32(i32{{.*}} %x, i32{{.*}} %y)
+// CHECK:       entry:
+// CHECK:       ret i32 %x
+int first_i32_i32(int x, int y)
+{
+  return first<int,int>(x, y);
+}
+
+// CHECK-LABEL: define{{.*}} i32 @second_i32_i32(i32{{.*}} %x, i32{{.*}} %y)
+// CHECK:       entry:
+// CHECK:       ret i32 %y
+int second_i32_i32(int x, int y)
+{
+  return second<int,int>(x, y);
+}
+}
+
+// Permutations of an int and a double
+extern "C"
+{
+// CHECK-LABEL: define{{.*}} i32 @first_i32_f64(i32{{.*}} %x, double{{.*}} %y)
+// CHECK:       entry:
+// CHECK:       ret i32 %x
+int first_i32_f64(int x, double y)
+{
+  return first<int,double>(x, y);
+}
+  
+// CHECK-LABEL: define{{.*}} double @second_i32_f64(i32{{.*}} %x, double{{.*}} %y)
+// CHECK:       entry:
+
+// X86Linux:    ret double %y
+// X64SystemV:  ret double %y
+// X86Darwin:   ret double %y
+// X86Windows:  [[TMP0:%.*]] = alloca <{ [4 x i8], double }>, align 4
+// X86Windows:  [[TMP1:%.*]] = getelementptr inbounds i8, ptr [[TMP0]], i32 4
+// X86Windows:  store double %y, ptr [[TMP1]], align 4
+// X86Windows:  [[TMP2:%.*]] = load double, ptr [[TMP0]], align 4
+// X86Windows:  ret double [[TMP2]]
+double second_i32_f64(int x, double y)
+{
+  return second<int,double>(x, y);
+}
+
+// CHECK-LABEL: define{{.*}} double @first_f64_i32(double{{.*}} %x, i32{{.*}} %y)
+// CHECK:       entry:
+// CHECK:       ret double %x
+double first_f64_i32(double x, int y)
+{
+  return first<double,int>(x, y);
+}
+
+// CHECK-LABEL: define{{.*}} i32 @second_f64_i32(double{{.*}} %x, i32{{.*}} %y)
+// CHECK:       entry:
+// CHECK:       ret i32 %y
+int second_f64_i32(double x, int y)
+{
+  return second<double,int>(x, y);
+}   
+}
diff --git a/llvm/include/llvm/CodeGen/Passes.h b/llvm/include/llvm/CodeGen/Passes.h
index bbfb8a0dbe26a4..fe3208df7a23b2 100644
--- a/llvm/include/llvm/CodeGen/Passes.h
+++ b/llvm/include/llvm/CodeGen/Passes.h
@@ -600,6 +600,10 @@ namespace llvm {
 
   /// Lowers KCFI operand bundles for indirect calls.
   FunctionPass *createKCFIPass();
+
+  // Inline variadic functions and expand variadic intrinsics.
+  ModulePass *createExpandVariadicsPass();
+
 } // End llvm namespace
 
 #endif
diff --git a/llvm/include/llvm/InitializePasses.h b/llvm/include/llvm/InitializePasses.h
index 3db639a6872407..6487d0a5e26d1b 100644
--- a/llvm/include/llvm/InitializePasses.h
+++ b/llvm/include/llvm/InitializePasses.h
@@ -106,6 +106,7 @@ void initializeExpandLargeDivRemLegacyPassPass(PassRegistry&);
 void initializeExpandMemCmpLegacyPassPass(PassRegistry &);
 void initializeExpandPostRAPass(PassRegistry&);
 void initializeExpandReductionsPass(PassRegistry&);
+void initializeExpandVariadicsPass(PassRegistry &);
 void initializeExpandVectorPredicationPass(PassRegistry &);
 void initializeExternalAAWrapperPassPass(PassRegistry&);
 void initializeFEntryInserterPass(PassRegistry&);
diff --git a/llvm/include/llvm/Transforms/IPO/ExpandVariadics.h b/llvm/include/llvm/Transforms/IPO/ExpandVariadics.h
new file mode 100644
index 00000000000000..e7ffe343b940e9
--- /dev/null
+++ b/llvm/include/llvm/Transforms/IPO/ExpandVariadics.h
@@ -0,0 +1,17 @@
+#ifndef LLVM_TRANSFORMS_IPO_EXPANDVARIADICS_H
+#define LLVM_TRANSFORMS_IPO_EXPANDVARIADICS_H
+
+#include "llvm/IR/PassManager.h"
+
+namespace llvm {
+
+class Module;
+
+class ExpandVariadicsPass : public PassInfoMixin<ExpandVariadicsPass> {
+public:
+  PreservedAnalyses run(Module &M, ModuleAnalysisManager &AM);
+};
+
+} // end namespace llvm
+
+#endif // LLVM_TRANSFORMS_IPO_EXPANDVARIADICS_H
diff --git a/llvm/lib/Passes/PassBuilder.cpp b/llvm/lib/Passes/PassBuilder.cpp
index c934ec42f6eb15..624fffd233ce56 100644
--- a/llvm/lib/Passes/PassBuilder.cpp
+++ b/llvm/lib/Passes/PassBuilder.cpp
@@ -131,6 +131,7 @@
 #include "llvm/Transforms/IPO/DeadArgumentElimination.h"
 #include "llvm/Transforms/IPO/ElimAvailExtern.h"
 #include "llvm/Transforms/IPO/EmbedBitcodePass.h"
+#include "llvm/Transforms/IPO/ExpandVariadics.h"
 #include "llvm/Transforms/IPO/ForceFunctionAttrs.h"
 #include "llvm/Transforms/IPO/FunctionAttrs.h"
 #include "llvm/Transforms/IPO/FunctionImport.h"
diff --git a/llvm/lib/Passes/PassRegistry.def b/llvm/lib/Passes/PassRegistry.def
index 44511800ccff8d..4ea9493208315a 100644
--- a/llvm/lib/Passes/PassRegistry.def
+++ b/llvm/lib/Passes/PassRegistry.def
@@ -59,6 +59,7 @@ MODULE_PASS("dot-callgraph", CallGraphDOTPrinterPass())
 MODULE_PASS("dxil-upgrade", DXILUpgradePass())
 MODULE_PASS("elim-avail-extern", EliminateAvailableExternallyPass())
 MODULE_PASS("extract-blocks", BlockExtractorPass({}, false))
+MODULE_PASS("expand-variadics", ExpandVariadicsPass())
 MODULE_PASS("forceattrs", ForceFunctionAttrsPass())
 MODULE_PASS("function-import", FunctionImportPass())
 MODULE_PASS("globalopt", GlobalOptPass())
diff --git a/llvm/lib/Transforms/IPO/CMakeLists.txt b/llvm/lib/Transforms/IPO/CMakeLists.txt
index 034f1587ae8df4..b8bd0be91d2232 100644
--- a/llvm/lib/Transforms/IPO/CMakeLists.txt
+++ b/llvm/lib/Transforms/IPO/CMakeLists.txt
@@ -12,6 +12,7 @@ add_llvm_component_library(LLVMipo
   DeadArgumentElimination.cpp
   ElimAvailExtern.cpp
   EmbedBitcodePass.cpp
+  ExpandVariadics.cpp
   ExtractGV.cpp
   ForceFunctionAttrs.cpp
   FunctionAttrs.cpp
diff --git a/llvm/lib/Transforms/IPO/ExpandVariadics.cpp b/llvm/lib/Transforms/IPO/ExpandVariadics.cpp
new file mode 100644
index 00000000000000..4b4c635cef5092
--- /dev/null
+++ b/llvm/lib/Transforms/IPO/ExpandVariadics.cpp
@@ -0,0 +1,723 @@
+//===-- ExpandVariadicsPass.cpp --------------------------------*- C++ -*-=//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// S...
[truncated]

llvmbot · 2024-02-07T23:50:02Z

@llvm/pr-subscribers-llvm-transforms

Author: Jon Chesterfield (JonChesterfield)

Changes

This is a MVP style commit for eliminating variadic functions as an IR pass. It replaces the ... with a va_list value.

The implementation strategy is to divide the transform into orthogonal parts that can be tested independently. In particular this seeks to decouple target specific complexity from llvm modelling complexity. The original motivation was to provide variadic function support to amdgpu and nvptx. That's tested and working in the downstream branch from which this patch was extracted.

This commit introduces the target specific part of the transform, implemented for 32 and 64 bit x86. Test coverage of more interesting types from clang requires either pull/80002 or complicated filecheck patterns. It rewrites very simple variadic calls into one to an equivalent function taking a va_list.

Subsequent patches are needed to:

rewrite variadic functions that don't happen to already match this pattern
add more targets - aarch64, amdgpu, nvptx are planned at present
call the pass from codegen to expand all variadic functions for GPU targets
close the holes in test coverage (and implementation)
add this transform to one of the optimisation pipelines, maybe O1
maybe expand va_arg instructions (for the GPUs, or maybe from alternative front ends)

This patch is known to be incomplete. It is posted in this form to check the strategy and provide a mostly-correct baseline to extend. Initial discussion is at https://discourse.llvm.org/t/rfc-desugar-variadics-codegen-for-new-targets-optimisation-for-existing/72854

Patch is 176.90 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/81058.diff

16 Files Affected:

(added) clang/test/CodeGen/expand-variadic-call.c (+273)
(added) clang/test/CodeGen/variadic-wrapper-removal.c (+86)
(added) clang/test/CodeGenCXX/inline-then-fold-variadics.cpp (+117)
(modified) llvm/include/llvm/CodeGen/Passes.h (+4)
(modified) llvm/include/llvm/InitializePasses.h (+1)
(added) llvm/include/llvm/Transforms/IPO/ExpandVariadics.h (+17)
(modified) llvm/lib/Passes/PassBuilder.cpp (+1)
(modified) llvm/lib/Passes/PassRegistry.def (+1)
(modified) llvm/lib/Transforms/IPO/CMakeLists.txt (+1)
(added) llvm/lib/Transforms/IPO/ExpandVariadics.cpp (+723)
(added) llvm/test/CodeGen/X86/expand-variadic-call-i386-darwin.ll (+385)
(added) llvm/test/CodeGen/X86/expand-variadic-call-i386-linux.ll (+385)
(added) llvm/test/CodeGen/X86/expand-variadic-call-i686-msvc.ll (+402)
(added) llvm/test/CodeGen/X86/expand-variadic-call-x64-darwin.ll (+589)
(added) llvm/test/CodeGen/X86/expand-variadic-call-x64-linux.ll (+589)
(modified) llvm/utils/gn/secondary/llvm/lib/Transforms/IPO/BUILD.gn (+1)

diff --git a/clang/test/CodeGen/expand-variadic-call.c b/clang/test/CodeGen/expand-variadic-call.c
new file mode 100644
index 00000000000000..fa2b984bec08a5
--- /dev/null
+++ b/clang/test/CodeGen/expand-variadic-call.c
@@ -0,0 +1,273 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
+
+// REQUIRES: x86-registered-target
+// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -target-cpu x86-64-v4 -std=c23 -O1 -ffreestanding -emit-llvm -o - %s | FileCheck %s
+
+// This test sanity checks calling a variadic function with the expansion transform disabled.
+// The IR test cases {arch}/expand-variadic-call-*.ll correspond to IR generated from this test case.
+
+typedef __builtin_va_list va_list;
+#define va_copy(dest, src) __builtin_va_copy(dest, src)
+#define va_start(ap, ...) __builtin_va_start(ap, 0)
+#define va_end(ap) __builtin_va_end(ap)
+#define va_arg(ap, type) __builtin_va_arg(ap, type)
+
+// 32 bit x86 alignment uses getTypeStackAlign for special cases
+// Whitebox testing.
+// Needs a type >= 16 which is either a simd or a struct containing a simd
+// darwinvectorabi should force 4 bytes
+// linux vectors with align 16/32/64 return that alignment
+
+
+void wrapped(va_list);
+
+// CHECK-LABEL: @codegen_for_copy(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[CP:%.*]] = alloca [1 x %struct.__va_list_tag], align 16
+// CHECK-NEXT:    call void @llvm.lifetime.start.p0(i64 24, ptr nonnull [[CP]]) #[[ATTR7:[0-9]+]]
+// CHECK-NEXT:    call void @llvm.va_copy(ptr nonnull [[CP]], ptr [[X:%.*]])
+// CHECK-NEXT:    call void @wrapped(ptr noundef nonnull [[CP]]) #[[ATTR8:[0-9]+]]
+// CHECK-NEXT:    call void @llvm.va_end(ptr [[CP]])
+// CHECK-NEXT:    call void @llvm.lifetime.end.p0(i64 24, ptr nonnull [[CP]]) #[[ATTR7]]
+// CHECK-NEXT:    ret void
+//
+void codegen_for_copy(va_list x)
+{
+  va_list cp;
+  va_copy(cp, x);
+  wrapped(cp);
+  va_end(cp);
+}
+
+
+// CHECK-LABEL: @vararg(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[VA:%.*]] = alloca [1 x %struct.__va_list_tag], align 16
+// CHECK-NEXT:    call void @llvm.lifetime.start.p0(i64 24, ptr nonnull [[VA]]) #[[ATTR7]]
+// CHECK-NEXT:    call void @llvm.va_start(ptr nonnull [[VA]])
+// CHECK-NEXT:    call void @wrapped(ptr noundef nonnull [[VA]]) #[[ATTR8]]
+// CHECK-NEXT:    call void @llvm.va_end(ptr [[VA]])
+// CHECK-NEXT:    call void @llvm.lifetime.end.p0(i64 24, ptr nonnull [[VA]]) #[[ATTR7]]
+// CHECK-NEXT:    ret void
+//
+ void vararg(...)
+{
+  va_list va;
+  __builtin_va_start(va, 0);
+  wrapped(va);
+  va_end(va);
+}
+
+// vectors with alignment 16/32/64 are natively aligned on linux x86
+// v32f32 would be a m1024 type, larger than x64 defines at time of writing
+typedef int i32;
+typedef float v4f32 __attribute__((__vector_size__(16), __aligned__(16)));
+typedef float v8f32 __attribute__((__vector_size__(32), __aligned__(32)));
+typedef float v16f32 __attribute__((__vector_size__(64), __aligned__(64)));
+typedef float v32f32 __attribute__((__vector_size__(128), __aligned__(128)));
+
+
+// Pass a single value to wrapped() via vararg(...)
+// CHECK-LABEL: @single_i32(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    tail call void (...) @vararg(i32 noundef [[X:%.*]]) #[[ATTR9:[0-9]+]]
+// CHECK-NEXT:    ret void
+//
+void single_i32(i32 x)
+{
+  vararg(x);
+}
+
+// CHECK-LABEL: @single_double(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    tail call void (...) @vararg(double noundef [[X:%.*]]) #[[ATTR9]]
+// CHECK-NEXT:    ret void
+//
+void single_double(double x)
+{
+  vararg(x);
+}
+
+// CHECK-LABEL: @single_v4f32(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    tail call void (...) @vararg(<4 x float> noundef [[X:%.*]]) #[[ATTR9]]
+// CHECK-NEXT:    ret void
+//
+void single_v4f32(v4f32 x)
+{
+  vararg(x);
+}
+
+// CHECK-LABEL: @single_v8f32(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    tail call void (...) @vararg(<8 x float> noundef [[X:%.*]]) #[[ATTR9]]
+// CHECK-NEXT:    ret void
+//
+void single_v8f32(v8f32 x)
+{
+  vararg(x);
+}
+
+// CHECK-LABEL: @single_v16f32(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    tail call void (...) @vararg(<16 x float> noundef [[X:%.*]]) #[[ATTR9]]
+// CHECK-NEXT:    ret void
+//
+void single_v16f32(v16f32 x)
+{
+  vararg(x);
+}
+
+// CHECK-LABEL: @single_v32f32(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[INDIRECT_ARG_TEMP:%.*]] = alloca <32 x float>, align 128
+// CHECK-NEXT:    [[X:%.*]] = load <32 x float>, ptr [[TMP0:%.*]], align 128, !tbaa [[TBAA2:![0-9]+]]
+// CHECK-NEXT:    store <32 x float> [[X]], ptr [[INDIRECT_ARG_TEMP]], align 128, !tbaa [[TBAA2]]
+// CHECK-NEXT:    tail call void (...) @vararg(ptr noundef nonnull byval(<32 x float>) align 128 [[INDIRECT_ARG_TEMP]]) #[[ATTR9]]
+// CHECK-NEXT:    ret void
+//
+void single_v32f32(v32f32 x)
+{
+  vararg(x);
+}
+
+
+
+// CHECK-LABEL: @i32_double(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    tail call void (...) @vararg(i32 noundef [[X:%.*]], double noundef [[Y:%.*]]) #[[ATTR9]]
+// CHECK-NEXT:    ret void
+//
+void i32_double(i32 x, double y)
+{
+  vararg(x, y);
+}
+
+// CHECK-LABEL: @double_i32(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    tail call void (...) @vararg(double noundef [[X:%.*]], i32 noundef [[Y:%.*]]) #[[ATTR9]]
+// CHECK-NEXT:    ret void
+//
+void double_i32(double x, i32 y)
+{
+  vararg(x, y);
+}
+
+
+// A struct used by libc variadic tests
+
+typedef struct {
+  char c;
+  short s;
+  int i;
+  long l;
+  float f;
+  double d;
+}  libcS;
+
+// CHECK-LABEL: @i32_libcS(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    tail call void (...) @vararg(i32 noundef [[X:%.*]], ptr noundef nonnull byval([[STRUCT_LIBCS:%.*]]) align 8 [[Y:%.*]]) #[[ATTR9]]
+// CHECK-NEXT:    ret void
+//
+void i32_libcS(i32 x, libcS y)
+{
+  vararg(x, y);
+}
+
+// CHECK-LABEL: @libcS_i32(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    tail call void (...) @vararg(ptr noundef nonnull byval([[STRUCT_LIBCS:%.*]]) align 8 [[X:%.*]], i32 noundef [[Y:%.*]]) #[[ATTR9]]
+// CHECK-NEXT:    ret void
+//
+void libcS_i32(libcS x, i32 y)
+{
+  vararg(x, y);
+}
+
+
+// CHECK-LABEL: @i32_v4f32(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    tail call void (...) @vararg(i32 noundef [[X:%.*]], <4 x float> noundef [[Y:%.*]]) #[[ATTR9]]
+// CHECK-NEXT:    ret void
+//
+void i32_v4f32(i32 x, v4f32 y)
+{
+  vararg(x, y);
+}
+
+// CHECK-LABEL: @v4f32_i32(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    tail call void (...) @vararg(<4 x float> noundef [[X:%.*]], i32 noundef [[Y:%.*]]) #[[ATTR9]]
+// CHECK-NEXT:    ret void
+//
+void v4f32_i32(v4f32 x, i32 y)
+{
+  vararg(x, y);
+}
+
+// CHECK-LABEL: @i32_v8f32(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    tail call void (...) @vararg(i32 noundef [[X:%.*]], <8 x float> noundef [[Y:%.*]]) #[[ATTR9]]
+// CHECK-NEXT:    ret void
+//
+void i32_v8f32(i32 x, v8f32 y)
+{
+  vararg(x, y);
+}
+
+// CHECK-LABEL: @v8f32_i32(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    tail call void (...) @vararg(<8 x float> noundef [[X:%.*]], i32 noundef [[Y:%.*]]) #[[ATTR9]]
+// CHECK-NEXT:    ret void
+//
+void v8f32_i32(v8f32 x, i32 y)
+{
+  vararg(x, y);
+}
+
+// CHECK-LABEL: @i32_v16f32(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    tail call void (...) @vararg(i32 noundef [[X:%.*]], <16 x float> noundef [[Y:%.*]]) #[[ATTR9]]
+// CHECK-NEXT:    ret void
+//
+void i32_v16f32(i32 x, v16f32 y)
+{
+  vararg(x, y);
+}
+
+// CHECK-LABEL: @v16f32_i32(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    tail call void (...) @vararg(<16 x float> noundef [[X:%.*]], i32 noundef [[Y:%.*]]) #[[ATTR9]]
+// CHECK-NEXT:    ret void
+//
+void v16f32_i32(v16f32 x, i32 y)
+{
+  vararg(x, y);
+}
+
+// CHECK-LABEL: @i32_v32f32(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[INDIRECT_ARG_TEMP:%.*]] = alloca <32 x float>, align 128
+// CHECK-NEXT:    [[Y:%.*]] = load <32 x float>, ptr [[TMP0:%.*]], align 128, !tbaa [[TBAA2]]
+// CHECK-NEXT:    store <32 x float> [[Y]], ptr [[INDIRECT_ARG_TEMP]], align 128, !tbaa [[TBAA2]]
+// CHECK-NEXT:    tail call void (...) @vararg(i32 noundef [[X:%.*]], ptr noundef nonnull byval(<32 x float>) align 128 [[INDIRECT_ARG_TEMP]]) #[[ATTR9]]
+// CHECK-NEXT:    ret void
+//
+void i32_v32f32(i32 x, v32f32 y)
+{
+  vararg(x, y);
+}
+
+// CHECK-LABEL: @v32f32_i32(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[INDIRECT_ARG_TEMP:%.*]] = alloca <32 x float>, align 128
+// CHECK-NEXT:    [[X:%.*]] = load <32 x float>, ptr [[TMP0:%.*]], align 128, !tbaa [[TBAA2]]
+// CHECK-NEXT:    store <32 x float> [[X]], ptr [[INDIRECT_ARG_TEMP]], align 128, !tbaa [[TBAA2]]
+// CHECK-NEXT:    tail call void (...) @vararg(ptr noundef nonnull byval(<32 x float>) align 128 [[INDIRECT_ARG_TEMP]], i32 noundef [[Y:%.*]]) #[[ATTR9]]
+// CHECK-NEXT:    ret void
+//
+void v32f32_i32(v32f32 x, i32 y)
+{
+  vararg(x, y);
+}
diff --git a/clang/test/CodeGen/variadic-wrapper-removal.c b/clang/test/CodeGen/variadic-wrapper-removal.c
new file mode 100644
index 00000000000000..da41dde16f3d73
--- /dev/null
+++ b/clang/test/CodeGen/variadic-wrapper-removal.c
@@ -0,0 +1,86 @@
+// REQUIRES: x86-registered-target
+// RUN: %clang_cc1 -triple i386-unknown-linux-gnu -O1 -emit-llvm -o - %s | opt --passes=expand-variadics -S | FileCheck %s
+// RUN: %clang_cc1 -triple=x86_64-linux-gnu -O1 -emit-llvm -o - %s | opt --passes=expand-variadics -S | FileCheck %s
+
+// neither arm arch is implemented yet, leaving it here as a reminder
+// armv6 is a ptr as far as the struct is concerned, but possibly also a [1 x i32] passed by value
+// that seems insistent, maybe leave 32 bit arm alone for now
+// aarch64 is a struct of five things passed using byval memcpy
+
+// R-N: %clang_cc1 -triple=armv6-none--eabi -O1 -emit-llvm -o - %s | opt --passes=expand-variadics -S | FileCheck %s
+
+// R-N: %clang_cc1 -triple=aarch64-none-linux-gnu -O1 -emit-llvm -o - %s | opt --passes=expand-variadics -S | FileCheck %s
+
+
+
+// expand-variadics rewrites calls to variadic functions into calls to
+// equivalent functions that take a va_list argument. A property of the
+// implementation is that said "equivalent function" may be a pre-existing one.
+// This is equivalent to inlining a sufficiently simple variadic wrapper.
+
+#include <stdarg.h>
+
+typedef int FILE; // close enough for this test
+
+// fprintf is sometimes implemented as a call to vfprintf. That fits the
+// pattern the transform pass recognises - given an implementation of fprintf
+// in the IR module, calls to it can be rewritten into calls into vfprintf.
+
+// CHECK-LABEL: define{{.*}} i32 @fprintf(
+// CHECK-LABEL: define{{.*}} i32 @call_fprintf(
+// CHECK-NOT:   @fprintf
+// CHECK:       @vfprintf
+int vfprintf(FILE *restrict f, const char *restrict fmt, va_list ap);
+int fprintf(FILE *restrict f, const char *restrict fmt, ...)
+{
+  int ret;
+  va_list ap;
+  va_start(ap, fmt);
+  ret = vfprintf(f, fmt, ap);
+  va_end(ap);
+  return ret;
+}
+int call_fprintf(FILE *f)
+{
+  int x = 42;
+  double y = 3.14;
+  return fprintf(f, "int %d dbl %g\n", x, y);
+}
+
+// Void return type is also OK
+
+// CHECK-LABEL: define{{.*}} void @no_result(
+// CHECK-LABEL: define{{.*}} void @call_no_result(
+// CHECK-NOT:   @no_result
+// CHECK:       @vno_result
+void vno_result(const char * fmt, va_list);
+void no_result(const char * fmt, ...)
+{
+  va_list ap;
+  va_start(ap, fmt);
+  vno_result(fmt, ap);
+  va_end(ap);
+}
+void call_no_result(FILE *f)
+{
+  int x = 101;
+  no_result("", x);
+}
+
+// The vaend in the forwarding implementation is optional where it's a no-op
+
+// CHECK-LABEL: define{{.*}} i32 @no_vaend(
+// CHECK-LABEL: define{{.*}} i32 @call_no_vaend(
+// CHECK-NOT:   @no_vaend
+// CHECK:       @vno_vaend
+int vno_vaend(int x, va_list);
+int no_vaend(int x, ...)
+{
+  va_list ap;
+  va_start(ap, x);
+  return vno_vaend(x, ap);
+}
+int call_no_vaend(int x)
+{
+  return no_vaend(x, 10, 2.5f);
+}
diff --git a/clang/test/CodeGenCXX/inline-then-fold-variadics.cpp b/clang/test/CodeGenCXX/inline-then-fold-variadics.cpp
new file mode 100644
index 00000000000000..cf436ead77a2cb
--- /dev/null
+++ b/clang/test/CodeGenCXX/inline-then-fold-variadics.cpp
@@ -0,0 +1,117 @@
+// RUN: %clang_cc1 -triple i386-unknown-linux-gnu -Wno-varargs -O1 -disable-llvm-passes -emit-llvm -o - %s | opt --passes=instcombine | opt -passes="expand-variadics,default<O1>" -S | FileCheck %s --check-prefixes=CHECK,X86Linux
+
+// RUN: %clang_cc1 -triple x86_64-linux-gnu -Wno-varargs -O1 -disable-llvm-passes -emit-llvm -o - %s | opt --passes=instcombine | opt -passes="expand-variadics,default<O1>" -S | FileCheck %s --check-prefixes=CHECK,X64SystemV
+
+// RUN: %clang_cc1 -triple i386-apple-darwin -Wno-varargs -O1 -disable-llvm-passes -emit-llvm -o - %s | opt --passes=instcombine | opt -passes="expand-variadics,default<O1>" -S | FileCheck %s --check-prefixes=CHECK,X86Darwin
+
+// RUN: %clang_cc1 -triple x86_64-apple-darwin -Wno-varargs -O1 -disable-llvm-passes -emit-llvm -o - %s | opt --passes=instcombine | opt -passes="expand-variadics,default<O1>" -S | FileCheck %s --check-prefixes=CHECK,X64SystemV
+
+// RUN: %clang_cc1 -triple i686-windows-msvc -Wno-varargs -O1 -disable-llvm-passes -emit-llvm -o - %s | opt --passes=instcombine | opt -passes="expand-variadics,default<O1>" -S | FileCheck %s --check-prefixes=CHECK,X86Windows
+
+// 64 bit windows va_arg passes most types indirectly but the call instruction uses the types by value
+// ___: %clang_cc1 -triple x86_64-pc-windows-msvc -Wno-varargs -O1 -disable-llvm-passes -emit-llvm -o - %s | opt --passes=instcombine | opt -passes="expand-variadics,default<O1>" -S | FileCheck %s --check-prefixes=CHECK
+
+// Checks for consistency between clang and expand-variadics
+// 1. Use clang to lower va_arg
+// 2. Use expand-variadics to lower the rest of the variadic operations
+// 3. Use opt -O1 to simplify the result for simpler filecheck patterns
+// The simplification will fail when the two are not consistent, modulo bugs elsewhere.
+
+#include <stdarg.h>
+
+// This test can be simplified when expand-variadics is extended to apply to more patterns.
+// The first_valist and second_valist functions can then be inlined, either in the test or
+// by enabling optimisaton passes in the clang invocation.
+// The explicit instcombine pass canonicalises the variadic function IR.
+
+// More complicated tests need instcombine of ptrmask to land first.
+
+template <typename X, typename Y>
+static X first_valist(va_list va) {
+  return va_arg(va, X);
+}
+
+template <typename X, typename Y>
+static X first(...) {
+  va_list va;
+  __builtin_va_start(va, 0);
+  return first_valist<X,Y>(va);
+}
+
+template <typename X, typename Y>
+static Y second_valist(va_list va) {
+  va_arg(va, X);
+  Y r = va_arg(va, Y);
+  return r;
+}
+
+
+template <typename X, typename Y>
+static Y second(...) {
+  va_list va;
+  __builtin_va_start(va, 0);
+  return second_valist<X,Y>(va);
+}
+
+extern "C"
+{
+// CHECK-LABEL: define{{.*}} i32 @first_i32_i32(i32{{.*}} %x, i32{{.*}} %y)
+// CHECK:       entry:
+// CHECK:       ret i32 %x
+int first_i32_i32(int x, int y)
+{
+  return first<int,int>(x, y);
+}
+
+// CHECK-LABEL: define{{.*}} i32 @second_i32_i32(i32{{.*}} %x, i32{{.*}} %y)
+// CHECK:       entry:
+// CHECK:       ret i32 %y
+int second_i32_i32(int x, int y)
+{
+  return second<int,int>(x, y);
+}
+}
+
+// Permutations of an int and a double
+extern "C"
+{
+// CHECK-LABEL: define{{.*}} i32 @first_i32_f64(i32{{.*}} %x, double{{.*}} %y)
+// CHECK:       entry:
+// CHECK:       ret i32 %x
+int first_i32_f64(int x, double y)
+{
+  return first<int,double>(x, y);
+}
+  
+// CHECK-LABEL: define{{.*}} double @second_i32_f64(i32{{.*}} %x, double{{.*}} %y)
+// CHECK:       entry:
+
+// X86Linux:    ret double %y
+// X64SystemV:  ret double %y
+// X86Darwin:   ret double %y
+// X86Windows:  [[TMP0:%.*]] = alloca <{ [4 x i8], double }>, align 4
+// X86Windows:  [[TMP1:%.*]] = getelementptr inbounds i8, ptr [[TMP0]], i32 4
+// X86Windows:  store double %y, ptr [[TMP1]], align 4
+// X86Windows:  [[TMP2:%.*]] = load double, ptr [[TMP0]], align 4
+// X86Windows:  ret double [[TMP2]]
+double second_i32_f64(int x, double y)
+{
+  return second<int,double>(x, y);
+}
+
+// CHECK-LABEL: define{{.*}} double @first_f64_i32(double{{.*}} %x, i32{{.*}} %y)
+// CHECK:       entry:
+// CHECK:       ret double %x
+double first_f64_i32(double x, int y)
+{
+  return first<double,int>(x, y);
+}
+
+// CHECK-LABEL: define{{.*}} i32 @second_f64_i32(double{{.*}} %x, i32{{.*}} %y)
+// CHECK:       entry:
+// CHECK:       ret i32 %y
+int second_f64_i32(double x, int y)
+{
+  return second<double,int>(x, y);
+}   
+}
diff --git a/llvm/include/llvm/CodeGen/Passes.h b/llvm/include/llvm/CodeGen/Passes.h
index bbfb8a0dbe26a4..fe3208df7a23b2 100644
--- a/llvm/include/llvm/CodeGen/Passes.h
+++ b/llvm/include/llvm/CodeGen/Passes.h
@@ -600,6 +600,10 @@ namespace llvm {
 
   /// Lowers KCFI operand bundles for indirect calls.
   FunctionPass *createKCFIPass();
+
+  // Inline variadic functions and expand variadic intrinsics.
+  ModulePass *createExpandVariadicsPass();
+
 } // End llvm namespace
 
 #endif
diff --git a/llvm/include/llvm/InitializePasses.h b/llvm/include/llvm/InitializePasses.h
index 3db639a6872407..6487d0a5e26d1b 100644
--- a/llvm/include/llvm/InitializePasses.h
+++ b/llvm/include/llvm/InitializePasses.h
@@ -106,6 +106,7 @@ void initializeExpandLargeDivRemLegacyPassPass(PassRegistry&);
 void initializeExpandMemCmpLegacyPassPass(PassRegistry &);
 void initializeExpandPostRAPass(PassRegistry&);
 void initializeExpandReductionsPass(PassRegistry&);
+void initializeExpandVariadicsPass(PassRegistry &);
 void initializeExpandVectorPredicationPass(PassRegistry &);
 void initializeExternalAAWrapperPassPass(PassRegistry&);
 void initializeFEntryInserterPass(PassRegistry&);
diff --git a/llvm/include/llvm/Transforms/IPO/ExpandVariadics.h b/llvm/include/llvm/Transforms/IPO/ExpandVariadics.h
new file mode 100644
index 00000000000000..e7ffe343b940e9
--- /dev/null
+++ b/llvm/include/llvm/Transforms/IPO/ExpandVariadics.h
@@ -0,0 +1,17 @@
+#ifndef LLVM_TRANSFORMS_IPO_EXPANDVARIADICS_H
+#define LLVM_TRANSFORMS_IPO_EXPANDVARIADICS_H
+
+#include "llvm/IR/PassManager.h"
+
+namespace llvm {
+
+class Module;
+
+class ExpandVariadicsPass : public PassInfoMixin<ExpandVariadicsPass> {
+public:
+  PreservedAnalyses run(Module &M, ModuleAnalysisManager &AM);
+};
+
+} // end namespace llvm
+
+#endif // LLVM_TRANSFORMS_IPO_EXPANDVARIADICS_H
diff --git a/llvm/lib/Passes/PassBuilder.cpp b/llvm/lib/Passes/PassBuilder.cpp
index c934ec42f6eb15..624fffd233ce56 100644
--- a/llvm/lib/Passes/PassBuilder.cpp
+++ b/llvm/lib/Passes/PassBuilder.cpp
@@ -131,6 +131,7 @@
 #include "llvm/Transforms/IPO/DeadArgumentElimination.h"
 #include "llvm/Transforms/IPO/ElimAvailExtern.h"
 #include "llvm/Transforms/IPO/EmbedBitcodePass.h"
+#include "llvm/Transforms/IPO/ExpandVariadics.h"
 #include "llvm/Transforms/IPO/ForceFunctionAttrs.h"
 #include "llvm/Transforms/IPO/FunctionAttrs.h"
 #include "llvm/Transforms/IPO/FunctionImport.h"
diff --git a/llvm/lib/Passes/PassRegistry.def b/llvm/lib/Passes/PassRegistry.def
index 44511800ccff8d..4ea9493208315a 100644
--- a/llvm/lib/Passes/PassRegistry.def
+++ b/llvm/lib/Passes/PassRegistry.def
@@ -59,6 +59,7 @@ MODULE_PASS("dot-callgraph", CallGraphDOTPrinterPass())
 MODULE_PASS("dxil-upgrade", DXILUpgradePass())
 MODULE_PASS("elim-avail-extern", EliminateAvailableExternallyPass())
 MODULE_PASS("extract-blocks", BlockExtractorPass({}, false))
+MODULE_PASS("expand-variadics", ExpandVariadicsPass())
 MODULE_PASS("forceattrs", ForceFunctionAttrsPass())
 MODULE_PASS("function-import", FunctionImportPass())
 MODULE_PASS("globalopt", GlobalOptPass())
diff --git a/llvm/lib/Transforms/IPO/CMakeLists.txt b/llvm/lib/Transforms/IPO/CMakeLists.txt
index 034f1587ae8df4..b8bd0be91d2232 100644
--- a/llvm/lib/Transforms/IPO/CMakeLists.txt
+++ b/llvm/lib/Transforms/IPO/CMakeLists.txt
@@ -12,6 +12,7 @@ add_llvm_component_library(LLVMipo
   DeadArgumentElimination.cpp
   ElimAvailExtern.cpp
   EmbedBitcodePass.cpp
+  ExpandVariadics.cpp
   ExtractGV.cpp
   ForceFunctionAttrs.cpp
   FunctionAttrs.cpp
diff --git a/llvm/lib/Transforms/IPO/ExpandVariadics.cpp b/llvm/lib/Transforms/IPO/ExpandVariadics.cpp
new file mode 100644
index 00000000000000..4b4c635cef5092
--- /dev/null
+++ b/llvm/lib/Transforms/IPO/ExpandVariadics.cpp
@@ -0,0 +1,723 @@
+//===-- ExpandVariadicsPass.cpp --------------------------------*- C++ -*-=//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// S...
[truncated]

jhuber6

Thanks for working on this. A few quick nits, overall looks like lots of straightforward testing with a beefy new pass.

jhuber6 · 2024-02-08T00:41:08Z

llvm/lib/Transforms/IPO/ExpandVariadics.cpp

+  virtual void initializeVAList(LLVMContext &Ctx, IRBuilder<> &Builder,
+                                AllocaInst *va_list, Value * /*buffer*/) {


If we assert a value is non-null, it's best to make it a reference in the interface.

Expressing the class invariants through the vtable seems harder than it should be. This function is only called if valistOnStack returns true, changing the base to a builtin_unreachable.

jhuber6 · 2024-02-08T00:42:40Z

llvm/lib/Transforms/IPO/ExpandVariadics.cpp

+};
+
+std::unique_ptr<Interface> create(Module &M) {
+  Triple Trip = Triple(M.getTargetTriple());


Nit. kind of a non-standard name. I prefer

Suggested change

Triple Trip = Triple(M.getTargetTriple());

llvm::Triple Triple(M.getTargetTriple());

jhuber6 · 2024-02-08T00:43:28Z

llvm/lib/Transforms/IPO/ExpandVariadics.cpp

+        uint32_t minimum_slot_align() override { return 4; }
+        uint32_t maximum_slot_align() override { return 0; }
+      };
+      return std::make_unique<X86Windows>();


Do these need to be special types? The usage seems like it could just be a config struct to the base class with some pair.

They were a struct with two named fields previously, could go with something like

struct slotAlignTy { uint32_t min; uint32_t max; }; slotAlignTy slot_align() override {return {0, 4}};

Or put state in the base class, that probably optimises better than virtual functions that return constants.

I like the state in the base class approach. Const fields at fixed offset, initialisation specified where it was before. Would be clearer with named parameters but the min, max sequence convention is probably good enough.

jhuber6 · 2024-02-08T00:43:42Z

llvm/lib/Transforms/IPO/ExpandVariadics.cpp

+  static char ID;
+  std::unique_ptr<VariadicABIInfo::Interface> ABI;
+
+  // todo: drop the boolean


Suggested change

// todo: drop the boolean

// TODO: drop the boolean

jhuber6 · 2024-02-08T00:45:53Z

llvm/lib/Transforms/IPO/ExpandVariadics.cpp

+  }
+};
+
+Function *ExpandVariadics::isFunctionInlinable(Module &M, Function *F) {


I feel a lot of these arguments should be references since we're not checking for null.

There's a fair amount of pointer compares. Target of call instruction to function and similar. Could go with references everywhere, but the &reference == pointer construct reads poorly to me.

jhuber6

Some more comments

jhuber6 · 2024-02-08T01:50:00Z

llvm/lib/Transforms/IPO/ExpandVariadics.cpp

+                                AllocaInst * /*va_list*/, Value * /*buffer*/) {
+    // Function needs to be implemented if valist is on the stack
+    assert(!valistOnStack());
+    __builtin_unreachable();


This is confusing. If this function is never to be called why not make it pure virtual? __builtin_unreachable() is an optimization hint while llvm_unreachable is a runtime assertion that the code should not be executed.

Yeah, llvm_unreachable is usually used inside LLVM, so definitely better to use llvm_unreachable here.

I've gone with more comments, but maybe to make this clear enough I need to separate the field alignment from whether va_list is a void* or something that needs more work. The function is called when it is overridden and it's not great to provide overrides that are themselves not called.

jhuber6 · 2024-02-08T01:52:17Z

llvm/lib/Transforms/IPO/ExpandVariadics.cpp

+  if (!isa<ReturnInst>(BB.getTerminator())) {
+    return nullptr;
+  }


Style nit. braces around single line block

jhuber6 · 2024-02-08T01:52:33Z

llvm/lib/Transforms/IPO/ExpandVariadics.cpp

+    return nullptr;
+  }
+
+  // Walk the block in order checking for specific instructions, some of them


Nit: style. LLVM uses PascalCase for variables in all locations below.

Advice sought on how to PascalCase variables with va_list in the name

If it's a well-understood thing then we usually ignore it, like argc. But otherwise probably VAList.

jhuber6 · 2024-02-08T01:58:56Z

llvm/lib/Transforms/IPO/ExpandVariadics.cpp

+  // with it, such that target specific va_arg instructions will correctly
+  // iterate over it. Primarily this means getting the alignment right.
+
+  class {


Not sure how I feel about a big anonymous class in the middle of this function.

I'm expecting some resistance to large functions. The cyclometric complexity is very low - they read linearly from top to bottom - but nevertheless people might prefer blocks of code with a single use to be moved further from it.

I personally don't like it, I'd rather see the class moved outside the function. I understand that it's only used by one function but IMO a class def in the middle of a function breaks the "flow" and makes it harder to read it

Yeah, I don't think I've encountered any anonymous classes in LLVM. I've only ever seen anonymous structs that are POD without any members. You call this Frame, so just move it out and call it FrameTy or something.

jhuber6 · 2024-02-08T01:58:58Z

llvm/lib/Transforms/IPO/ExpandVariadics.cpp

+    enum { N = 4 };
+    SmallVector<Type *, N> fieldTypes;
+    SmallVector<std::pair<Value *, bool>, N> maybeValueIsByval;


Usually it's easiest to leave off the template argument and let the type default to something sane unless there's an explicit. I think by default you get 64 bytes, so that'll likely be 8 or so.

The vectors are always the same length as each other, does that affect your recommendation to use a default?

Well the size is only relevant for the small size optimizations until we do malloc. I suppose if you know ahead of time that it's less likely to need malloc with 4 on both it could be argued, but if they're different sizes and it's not a performance issue it's usually cleaner.

jhuber6 · 2024-02-08T01:59:32Z

llvm/lib/Transforms/IPO/ExpandVariadics.cpp

+  if (Frame.empty()) {
+    // Not passing anything, hopefully va_arg won't try to dereference it
+    // Might be a target specific thing whether one can pass nullptr instead
+    // of undef i32
+    Frame.append(Type::getInt32Ty(Ctx), nullptr, false);
+  }


Suggested change

if (Frame.empty()) {

// Not passing anything, hopefully va_arg won't try to dereference it

// Might be a target specific thing whether one can pass nullptr instead

// of undef i32

Frame.append(Type::getInt32Ty(Ctx), nullptr, false);

}

// Not passing anything, hopefully va_arg won't try to dereference it

// Might be a target specific thing whether one can pass nullptr instead

// of undef i32

if (Frame.empty())

Frame.append(Type::getInt32Ty(Ctx), nullptr, false);

Not keen - the comment is true within the scope of the if and false outside of the scope. If deleting the braces is important I can rephrase the comment (probably by writing 'if' in the comment as well as in the code)

shiltian · 2024-02-08T02:03:20Z

llvm/include/llvm/Transforms/IPO/ExpandVariadics.h

@@ -0,0 +1,17 @@
+#ifndef LLVM_TRANSFORMS_IPO_EXPANDVARIADICS_H


LLVM copyright header as well as (brief) documentation of the pass

shiltian · 2024-02-08T02:04:41Z

llvm/lib/Transforms/IPO/ExpandVariadics.cpp

+                                AllocaInst * /*va_list*/, Value * /*buffer*/) {
+    // Function needs to be implemented if valist is on the stack
+    assert(!valistOnStack());
+    __builtin_unreachable();


Yeah, llvm_unreachable is usually used inside LLVM, so definitely better to use llvm_unreachable here.

shiltian · 2024-02-08T02:07:33Z

llvm/lib/Transforms/IPO/ExpandVariadics.cpp

+  bool VACopyIsMemcpy() { return true; }
+};
+
+struct X64SystemV final : public Interface {


I wonder if this part needs to be in the backend.

I really hope not. It doesn't duplicate information that the backend already knows and IR passes reaching into the backend is a mess. Likewise reaching into the front end to try to deduplicate against clang, which has the complement to this information, will have a severe adverse effect on legibility.

But it doesn't look like a good idea to handle multiple targets in one single file. Here it just includes x86 family. There are those many targets supporting variadic functions out there. FWIW, I don't think the information needed for this pass becomes "messy" in backend.

It scales sublinearly in number of targets and x86 is about the worst case. Scattering the information across all the backends does nothing to decrease the amount of information needed.

I don't think we should try to put an abstraction over this until the switch has become unwieldy. Partly because that day may not come, partly because the details are likely to change (further) under review, and partly because the last time I tried to export information from the backend the infra didn't work with the new pass manager.

I think I fixed your target information issue, you just need to add the TargetMachine to the pass constructor: f7dcabe

Thanks for patching the lds pass, commit is a useful reference on the pass manager boilerplate.

Do we have a targetmachine instance available in the middle of opt? I thought that was only created once we hit codegen, in which case depending on one would prevent this from being run as an optimisation in the opt O1 or O2 pipeline.

The TargetMachine in PassBuilder seems optional, but I'm wondering if we'd even want to run this pass if there is no TM. I don't especially like seeing target-specific logic in an optimization, but I'm not contributing enough to optimizations either to assert whether it's a bad thing or not

arsenm · 2024-02-08T06:39:18Z

llvm/lib/Transforms/IPO/ExpandVariadics.cpp

+
+  // All targets currently implemented use a ptr for the valist parameter
+  Type *vaListParameterType(LLVMContext &Ctx) {
+    return PointerType::getUnqual(Ctx);


I'd expect this to use the alloca addrspace?

Unclear - va_list as a type is declared in clang as a void* (or maybe a char * for amdgpu) at present. None of them have addrspace annotations. The IR works out nicer if we can change that to an addrspace(5) void*. Semi-related, amdgpu needs a patch to a pointer rounding function that uses ptrmask to tolerate addrspace. Missing from this patch as amdgpu isn't in the MVP.

If we can change that, it doesn't have to be in the MVP

Yes, we need to change those to use the right address space

arsenm · 2024-02-08T06:40:13Z

llvm/lib/Transforms/IPO/ExpandVariadics.cpp

+  bool VACopyIsMemcpy() { return true; }
+};
+
+struct X64SystemV final : public Interface {


I think I fixed your target information issue, you just need to add the TargetMachine to the pass constructor: f7dcabe

arsenm · 2024-02-08T06:43:46Z

llvm/lib/Transforms/IPO/ExpandVariadics.cpp

+  const bool IsLinuxABI = Triple.isOSLinux() || Triple.isOSCygMing();
+
+  switch (Triple.getArch()) {
+


Extra empty line

arsenm · 2024-02-08T06:45:26Z

llvm/lib/Transforms/IPO/ExpandVariadics.cpp

+    if (GlobalValue::isInterposableLinkage(F->getLinkage()))
+      return false;


Simpler as F->isDefinitionExact?

Don't we want to do this unconditionally for AMDGPU?

There are a few conditional statements needed when the later patch lands that permits changing the function interfaces from codegen. This is indeed one of them. I'll check isDefinitionExact.

arsenm · 2024-02-08T06:46:10Z

llvm/lib/Transforms/IPO/ExpandVariadics.cpp

+      }
+    }
+
+    // Branch funnels look like variadic functions but arent:


Suggested change

// Branch funnels look like variadic functions but arent:

// Branch funnels look like variadic functions but aren't:

arsenm · 2024-02-08T06:49:55Z

llvm/lib/Transforms/IPO/ExpandVariadics.cpp

+  NewCB->setCallingConv(CB->getCallingConv());
+  NewCB->copyMetadata(*CB, {LLVMContext::MD_prof, LLVMContext::MD_dbg});
+
+  if (!CB->use_empty()) // dead branch?


you can just replaceAllUsesWith, it shouldn't die on empty uses

arsenm · 2024-02-08T06:50:19Z

llvm/lib/Transforms/IPO/ExpandVariadics.cpp

+  NewCB->setAttributes(PAL);
+  NewCB->takeName(CB);
+  NewCB->setCallingConv(CB->getCallingConv());
+  NewCB->copyMetadata(*CB, {LLVMContext::MD_prof, LLVMContext::MD_dbg});


I'd expect this to be able to preserve all metadata?

Unsure - dead argument elimination preserves exactly these two

arsenm · 2024-02-08T06:51:38Z

llvm/test/CodeGen/X86/expand-variadic-call-x64-linux.ll

+attributes #0 = { nounwind "min-legal-vector-width"="0" "no-builtins" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cmov,+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
+attributes #1 = { mustprogress nocallback nofree nosync nounwind willreturn memory(argmem: readwrite) }
+attributes #2 = { mustprogress nocallback nofree nosync nounwind willreturn }
+attributes #3 = { "no-builtins" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cmov,+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
+attributes #4 = { nounwind "min-legal-vector-width"="128" "no-builtins" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cmov,+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
+attributes #5 = { nounwind }
+attributes #6 = { nobuiltin nounwind "no-builtins" }
+attributes #7 = { nobuiltin "no-builtins" }


Should be able to drop all of the attributes

arsenm · 2024-02-08T06:51:53Z

llvm/test/CodeGen/X86/expand-variadic-call-x64-linux.ll

+attributes #6 = { nobuiltin nounwind "no-builtins" }
+attributes #7 = { nobuiltin "no-builtins" }
+
+!llvm.module.flags = !{!0, !1, !2}


At most you need whatever is required to show the debug info preservation

Debug info is something I haven't really thought about. That's probably something that should have its own test case.

These are aspiring to show that the call frame is correct for the given architecture - variables are at the right offset and so forth - other things like metadata / attribute propagation are target independent and should have their own tests.

I'll look at removing the attributes from these, would give better signal to noise.

arsenm · 2024-02-08T06:52:06Z

llvm/test/CodeGen/X86/expand-variadic-call-x64-linux.ll

+  ret void
+}
+
+define dso_local void @v32f32_i32(ptr nocapture noundef readonly byval(<32 x float>) align 128 %0, i32 noundef %y) local_unnamed_addr #0 {


Don't need all the local_unnamed_addr / dso_local

arsenm · 2024-02-08T07:46:09Z

clang/test/CodeGenCXX/inline-then-fold-variadics.cpp

@@ -0,0 +1,117 @@
+// RUN: %clang_cc1 -triple i386-unknown-linux-gnu -Wno-varargs -O1 -disable-llvm-passes -emit-llvm -o - %s | opt --passes=instcombine | opt -passes="expand-variadics,default<O1>" -S | FileCheck %s --check-prefixes=CHECK,X86Linux


can do this in one opt run?

Nope. There's something weird/broken with opt here. Failure reproduces with other passes. Issue #81128

arsenm · 2024-02-08T07:47:00Z

clang/test/CodeGen/expand-variadic-call.c

+void v32f32_i32(v32f32 x, i32 y)
+{
+  vararg(x, y);
+}


also get the half and bfloat cases

JonChesterfield · 2024-02-08T12:52:18Z

Patch run through clang-tidy --checks=readability-identifier-naming with the config file in tree and recommendations applied. Some of the choices seem poor but it's presumably acceptable.

Pierre-vh

My comments are mostly about style, I haven't done a deep dive into the logic of the pass yet

Pierre-vh · 2024-02-08T13:37:33Z

llvm/lib/Transforms/IPO/ExpandVariadics.cpp

+    if (!Equivalent)
+      return Changed;
+
+    for (User *U : llvm::make_early_inc_range(F->users()))


do you actually need llvm:: for those?

Pierre-vh · 2024-02-08T13:37:54Z

llvm/lib/Transforms/IPO/ExpandVariadics.cpp

+  case VariadicABIInfo::ValistCc::value: {
+    // If it's being passed by value, need a load
+    // TODO: Check it's loading the right thing
+    auto *load = dyn_cast<LoadInst>(&*Iter);


Pierre-vh · 2024-02-08T13:38:06Z

llvm/lib/Transforms/IPO/ExpandVariadics.cpp

+  if (!Call)
+    return nullptr;
+
+  if (auto *end = dyn_cast<VAEndInst>(&*Iter)) {


Pierre-vh · 2024-02-08T13:38:37Z

llvm/lib/Transforms/IPO/ExpandVariadics.cpp

+  // Not totally convinced this is necessary but dead store elimination
+  // decides to discard the stores to the alloca and pass uninitialised
+  // memory along instead when the function is marked tailcall
+  if (TCK == CallInst::TCK_Tail) {


can drop {}

Pierre-vh · 2024-02-08T13:38:49Z

llvm/lib/Transforms/IPO/ExpandVariadics.cpp

+  NewCB->copyMetadata(*CB, {LLVMContext::MD_prof, LLVMContext::MD_dbg});
+
+  if (!CB->use_empty()) // dead branch?
+  {


can drop {} too

Pierre-vh · 2024-02-08T13:46:04Z

llvm/lib/Transforms/IPO/ExpandVariadics.cpp

+      if (CB->isMustTailCall()) {
+        return false;
+      }
+
+      if (!CB->isCallee(&U) || CB->getFunctionType() != F->getFunctionType()) {
+        return false;
+      }


both can drop {}

Pierre-vh · 2024-02-08T13:46:44Z

llvm/lib/Transforms/IPO/ExpandVariadics.cpp

+    if (Triple.isOSDarwin()) {
+      return std::make_unique<VariadicABIInfo::X64SystemV>();
+    }
+
+    if (IsLinuxABI) {
+      return std::make_unique<VariadicABIInfo::X64SystemV>();
+    }


both can drop {}

Pierre-vh · 2024-02-08T13:47:37Z

llvm/lib/Transforms/IPO/ExpandVariadics.cpp

+// aarch64 uses byval
+enum class ValistCc { value, pointer, /*byval*/ };
+
+struct Interface {


Interface is a very generic name, can you make it a bit more specific and add docs?

Pierre-vh · 2024-02-08T13:48:19Z

llvm/lib/Transforms/IPO/ExpandVariadics.cpp

+  bool VACopyIsMemcpy() { return true; }
+};
+
+struct X64SystemV final : public Interface {


The TargetMachine in PassBuilder seems optional, but I'm wondering if we'd even want to run this pass if there is no TM. I don't especially like seeing target-specific logic in an optimization, but I'm not contributing enough to optimizations either to assert whether it's a bad thing or not

Pierre-vh · 2024-02-08T13:49:56Z

llvm/lib/Transforms/IPO/ExpandVariadics.cpp

+
+    if (IsLinuxABI) {
+      struct X86Linux final : public Interface {
+        X86Linux() : Interface(4, 0) {}


i don't understand, why can't this just be a std::make_unique<Interface>(4, 0) ? Why the need for a derived type that does not add any feature/override?

efriedma-quic

I don't really like the whole "sufficiently simple function" thing. It seems fragile. You should be able to just take a arbitrary internal varargs function, rewrite its signature to take a va_list argument, rewrite calls to va_start to make a copy of that va_list, and rewrite the callers to construct that va_list. If that function turns out to be inlinable, great; if not, you haven't really lost anything.

(Rewriting the signature of a function is complicated in its own way because you need to allocate a new Function, then transplant the original function's body into it. But it's not uncharted territory: we should be able to refactor code out of llvm/lib/Transforms/IPO/ArgumentPromotion.cpp .)

Do we have a testing plan for this? Messing up calling convention stuff tends to lead to extremely subtle bugs.

efriedma-quic · 2024-02-12T19:40:19Z

llvm/lib/Transforms/IPO/ExpandVariadics.cpp

+  // varargs can't be tail called anyway
+  // Not totally convinced this is necessary but dead store elimination
+  // decides to discard the stores to the alloca and pass uninitialised
+  // memory along instead when the function is marked tailcall


As defined in LangRef, "tail" actually just means "doesn't reference any memory allocated in the current call's stack". It's a prerequisite for tail-calling a function, but doesn't indicate the function is actually tail-callable.

I'll have to recheck dead store elimination. If the tail call flag isn't removed, DSE turns the IR into undef, which seems wrong to me - but maybe it's fine and I need to reword this comment

jdoerfert · 2024-02-14T00:22:20Z

High level question:
Does this patch eliminate the variadic call edge, or, does it perform inlining on very special variadic function definitions?
I thought the former but isFunctionInlinable, sufficiently confused me.

RKSimon · 2024-02-14T10:10:22Z

Why have the x86 tests been placed in test\CodeGen\X86 instead of something like test\Transforms\ExpandVariadics\X86 ?

JonChesterfield · 2024-02-14T17:40:42Z

I don't really like the whole "sufficiently simple function" thing. It seems fragile. You should be able to just take a arbitrary internal varargs function, rewrite its signature to take a va_list argument, rewrite calls to va_start to make a copy of that va_list, and rewrite the callers to construct that va_list. If that function turns out to be inlinable, great; if not, you haven't really lost anything.

Yes, you can and I do. That's patch 2 of the series, numbered 1. in the list in the commit message (for this is 0)

(Rewriting the signature of a function is complicated in its own way because you need to allocate a new Function, then transplant the original function's body into it. But it's not uncharted territory: we should be able to refactor code out of llvm/lib/Transforms/IPO/ArgumentPromotion.cpp .)

And this is why it's a separate patch.

The rewrite-call-instruction is the target dependent bit, in this patch for an initial target.

The rewrite-function is (almost) target agnostic and involves a surprisingly large amount of book keeping. We have duplicated code in ArgumentPromotion, dead argument removal, the function cloning in attributor and probably elsewhere.

Do we have a testing plan for this? Messing up calling convention stuff tends to lead to extremely subtle bugs.

This would be why I want to do it all in IR, staying away from the backend lowering. It's also why it's structured as orthogonal transforms with independent testing. Fun story, the whole reason this exists for x64 is that I had such a rough time debugging it natively on amdgpu that it was easier to generalise to x64.

JonChesterfield · 2024-02-14T17:42:53Z

High level question: Does this patch eliminate the variadic call edge, or, does it perform inlining on very special variadic function definitions? I thought the former but isFunctionInlinable, sufficiently confused me.

This patch will rewrite calls to a variadic function into calls to a function taking a va_list. Later patches expand that to cover the rest of the cases. More complicated functions, other targets etc. Note that calls to variadic function pointers cannot generally be rewritten without permitting ABI changes, which is what I plan to do for nvptx and amdgpu.

efriedma-quic · 2024-02-14T20:19:21Z

I don't really like the whole "sufficiently simple function" thing. It seems fragile. You should be able to just take a arbitrary internal varargs function, rewrite its signature to take a va_list argument, rewrite calls to va_start to make a copy of that va_list, and rewrite the callers to construct that va_list. If that function turns out to be inlinable, great; if not, you haven't really lost anything.

Yes, you can and I do. That's patch 2 of the series, numbered 1. in the list in the commit message (for this is 0)

Not sure if this means isFunctionInlinable will go away in the followup patch, or if you plan to rewrite functions in a way that satisfies isFunctionInlinable. I think the end state should be that all functions go down the same codepath, not conditionally do something different based on whether they're "simple". I guess I don't have a strong preference for how you get there, though.

JonChesterfield · 2024-02-15T15:12:40Z

Not sure if this means isFunctionInlinable will go away in the followup patch, or if you plan to rewrite functions in a way that satisfies isFunctionInlinable. I think the end state should be that all functions go down the same codepath, not conditionally do something different based on whether they're "simple". I guess I don't have a strong preference for how you get there, though.

The logic I've got at present (which include the ABI rewriting) is

    bool usefulToSplit =
        splitFunctions() && (!F->isDeclaration() || rewriteABI());

    // F may already be a single basic block calling a known function
    // that takes a va_list, in which case it doens't need to be split.
    Function *Equivalent = isFunctionInlinable(M, F);

    if (usefulToSplit && !Equivalent) {
      Equivalent = DeriveInlinableVariadicFunctionPair(M, *F);
      assert(Equivalent);
      assert(isFunctionInlinable(M, *F)); // branch doesn't do this presently but it could do
      changed = true;
      functionToInliningTarget[F] = Equivalent;
    }

I'm not especially attached to the specific control flow.

The two transforms - inlining/call-rewrite and splitting an entry block off the variadic function function so that it can be inlined - are genuinely orthogonal and that is a really good property to preserve. It took some thought to get to that structure.

If we ignore that design and run functions through the block splitting unnecessarily, we win a combinatorial increase in required testing, a decrease in emitted code quality (spurious extra functions), an inability to pattern match on fprintf->vfprintf style code that happens to be in the application already. We would get to delete the isFunctionInlinable predicate.

The independent transform pipeline pattern is more important than the no special case branching heuristic. If it helps, view it as two complementary transforms where the one is skipped when it would be a no-op.

Related - if there's an objection to landing this as an inactive pass (only exercised by test code) we can put it into an optimisation pipeline immediately, it'll still remove some real world variadic calls even if the later patches don't make it.

efriedma-quic · 2024-02-16T21:49:57Z

If we ignore that design and run functions through the block splitting unnecessarily, we win a combinatorial increase in required testing, a decrease in emitted code quality (spurious extra functions), an inability to pattern match on fprintf->vfprintf style code that happens to be in the application already. We would get to delete the isFunctionInlinable predicate.

If we do unnecessary splitting, then run the inliner, we should end up with exactly the same thing you'd get by special-casing isFunctionInlinable functions. So that's... very slightly slower compile-time, in exchange for dropping the delicate isFunctionInlinable check. That seems worthwhile to me.

There's really a lot of stuff you can easily get wrong in a check like that... for example, it looks like you forgot to check the type of the load, or whether the load is volatile.

Not sure what you're referring to with the "combinatorial increase in required testing".

Related - if there's an objection to landing this as an inactive pass (only exercised by test code) we can put it into an optimisation pipeline immediately, it'll still remove some real world variadic calls even if the later patches don't make it.

I'm fine with off-by-default as long as we have a roadmap for enabling it at some point.

JonChesterfield · 2024-02-20T22:45:21Z

Ah OK, so split every variadic definition and let the inliner sort it out afterwards. Yes, I'm good with that. Tests either get messier or add a call to the inliner. Will update the PR correspondingly, solid simplification, thanks!

Discard the combinatorial testing comment - I misunderstood the structure you had in mind.

Unconditional splitting means the transforms need to land in the other order. First splitter/outliner, second the callinstr rewrite. With the function pattern recogniser deleted the test cases for this won't fire. After the varargs splitter/outliner lands the tests for this will PR make sense again.

jhuber6

Some nits, mostly just formatting and naming that hasn't been updated.

I agree overall that we should just put this in some canonical form and rely on other LLVM passes to take care of things like inlining. Eager to have this functionality in, so hopefully we can keep this moving.

jhuber6 · 2024-02-24T16:46:13Z

llvm/lib/Transforms/IPO/ExpandVariadics.cpp

+    if (Triple.isOSDarwin()) {
+      return std::make_unique<VariadicABIInfo::X64SystemV>();
+    }
+
+    if (IsLinuxABI) {
+      return std::make_unique<VariadicABIInfo::X64SystemV>();
+    }


Suggested change

if (Triple.isOSDarwin()) {

return std::make_unique<VariadicABIInfo::X64SystemV>();

}

if (IsLinuxABI) {

return std::make_unique<VariadicABIInfo::X64SystemV>();

}

if (Triple.isOSDarwin())

return std::make_unique<VariadicABIInfo::X64SystemV>();

if (IsLinuxABI)

return std::make_unique<VariadicABIInfo::X64SystemV>();

Nit, no braces around single blocks.

jhuber6 · 2024-02-24T16:50:53Z

llvm/lib/Transforms/IPO/ExpandVariadics.cpp

+  // with it, such that target specific va_arg instructions will correctly
+  // iterate over it. Primarily this means getting the alignment right.
+
+  class {


Yeah, I don't think I've encountered any anonymous classes in LLVM. I've only ever seen anonymous structs that are POD without any members. You call this Frame, so just move it out and call it FrameTy or something.

jhuber6 · 2024-02-24T16:53:41Z

llvm/lib/Transforms/IPO/ExpandVariadics.cpp

+
+  // Initialise a va_list pointing to that struct and pass it as the last
+  // argument
+  {


Why does this need to be in its own block? I'd assume if the names VoidPtr conflict then it would be the same thing. If you don't want the VoidBuffer escaping the scope I'd just pass it directly to the uses in both cases since it's short enough.

arsenm

Title should be rephrased; this doesn't have anything to do with inlining

JonChesterfield · 2024-04-17T00:37:47Z

Feedback is merged into #89007 which represents this transform, with the pattern match Eli recommended against removed (so some test cases now need to run the inliner), the anonymous class is moved, the multiple patch sequence idea is discarded in favour of one patch that does the target specific and the target agnostic mangling bits.

JonChesterfield · 2024-04-17T00:38:59Z

Marked as "draft" to approximate the state of "available for reference but not a candidate for merging"

JonChesterfield · 2024-06-06T11:01:45Z

Majority is landed as #93362

JonChesterfield requested review from arsenm, Artem-B, TNorthover, jdoerfert, shiltian, ronlieb, AlexVlx, jhuber6 and efriedma-quic February 7, 2024 23:49

llvmbot added clang Clang issues not falling into any other category backend:X86 llvm:transforms labels Feb 7, 2024

JonChesterfield force-pushed the jc_varargs_inliner_component branch from 7285c68 to c77e660 Compare February 8, 2024 00:26

jhuber6 reviewed Feb 8, 2024

View reviewed changes

JonChesterfield force-pushed the jc_varargs_inliner_component branch 2 times, most recently from 54b09d5 to b9e21af Compare February 8, 2024 01:24

jhuber6 reviewed Feb 8, 2024

View reviewed changes

shiltian reviewed Feb 8, 2024

View reviewed changes

arsenm reviewed Feb 8, 2024

View reviewed changes

JonChesterfield force-pushed the jc_varargs_inliner_component branch from b9e21af to 122a305 Compare February 8, 2024 12:52

[transforms] Inline simple variadic functions

956f4f9

JonChesterfield force-pushed the jc_varargs_inliner_component branch from 122a305 to 956f4f9 Compare February 8, 2024 13:08

Pierre-vh reviewed Feb 8, 2024

View reviewed changes

efriedma-quic reviewed Feb 12, 2024

View reviewed changes

jhuber6 reviewed Feb 24, 2024

View reviewed changes

arsenm reviewed Feb 28, 2024

View reviewed changes

JonChesterfield marked this pull request as draft April 17, 2024 00:38

JonChesterfield closed this Jun 6, 2024

		virtual void initializeVAList(LLVMContext &Ctx, IRBuilder<> &Builder,
		AllocaInst va_list, Value /buffer/) {

	Triple Trip = Triple(M.getTargetTriple());
	llvm::Triple Triple(M.getTargetTriple());

		@@ -0,0 +1,17 @@
		#ifndef LLVM_TRANSFORMS_IPO_EXPANDVARIADICS_H

		const bool IsLinuxABI = Triple.isOSLinux() \|\| Triple.isOSCygMing();

		switch (Triple.getArch()) {

		if (GlobalValue::isInterposableLinkage(F->getLinkage()))
		return false;

	// Branch funnels look like variadic functions but arent:
	// Branch funnels look like variadic functions but aren't:

		@@ -0,0 +1,117 @@
		// RUN: %clang_cc1 -triple i386-unknown-linux-gnu -Wno-varargs -O1 -disable-llvm-passes -emit-llvm -o - %s \| opt --passes=instcombine \| opt -passes="expand-variadics,default<O1>" -S \| FileCheck %s --check-prefixes=CHECK,X86Linux

[transforms] Inline simple variadic functions #81058

[transforms] Inline simple variadic functions #81058

Uh oh!

Conversation

JonChesterfield commented Feb 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Feb 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Feb 7, 2024

Uh oh!

jhuber6 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jhuber6 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JonChesterfield Feb 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JonChesterfield commented Feb 7, 2024 •

edited

Loading

llvmbot commented Feb 7, 2024 •

edited

Loading

JonChesterfield Feb 8, 2024 •

edited

Loading