-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[LLVM] Fix incorrect alignment on AMDGPU variadics #96370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary: The variadics lowering for AMDGPU puts all the arguments into a void pointer struct. The current logic dictates that the minimum alignment is four regardless of what the underlying type is. This is incorrect in the following case. ```c void foo(int, ...); void bar() { int x; void *p; foo(0, x, p); } ``` Here, because the minimum alignment is 4, we will only increment the buffer by 4, resulting in an incorrect alignment when we then try to access the void pointer. We need to set a minimum of 4, but increase it to 8 in cases like this.
@llvm/pr-subscribers-llvm-transforms @llvm/pr-subscribers-clang-codegen Author: Joseph Huber (jhuber6) ChangesSummary: void foo(int, ...);
void bar() {
int x;
void *p;
foo(0, x, p);
} Here, because the minimum alignment is 4, we will only increment the Patch is 56.64 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/96370.diff 4 Files Affected:
diff --git a/clang/lib/CodeGen/Targets/AMDGPU.cpp b/clang/lib/CodeGen/Targets/AMDGPU.cpp
index 4d3275e17c386..a169a7d920456 100644
--- a/clang/lib/CodeGen/Targets/AMDGPU.cpp
+++ b/clang/lib/CodeGen/Targets/AMDGPU.cpp
@@ -121,7 +121,7 @@ void AMDGPUABIInfo::computeInfo(CGFunctionInfo &FI) const {
RValue AMDGPUABIInfo::EmitVAArg(CodeGenFunction &CGF, Address VAListAddr,
QualType Ty, AggValueSlot Slot) const {
const bool IsIndirect = false;
- const bool AllowHigherAlign = false;
+ const bool AllowHigherAlign = true;
return emitVoidPtrVAArg(CGF, VAListAddr, Ty, IsIndirect,
getContext().getTypeInfoInChars(Ty),
CharUnits::fromQuantity(4), AllowHigherAlign, Slot);
@@ -212,13 +212,8 @@ ABIArgInfo AMDGPUABIInfo::classifyArgumentType(QualType Ty, bool Variadic,
Ty = useFirstFieldIfTransparentUnion(Ty);
- if (Variadic) {
- return ABIArgInfo::getDirect(/*T=*/nullptr,
- /*Offset=*/0,
- /*Padding=*/nullptr,
- /*CanBeFlattened=*/false,
- /*Align=*/0);
- }
+ if (Variadic)
+ return ABIArgInfo::getDirect();
if (isAggregateTypeForABI(Ty)) {
// Records with non-trivial destructors/copy-constructors should not be
diff --git a/clang/test/CodeGen/amdgpu-variadic-call.c b/clang/test/CodeGen/amdgpu-variadic-call.c
index 17eda215211a2..0529d6b3171c8 100644
--- a/clang/test/CodeGen/amdgpu-variadic-call.c
+++ b/clang/test/CodeGen/amdgpu-variadic-call.c
@@ -1,4 +1,3 @@
-// REQUIRES: amdgpu-registered-target
// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --function-signature
// RUN: %clang_cc1 -cc1 -std=c23 -triple amdgcn-amd-amdhsa -emit-llvm -O1 %s -o - | FileCheck %s
@@ -179,11 +178,9 @@ typedef struct
// CHECK-LABEL: define {{[^@]+}}@one_pair_f64
// CHECK-SAME: (i32 noundef [[F0:%.*]], double noundef [[F1:%.*]], double [[V0_COERCE0:%.*]], double [[V0_COERCE1:%.*]]) local_unnamed_addr #[[ATTR0]] {
// CHECK-NEXT: entry:
-// CHECK-NEXT: [[DOTFCA_0_INSERT:%.*]] = insertvalue [[STRUCT_PAIR_F64:%.*]] poison, double [[V0_COERCE0]], 0
-// CHECK-NEXT: [[DOTFCA_1_INSERT:%.*]] = insertvalue [[STRUCT_PAIR_F64]] [[DOTFCA_0_INSERT]], double [[V0_COERCE1]], 1
-// CHECK-NEXT: tail call void (...) @sink_0([[STRUCT_PAIR_F64]] [[DOTFCA_1_INSERT]]) #[[ATTR2]]
-// CHECK-NEXT: tail call void (i32, ...) @sink_1(i32 noundef [[F0]], [[STRUCT_PAIR_F64]] [[DOTFCA_1_INSERT]]) #[[ATTR2]]
-// CHECK-NEXT: tail call void (double, i32, ...) @sink_2(double noundef [[F1]], i32 noundef [[F0]], [[STRUCT_PAIR_F64]] [[DOTFCA_1_INSERT]]) #[[ATTR2]]
+// CHECK-NEXT: tail call void (...) @sink_0(double [[V0_COERCE0]], double [[V0_COERCE1]]) #[[ATTR2]]
+// CHECK-NEXT: tail call void (i32, ...) @sink_1(i32 noundef [[F0]], double [[V0_COERCE0]], double [[V0_COERCE1]]) #[[ATTR2]]
+// CHECK-NEXT: tail call void (double, i32, ...) @sink_2(double noundef [[F1]], i32 noundef [[F0]], double [[V0_COERCE0]], double [[V0_COERCE1]]) #[[ATTR2]]
// CHECK-NEXT: ret void
//
void one_pair_f64(int f0, double f1, pair_f64 v0)
@@ -220,10 +217,9 @@ typedef union
// CHECK-SAME: (i32 noundef [[F0:%.*]], double noundef [[F1:%.*]], i32 [[V0_COERCE:%.*]]) local_unnamed_addr #[[ATTR0]] {
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = bitcast i32 [[V0_COERCE]] to float
-// CHECK-NEXT: [[DOTFCA_0_INSERT:%.*]] = insertvalue [[UNION_UNION_F32_I32:%.*]] poison, float [[TMP0]], 0
-// CHECK-NEXT: tail call void (...) @sink_0([[UNION_UNION_F32_I32]] [[DOTFCA_0_INSERT]]) #[[ATTR2]]
-// CHECK-NEXT: tail call void (i32, ...) @sink_1(i32 noundef [[F0]], [[UNION_UNION_F32_I32]] [[DOTFCA_0_INSERT]]) #[[ATTR2]]
-// CHECK-NEXT: tail call void (double, i32, ...) @sink_2(double noundef [[F1]], i32 noundef [[F0]], [[UNION_UNION_F32_I32]] [[DOTFCA_0_INSERT]]) #[[ATTR2]]
+// CHECK-NEXT: tail call void (...) @sink_0(float [[TMP0]]) #[[ATTR2]]
+// CHECK-NEXT: tail call void (i32, ...) @sink_1(i32 noundef [[F0]], float [[TMP0]]) #[[ATTR2]]
+// CHECK-NEXT: tail call void (double, i32, ...) @sink_2(double noundef [[F1]], i32 noundef [[F0]], float [[TMP0]]) #[[ATTR2]]
// CHECK-NEXT: ret void
//
void one_pair_union_f32_i32(int f0, double f1, union_f32_i32 v0)
@@ -242,10 +238,9 @@ typedef union
// CHECK-LABEL: define {{[^@]+}}@one_pair_transparent_union_f32_i32
// CHECK-SAME: (i32 noundef [[F0:%.*]], double noundef [[F1:%.*]], i32 [[V0_COERCE:%.*]]) local_unnamed_addr #[[ATTR0]] {
// CHECK-NEXT: entry:
-// CHECK-NEXT: [[DOTFCA_0_INSERT:%.*]] = insertvalue [[UNION_TRANSPARENT_UNION_F32_I32:%.*]] poison, i32 [[V0_COERCE]], 0
-// CHECK-NEXT: tail call void (...) @sink_0([[UNION_TRANSPARENT_UNION_F32_I32]] [[DOTFCA_0_INSERT]]) #[[ATTR2]]
-// CHECK-NEXT: tail call void (i32, ...) @sink_1(i32 noundef [[F0]], [[UNION_TRANSPARENT_UNION_F32_I32]] [[DOTFCA_0_INSERT]]) #[[ATTR2]]
-// CHECK-NEXT: tail call void (double, i32, ...) @sink_2(double noundef [[F1]], i32 noundef [[F0]], [[UNION_TRANSPARENT_UNION_F32_I32]] [[DOTFCA_0_INSERT]]) #[[ATTR2]]
+// CHECK-NEXT: tail call void (...) @sink_0(i32 [[V0_COERCE]]) #[[ATTR2]]
+// CHECK-NEXT: tail call void (i32, ...) @sink_1(i32 noundef [[F0]], i32 [[V0_COERCE]]) #[[ATTR2]]
+// CHECK-NEXT: tail call void (double, i32, ...) @sink_2(double noundef [[F1]], i32 noundef [[F0]], i32 [[V0_COERCE]]) #[[ATTR2]]
// CHECK-NEXT: ret void
//
void one_pair_transparent_union_f32_i32(int f0, double f1, transparent_union_f32_i32 v0)
@@ -277,12 +272,9 @@ void multiple_one(int f0, double f1, int v0, double v1)
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = bitcast i32 [[V2_COERCE]] to float
// CHECK-NEXT: [[CONV:%.*]] = fpext float [[V1]] to double
-// CHECK-NEXT: [[DOTFCA_0_INSERT16:%.*]] = insertvalue [[STRUCT_PAIR_F64:%.*]] poison, double [[V0_COERCE0]], 0
-// CHECK-NEXT: [[DOTFCA_1_INSERT:%.*]] = insertvalue [[STRUCT_PAIR_F64]] [[DOTFCA_0_INSERT16]], double [[V0_COERCE1]], 1
-// CHECK-NEXT: [[DOTFCA_0_INSERT:%.*]] = insertvalue [[UNION_UNION_F32_I32:%.*]] poison, float [[TMP0]], 0
-// CHECK-NEXT: tail call void (...) @sink_0([[STRUCT_PAIR_F64]] [[DOTFCA_1_INSERT]], double noundef [[CONV]], [[UNION_UNION_F32_I32]] [[DOTFCA_0_INSERT]], i32 noundef [[V3]]) #[[ATTR2]]
-// CHECK-NEXT: tail call void (i32, ...) @sink_1(i32 noundef [[F0]], [[STRUCT_PAIR_F64]] [[DOTFCA_1_INSERT]], double noundef [[CONV]], [[UNION_UNION_F32_I32]] [[DOTFCA_0_INSERT]], i32 noundef [[V3]]) #[[ATTR2]]
-// CHECK-NEXT: tail call void (double, i32, ...) @sink_2(double noundef [[F1]], i32 noundef [[F0]], [[STRUCT_PAIR_F64]] [[DOTFCA_1_INSERT]], double noundef [[CONV]], [[UNION_UNION_F32_I32]] [[DOTFCA_0_INSERT]], i32 noundef [[V3]]) #[[ATTR2]]
+// CHECK-NEXT: tail call void (...) @sink_0(double [[V0_COERCE0]], double [[V0_COERCE1]], double noundef [[CONV]], float [[TMP0]], i32 noundef [[V3]]) #[[ATTR2]]
+// CHECK-NEXT: tail call void (i32, ...) @sink_1(i32 noundef [[F0]], double [[V0_COERCE0]], double [[V0_COERCE1]], double noundef [[CONV]], float [[TMP0]], i32 noundef [[V3]]) #[[ATTR2]]
+// CHECK-NEXT: tail call void (double, i32, ...) @sink_2(double noundef [[F1]], i32 noundef [[F0]], double [[V0_COERCE0]], double [[V0_COERCE1]], double noundef [[CONV]], float [[TMP0]], i32 noundef [[V3]]) #[[ATTR2]]
// CHECK-NEXT: ret void
//
void multiple_two(int f0, double f1, pair_f64 v0, float v1, union_f32_i32 v2, int v3)
diff --git a/llvm/lib/Transforms/IPO/ExpandVariadics.cpp b/llvm/lib/Transforms/IPO/ExpandVariadics.cpp
index d340bc041ccda..489e13410d3b1 100644
--- a/llvm/lib/Transforms/IPO/ExpandVariadics.cpp
+++ b/llvm/lib/Transforms/IPO/ExpandVariadics.cpp
@@ -934,7 +934,11 @@ struct Amdgpu final : public VariadicABIInfo {
}
VAArgSlotInfo slotInfo(const DataLayout &DL, Type *Parameter) override {
- return {Align(4), false};
+ const unsigned MinAlign = 1;
+ Align A = DL.getABITypeAlign(Parameter);
+ if (A < MinAlign)
+ A = Align(MinAlign);
+ return {A, false};
}
};
diff --git a/llvm/test/CodeGen/AMDGPU/expand-variadic-call.ll b/llvm/test/CodeGen/AMDGPU/expand-variadic-call.ll
index ce55558dabaf1..e7450707ac94a 100644
--- a/llvm/test/CodeGen/AMDGPU/expand-variadic-call.ll
+++ b/llvm/test/CodeGen/AMDGPU/expand-variadic-call.ll
@@ -1,50 +1,48 @@
-; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: -p --function-signature
-; RUN: opt -S --passes=expand-variadics --expand-variadics-override=lowering < %s | FileCheck %s
-; REQUIRES: amdgpu-registered-target
-target datalayout = "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-p7:160:256:256:32-p8:128:128-p9:192:256:256:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5-G1-ni:7:8:9"
-target triple = "amdgcn-amd-amdhsa"
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt -S -mtriple=amdgcn-- --passes=expand-variadics \
+; RUN: --expand-variadics-override=lowering < %s | FileCheck %s
-; Check the variables are lowered to the locations this target expects
+%struct.libcS = type { i8, i16, i32, i64, float, double }
-; The types show the call frames
+; The types show the call frames.
; CHECK: %single_i32.vararg = type <{ i32 }>
; CHECK: %single_double.vararg = type <{ double }>
; CHECK: %single_v4f32.vararg = type <{ <4 x float> }>
; CHECK: %single_v8f32.vararg = type <{ <8 x float> }>
; CHECK: %single_v16f32.vararg = type <{ <16 x float> }>
; CHECK: %single_v32f32.vararg = type <{ <32 x float> }>
-; CHECK: %i32_double.vararg = type <{ i32, double }>
+; CHECK: %i32_double.vararg = type <{ i32, [4 x i8], double }>
; CHECK: %double_i32.vararg = type <{ double, i32 }>
-; CHECK: %i32_libcS.vararg = type <{ i32, %struct.libcS }>
+; CHECK: %i32_libcS.vararg = type <{ i32, [4 x i8], %struct.libcS }>
+; CHECK: %struct.libcS = type { i8, i16, i32, i64, float, double }
; CHECK: %libcS_i32.vararg = type <{ %struct.libcS, i32 }>
-; CHECK: %i32_v4f32.vararg = type <{ i32, <4 x float> }>
+; CHECK: %i32_v4f32.vararg = type <{ i32, [12 x i8], <4 x float> }>
; CHECK: %v4f32_i32.vararg = type <{ <4 x float>, i32 }>
-; CHECK: %i32_v8f32.vararg = type <{ i32, <8 x float> }>
+; CHECK: %i32_v8f32.vararg = type <{ i32, [28 x i8], <8 x float> }>
; CHECK: %v8f32_i32.vararg = type <{ <8 x float>, i32 }>
-; CHECK: %i32_v16f32.vararg = type <{ i32, <16 x float> }>
+; CHECK: %i32_v16f32.vararg = type <{ i32, [60 x i8], <16 x float> }>
; CHECK: %v16f32_i32.vararg = type <{ <16 x float>, i32 }>
-; CHECK: %i32_v32f32.vararg = type <{ i32, <32 x float> }>
+; CHECK: %i32_v32f32.vararg = type <{ i32, [124 x i8], <32 x float> }>
; CHECK: %v32f32_i32.vararg = type <{ <32 x float>, i32 }>
; CHECK: %fptr_single_i32.vararg = type <{ i32 }>
; CHECK: %fptr_libcS.vararg = type <{ %struct.libcS }>
-%struct.libcS = type { i8, i16, i32, i64, float, double }
-
@vararg_ptr = hidden addrspace(1) global ptr @vararg, align 8
define hidden void @copy(ptr noundef %va) {
-; CHECK-LABEL: define {{[^@]+}}@copy(ptr noundef %va) {
-; CHECK-NEXT: entry:
-; CHECK-NEXT: %va.addr = alloca ptr, align 8, addrspace(5)
-; CHECK-NEXT: %cp = alloca ptr, align 8, addrspace(5)
-; CHECK-NEXT: %va.addr.ascast = addrspacecast ptr addrspace(5) %va.addr to ptr
-; CHECK-NEXT: %cp.ascast = addrspacecast ptr addrspace(5) %cp to ptr
-; CHECK-NEXT: store ptr %va, ptr addrspace(5) %va.addr, align 8
-; CHECK-NEXT: call void @llvm.lifetime.start.p5(i64 8, ptr addrspace(5) %cp)
-; CHECK-NEXT: call void @llvm.memcpy.p0.p0.i32(ptr %cp.ascast, ptr %va.addr.ascast, i32 8, i1 false)
-; CHECK-NEXT: %0 = load ptr, ptr addrspace(5) %cp, align 8
-; CHECK-NEXT: call void @valist(ptr noundef %0)
-; CHECK-NEXT: call void @llvm.lifetime.end.p5(i64 8, ptr addrspace(5) %cp)
+; CHECK-LABEL: define hidden void @copy(
+; CHECK-SAME: ptr noundef [[VA:%.*]]) {
+; CHECK-NEXT: [[ENTRY:.*:]]
+; CHECK-NEXT: [[VA_ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
+; CHECK-NEXT: [[CP:%.*]] = alloca ptr, align 8, addrspace(5)
+; CHECK-NEXT: [[VA_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[VA_ADDR]] to ptr
+; CHECK-NEXT: [[CP_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[CP]] to ptr
+; CHECK-NEXT: store ptr [[VA]], ptr addrspace(5) [[VA_ADDR]], align 8
+; CHECK-NEXT: call void @llvm.lifetime.start.p5(i64 8, ptr addrspace(5) [[CP]])
+; CHECK-NEXT: call void @llvm.memcpy.p0.p0.i32(ptr [[CP_ASCAST]], ptr [[VA_ADDR_ASCAST]], i32 8, i1 false)
+; CHECK-NEXT: [[TMP0:%.*]] = load ptr, ptr addrspace(5) [[CP]], align 8
+; CHECK-NEXT: call void @valist(ptr noundef [[TMP0]])
+; CHECK-NEXT: call void @llvm.lifetime.end.p5(i64 8, ptr addrspace(5) [[CP]])
; CHECK-NEXT: ret void
;
entry:
@@ -70,15 +68,16 @@ declare hidden void @valist(ptr noundef)
declare void @llvm.lifetime.end.p5(i64 immarg, ptr addrspace(5) nocapture)
define hidden void @start_once(...) {
-; CHECK-LABEL: define {{[^@]+}}@start_once(ptr %varargs) {
-; CHECK-NEXT: entry:
-; CHECK-NEXT: %s = alloca ptr, align 8, addrspace(5)
-; CHECK-NEXT: %s.ascast = addrspacecast ptr addrspace(5) %s to ptr
-; CHECK-NEXT: call void @llvm.lifetime.start.p5(i64 8, ptr addrspace(5) %s)
-; CHECK-NEXT: store ptr %varargs, ptr %s.ascast, align 8
-; CHECK-NEXT: %0 = load ptr, ptr addrspace(5) %s, align 8
-; CHECK-NEXT: call void @valist(ptr noundef %0)
-; CHECK-NEXT: call void @llvm.lifetime.end.p5(i64 8, ptr addrspace(5) %s)
+; CHECK-LABEL: define hidden void @start_once(
+; CHECK-SAME: ptr [[VARARGS:%.*]]) {
+; CHECK-NEXT: [[ENTRY:.*:]]
+; CHECK-NEXT: [[S:%.*]] = alloca ptr, align 8, addrspace(5)
+; CHECK-NEXT: [[S_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[S]] to ptr
+; CHECK-NEXT: call void @llvm.lifetime.start.p5(i64 8, ptr addrspace(5) [[S]])
+; CHECK-NEXT: store ptr [[VARARGS]], ptr [[S_ASCAST]], align 8
+; CHECK-NEXT: [[TMP0:%.*]] = load ptr, ptr addrspace(5) [[S]], align 8
+; CHECK-NEXT: call void @valist(ptr noundef [[TMP0]])
+; CHECK-NEXT: call void @llvm.lifetime.end.p5(i64 8, ptr addrspace(5) [[S]])
; CHECK-NEXT: ret void
;
entry:
@@ -98,22 +97,23 @@ declare void @llvm.va_start.p0(ptr)
declare void @llvm.va_end.p0(ptr)
define hidden void @start_twice(...) {
-; CHECK-LABEL: define {{[^@]+}}@start_twice(ptr %varargs) {
-; CHECK-NEXT: entry:
-; CHECK-NEXT: %s0 = alloca ptr, align 8, addrspace(5)
-; CHECK-NEXT: %s1 = alloca ptr, align 8, addrspace(5)
-; CHECK-NEXT: %s0.ascast = addrspacecast ptr addrspace(5) %s0 to ptr
-; CHECK-NEXT: %s1.ascast = addrspacecast ptr addrspace(5) %s1 to ptr
-; CHECK-NEXT: call void @llvm.lifetime.start.p5(i64 8, ptr addrspace(5) %s0)
-; CHECK-NEXT: call void @llvm.lifetime.start.p5(i64 8, ptr addrspace(5) %s1)
-; CHECK-NEXT: store ptr %varargs, ptr %s0.ascast, align 8
-; CHECK-NEXT: %0 = load ptr, ptr addrspace(5) %s0, align 8
-; CHECK-NEXT: call void @valist(ptr noundef %0)
-; CHECK-NEXT: store ptr %varargs, ptr %s1.ascast, align 8
-; CHECK-NEXT: %1 = load ptr, ptr addrspace(5) %s1, align 8
-; CHECK-NEXT: call void @valist(ptr noundef %1)
-; CHECK-NEXT: call void @llvm.lifetime.end.p5(i64 8, ptr addrspace(5) %s1)
-; CHECK-NEXT: call void @llvm.lifetime.end.p5(i64 8, ptr addrspace(5) %s0)
+; CHECK-LABEL: define hidden void @start_twice(
+; CHECK-SAME: ptr [[VARARGS:%.*]]) {
+; CHECK-NEXT: [[ENTRY:.*:]]
+; CHECK-NEXT: [[S0:%.*]] = alloca ptr, align 8, addrspace(5)
+; CHECK-NEXT: [[S1:%.*]] = alloca ptr, align 8, addrspace(5)
+; CHECK-NEXT: [[S0_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[S0]] to ptr
+; CHECK-NEXT: [[S1_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[S1]] to ptr
+; CHECK-NEXT: call void @llvm.lifetime.start.p5(i64 8, ptr addrspace(5) [[S0]])
+; CHECK-NEXT: call void @llvm.lifetime.start.p5(i64 8, ptr addrspace(5) [[S1]])
+; CHECK-NEXT: store ptr [[VARARGS]], ptr [[S0_ASCAST]], align 8
+; CHECK-NEXT: [[TMP0:%.*]] = load ptr, ptr addrspace(5) [[S0]], align 8
+; CHECK-NEXT: call void @valist(ptr noundef [[TMP0]])
+; CHECK-NEXT: store ptr [[VARARGS]], ptr [[S1_ASCAST]], align 8
+; CHECK-NEXT: [[TMP1:%.*]] = load ptr, ptr addrspace(5) [[S1]], align 8
+; CHECK-NEXT: call void @valist(ptr noundef [[TMP1]])
+; CHECK-NEXT: call void @llvm.lifetime.end.p5(i64 8, ptr addrspace(5) [[S1]])
+; CHECK-NEXT: call void @llvm.lifetime.end.p5(i64 8, ptr addrspace(5) [[S0]])
; CHECK-NEXT: ret void
;
entry:
@@ -137,15 +137,16 @@ entry:
}
define hidden void @single_i32(i32 noundef %x) {
-; CHECK-LABEL: define {{[^@]+}}@single_i32(i32 noundef %x) {
-; CHECK-NEXT: entry:
-; CHECK-NEXT: %vararg_buffer = alloca %single_i32.vararg, align 4, addrspace(5)
-; CHECK-NEXT: call void @llvm.lifetime.start.p5(i64 4, ptr addrspace(5) %vararg_buffer)
-; CHECK-NEXT: %0 = getelementptr inbounds %single_i32.vararg, ptr addrspace(5) %vararg_buffer, i32 0, i32 0
-; CHECK-NEXT: store i32 %x, ptr addrspace(5) %0, align 4
-; CHECK-NEXT: %1 = addrspacecast ptr addrspace(5) %vararg_buffer to ptr
-; CHECK-NEXT: call void @vararg(ptr %1)
-; CHECK-NEXT: call void @llvm.lifetime.end.p5(i64 4, ptr addrspace(5) %vararg_buffer)
+; CHECK-LABEL: define hidden void @single_i32(
+; CHECK-SAME: i32 noundef [[X:%.*]]) {
+; CHECK-NEXT: [[ENTRY:.*:]]
+; CHECK-NEXT: [[VARARG_BUFFER:%.*]] = alloca [[SINGLE_I32_VARARG:%.*]], align 4, addrspace(5)
+; CHECK-NEXT: call void @llvm.lifetime.start.p5(i64 4, ptr addrspace(5) [[VARARG_BUFFER]])
+; CHECK-NEXT: [[TMP0:%.*]] = getelementptr inbounds [[SINGLE_I32_VARARG]], ptr addrspace(5) [[VARARG_BUFFER]], i32 0, i32 0
+; CHECK-NEXT: store i32 [[X]], ptr addrspace(5) [[TMP0]], align 4
+; CHECK-NEXT: [[TMP1:%.*]] = addrspacecast ptr addrspace(5) [[VARARG_BUFFER]] to ptr
+; CHECK-NEXT: call void @vararg(ptr [[TMP1]])
+; CHECK-NEXT: call void @llvm.lifetime.end.p5(i64 4, ptr addrspace(5) [[VARARG_BUFFER]])
; CHECK-NEXT: ret void
;
entry:
@@ -156,15 +157,16 @@ entry:
declare hidden void @vararg(...)
define hidden void @single_double(double noundef %x) {
-; CHECK-LABEL: define {{[^@]+}}@single_double(double noundef %x) {
-; CHECK-NEXT: entry:
-; CHECK-NEXT: %vararg_buffer = alloca %single_double.vararg, align 4, addrspace(5)
-; CHECK-NEXT: call void @llvm.lifetime.start.p5(i64 8, ptr addrspace(5) %vararg_buffer)
-; CHECK-NEXT: %0 = getelementptr inbounds %single_double.vararg, ptr addrspace(5) %vararg_buffer, i32 0, i32 0
-; CHECK-NEXT: store double %x, ptr addrspace(5) %0, align 8
-; CHECK-NEXT: %1 = addrspacecast ptr addrspace(5) %vararg_buffer to ptr
-; CHECK-NEXT: call void @vararg(ptr %1)
-; CHECK-NEXT: call void @llvm.lifetime.end.p5(i64 8, ptr addrspace(5) %vararg_buffer)
+; CHECK-LABEL: define hidden void @single_double(
+; CHECK-SAME: double noundef [[X:%.*]]) {
+; CHECK-NEXT: [[ENTRY:.*:]]
+; CHECK-NEXT: [[VARARG_BUFFER:%.*]] = alloca [[SINGLE_DOUBLE_VARARG:%.*]], align 8, addrspace(5)
+; CHECK-NEXT: call void @llvm.lifetime.start.p5(i64 8, ptr addrspace(5) [[VARARG_BUFFER]])
+; CHECK-NEXT: [[TMP0:%.*]] = getelementptr inbounds [[SINGLE_DOUBLE_VARARG]], ptr addrspace(5) [[VARARG_BUFFER]], i32 0, i32 0
+; CHECK-NEXT: store double [[X]], ptr addrspace(5) [[TMP0]], align 8
+; CHECK-NEXT: [[TMP1:%.*]] = addrspacecast ptr addrspace(5) [[VARARG_BUFFER]] to ptr
+; CHECK-NEXT: call void @vararg(ptr [[TMP1]])
+; CHECK-NEXT: call void @llvm.lifetime.end.p5(i64 8, ptr addrspace(5) [[VARARG_BUFFER]])
; CHECK-NEXT: ret void
;
entry:
@@ -173,15 +175,16 @@ entry:
}
define hidden void @single_v4f32(<4 x float> noundef %x) {
-; CHECK-LABEL: define {{[^@]+}}@single_v4f32(<4 x float> noundef %x) {
-; CHECK-NEXT: entry:
-; CHECK-NEXT: %vararg_buffer = alloca %single_v4f32.vararg, align 4, addrspace(5)
-; CHECK-NEXT: call void @llvm.lifetime.start.p5(i64 16, ptr addrspace(5) %vararg_buffer)
-; CHECK-NEXT: %0 = getelementptr inbounds %single_v4f32.vararg, ptr addrspace(5) %vararg_buffer, i32 0, i32 0
-; CHECK-NEXT: store <4 x float> %x, ptr addrspace(5) %0, align 16
-; CHECK-NEXT: %1 = addrspacecast ptr addrspace(5) %vararg_buffer to ptr
-; CHECK-NEXT: call void @vararg(ptr %1)
-; CHECK-NEXT: call void @llvm.lifetime.end.p5(i64 16, ptr addr...
[truncated]
|
It should be incrementing by the size? 4 byte aligned access of 8 byte type should work fine |
Guess that's an AMD thing, so I'm going to assume that @JonChesterfield wrote this intentionally to save on stack space? I suppose the issue I'm having with my |
Incrementing by align is just a bug, of course the size is the real value. Whether we want to continue wasting space is another not-correctness discussion |
Struct padding is pretty universal, AMDGPU seems the odd one out here. I wouldn't mind it so much if it didn't require me to know which vendor I was dealing with in the RPC implementation, but I suppose I could store that information somewhere if we want to use a compressed option and we know it works. |
It's not about struct padding, but the base alignment. Any pointer increment should be alignTo(ptr + size, align), not += align. The += align won't even work for large structs |
Hm, that's what I'm doing in the
Without this patch, I get this. As you can see there's no struct padding so the 8 byte value is right next to the 4 byte one.
|
For what IR? Is the small struct getting expanded into individual scalar pieces? |
The implementation intentionally states that the alignment is always |
This PR is invalid. First, the alignment on the eight byte pointer is supposed to be four. Increasing it to 8 makes things worse. Second, I can't see any support for the claim that the code is incrementing by the alignment of the value, as opposed to the size. The frame is setup as a struct instance with explicit padding by Int8Tys and the calculation there is correct. The va_arg increment is done in CodeGen:emitVoidPtrVAArg, where DirectSize is ValueInfo.Width, aligned to the 4 byte slot size, then stored. It does not increment the iterator by the alignment of the type. The lowering pass is doing exactly what was intended. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Patch should not land. Need to know what bug this was trying to address to guess at what the right fix would be.
My understanding was that the variadics did lowering to a struct with a minimum alignment of four. This currently doesn't do that, hence my confusion. The current lowering provides no padding, which I now see is a deliberate choice to save on stack presumably. The issue I had was that I laid out my |
Ah yes, libc code doing the equivalent of va_arg assuming natural alignment when the underlying buffer is a packed struct with fields padded to four bytes would not work. That would be "fixed" by changing the compiler to match the assumption made by libc, but it seems much better for libc to do the misaligned load instead. Then there's no wavesize*align-padding stack space burned at runtime. |
Lower than native alignment is legal in AMDGPU hardware and it's possible to work around in the |
Summary:
The variadics lowering for AMDGPU puts all the arguments into a void
pointer struct. The current logic dictates that the minimum alignment is
four regardless of what the underlying type is. This is incorrect in
the following case.
Here, because the minimum alignment is 4, we will only increment the
buffer by 4, resulting in an incorrect alignment when we then try to
access the void pointer. We need to set a minimum of 4, but increase it
to 8 in cases like this.