Skip to content

[AArch64] Change the coercion type of structs with pointer members. #135064

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 10, 2025

Conversation

davemgreen
Copy link
Collaborator

The aim here is to avoid a ptrtoint->inttoptr round-trip through the function
argument whilst keeping the calling convention the same. Given a struct which
is <= 128bits in size, which can only contain either 1 or 2 pointers, we
convert to a ptr or [2 x ptr] as opposed to the old coercion that uses i64 or
[2 x i64]. This helps alias analysis produce more accurate results.

@llvmbot llvmbot added clang Clang issues not falling into any other category backend:AArch64 clang:codegen IR generation bugs: mangling, exceptions, etc. labels Apr 9, 2025
@llvmbot
Copy link
Member

llvmbot commented Apr 9, 2025

@llvm/pr-subscribers-clang-codegen

@llvm/pr-subscribers-backend-aarch64

Author: David Green (davemgreen)

Changes

The aim here is to avoid a ptrtoint->inttoptr round-trip through the function
argument whilst keeping the calling convention the same. Given a struct which
is <= 128bits in size, which can only contain either 1 or 2 pointers, we
convert to a ptr or [2 x ptr] as opposed to the old coercion that uses i64 or
[2 x i64]. This helps alias analysis produce more accurate results.


Full diff: https://github.com/llvm/llvm-project/pull/135064.diff

3 Files Affected:

  • (modified) clang/lib/CodeGen/Targets/AArch64.cpp (+18)
  • (added) clang/test/CodeGen/AArch64/struct-coerce-using-ptr.cpp (+93)
  • (modified) clang/test/CodeGenCXX/trivial_abi.cpp (+5-8)
diff --git a/clang/lib/CodeGen/Targets/AArch64.cpp b/clang/lib/CodeGen/Targets/AArch64.cpp
index 073ca3cc82690..9dc5f824254eb 100644
--- a/clang/lib/CodeGen/Targets/AArch64.cpp
+++ b/clang/lib/CodeGen/Targets/AArch64.cpp
@@ -485,6 +485,24 @@ ABIArgInfo AArch64ABIInfo::classifyArgumentType(QualType Ty, bool IsVariadicFn,
     }
     Size = llvm::alignTo(Size, Alignment);
 
+    // If the Aggregate is made up of pointers, use an array of pointers for the
+    // coerced type. This prevents having to convert ptr2int->int2ptr through
+    // the call, allowing alias analysis to produce better code.
+    if (const RecordType *RT = Ty->getAs<RecordType>()) {
+      if (const RecordDecl *RD = RT->getDecl()) {
+        if (all_of(RD->fields(), [](FieldDecl *FD) {
+              return FD->getType()->isPointerOrReferenceType();
+            })) {
+          assert((Size == 64 || Size == 128) &&
+                 "Expected a 64 or 128bit struct containing pointers");
+          llvm::Type *PtrTy = llvm::PointerType::getUnqual(getVMContext());
+          if (Size == 128)
+            PtrTy = llvm::ArrayType::get(PtrTy, 2);
+          return ABIArgInfo::getDirect(PtrTy);
+        }
+      }
+    }
+
     // We use a pair of i64 for 16-byte aggregate with 8-byte alignment.
     // For aggregates with 16-byte alignment, we use i128.
     llvm::Type *BaseTy = llvm::Type::getIntNTy(getVMContext(), Alignment);
diff --git a/clang/test/CodeGen/AArch64/struct-coerce-using-ptr.cpp b/clang/test/CodeGen/AArch64/struct-coerce-using-ptr.cpp
new file mode 100644
index 0000000000000..c2d68ae7ef6cd
--- /dev/null
+++ b/clang/test/CodeGen/AArch64/struct-coerce-using-ptr.cpp
@@ -0,0 +1,93 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --version 5
+// RUN: %clang_cc1 -triple aarch64-none-elf -fcxx-exceptions -fexceptions -emit-llvm -o - %s | FileCheck %s
+
+struct Sp {
+    int *x;
+};
+// CHECK-LABEL: define dso_local void @_Z2Tp2Sp(
+// CHECK-SAME: ptr [[S_COERCE:%.*]]) #[[ATTR0:[0-9]+]] {
+// CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    [[S:%.*]] = alloca [[STRUCT_SP:%.*]], align 8
+// CHECK-NEXT:    [[COERCE_DIVE:%.*]] = getelementptr inbounds nuw [[STRUCT_SP]], ptr [[S]], i32 0, i32 0
+// CHECK-NEXT:    store ptr [[S_COERCE]], ptr [[COERCE_DIVE]], align 8
+// CHECK-NEXT:    [[X:%.*]] = getelementptr inbounds nuw [[STRUCT_SP]], ptr [[S]], i32 0, i32 0
+// CHECK-NEXT:    [[TMP0:%.*]] = load ptr, ptr [[X]], align 8
+// CHECK-NEXT:    store i32 1, ptr [[TMP0]], align 4
+// CHECK-NEXT:    ret void
+//
+void Tp(Sp s) { *s.x = 1; }
+
+struct Spp {
+    int *x, *y;
+};
+// CHECK-LABEL: define dso_local void @_Z3Tpp3Spp(
+// CHECK-SAME: [2 x ptr] [[S_COERCE:%.*]]) #[[ATTR0]] {
+// CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    [[S:%.*]] = alloca [[STRUCT_SPP:%.*]], align 8
+// CHECK-NEXT:    store [2 x ptr] [[S_COERCE]], ptr [[S]], align 8
+// CHECK-NEXT:    [[X:%.*]] = getelementptr inbounds nuw [[STRUCT_SPP]], ptr [[S]], i32 0, i32 0
+// CHECK-NEXT:    [[TMP0:%.*]] = load ptr, ptr [[X]], align 8
+// CHECK-NEXT:    store i32 1, ptr [[TMP0]], align 4
+// CHECK-NEXT:    ret void
+//
+void Tpp(Spp s) { *s.x = 1; }
+
+struct Sppp {
+    int *x, *y, *z;
+};
+// CHECK-LABEL: define dso_local void @_Z4Tppp4Sppp(
+// CHECK-SAME: ptr noundef [[S:%.*]]) #[[ATTR0]] {
+// CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    [[S_INDIRECT_ADDR:%.*]] = alloca ptr, align 8
+// CHECK-NEXT:    store ptr [[S]], ptr [[S_INDIRECT_ADDR]], align 8
+// CHECK-NEXT:    [[X:%.*]] = getelementptr inbounds nuw [[STRUCT_SPPP:%.*]], ptr [[S]], i32 0, i32 0
+// CHECK-NEXT:    [[TMP0:%.*]] = load ptr, ptr [[X]], align 8
+// CHECK-NEXT:    store i32 1, ptr [[TMP0]], align 4
+// CHECK-NEXT:    ret void
+//
+void Tppp(Sppp s) { *s.x = 1; }
+
+struct Spi {
+    int *x, y;
+};
+// CHECK-LABEL: define dso_local void @_Z3Tpi3Spi(
+// CHECK-SAME: [2 x i64] [[S_COERCE:%.*]]) #[[ATTR0]] {
+// CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    [[S:%.*]] = alloca [[STRUCT_SPI:%.*]], align 8
+// CHECK-NEXT:    store [2 x i64] [[S_COERCE]], ptr [[S]], align 8
+// CHECK-NEXT:    [[X:%.*]] = getelementptr inbounds nuw [[STRUCT_SPI]], ptr [[S]], i32 0, i32 0
+// CHECK-NEXT:    [[TMP0:%.*]] = load ptr, ptr [[X]], align 8
+// CHECK-NEXT:    store i32 1, ptr [[TMP0]], align 4
+// CHECK-NEXT:    ret void
+//
+void Tpi(Spi s) { *s.x = 1; }
+
+struct Srp {
+    int &x, *y;
+};
+// CHECK-LABEL: define dso_local void @_Z3Trp3Srp(
+// CHECK-SAME: [2 x ptr] [[S_COERCE:%.*]]) #[[ATTR0]] {
+// CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    [[S:%.*]] = alloca [[STRUCT_SRP:%.*]], align 8
+// CHECK-NEXT:    store [2 x ptr] [[S_COERCE]], ptr [[S]], align 8
+// CHECK-NEXT:    [[X:%.*]] = getelementptr inbounds nuw [[STRUCT_SRP]], ptr [[S]], i32 0, i32 0
+// CHECK-NEXT:    [[TMP0:%.*]] = load ptr, ptr [[X]], align 8
+// CHECK-NEXT:    store i32 1, ptr [[TMP0]], align 4
+// CHECK-NEXT:    ret void
+//
+void Trp(Srp s) { s.x = 1; }
+
+struct __attribute__((__packed__)) Spp_packed {
+    int *x, *y;
+};
+// CHECK-LABEL: define dso_local void @_Z10Tpp_packed10Spp_packed(
+// CHECK-SAME: [2 x ptr] [[S_COERCE:%.*]]) #[[ATTR0]] {
+// CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    [[S:%.*]] = alloca [[STRUCT_SPP_PACKED:%.*]], align 1
+// CHECK-NEXT:    store [2 x ptr] [[S_COERCE]], ptr [[S]], align 1
+// CHECK-NEXT:    [[X:%.*]] = getelementptr inbounds nuw [[STRUCT_SPP_PACKED]], ptr [[S]], i32 0, i32 0
+// CHECK-NEXT:    [[TMP0:%.*]] = load ptr, ptr [[X]], align 1
+// CHECK-NEXT:    store i32 1, ptr [[TMP0]], align 4
+// CHECK-NEXT:    ret void
+//
+void Tpp_packed(Spp_packed s) { *s.x = 1; }
diff --git a/clang/test/CodeGenCXX/trivial_abi.cpp b/clang/test/CodeGenCXX/trivial_abi.cpp
index 90054dbf37ae3..b8cc0d1cc6528 100644
--- a/clang/test/CodeGenCXX/trivial_abi.cpp
+++ b/clang/test/CodeGenCXX/trivial_abi.cpp
@@ -68,11 +68,10 @@ struct D0 : B0, B1 {
 
 Small D0::m0() { return {}; }
 
-// CHECK: define{{.*}} void @_Z14testParamSmall5Small(i64 %[[A_COERCE:.*]])
+// CHECK: define{{.*}} void @_Z14testParamSmall5Small(ptr %[[A_COERCE:.*]])
 // CHECK: %[[A:.*]] = alloca %[[STRUCT_SMALL]], align 8
 // CHECK: %[[COERCE_DIVE:.*]] = getelementptr inbounds nuw %[[STRUCT_SMALL]], ptr %[[A]], i32 0, i32 0
-// CHECK: %[[COERCE_VAL_IP:.*]] = inttoptr i64 %[[A_COERCE]] to ptr
-// CHECK: store ptr %[[COERCE_VAL_IP]], ptr %[[COERCE_DIVE]], align 8
+// CHECK: store ptr %[[A_COERCE]], ptr %[[COERCE_DIVE]], align 8
 // CHECK: %[[CALL:.*]] = call noundef ptr @_ZN5SmallD1Ev(ptr {{[^,]*}} %[[A]])
 // CHECK: ret void
 // CHECK: }
@@ -101,8 +100,7 @@ Small testReturnSmall() {
 // CHECK: %[[CALL1:.*]] = call noundef ptr @_ZN5SmallC1ERKS_(ptr {{[^,]*}} %[[AGG_TMP]], ptr noundef nonnull align 8 dereferenceable(8) %[[T]])
 // CHECK: %[[COERCE_DIVE:.*]] = getelementptr inbounds nuw %[[STRUCT_SMALL]], ptr %[[AGG_TMP]], i32 0, i32 0
 // CHECK: %[[V0:.*]] = load ptr, ptr %[[COERCE_DIVE]], align 8
-// CHECK: %[[COERCE_VAL_PI:.*]] = ptrtoint ptr %[[V0]] to i64
-// CHECK: call void @_Z14testParamSmall5Small(i64 %[[COERCE_VAL_PI]])
+// CHECK: call void @_Z14testParamSmall5Small(ptr %[[V0]])
 // CHECK: %[[CALL2:.*]] = call noundef ptr @_ZN5SmallD1Ev(ptr {{[^,]*}} %[[T]])
 // CHECK: ret void
 // CHECK: }
@@ -120,8 +118,7 @@ void testCallSmall0() {
 // CHECK: store ptr %[[COERCE_VAL_IP]], ptr %[[COERCE_DIVE]], align 8
 // CHECK: %[[COERCE_DIVE1:.*]] = getelementptr inbounds nuw %[[STRUCT_SMALL]], ptr %[[AGG_TMP]], i32 0, i32 0
 // CHECK: %[[V0:.*]] = load ptr, ptr %[[COERCE_DIVE1]], align 8
-// CHECK: %[[COERCE_VAL_PI:.*]] = ptrtoint ptr %[[V0]] to i64
-// CHECK: call void @_Z14testParamSmall5Small(i64 %[[COERCE_VAL_PI]])
+// CHECK: call void @_Z14testParamSmall5Small(ptr %[[V0]])
 // CHECK: ret void
 // CHECK: }
 
@@ -226,7 +223,7 @@ NonTrivial testReturnHasNonTrivial() {
 // CHECK: call noundef ptr @_ZN5SmallC1Ev(ptr {{[^,]*}} %[[AGG_TMP]])
 // CHECK: invoke noundef ptr @_ZN5SmallC1Ev(ptr {{[^,]*}} %[[AGG_TMP1]])
 
-// CHECK: call void @_Z20calleeExceptionSmall5SmallS_(i64 %{{.*}}, i64 %{{.*}})
+// CHECK: call void @_Z20calleeExceptionSmall5SmallS_(ptr %{{.*}}, ptr %{{.*}})
 // CHECK-NEXT: ret void
 
 // CHECK: landingpad { ptr, i32 }

@llvmbot
Copy link
Member

llvmbot commented Apr 9, 2025

@llvm/pr-subscribers-clang

Author: David Green (davemgreen)

Changes

The aim here is to avoid a ptrtoint->inttoptr round-trip through the function
argument whilst keeping the calling convention the same. Given a struct which
is <= 128bits in size, which can only contain either 1 or 2 pointers, we
convert to a ptr or [2 x ptr] as opposed to the old coercion that uses i64 or
[2 x i64]. This helps alias analysis produce more accurate results.


Full diff: https://github.com/llvm/llvm-project/pull/135064.diff

3 Files Affected:

  • (modified) clang/lib/CodeGen/Targets/AArch64.cpp (+18)
  • (added) clang/test/CodeGen/AArch64/struct-coerce-using-ptr.cpp (+93)
  • (modified) clang/test/CodeGenCXX/trivial_abi.cpp (+5-8)
diff --git a/clang/lib/CodeGen/Targets/AArch64.cpp b/clang/lib/CodeGen/Targets/AArch64.cpp
index 073ca3cc82690..9dc5f824254eb 100644
--- a/clang/lib/CodeGen/Targets/AArch64.cpp
+++ b/clang/lib/CodeGen/Targets/AArch64.cpp
@@ -485,6 +485,24 @@ ABIArgInfo AArch64ABIInfo::classifyArgumentType(QualType Ty, bool IsVariadicFn,
     }
     Size = llvm::alignTo(Size, Alignment);
 
+    // If the Aggregate is made up of pointers, use an array of pointers for the
+    // coerced type. This prevents having to convert ptr2int->int2ptr through
+    // the call, allowing alias analysis to produce better code.
+    if (const RecordType *RT = Ty->getAs<RecordType>()) {
+      if (const RecordDecl *RD = RT->getDecl()) {
+        if (all_of(RD->fields(), [](FieldDecl *FD) {
+              return FD->getType()->isPointerOrReferenceType();
+            })) {
+          assert((Size == 64 || Size == 128) &&
+                 "Expected a 64 or 128bit struct containing pointers");
+          llvm::Type *PtrTy = llvm::PointerType::getUnqual(getVMContext());
+          if (Size == 128)
+            PtrTy = llvm::ArrayType::get(PtrTy, 2);
+          return ABIArgInfo::getDirect(PtrTy);
+        }
+      }
+    }
+
     // We use a pair of i64 for 16-byte aggregate with 8-byte alignment.
     // For aggregates with 16-byte alignment, we use i128.
     llvm::Type *BaseTy = llvm::Type::getIntNTy(getVMContext(), Alignment);
diff --git a/clang/test/CodeGen/AArch64/struct-coerce-using-ptr.cpp b/clang/test/CodeGen/AArch64/struct-coerce-using-ptr.cpp
new file mode 100644
index 0000000000000..c2d68ae7ef6cd
--- /dev/null
+++ b/clang/test/CodeGen/AArch64/struct-coerce-using-ptr.cpp
@@ -0,0 +1,93 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --version 5
+// RUN: %clang_cc1 -triple aarch64-none-elf -fcxx-exceptions -fexceptions -emit-llvm -o - %s | FileCheck %s
+
+struct Sp {
+    int *x;
+};
+// CHECK-LABEL: define dso_local void @_Z2Tp2Sp(
+// CHECK-SAME: ptr [[S_COERCE:%.*]]) #[[ATTR0:[0-9]+]] {
+// CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    [[S:%.*]] = alloca [[STRUCT_SP:%.*]], align 8
+// CHECK-NEXT:    [[COERCE_DIVE:%.*]] = getelementptr inbounds nuw [[STRUCT_SP]], ptr [[S]], i32 0, i32 0
+// CHECK-NEXT:    store ptr [[S_COERCE]], ptr [[COERCE_DIVE]], align 8
+// CHECK-NEXT:    [[X:%.*]] = getelementptr inbounds nuw [[STRUCT_SP]], ptr [[S]], i32 0, i32 0
+// CHECK-NEXT:    [[TMP0:%.*]] = load ptr, ptr [[X]], align 8
+// CHECK-NEXT:    store i32 1, ptr [[TMP0]], align 4
+// CHECK-NEXT:    ret void
+//
+void Tp(Sp s) { *s.x = 1; }
+
+struct Spp {
+    int *x, *y;
+};
+// CHECK-LABEL: define dso_local void @_Z3Tpp3Spp(
+// CHECK-SAME: [2 x ptr] [[S_COERCE:%.*]]) #[[ATTR0]] {
+// CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    [[S:%.*]] = alloca [[STRUCT_SPP:%.*]], align 8
+// CHECK-NEXT:    store [2 x ptr] [[S_COERCE]], ptr [[S]], align 8
+// CHECK-NEXT:    [[X:%.*]] = getelementptr inbounds nuw [[STRUCT_SPP]], ptr [[S]], i32 0, i32 0
+// CHECK-NEXT:    [[TMP0:%.*]] = load ptr, ptr [[X]], align 8
+// CHECK-NEXT:    store i32 1, ptr [[TMP0]], align 4
+// CHECK-NEXT:    ret void
+//
+void Tpp(Spp s) { *s.x = 1; }
+
+struct Sppp {
+    int *x, *y, *z;
+};
+// CHECK-LABEL: define dso_local void @_Z4Tppp4Sppp(
+// CHECK-SAME: ptr noundef [[S:%.*]]) #[[ATTR0]] {
+// CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    [[S_INDIRECT_ADDR:%.*]] = alloca ptr, align 8
+// CHECK-NEXT:    store ptr [[S]], ptr [[S_INDIRECT_ADDR]], align 8
+// CHECK-NEXT:    [[X:%.*]] = getelementptr inbounds nuw [[STRUCT_SPPP:%.*]], ptr [[S]], i32 0, i32 0
+// CHECK-NEXT:    [[TMP0:%.*]] = load ptr, ptr [[X]], align 8
+// CHECK-NEXT:    store i32 1, ptr [[TMP0]], align 4
+// CHECK-NEXT:    ret void
+//
+void Tppp(Sppp s) { *s.x = 1; }
+
+struct Spi {
+    int *x, y;
+};
+// CHECK-LABEL: define dso_local void @_Z3Tpi3Spi(
+// CHECK-SAME: [2 x i64] [[S_COERCE:%.*]]) #[[ATTR0]] {
+// CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    [[S:%.*]] = alloca [[STRUCT_SPI:%.*]], align 8
+// CHECK-NEXT:    store [2 x i64] [[S_COERCE]], ptr [[S]], align 8
+// CHECK-NEXT:    [[X:%.*]] = getelementptr inbounds nuw [[STRUCT_SPI]], ptr [[S]], i32 0, i32 0
+// CHECK-NEXT:    [[TMP0:%.*]] = load ptr, ptr [[X]], align 8
+// CHECK-NEXT:    store i32 1, ptr [[TMP0]], align 4
+// CHECK-NEXT:    ret void
+//
+void Tpi(Spi s) { *s.x = 1; }
+
+struct Srp {
+    int &x, *y;
+};
+// CHECK-LABEL: define dso_local void @_Z3Trp3Srp(
+// CHECK-SAME: [2 x ptr] [[S_COERCE:%.*]]) #[[ATTR0]] {
+// CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    [[S:%.*]] = alloca [[STRUCT_SRP:%.*]], align 8
+// CHECK-NEXT:    store [2 x ptr] [[S_COERCE]], ptr [[S]], align 8
+// CHECK-NEXT:    [[X:%.*]] = getelementptr inbounds nuw [[STRUCT_SRP]], ptr [[S]], i32 0, i32 0
+// CHECK-NEXT:    [[TMP0:%.*]] = load ptr, ptr [[X]], align 8
+// CHECK-NEXT:    store i32 1, ptr [[TMP0]], align 4
+// CHECK-NEXT:    ret void
+//
+void Trp(Srp s) { s.x = 1; }
+
+struct __attribute__((__packed__)) Spp_packed {
+    int *x, *y;
+};
+// CHECK-LABEL: define dso_local void @_Z10Tpp_packed10Spp_packed(
+// CHECK-SAME: [2 x ptr] [[S_COERCE:%.*]]) #[[ATTR0]] {
+// CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    [[S:%.*]] = alloca [[STRUCT_SPP_PACKED:%.*]], align 1
+// CHECK-NEXT:    store [2 x ptr] [[S_COERCE]], ptr [[S]], align 1
+// CHECK-NEXT:    [[X:%.*]] = getelementptr inbounds nuw [[STRUCT_SPP_PACKED]], ptr [[S]], i32 0, i32 0
+// CHECK-NEXT:    [[TMP0:%.*]] = load ptr, ptr [[X]], align 1
+// CHECK-NEXT:    store i32 1, ptr [[TMP0]], align 4
+// CHECK-NEXT:    ret void
+//
+void Tpp_packed(Spp_packed s) { *s.x = 1; }
diff --git a/clang/test/CodeGenCXX/trivial_abi.cpp b/clang/test/CodeGenCXX/trivial_abi.cpp
index 90054dbf37ae3..b8cc0d1cc6528 100644
--- a/clang/test/CodeGenCXX/trivial_abi.cpp
+++ b/clang/test/CodeGenCXX/trivial_abi.cpp
@@ -68,11 +68,10 @@ struct D0 : B0, B1 {
 
 Small D0::m0() { return {}; }
 
-// CHECK: define{{.*}} void @_Z14testParamSmall5Small(i64 %[[A_COERCE:.*]])
+// CHECK: define{{.*}} void @_Z14testParamSmall5Small(ptr %[[A_COERCE:.*]])
 // CHECK: %[[A:.*]] = alloca %[[STRUCT_SMALL]], align 8
 // CHECK: %[[COERCE_DIVE:.*]] = getelementptr inbounds nuw %[[STRUCT_SMALL]], ptr %[[A]], i32 0, i32 0
-// CHECK: %[[COERCE_VAL_IP:.*]] = inttoptr i64 %[[A_COERCE]] to ptr
-// CHECK: store ptr %[[COERCE_VAL_IP]], ptr %[[COERCE_DIVE]], align 8
+// CHECK: store ptr %[[A_COERCE]], ptr %[[COERCE_DIVE]], align 8
 // CHECK: %[[CALL:.*]] = call noundef ptr @_ZN5SmallD1Ev(ptr {{[^,]*}} %[[A]])
 // CHECK: ret void
 // CHECK: }
@@ -101,8 +100,7 @@ Small testReturnSmall() {
 // CHECK: %[[CALL1:.*]] = call noundef ptr @_ZN5SmallC1ERKS_(ptr {{[^,]*}} %[[AGG_TMP]], ptr noundef nonnull align 8 dereferenceable(8) %[[T]])
 // CHECK: %[[COERCE_DIVE:.*]] = getelementptr inbounds nuw %[[STRUCT_SMALL]], ptr %[[AGG_TMP]], i32 0, i32 0
 // CHECK: %[[V0:.*]] = load ptr, ptr %[[COERCE_DIVE]], align 8
-// CHECK: %[[COERCE_VAL_PI:.*]] = ptrtoint ptr %[[V0]] to i64
-// CHECK: call void @_Z14testParamSmall5Small(i64 %[[COERCE_VAL_PI]])
+// CHECK: call void @_Z14testParamSmall5Small(ptr %[[V0]])
 // CHECK: %[[CALL2:.*]] = call noundef ptr @_ZN5SmallD1Ev(ptr {{[^,]*}} %[[T]])
 // CHECK: ret void
 // CHECK: }
@@ -120,8 +118,7 @@ void testCallSmall0() {
 // CHECK: store ptr %[[COERCE_VAL_IP]], ptr %[[COERCE_DIVE]], align 8
 // CHECK: %[[COERCE_DIVE1:.*]] = getelementptr inbounds nuw %[[STRUCT_SMALL]], ptr %[[AGG_TMP]], i32 0, i32 0
 // CHECK: %[[V0:.*]] = load ptr, ptr %[[COERCE_DIVE1]], align 8
-// CHECK: %[[COERCE_VAL_PI:.*]] = ptrtoint ptr %[[V0]] to i64
-// CHECK: call void @_Z14testParamSmall5Small(i64 %[[COERCE_VAL_PI]])
+// CHECK: call void @_Z14testParamSmall5Small(ptr %[[V0]])
 // CHECK: ret void
 // CHECK: }
 
@@ -226,7 +223,7 @@ NonTrivial testReturnHasNonTrivial() {
 // CHECK: call noundef ptr @_ZN5SmallC1Ev(ptr {{[^,]*}} %[[AGG_TMP]])
 // CHECK: invoke noundef ptr @_ZN5SmallC1Ev(ptr {{[^,]*}} %[[AGG_TMP1]])
 
-// CHECK: call void @_Z20calleeExceptionSmall5SmallS_(i64 %{{.*}}, i64 %{{.*}})
+// CHECK: call void @_Z20calleeExceptionSmall5SmallS_(ptr %{{.*}}, ptr %{{.*}})
 // CHECK-NEXT: ret void
 
 // CHECK: landingpad { ptr, i32 }

Copy link
Collaborator

@efriedma-quic efriedma-quic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The general idea here makes sense.

@davemgreen davemgreen force-pushed the gh-a64-coerceptr branch 3 times, most recently from 84e87d5 to 9f23881 Compare April 29, 2025 08:39
@davemgreen
Copy link
Collaborator Author

Rebase and ping - thanks.

The aim here is to avoid a ptrtoint->inttoptr round-trip throught the function
argument whilst keeping the calling convention the same. Given a struct which
is <= 128bits in size, which can only contain either 1 or 2 pointers, we
convert to a ptr or [2 x ptr] as opposed to the old coercion that uses i64 or
[2 x i64].
@davemgreen
Copy link
Collaborator Author

Rebase and ping - it feels like this is a decent compromise for keeping the code simple.

Copy link
Collaborator

@efriedma-quic efriedma-quic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@davemgreen davemgreen merged commit 5f648c3 into llvm:main Jun 10, 2025
11 checks passed
@davemgreen davemgreen deleted the gh-a64-coerceptr branch June 10, 2025 06:04
rorth pushed a commit to rorth/llvm-project that referenced this pull request Jun 11, 2025
…lvm#135064)

The aim here is to avoid a ptrtoint->inttoptr round-trip through the function
argument whilst keeping the calling convention the same. Given a struct which
is <= 128bits in size, which can only contain either 1 or 2 pointers, we
convert to a ptr or [2 x ptr] as opposed to the old coercion that uses i64 or
[2 x i64]. This helps alias analysis produce more accurate results.
DhruvSrivastavaX pushed a commit to DhruvSrivastavaX/lldb-for-aix that referenced this pull request Jun 12, 2025
…lvm#135064)

The aim here is to avoid a ptrtoint->inttoptr round-trip through the function
argument whilst keeping the calling convention the same. Given a struct which
is <= 128bits in size, which can only contain either 1 or 2 pointers, we
convert to a ptr or [2 x ptr] as opposed to the old coercion that uses i64 or
[2 x i64]. This helps alias analysis produce more accurate results.
@Caslyn
Copy link
Contributor

Caslyn commented Jun 12, 2025

Hi @davemgreen - after this commit, we’ve observed clang++ crashes compiling existing code with:

clang++: llvm/lib/IR/Instructions.cpp:3038: static CastInst *llvm::CastInst::Create(Instruction::CastOps, Value *, Type *, const Twine &, InsertPosition): Assertion `castIsValid(op, S, Ty) && "Invalid cast!"' failed.

Here’s a reduced reproducer:

// bin/clang -cc1 -triple aarch64-unknown-fuchsia -target-feature -aes -target-feature -crypto -target-feature -fp-armv8 -target-feature -neon -target-feature -sha2 -fno-rtti -emit-llvm

struct RomTable {
  template <typename IoProvider, typename ComponentCallback>
  static void Walk(IoProvider, int, ComponentCallback) {};
};
struct RegisterMmioScaled {
  RegisterMmioScaled(void *);
  __attribute__((address_space(100))) char *mmio_;
};
long addr;
int size;
void WalkRomTable() {
  RegisterMmioScaled mmio(reinterpret_cast<void *>(addr));
  RomTable::Walk(mmio, size, [] {});
}

Could you please take a look and revert this change if this will require a non-trivial fix?

@davemgreen
Copy link
Collaborator Author

Thanks for the report - I'll put together a fix to exclude address spaces for the moment.

davemgreen added a commit that referenced this pull request Jun 12, 2025
…pes.

As reported on #135064, the generic pointer coercion code in
CoerceIntOrPtrToIntOrPtr cannot handle address space casts (it tries to bitcast
the pointers). This bails out if an address space qualifier is found on the
pointer.
tomtor pushed a commit to tomtor/llvm-project that referenced this pull request Jun 14, 2025
…lvm#135064)

The aim here is to avoid a ptrtoint->inttoptr round-trip through the function
argument whilst keeping the calling convention the same. Given a struct which
is <= 128bits in size, which can only contain either 1 or 2 pointers, we
convert to a ptr or [2 x ptr] as opposed to the old coercion that uses i64 or
[2 x i64]. This helps alias analysis produce more accurate results.
tomtor pushed a commit to tomtor/llvm-project that referenced this pull request Jun 14, 2025
…pes.

As reported on llvm#135064, the generic pointer coercion code in
CoerceIntOrPtrToIntOrPtr cannot handle address space casts (it tries to bitcast
the pointers). This bails out if an address space qualifier is found on the
pointer.
@eaeltsin
Copy link
Contributor

heads up - this change causes tag-mismatch under hwasan on a bunch of our internal tests.

@eaeltsin
Copy link
Contributor

The issue seems to be with pointers to functions, adding

           FDTy = getContext().getBaseElementType(FDTy);
         return (FDTy->isPointerOrReferenceType() &&
                 getContext().getTypeSize(FDTy) == 64 &&
-                !FDTy->getPointeeType().hasAddressSpace()) ||
+                !FDTy->getPointeeType().hasAddressSpace() &&
+                !FDTy->getPointeeType()->isFunctionType()) ||
                Self(Self, FDTy);
       });
     };

to clang/lib/CodeGen/Targets/AArch64.cpp:512 makes the tests pass.

The problematic type in our case seems to be abseil::FunctionRef

Please let me know if this is sufficient, reducing internal test to a standalone reproducer might take some time.

@davemgreen
Copy link
Collaborator Author

The issue seems to be with pointers to functions, adding

           FDTy = getContext().getBaseElementType(FDTy);
         return (FDTy->isPointerOrReferenceType() &&
                 getContext().getTypeSize(FDTy) == 64 &&
-                !FDTy->getPointeeType().hasAddressSpace()) ||
+                !FDTy->getPointeeType().hasAddressSpace() &&
+                !FDTy->getPointeeType()->isFunctionType()) ||
                Self(Self, FDTy);
       });
     };

to clang/lib/CodeGen/Targets/AArch64.cpp:512 makes the tests pass.

The problematic type in our case seems to be abseil::FunctionRef

Please let me know if this is sufficient, reducing internal test to a standalone reproducer might take some time.

Hi - that might break some of my motivating examples (they were pointers passed in structs from lambda captures, I don't remember if the pointers were to functions or not, there were definitely a lot of function pointers flying around). In general I would have expected it to be better to avoid the ptrtoint->inttoptr when passed through a function argument though. Can we make it bail out for hwasan only with function pointers?

I was trying with code like https://godbolt.org/z/xfW3vcjeo to see if I could hit the same issue (but imagine running on an AArch64 machine). Any ideas what it might take to trigger it (different option, certain code pattern?)

@eaeltsin
Copy link
Contributor

The tests fail with SIGSEGV without hwasan, so there is some problem revealed by this change either in the compiler or in the tests. Bailing out under hwasan is not a fix.

Disallowing function pointers makes the problem go away, so might be a useful hint. Another hint is that the issue seems to be limited to 128-bit structs only.

We are trying to cook a standalone reproducer.

@vitalybuka
Copy link
Collaborator

The tests fail with SIGSEGV without hwasan, so there is some problem revealed by this change either in the compiler or in the tests. Bailing out under hwasan is not a fix.

Disallowing function pointers makes the problem go away, so might be a useful hint. Another hint is that the issue seems to be limited to 128-bit structs only.

We are trying to cook a standalone reproducer.

@eaeltsin Isn't the original issue are new SIGSEGV on non-hwasan arm builds?

@davemgreen
Copy link
Collaborator Author

I haven't seen anything else like that I don't believe, and my understanding is the new IR should be simpler than before. Let me know if you have anything that can show the issue you are running into. Thanks

akuhlens pushed a commit to akuhlens/llvm-project that referenced this pull request Jun 24, 2025
…pes.

As reported on llvm#135064, the generic pointer coercion code in
CoerceIntOrPtrToIntOrPtr cannot handle address space casts (it tries to bitcast
the pointers). This bails out if an address space qualifier is found on the
pointer.
@eaeltsin
Copy link
Contributor

eaeltsin commented Jun 24, 2025

Sorry I don't have much to add so far. I can reproduce the problem in a pretty complex context (co_await inside the loop, LICM pass seems to make the tests fail), but cannot provide a reduced public reproducer yet. Still looking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:AArch64 clang:codegen IR generation bugs: mangling, exceptions, etc. clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants