-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[Flang] Adjust the trampoline size for AArch64 and PPC #118678
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
✅ With the latest revision this PR passed the C/C++ code formatter. |
@llvm/pr-subscribers-flang-codegen Author: None (ssijaric-nv) ChangesThe trampoline size is 36 bytes for AArch64, 40 bytes for PPC32 and 48 bytes Full diff: https://github.com/llvm/llvm-project/pull/118678.diff 2 Files Affected:
diff --git a/flang/lib/Optimizer/CodeGen/BoxedProcedure.cpp b/flang/lib/Optimizer/CodeGen/BoxedProcedure.cpp
index c536fd19fcc69a..e1c9e97d0b535b 100644
--- a/flang/lib/Optimizer/CodeGen/BoxedProcedure.cpp
+++ b/flang/lib/Optimizer/CodeGen/BoxedProcedure.cpp
@@ -268,10 +268,18 @@ class BoxedProcedurePass
// Create the thunk.
auto module = embox->getParentOfType<mlir::ModuleOp>();
FirOpBuilder builder(rewriter, module);
+ const auto triple{fir::getTargetTriple(module)};
auto loc = embox.getLoc();
mlir::Type i8Ty = builder.getI8Type();
mlir::Type i8Ptr = builder.getRefType(i8Ty);
- mlir::Type buffTy = SequenceType::get({32}, i8Ty);
+ fir::SequenceType::Extent thunkSize = 32;
+ if (triple.isPPC32())
+ thunkSize = 40;
+ else if (triple.isPPC64())
+ thunkSize = 48;
+ else if (triple.isAArch64())
+ thunkSize = 36;
+ mlir::Type buffTy = SequenceType::get({thunkSize}, i8Ty);
auto buffer = builder.create<AllocaOp>(loc, buffTy);
mlir::Value closure =
builder.createConvert(loc, i8Ptr, embox.getHost());
diff --git a/flang/test/Fir/boxproc.fir b/flang/test/Fir/boxproc.fir
index 9e4ea0bc210775..fbc1329892cbc7 100644
--- a/flang/test/Fir/boxproc.fir
+++ b/flang/test/Fir/boxproc.fir
@@ -1,7 +1,11 @@
-// RUN: tco %s | FileCheck %s
+// RUN: %if aarch64-registered-target %{tco --target=aarch64-unknown-linux-gnu %s | FileCheck %s --check-prefixes=CHECK,CHECK-AARCH64 %}
+// RUN: %if x86-registered-target %{tco --target=x86_64-unknown-linux-gnu %s | FileCheck %s --check-prefixes=CHECK,CHECK-X86 %}
+// RUN: %if powerpc-registered-target %{tco --target=powerpc64le-unknown-linux-gnu %s | FileCheck %s --check-prefixes=CHECK,CHECK-PPC %}
// CHECK-LABEL: define void @_QPtest_proc_dummy()
-// CHECK: %[[VAL_3:.*]] = alloca [32 x i8], i64 1, align 1
+// CHECK-AARCH64: %[[VAL_3:.*]] = alloca [36 x i8], i64 1, align 1
+// CHECK-X86: %[[VAL_3:.*]] = alloca [32 x i8], i64 1, align 1
+// CHECK-PPC: %[[VAL_3:.*]] = alloca [4{{[0-8]+}} x i8], i64 1, align 1
// CHECK: %[[VAL_1:.*]] = alloca { ptr }, i64 1, align 8
// CHECK: %[[VAL_0:.*]] = alloca i32, i64 1, align 4
// CHECK: %[[VAL_2:.*]] = getelementptr { ptr }, ptr %[[VAL_1]], i32 0, i32 0
@@ -59,7 +63,9 @@ func.func @_QPtest_proc_dummy_other(%arg0: !fir.boxproc<() -> ()>) {
}
// CHECK-LABEL: define void @_QPtest_proc_dummy_char()
-// CHECK: %[[VAL_20:.*]] = alloca [32 x i8], i64 1, align 1
+// CHECK-AARCH64: %[[VAL_20:.*]] = alloca [36 x i8], i64 1, align 1
+// CHECK-X86: %[[VAL_20:.*]] = alloca [32 x i8], i64 1, align 1
+// CHECK-PPC: %[[VAL_20:.*]] = alloca [4{{[0-8]+}} x i8], i64 1, align 1
// CHECK: %[[VAL_2:.*]] = alloca { { ptr, i64 } }, i64 1, align 8
// CHECK: %[[VAL_1:.*]] = alloca [10 x i8], i64 1, align 1
// CHECK: %[[VAL_0:.*]] = alloca [40 x i8], i64 1, align 1
|
@llvm/pr-subscribers-flang-fir-hlfir Author: None (ssijaric-nv) ChangesThe trampoline size is 36 bytes for AArch64, 40 bytes for PPC32 and 48 bytes Full diff: https://github.com/llvm/llvm-project/pull/118678.diff 2 Files Affected:
diff --git a/flang/lib/Optimizer/CodeGen/BoxedProcedure.cpp b/flang/lib/Optimizer/CodeGen/BoxedProcedure.cpp
index c536fd19fcc69a..e1c9e97d0b535b 100644
--- a/flang/lib/Optimizer/CodeGen/BoxedProcedure.cpp
+++ b/flang/lib/Optimizer/CodeGen/BoxedProcedure.cpp
@@ -268,10 +268,18 @@ class BoxedProcedurePass
// Create the thunk.
auto module = embox->getParentOfType<mlir::ModuleOp>();
FirOpBuilder builder(rewriter, module);
+ const auto triple{fir::getTargetTriple(module)};
auto loc = embox.getLoc();
mlir::Type i8Ty = builder.getI8Type();
mlir::Type i8Ptr = builder.getRefType(i8Ty);
- mlir::Type buffTy = SequenceType::get({32}, i8Ty);
+ fir::SequenceType::Extent thunkSize = 32;
+ if (triple.isPPC32())
+ thunkSize = 40;
+ else if (triple.isPPC64())
+ thunkSize = 48;
+ else if (triple.isAArch64())
+ thunkSize = 36;
+ mlir::Type buffTy = SequenceType::get({thunkSize}, i8Ty);
auto buffer = builder.create<AllocaOp>(loc, buffTy);
mlir::Value closure =
builder.createConvert(loc, i8Ptr, embox.getHost());
diff --git a/flang/test/Fir/boxproc.fir b/flang/test/Fir/boxproc.fir
index 9e4ea0bc210775..fbc1329892cbc7 100644
--- a/flang/test/Fir/boxproc.fir
+++ b/flang/test/Fir/boxproc.fir
@@ -1,7 +1,11 @@
-// RUN: tco %s | FileCheck %s
+// RUN: %if aarch64-registered-target %{tco --target=aarch64-unknown-linux-gnu %s | FileCheck %s --check-prefixes=CHECK,CHECK-AARCH64 %}
+// RUN: %if x86-registered-target %{tco --target=x86_64-unknown-linux-gnu %s | FileCheck %s --check-prefixes=CHECK,CHECK-X86 %}
+// RUN: %if powerpc-registered-target %{tco --target=powerpc64le-unknown-linux-gnu %s | FileCheck %s --check-prefixes=CHECK,CHECK-PPC %}
// CHECK-LABEL: define void @_QPtest_proc_dummy()
-// CHECK: %[[VAL_3:.*]] = alloca [32 x i8], i64 1, align 1
+// CHECK-AARCH64: %[[VAL_3:.*]] = alloca [36 x i8], i64 1, align 1
+// CHECK-X86: %[[VAL_3:.*]] = alloca [32 x i8], i64 1, align 1
+// CHECK-PPC: %[[VAL_3:.*]] = alloca [4{{[0-8]+}} x i8], i64 1, align 1
// CHECK: %[[VAL_1:.*]] = alloca { ptr }, i64 1, align 8
// CHECK: %[[VAL_0:.*]] = alloca i32, i64 1, align 4
// CHECK: %[[VAL_2:.*]] = getelementptr { ptr }, ptr %[[VAL_1]], i32 0, i32 0
@@ -59,7 +63,9 @@ func.func @_QPtest_proc_dummy_other(%arg0: !fir.boxproc<() -> ()>) {
}
// CHECK-LABEL: define void @_QPtest_proc_dummy_char()
-// CHECK: %[[VAL_20:.*]] = alloca [32 x i8], i64 1, align 1
+// CHECK-AARCH64: %[[VAL_20:.*]] = alloca [36 x i8], i64 1, align 1
+// CHECK-X86: %[[VAL_20:.*]] = alloca [32 x i8], i64 1, align 1
+// CHECK-PPC: %[[VAL_20:.*]] = alloca [4{{[0-8]+}} x i8], i64 1, align 1
// CHECK: %[[VAL_2:.*]] = alloca { { ptr, i64 } }, i64 1, align 8
// CHECK: %[[VAL_1:.*]] = alloca [10 x i8], i64 1, align 1
// CHECK: %[[VAL_0:.*]] = alloca [40 x i8], i64 1, align 1
|
There is a separate PR under #118234 to pass the correct size to __trampoline_setup during AArch64 lowering. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
Please could you provide links to the documentation you used, preferably both in code comments and the commit summary |
Not a problem for this patch, but it would be REALLY nice if there was a function that could be called in LLVM to get the correct size for a trampoline, rather than each different user of LLVM having to implement their own variant of
Subject to providing some documentation for "why these values" (I understand what they represent and sort of why they need to be different, but it'd be good to have a source for what they are on different architectures). |
Thanks, all. Added a comment. I'll add a function to LLVM to retrieve the trampoline size, and will then push a follow-on patch to replace the if...then...else checks. |
My colleagues found that the size is 32 for |
Thank you for checking, Kelvin. I went by what's in PPCTargetLowering::LowerINIT_TRAMPOLINE, but these are only for Linux. Looks like it's safe to keep it at 32-bytes for all PPC targets, with the exception of ppc32 at 40 bytes? |
I think it works for Linux. Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
5e5b89e
to
273c5ed
Compare
The trampoline size is 36 bytes for AArch64, 40 bytes for PPC32 and 48 bytes for PPC64. During AArch64 and PPC lowering, init.trampoline is lowered to a call to __trampoline_setup, with the corresponding trampoline sizes passed to it.
273c5ed
to
2191d54
Compare
The trampoline size is 36 bytes for AArch64, 40 bytes for PPC32 and 48 bytes
for PPC64, defined in compiler-rt/lib/builtins/trampoline_setup.c. During AArch64
and PPC lowering, init.trampoline is lowered to a call to __trampoline_setup, with the
corresponding trampoline sizes passed to it. On x86_64 and RISCV, __trampoline_setup
isn't used, and the setup fits within 32 bytes.