Skip to content

[Flang] Adjust the trampoline size for AArch64 and PPC #118678

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jan 27, 2025

Conversation

ssijaric-nv
Copy link
Contributor

@ssijaric-nv ssijaric-nv commented Dec 4, 2024

The trampoline size is 36 bytes for AArch64, 40 bytes for PPC32 and 48 bytes
for PPC64, defined in compiler-rt/lib/builtins/trampoline_setup.c. During AArch64
and PPC lowering, init.trampoline is lowered to a call to __trampoline_setup, with the
corresponding trampoline sizes passed to it. On x86_64 and RISCV, __trampoline_setup
isn't used, and the setup fits within 32 bytes.

Copy link

github-actions bot commented Dec 4, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@ssijaric-nv ssijaric-nv marked this pull request as ready for review December 5, 2024 00:50
@llvmbot llvmbot added flang Flang issues not falling into any other category flang:fir-hlfir flang:codegen labels Dec 5, 2024
@llvmbot
Copy link
Member

llvmbot commented Dec 5, 2024

@llvm/pr-subscribers-flang-codegen

Author: None (ssijaric-nv)

Changes

The trampoline size is 36 bytes for AArch64, 40 bytes for PPC32 and 48 bytes
for PPC64. During AArch64 and PPC lowering, init.trampoline is lowered
to a call to __trampoline_setup, with the corresponding trampoline sizes
passed to it.


Full diff: https://github.com/llvm/llvm-project/pull/118678.diff

2 Files Affected:

  • (modified) flang/lib/Optimizer/CodeGen/BoxedProcedure.cpp (+9-1)
  • (modified) flang/test/Fir/boxproc.fir (+9-3)
diff --git a/flang/lib/Optimizer/CodeGen/BoxedProcedure.cpp b/flang/lib/Optimizer/CodeGen/BoxedProcedure.cpp
index c536fd19fcc69a..e1c9e97d0b535b 100644
--- a/flang/lib/Optimizer/CodeGen/BoxedProcedure.cpp
+++ b/flang/lib/Optimizer/CodeGen/BoxedProcedure.cpp
@@ -268,10 +268,18 @@ class BoxedProcedurePass
             // Create the thunk.
             auto module = embox->getParentOfType<mlir::ModuleOp>();
             FirOpBuilder builder(rewriter, module);
+            const auto triple{fir::getTargetTriple(module)};
             auto loc = embox.getLoc();
             mlir::Type i8Ty = builder.getI8Type();
             mlir::Type i8Ptr = builder.getRefType(i8Ty);
-            mlir::Type buffTy = SequenceType::get({32}, i8Ty);
+            fir::SequenceType::Extent thunkSize = 32;
+            if (triple.isPPC32())
+              thunkSize = 40;
+            else if (triple.isPPC64())
+              thunkSize = 48;
+            else if (triple.isAArch64())
+              thunkSize = 36;
+            mlir::Type buffTy = SequenceType::get({thunkSize}, i8Ty);
             auto buffer = builder.create<AllocaOp>(loc, buffTy);
             mlir::Value closure =
                 builder.createConvert(loc, i8Ptr, embox.getHost());
diff --git a/flang/test/Fir/boxproc.fir b/flang/test/Fir/boxproc.fir
index 9e4ea0bc210775..fbc1329892cbc7 100644
--- a/flang/test/Fir/boxproc.fir
+++ b/flang/test/Fir/boxproc.fir
@@ -1,7 +1,11 @@
-// RUN: tco %s | FileCheck %s
+// RUN: %if aarch64-registered-target %{tco --target=aarch64-unknown-linux-gnu %s | FileCheck %s --check-prefixes=CHECK,CHECK-AARCH64 %}
+// RUN: %if x86-registered-target %{tco --target=x86_64-unknown-linux-gnu %s | FileCheck %s --check-prefixes=CHECK,CHECK-X86 %}
+// RUN: %if powerpc-registered-target %{tco --target=powerpc64le-unknown-linux-gnu %s | FileCheck %s --check-prefixes=CHECK,CHECK-PPC %}
 
 // CHECK-LABEL: define void @_QPtest_proc_dummy()
-// CHECK:         %[[VAL_3:.*]] = alloca [32 x i8], i64 1, align 1
+// CHECK-AARCH64: %[[VAL_3:.*]] = alloca [36 x i8], i64 1, align 1
+// CHECK-X86:     %[[VAL_3:.*]] = alloca [32 x i8], i64 1, align 1
+// CHECK-PPC:     %[[VAL_3:.*]] = alloca [4{{[0-8]+}} x i8], i64 1, align 1
 // CHECK:         %[[VAL_1:.*]] = alloca { ptr }, i64 1, align 8
 // CHECK:         %[[VAL_0:.*]] = alloca i32, i64 1, align 4
 // CHECK:         %[[VAL_2:.*]] = getelementptr { ptr }, ptr %[[VAL_1]], i32 0, i32 0
@@ -59,7 +63,9 @@ func.func @_QPtest_proc_dummy_other(%arg0: !fir.boxproc<() -> ()>) {
 }
 
 // CHECK-LABEL: define void @_QPtest_proc_dummy_char()
-// CHECK:         %[[VAL_20:.*]] = alloca [32 x i8], i64 1, align 1
+// CHECK-AARCH64: %[[VAL_20:.*]] = alloca [36 x i8], i64 1, align 1
+// CHECK-X86:     %[[VAL_20:.*]] = alloca [32 x i8], i64 1, align 1
+// CHECK-PPC:     %[[VAL_20:.*]] = alloca [4{{[0-8]+}} x i8], i64 1, align 1
 // CHECK:         %[[VAL_2:.*]] = alloca { { ptr, i64 } }, i64 1, align 8
 // CHECK:         %[[VAL_1:.*]] = alloca [10 x i8], i64 1, align 1
 // CHECK:         %[[VAL_0:.*]] = alloca [40 x i8], i64 1, align 1

@llvmbot
Copy link
Member

llvmbot commented Dec 5, 2024

@llvm/pr-subscribers-flang-fir-hlfir

Author: None (ssijaric-nv)

Changes

The trampoline size is 36 bytes for AArch64, 40 bytes for PPC32 and 48 bytes
for PPC64. During AArch64 and PPC lowering, init.trampoline is lowered
to a call to __trampoline_setup, with the corresponding trampoline sizes
passed to it.


Full diff: https://github.com/llvm/llvm-project/pull/118678.diff

2 Files Affected:

  • (modified) flang/lib/Optimizer/CodeGen/BoxedProcedure.cpp (+9-1)
  • (modified) flang/test/Fir/boxproc.fir (+9-3)
diff --git a/flang/lib/Optimizer/CodeGen/BoxedProcedure.cpp b/flang/lib/Optimizer/CodeGen/BoxedProcedure.cpp
index c536fd19fcc69a..e1c9e97d0b535b 100644
--- a/flang/lib/Optimizer/CodeGen/BoxedProcedure.cpp
+++ b/flang/lib/Optimizer/CodeGen/BoxedProcedure.cpp
@@ -268,10 +268,18 @@ class BoxedProcedurePass
             // Create the thunk.
             auto module = embox->getParentOfType<mlir::ModuleOp>();
             FirOpBuilder builder(rewriter, module);
+            const auto triple{fir::getTargetTriple(module)};
             auto loc = embox.getLoc();
             mlir::Type i8Ty = builder.getI8Type();
             mlir::Type i8Ptr = builder.getRefType(i8Ty);
-            mlir::Type buffTy = SequenceType::get({32}, i8Ty);
+            fir::SequenceType::Extent thunkSize = 32;
+            if (triple.isPPC32())
+              thunkSize = 40;
+            else if (triple.isPPC64())
+              thunkSize = 48;
+            else if (triple.isAArch64())
+              thunkSize = 36;
+            mlir::Type buffTy = SequenceType::get({thunkSize}, i8Ty);
             auto buffer = builder.create<AllocaOp>(loc, buffTy);
             mlir::Value closure =
                 builder.createConvert(loc, i8Ptr, embox.getHost());
diff --git a/flang/test/Fir/boxproc.fir b/flang/test/Fir/boxproc.fir
index 9e4ea0bc210775..fbc1329892cbc7 100644
--- a/flang/test/Fir/boxproc.fir
+++ b/flang/test/Fir/boxproc.fir
@@ -1,7 +1,11 @@
-// RUN: tco %s | FileCheck %s
+// RUN: %if aarch64-registered-target %{tco --target=aarch64-unknown-linux-gnu %s | FileCheck %s --check-prefixes=CHECK,CHECK-AARCH64 %}
+// RUN: %if x86-registered-target %{tco --target=x86_64-unknown-linux-gnu %s | FileCheck %s --check-prefixes=CHECK,CHECK-X86 %}
+// RUN: %if powerpc-registered-target %{tco --target=powerpc64le-unknown-linux-gnu %s | FileCheck %s --check-prefixes=CHECK,CHECK-PPC %}
 
 // CHECK-LABEL: define void @_QPtest_proc_dummy()
-// CHECK:         %[[VAL_3:.*]] = alloca [32 x i8], i64 1, align 1
+// CHECK-AARCH64: %[[VAL_3:.*]] = alloca [36 x i8], i64 1, align 1
+// CHECK-X86:     %[[VAL_3:.*]] = alloca [32 x i8], i64 1, align 1
+// CHECK-PPC:     %[[VAL_3:.*]] = alloca [4{{[0-8]+}} x i8], i64 1, align 1
 // CHECK:         %[[VAL_1:.*]] = alloca { ptr }, i64 1, align 8
 // CHECK:         %[[VAL_0:.*]] = alloca i32, i64 1, align 4
 // CHECK:         %[[VAL_2:.*]] = getelementptr { ptr }, ptr %[[VAL_1]], i32 0, i32 0
@@ -59,7 +63,9 @@ func.func @_QPtest_proc_dummy_other(%arg0: !fir.boxproc<() -> ()>) {
 }
 
 // CHECK-LABEL: define void @_QPtest_proc_dummy_char()
-// CHECK:         %[[VAL_20:.*]] = alloca [32 x i8], i64 1, align 1
+// CHECK-AARCH64: %[[VAL_20:.*]] = alloca [36 x i8], i64 1, align 1
+// CHECK-X86:     %[[VAL_20:.*]] = alloca [32 x i8], i64 1, align 1
+// CHECK-PPC:     %[[VAL_20:.*]] = alloca [4{{[0-8]+}} x i8], i64 1, align 1
 // CHECK:         %[[VAL_2:.*]] = alloca { { ptr, i64 } }, i64 1, align 8
 // CHECK:         %[[VAL_1:.*]] = alloca [10 x i8], i64 1, align 1
 // CHECK:         %[[VAL_0:.*]] = alloca [40 x i8], i64 1, align 1

@ssijaric-nv
Copy link
Contributor Author

ssijaric-nv commented Dec 5, 2024

There is a separate PR under #118234 to pass the correct size to __trampoline_setup during AArch64 lowering.

Copy link
Contributor

@jeanPerier jeanPerier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@tblah
Copy link
Contributor

tblah commented Dec 5, 2024

Please could you provide links to the documentation you used, preferably both in code comments and the commit summary

@Leporacanthicus
Copy link
Contributor

Leporacanthicus commented Dec 5, 2024

Not a problem for this patch, but it would be REALLY nice if there was a function that could be called in LLVM to get the correct size for a trampoline, rather than each different user of LLVM having to implement their own variant of

if processorX size = A; 
else if processorY size = B; ...

Subject to providing some documentation for "why these values" (I understand what they represent and sort of why they need to be different, but it'd be good to have a source for what they are on different architectures).

@ssijaric-nv
Copy link
Contributor Author

Thanks, all. Added a comment. I'll add a function to LLVM to retrieve the trampoline size, and will then push a follow-on patch to replace the if...then...else checks.

@kkwli
Copy link
Collaborator

kkwli commented Dec 10, 2024

My colleagues found that the size is 32 for ppc64le, 12 for powerpc-*-aix* and 24 for powerpc64-*-aix*.
@RolandF77 @mandlebug

@ssijaric-nv
Copy link
Contributor Author

My colleagues found that the size is 32 for ppc64le, 12 for powerpc-*-aix* and 24 for powerpc64-*-aix*. @RolandF77 @mandlebug

Thank you for checking, Kelvin. I went by what's in PPCTargetLowering::LowerINIT_TRAMPOLINE, but these are only for Linux. Looks like it's safe to keep it at 32-bytes for all PPC targets, with the exception of ppc32 at 40 bytes?

@kkwli
Copy link
Collaborator

kkwli commented Dec 10, 2024

My colleagues found that the size is 32 for ppc64le, 12 for powerpc-*-aix* and 24 for powerpc64-*-aix*. @RolandF77 @mandlebug

Thank you for checking, Kelvin. I went by what's in PPCTargetLowering::LowerINIT_TRAMPOLINE, but these are only for Linux. Looks like it's safe to keep it at 32-bytes for all PPC targets, with the exception of ppc32 at 40 bytes?

I think it works for Linux. Thanks.

Copy link
Contributor

@tblah tblah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@ssijaric-nv ssijaric-nv force-pushed the flang_trampoline_size branch from 5e5b89e to 273c5ed Compare January 22, 2025 04:17
The trampoline size is 36 bytes for AArch64, 40 bytes for PPC32 and 48 bytes
for PPC64.  During AArch64 and PPC lowering, init.trampoline is lowered
to a call to __trampoline_setup, with the corresponding trampoline sizes
passed to it.
@ssijaric-nv ssijaric-nv force-pushed the flang_trampoline_size branch from 273c5ed to 2191d54 Compare January 22, 2025 19:09
@ssijaric-nv ssijaric-nv merged commit 16e9601 into llvm:main Jan 27, 2025
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flang:codegen flang:fir-hlfir flang Flang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants