Skip to content

[clang][SME] Ignore flatten/clang::always_inline statements for callees with mismatched streaming attributes #116391

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Nov 26, 2024

Conversation

MacDue
Copy link
Member

@MacDue MacDue commented Nov 15, 2024

If __attribute__((flatten)) is used on a function, or [[clang::always_inline]] on a statement, don't inline any callees with incompatible streaming attributes. Without this check, clang may produce incorrect code when these attributes are used in code with streaming functions.

Note: The docs for flatten say it can be ignored when inlining is impossible: "causes calls within the attributed function to be inlined unless it is impossible to do so".

Similarly, the (clang-only) [[clang::always_inline]] statement attribute is more relaxed than the GNU __attribute__((always_inline)) (which says it should error it if it can't inline), saying only "If a statement is marked [[clang::always_inline]] and contains calls, the compiler attempts to inline those calls.". The docs also go on to show an example of where [[clang::always_inline]] has no effect.

If `__attribute__((flatten))` is used on a function don't inline any
callees with incompatible streaming attributes. Without this check,
clang may produce incorrect code when `flatten` is used in code with
streaming functions.

Note: The docs for flatten say it can be ignored when inlining is
impossible: "causes calls within the attributed function to be inlined
unless it is impossible to do so".
@MacDue MacDue changed the title [clang][SME] Ignore flatten for callees mismatched streaming attributes [clang][SME] Ignore flatten for callees with mismatched streaming attributes Nov 15, 2024
@MacDue MacDue marked this pull request as ready for review November 18, 2024 17:30
@MacDue MacDue requested a review from SamTebbs33 November 18, 2024 17:30
@llvmbot llvmbot added clang Clang issues not falling into any other category backend:AArch64 clang:codegen IR generation bugs: mangling, exceptions, etc. labels Nov 18, 2024
@MacDue MacDue requested a review from sdesmalen-arm November 18, 2024 17:30
@llvmbot
Copy link
Member

llvmbot commented Nov 18, 2024

@llvm/pr-subscribers-backend-aarch64

@llvm/pr-subscribers-clang-codegen

Author: Benjamin Maxwell (MacDue)

Changes

If __attribute__((flatten)) is used on a function don't inline any callees with incompatible streaming attributes. Without this check, clang may produce incorrect code when flatten is used in code with streaming functions.

Note: The docs for flatten say it can be ignored when inlining is impossible: "causes calls within the attributed function to be inlined unless it is impossible to do so".


Full diff: https://github.com/llvm/llvm-project/pull/116391.diff

4 Files Affected:

  • (modified) clang/lib/CodeGen/CGCall.cpp (+7-4)
  • (modified) clang/lib/CodeGen/TargetInfo.h (+9)
  • (modified) clang/lib/CodeGen/Targets/AArch64.cpp (+53-11)
  • (added) clang/test/CodeGen/AArch64/sme-flatten-streaming-attrs.c (+74)
diff --git a/clang/lib/CodeGen/CGCall.cpp b/clang/lib/CodeGen/CGCall.cpp
index 8f4f5d3ed81601..b8a968fdf4e9eb 100644
--- a/clang/lib/CodeGen/CGCall.cpp
+++ b/clang/lib/CodeGen/CGCall.cpp
@@ -5112,9 +5112,10 @@ RValue CodeGenFunction::EmitCall(const CGFunctionInfo &CallInfo,
 
   // Some architectures (such as x86-64) have the ABI changed based on
   // attribute-target/features. Give them a chance to diagnose.
-  CGM.getTargetCodeGenInfo().checkFunctionCallABI(
-      CGM, Loc, dyn_cast_or_null<FunctionDecl>(CurCodeDecl),
-      dyn_cast_or_null<FunctionDecl>(TargetDecl), CallArgs, RetTy);
+  const FunctionDecl *CallerDecl = dyn_cast_or_null<FunctionDecl>(CurCodeDecl);
+  const FunctionDecl *CalleeDecl = dyn_cast_or_null<FunctionDecl>(TargetDecl);
+  CGM.getTargetCodeGenInfo().checkFunctionCallABI(CGM, Loc, CallerDecl,
+                                                  CalleeDecl, CallArgs, RetTy);
 
   // 1. Set up the arguments.
 
@@ -5705,7 +5706,9 @@ RValue CodeGenFunction::EmitCall(const CGFunctionInfo &CallInfo,
   // FIXME: should this really take priority over __try, below?
   if (CurCodeDecl && CurCodeDecl->hasAttr<FlattenAttr>() &&
       !InNoInlineAttributedStmt &&
-      !(TargetDecl && TargetDecl->hasAttr<NoInlineAttr>())) {
+      !(TargetDecl && TargetDecl->hasAttr<NoInlineAttr>()) &&
+      !CGM.getTargetCodeGenInfo().wouldInliningViolateFunctionCallABI(
+          CallerDecl, CalleeDecl)) {
     Attrs =
         Attrs.addFnAttribute(getLLVMContext(), llvm::Attribute::AlwaysInline);
   }
diff --git a/clang/lib/CodeGen/TargetInfo.h b/clang/lib/CodeGen/TargetInfo.h
index 373f8b8a80fdb1..23ff476b0e33ce 100644
--- a/clang/lib/CodeGen/TargetInfo.h
+++ b/clang/lib/CodeGen/TargetInfo.h
@@ -98,6 +98,15 @@ class TargetCodeGenInfo {
                                     const CallArgList &Args,
                                     QualType ReturnType) const {}
 
+  /// Returns true if inlining the function call would produce incorrect code
+  /// for the current target and should be ignored (even with the always_inline
+  /// or flatten attributes).
+  virtual bool
+  wouldInliningViolateFunctionCallABI(const FunctionDecl *Caller,
+                                      const FunctionDecl *Callee) const {
+    return false;
+  }
+
   /// Determines the size of struct _Unwind_Exception on this platform,
   /// in 8-bit units.  The Itanium ABI defines this as:
   ///   struct _Unwind_Exception {
diff --git a/clang/lib/CodeGen/Targets/AArch64.cpp b/clang/lib/CodeGen/Targets/AArch64.cpp
index 9320c6ef06efab..a9ea84b6575f92 100644
--- a/clang/lib/CodeGen/Targets/AArch64.cpp
+++ b/clang/lib/CodeGen/Targets/AArch64.cpp
@@ -177,6 +177,9 @@ class AArch64TargetCodeGenInfo : public TargetCodeGenInfo {
                             const FunctionDecl *Callee, const CallArgList &Args,
                             QualType ReturnType) const override;
 
+  bool wouldInliningViolateFunctionCallABI(
+      const FunctionDecl *Caller, const FunctionDecl *Callee) const override;
+
 private:
   // Diagnose calls between functions with incompatible Streaming SVE
   // attributes.
@@ -1143,12 +1146,20 @@ void AArch64TargetCodeGenInfo::checkFunctionABI(
   }
 }
 
-void AArch64TargetCodeGenInfo::checkFunctionCallABIStreaming(
-    CodeGenModule &CGM, SourceLocation CallLoc, const FunctionDecl *Caller,
-    const FunctionDecl *Callee) const {
-  if (!Caller || !Callee || !Callee->hasAttr<AlwaysInlineAttr>())
-    return;
+enum class ArmStreamingInlinability : uint8_t {
+  Ok = 0,
+  IncompatibleStreamingModes = 1,
+  MismatchedStreamingCompatibility = 1 << 1,
+  CalleeHasNewZA = 1 << 2,
+  LLVM_MARK_AS_BITMASK_ENUM(/*LargestValue=*/CalleeHasNewZA),
+};
 
+/// Determines if there are any streaming ABI issues with inlining \p Callee
+/// into \p Caller. Returns the issues in the ArmStreamingInlinability bit enum
+/// (multiple bits can be set).
+static ArmStreamingInlinability
+GetArmStreamingInlinability(const FunctionDecl *Caller,
+                            const FunctionDecl *Callee) {
   bool CallerIsStreaming =
       IsArmStreamingFunction(Caller, /*IncludeLocallyStreaming=*/true);
   bool CalleeIsStreaming =
@@ -1156,17 +1167,41 @@ void AArch64TargetCodeGenInfo::checkFunctionCallABIStreaming(
   bool CallerIsStreamingCompatible = isStreamingCompatible(Caller);
   bool CalleeIsStreamingCompatible = isStreamingCompatible(Callee);
 
+  ArmStreamingInlinability Inlinability = ArmStreamingInlinability::Ok;
+
   if (!CalleeIsStreamingCompatible &&
-      (CallerIsStreaming != CalleeIsStreaming || CallerIsStreamingCompatible))
+      (CallerIsStreaming != CalleeIsStreaming || CallerIsStreamingCompatible)) {
+    Inlinability |= ArmStreamingInlinability::MismatchedStreamingCompatibility;
+    if (CalleeIsStreaming)
+      Inlinability |= ArmStreamingInlinability::IncompatibleStreamingModes;
+  }
+  if (auto *NewAttr = Callee->getAttr<ArmNewAttr>())
+    if (NewAttr->isNewZA())
+      Inlinability |= ArmStreamingInlinability::CalleeHasNewZA;
+
+  return Inlinability;
+}
+
+void AArch64TargetCodeGenInfo::checkFunctionCallABIStreaming(
+    CodeGenModule &CGM, SourceLocation CallLoc, const FunctionDecl *Caller,
+    const FunctionDecl *Callee) const {
+  if (!Caller || !Callee || !Callee->hasAttr<AlwaysInlineAttr>())
+    return;
+
+  ArmStreamingInlinability Inlinability =
+      GetArmStreamingInlinability(Caller, Callee);
+
+  if (bool(Inlinability &
+           ArmStreamingInlinability::MismatchedStreamingCompatibility))
     CGM.getDiags().Report(
-        CallLoc, CalleeIsStreaming
+        CallLoc, bool(Inlinability &
+                      ArmStreamingInlinability::IncompatibleStreamingModes)
                      ? diag::err_function_always_inline_attribute_mismatch
                      : diag::warn_function_always_inline_attribute_mismatch)
         << Caller->getDeclName() << Callee->getDeclName() << "streaming";
-  if (auto *NewAttr = Callee->getAttr<ArmNewAttr>())
-    if (NewAttr->isNewZA())
-      CGM.getDiags().Report(CallLoc, diag::err_function_always_inline_new_za)
-          << Callee->getDeclName();
+  if (bool(Inlinability & ArmStreamingInlinability::CalleeHasNewZA))
+    CGM.getDiags().Report(CallLoc, diag::err_function_always_inline_new_za)
+        << Callee->getDeclName();
 }
 
 // If the target does not have floating-point registers, but we are using a
@@ -1200,6 +1235,13 @@ void AArch64TargetCodeGenInfo::checkFunctionCallABI(CodeGenModule &CGM,
   checkFunctionCallABISoftFloat(CGM, CallLoc, Caller, Callee, Args, ReturnType);
 }
 
+bool AArch64TargetCodeGenInfo::wouldInliningViolateFunctionCallABI(
+    const FunctionDecl *Caller, const FunctionDecl *Callee) const {
+  return Caller && Callee &&
+         GetArmStreamingInlinability(Caller, Callee) !=
+             ArmStreamingInlinability::Ok;
+}
+
 void AArch64ABIInfo::appendAttributeMangling(TargetClonesAttr *Attr,
                                              unsigned Index,
                                              raw_ostream &Out) const {
diff --git a/clang/test/CodeGen/AArch64/sme-flatten-streaming-attrs.c b/clang/test/CodeGen/AArch64/sme-flatten-streaming-attrs.c
new file mode 100644
index 00000000000000..33ff8f33ca43f3
--- /dev/null
+++ b/clang/test/CodeGen/AArch64/sme-flatten-streaming-attrs.c
@@ -0,0 +1,74 @@
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -emit-llvm -target-feature +sme %s -o - | FileCheck %s
+
+// REQUIRES: aarch64-registered-target
+
+extern void was_inlined(void);
+
+#define __flatten  __attribute__((flatten))
+void fn(void) { was_inlined(); }
+void fn_streaming_compatible(void) __arm_streaming_compatible { was_inlined(); }
+void fn_streaming(void) __arm_streaming { was_inlined(); }
+__arm_locally_streaming void fn_locally_streaming(void) { was_inlined(); }
+__arm_new("za") void fn_streaming_new_za(void) __arm_streaming { was_inlined(); }
+
+__flatten
+void caller(void) {
+    fn();
+    fn_streaming_compatible();
+    fn_streaming();
+    fn_locally_streaming();
+    fn_streaming_new_za();
+}
+// CHECK-LABEL: void @caller()
+//  CHECK-NEXT: entry:
+//  CHECK-NEXT:   call void @was_inlined
+//  CHECK-NEXT:   call void @was_inlined
+//  CHECK-NEXT:   call void @fn_streaming
+//  CHECK-NEXT:   call void @fn_locally_streaming
+//  CHECK-NEXT:   call void @fn_streaming_new_za
+
+__flatten void caller_streaming_compatible(void) __arm_streaming_compatible {
+    fn();
+    fn_streaming_compatible();
+    fn_streaming();
+    fn_locally_streaming();
+    fn_streaming_new_za();
+}
+// CHECK-LABEL: void @caller_streaming_compatible()
+//  CHECK-NEXT: entry:
+//  CHECK-NEXT:   call void @fn
+//  CHECK-NEXT:   call void @was_inlined
+//  CHECK-NEXT:   call void @fn_streaming
+//  CHECK-NEXT:   call void @fn_locally_streaming
+//  CHECK-NEXT:   call void @fn_streaming_new_za
+
+__flatten void caller_streaming(void) __arm_streaming {
+    fn();
+    fn_streaming_compatible();
+    fn_streaming();
+    fn_locally_streaming();
+    fn_streaming_new_za();
+}
+// CHECK-LABEL: void @caller_streaming()
+//  CHECK-NEXT: entry:
+//  CHECK-NEXT:   call void @fn
+//  CHECK-NEXT:   call void @was_inlined
+//  CHECK-NEXT:   call void @was_inlined
+//  CHECK-NEXT:   call void @was_inlined
+//  CHECK-NEXT:   call void @fn_streaming_new_za
+
+__flatten __arm_locally_streaming
+void caller_locally_streaming(void) {
+    fn();
+    fn_streaming_compatible();
+    fn_streaming();
+    fn_locally_streaming();
+    fn_streaming_new_za();
+}
+// CHECK-LABEL: void @caller_locally_streaming()
+//  CHECK-NEXT: entry:
+//  CHECK-NEXT:   call void @fn
+//  CHECK-NEXT:   call void @was_inlined
+//  CHECK-NEXT:   call void @was_inlined
+//  CHECK-NEXT:   call void @was_inlined
+//  CHECK-NEXT:   call void @fn_streaming_new_za

@llvmbot
Copy link
Member

llvmbot commented Nov 18, 2024

@llvm/pr-subscribers-clang

Author: Benjamin Maxwell (MacDue)

Changes

If __attribute__((flatten)) is used on a function don't inline any callees with incompatible streaming attributes. Without this check, clang may produce incorrect code when flatten is used in code with streaming functions.

Note: The docs for flatten say it can be ignored when inlining is impossible: "causes calls within the attributed function to be inlined unless it is impossible to do so".


Full diff: https://github.com/llvm/llvm-project/pull/116391.diff

4 Files Affected:

  • (modified) clang/lib/CodeGen/CGCall.cpp (+7-4)
  • (modified) clang/lib/CodeGen/TargetInfo.h (+9)
  • (modified) clang/lib/CodeGen/Targets/AArch64.cpp (+53-11)
  • (added) clang/test/CodeGen/AArch64/sme-flatten-streaming-attrs.c (+74)
diff --git a/clang/lib/CodeGen/CGCall.cpp b/clang/lib/CodeGen/CGCall.cpp
index 8f4f5d3ed81601..b8a968fdf4e9eb 100644
--- a/clang/lib/CodeGen/CGCall.cpp
+++ b/clang/lib/CodeGen/CGCall.cpp
@@ -5112,9 +5112,10 @@ RValue CodeGenFunction::EmitCall(const CGFunctionInfo &CallInfo,
 
   // Some architectures (such as x86-64) have the ABI changed based on
   // attribute-target/features. Give them a chance to diagnose.
-  CGM.getTargetCodeGenInfo().checkFunctionCallABI(
-      CGM, Loc, dyn_cast_or_null<FunctionDecl>(CurCodeDecl),
-      dyn_cast_or_null<FunctionDecl>(TargetDecl), CallArgs, RetTy);
+  const FunctionDecl *CallerDecl = dyn_cast_or_null<FunctionDecl>(CurCodeDecl);
+  const FunctionDecl *CalleeDecl = dyn_cast_or_null<FunctionDecl>(TargetDecl);
+  CGM.getTargetCodeGenInfo().checkFunctionCallABI(CGM, Loc, CallerDecl,
+                                                  CalleeDecl, CallArgs, RetTy);
 
   // 1. Set up the arguments.
 
@@ -5705,7 +5706,9 @@ RValue CodeGenFunction::EmitCall(const CGFunctionInfo &CallInfo,
   // FIXME: should this really take priority over __try, below?
   if (CurCodeDecl && CurCodeDecl->hasAttr<FlattenAttr>() &&
       !InNoInlineAttributedStmt &&
-      !(TargetDecl && TargetDecl->hasAttr<NoInlineAttr>())) {
+      !(TargetDecl && TargetDecl->hasAttr<NoInlineAttr>()) &&
+      !CGM.getTargetCodeGenInfo().wouldInliningViolateFunctionCallABI(
+          CallerDecl, CalleeDecl)) {
     Attrs =
         Attrs.addFnAttribute(getLLVMContext(), llvm::Attribute::AlwaysInline);
   }
diff --git a/clang/lib/CodeGen/TargetInfo.h b/clang/lib/CodeGen/TargetInfo.h
index 373f8b8a80fdb1..23ff476b0e33ce 100644
--- a/clang/lib/CodeGen/TargetInfo.h
+++ b/clang/lib/CodeGen/TargetInfo.h
@@ -98,6 +98,15 @@ class TargetCodeGenInfo {
                                     const CallArgList &Args,
                                     QualType ReturnType) const {}
 
+  /// Returns true if inlining the function call would produce incorrect code
+  /// for the current target and should be ignored (even with the always_inline
+  /// or flatten attributes).
+  virtual bool
+  wouldInliningViolateFunctionCallABI(const FunctionDecl *Caller,
+                                      const FunctionDecl *Callee) const {
+    return false;
+  }
+
   /// Determines the size of struct _Unwind_Exception on this platform,
   /// in 8-bit units.  The Itanium ABI defines this as:
   ///   struct _Unwind_Exception {
diff --git a/clang/lib/CodeGen/Targets/AArch64.cpp b/clang/lib/CodeGen/Targets/AArch64.cpp
index 9320c6ef06efab..a9ea84b6575f92 100644
--- a/clang/lib/CodeGen/Targets/AArch64.cpp
+++ b/clang/lib/CodeGen/Targets/AArch64.cpp
@@ -177,6 +177,9 @@ class AArch64TargetCodeGenInfo : public TargetCodeGenInfo {
                             const FunctionDecl *Callee, const CallArgList &Args,
                             QualType ReturnType) const override;
 
+  bool wouldInliningViolateFunctionCallABI(
+      const FunctionDecl *Caller, const FunctionDecl *Callee) const override;
+
 private:
   // Diagnose calls between functions with incompatible Streaming SVE
   // attributes.
@@ -1143,12 +1146,20 @@ void AArch64TargetCodeGenInfo::checkFunctionABI(
   }
 }
 
-void AArch64TargetCodeGenInfo::checkFunctionCallABIStreaming(
-    CodeGenModule &CGM, SourceLocation CallLoc, const FunctionDecl *Caller,
-    const FunctionDecl *Callee) const {
-  if (!Caller || !Callee || !Callee->hasAttr<AlwaysInlineAttr>())
-    return;
+enum class ArmStreamingInlinability : uint8_t {
+  Ok = 0,
+  IncompatibleStreamingModes = 1,
+  MismatchedStreamingCompatibility = 1 << 1,
+  CalleeHasNewZA = 1 << 2,
+  LLVM_MARK_AS_BITMASK_ENUM(/*LargestValue=*/CalleeHasNewZA),
+};
 
+/// Determines if there are any streaming ABI issues with inlining \p Callee
+/// into \p Caller. Returns the issues in the ArmStreamingInlinability bit enum
+/// (multiple bits can be set).
+static ArmStreamingInlinability
+GetArmStreamingInlinability(const FunctionDecl *Caller,
+                            const FunctionDecl *Callee) {
   bool CallerIsStreaming =
       IsArmStreamingFunction(Caller, /*IncludeLocallyStreaming=*/true);
   bool CalleeIsStreaming =
@@ -1156,17 +1167,41 @@ void AArch64TargetCodeGenInfo::checkFunctionCallABIStreaming(
   bool CallerIsStreamingCompatible = isStreamingCompatible(Caller);
   bool CalleeIsStreamingCompatible = isStreamingCompatible(Callee);
 
+  ArmStreamingInlinability Inlinability = ArmStreamingInlinability::Ok;
+
   if (!CalleeIsStreamingCompatible &&
-      (CallerIsStreaming != CalleeIsStreaming || CallerIsStreamingCompatible))
+      (CallerIsStreaming != CalleeIsStreaming || CallerIsStreamingCompatible)) {
+    Inlinability |= ArmStreamingInlinability::MismatchedStreamingCompatibility;
+    if (CalleeIsStreaming)
+      Inlinability |= ArmStreamingInlinability::IncompatibleStreamingModes;
+  }
+  if (auto *NewAttr = Callee->getAttr<ArmNewAttr>())
+    if (NewAttr->isNewZA())
+      Inlinability |= ArmStreamingInlinability::CalleeHasNewZA;
+
+  return Inlinability;
+}
+
+void AArch64TargetCodeGenInfo::checkFunctionCallABIStreaming(
+    CodeGenModule &CGM, SourceLocation CallLoc, const FunctionDecl *Caller,
+    const FunctionDecl *Callee) const {
+  if (!Caller || !Callee || !Callee->hasAttr<AlwaysInlineAttr>())
+    return;
+
+  ArmStreamingInlinability Inlinability =
+      GetArmStreamingInlinability(Caller, Callee);
+
+  if (bool(Inlinability &
+           ArmStreamingInlinability::MismatchedStreamingCompatibility))
     CGM.getDiags().Report(
-        CallLoc, CalleeIsStreaming
+        CallLoc, bool(Inlinability &
+                      ArmStreamingInlinability::IncompatibleStreamingModes)
                      ? diag::err_function_always_inline_attribute_mismatch
                      : diag::warn_function_always_inline_attribute_mismatch)
         << Caller->getDeclName() << Callee->getDeclName() << "streaming";
-  if (auto *NewAttr = Callee->getAttr<ArmNewAttr>())
-    if (NewAttr->isNewZA())
-      CGM.getDiags().Report(CallLoc, diag::err_function_always_inline_new_za)
-          << Callee->getDeclName();
+  if (bool(Inlinability & ArmStreamingInlinability::CalleeHasNewZA))
+    CGM.getDiags().Report(CallLoc, diag::err_function_always_inline_new_za)
+        << Callee->getDeclName();
 }
 
 // If the target does not have floating-point registers, but we are using a
@@ -1200,6 +1235,13 @@ void AArch64TargetCodeGenInfo::checkFunctionCallABI(CodeGenModule &CGM,
   checkFunctionCallABISoftFloat(CGM, CallLoc, Caller, Callee, Args, ReturnType);
 }
 
+bool AArch64TargetCodeGenInfo::wouldInliningViolateFunctionCallABI(
+    const FunctionDecl *Caller, const FunctionDecl *Callee) const {
+  return Caller && Callee &&
+         GetArmStreamingInlinability(Caller, Callee) !=
+             ArmStreamingInlinability::Ok;
+}
+
 void AArch64ABIInfo::appendAttributeMangling(TargetClonesAttr *Attr,
                                              unsigned Index,
                                              raw_ostream &Out) const {
diff --git a/clang/test/CodeGen/AArch64/sme-flatten-streaming-attrs.c b/clang/test/CodeGen/AArch64/sme-flatten-streaming-attrs.c
new file mode 100644
index 00000000000000..33ff8f33ca43f3
--- /dev/null
+++ b/clang/test/CodeGen/AArch64/sme-flatten-streaming-attrs.c
@@ -0,0 +1,74 @@
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -emit-llvm -target-feature +sme %s -o - | FileCheck %s
+
+// REQUIRES: aarch64-registered-target
+
+extern void was_inlined(void);
+
+#define __flatten  __attribute__((flatten))
+void fn(void) { was_inlined(); }
+void fn_streaming_compatible(void) __arm_streaming_compatible { was_inlined(); }
+void fn_streaming(void) __arm_streaming { was_inlined(); }
+__arm_locally_streaming void fn_locally_streaming(void) { was_inlined(); }
+__arm_new("za") void fn_streaming_new_za(void) __arm_streaming { was_inlined(); }
+
+__flatten
+void caller(void) {
+    fn();
+    fn_streaming_compatible();
+    fn_streaming();
+    fn_locally_streaming();
+    fn_streaming_new_za();
+}
+// CHECK-LABEL: void @caller()
+//  CHECK-NEXT: entry:
+//  CHECK-NEXT:   call void @was_inlined
+//  CHECK-NEXT:   call void @was_inlined
+//  CHECK-NEXT:   call void @fn_streaming
+//  CHECK-NEXT:   call void @fn_locally_streaming
+//  CHECK-NEXT:   call void @fn_streaming_new_za
+
+__flatten void caller_streaming_compatible(void) __arm_streaming_compatible {
+    fn();
+    fn_streaming_compatible();
+    fn_streaming();
+    fn_locally_streaming();
+    fn_streaming_new_za();
+}
+// CHECK-LABEL: void @caller_streaming_compatible()
+//  CHECK-NEXT: entry:
+//  CHECK-NEXT:   call void @fn
+//  CHECK-NEXT:   call void @was_inlined
+//  CHECK-NEXT:   call void @fn_streaming
+//  CHECK-NEXT:   call void @fn_locally_streaming
+//  CHECK-NEXT:   call void @fn_streaming_new_za
+
+__flatten void caller_streaming(void) __arm_streaming {
+    fn();
+    fn_streaming_compatible();
+    fn_streaming();
+    fn_locally_streaming();
+    fn_streaming_new_za();
+}
+// CHECK-LABEL: void @caller_streaming()
+//  CHECK-NEXT: entry:
+//  CHECK-NEXT:   call void @fn
+//  CHECK-NEXT:   call void @was_inlined
+//  CHECK-NEXT:   call void @was_inlined
+//  CHECK-NEXT:   call void @was_inlined
+//  CHECK-NEXT:   call void @fn_streaming_new_za
+
+__flatten __arm_locally_streaming
+void caller_locally_streaming(void) {
+    fn();
+    fn_streaming_compatible();
+    fn_streaming();
+    fn_locally_streaming();
+    fn_streaming_new_za();
+}
+// CHECK-LABEL: void @caller_locally_streaming()
+//  CHECK-NEXT: entry:
+//  CHECK-NEXT:   call void @fn
+//  CHECK-NEXT:   call void @was_inlined
+//  CHECK-NEXT:   call void @was_inlined
+//  CHECK-NEXT:   call void @was_inlined
+//  CHECK-NEXT:   call void @fn_streaming_new_za

@efriedma-quic
Copy link
Collaborator

Do we really need to do this in the frontend? The inliner itself should be doing safety checks.

@sdesmalen-arm
Copy link
Collaborator

Do we really need to do this in the frontend? The inliner itself should be doing safety checks.

The inliner avoids any safety checks when alwaysinline is specified, and this can't easily be changed (I previously raised this on Discourse and did some experimentation refactoring LLVM inlining to distinguish between the two cases, but never had a chance to properly follow this up)

At the moment, LLVM has three ways to do inlining:

  • <no specific attribute> means LLVM will try to inline only if profitable.
  • inlinehint, means that LLVM should favour inlining in its cost model, but doens't inline when areInlineCompatible returns false)
  • alwaysinline, means that LLVM disregards the cost-model and always inlines a function (even when areInlineCompatible would otherwise return false).

The last one, alwaysinline has different semantics from the corresponding C++ attribute. The C++ attribute means "must inline" where if it cannot inline, the compiler must error. Unfortunately alwaysinline is chosen to implement the C++ attribute as well, which leads to issues that we had to work around in Clang (see #77936 and #100740 that emit a diagnostic/warning in Clang if inlining may not be possible, which has to be conservative). We actually want to be able to distinguish "always inline if possible (ignoring any cost-model)" and "always inline and fail if not possible", but I think ideally this would require a new attribute to LLVM.

For the immediate bug addressed in this PR, the route chosen is to avoid adding alwaysinline on calls when we know that the called functions are not (at compiletime known to be) compatible to be inlined.

Copy link
Collaborator

@efriedma-quic efriedma-quic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, right, we have isInlineViable(), but attributes aren't part of it for... reasons. (I haven't tried to dig into https://discourse.llvm.org/t/rfc-avoid-inlining-alwaysinline-functions-when-they-cannot-be-inlined ; I'll believe you that it's non-trivial to fix.) Hence, shoving this into the frontend.

I guess this is fine, then. But please add a note describing the current situation, so the next person looking at this has context. (Put it in TargetInfo.h, maybe?)

LGTM with the note.

@MacDue
Copy link
Member Author

MacDue commented Nov 21, 2024

I guess this is fine, then. But please add a note describing the current situation, so the next person looking at this has context. (Put it in TargetInfo.h, maybe?)

LGTM with the note.

Thanks 👍 Added a note to TargetInfo.h

Change-Id: I48ea9dc200cbb60cb1bacd2873cc015f200b0f32
@MacDue MacDue changed the title [clang][SME] Ignore flatten for callees with mismatched streaming attributes [clang][SME] Ignore flatten/clang::always_inline statements for callees with mismatched streaming attributes Nov 21, 2024
@MacDue
Copy link
Member Author

MacDue commented Nov 21, 2024

Note: I've extended this PR to cover the clang-only [[clang::always_inline]] statement attribute (which is currently broken for streaming functions too). It's documentation is more relaxed than the GNU __attribute__((always_inline)) attribute (which says failing to inline is an error) -- relevant snippets included above.

Copy link
Collaborator

@sdesmalen-arm sdesmalen-arm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for including the clang::always_inline statement attribute as well.

@MacDue MacDue merged commit db6f627 into llvm:main Nov 26, 2024
8 checks passed
@MacDue MacDue deleted the sme_flatten branch November 26, 2024 14:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:AArch64 clang:codegen IR generation bugs: mangling, exceptions, etc. clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants