Skip to content

[clang] Support per-function [[clang::code_align(N)]] attribute. #80765

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

AntonBikineev
Copy link
Contributor

The existing attribute works only for loop headers. This can unfortunately be quite limiting, especially as a protection against the "Jump Conditional Code Erratum" [0] (for example, if there is an unaligned check in a hot loop). The command line option -malign-branch-boundary can help with that, but it's too coarse-grained and can significantly increase the binary size.

This change introduces a per-function [[clang::code_align(N)]] attribute that aligns all the basic blocks of a function by the given N.

[0] https://www.intel.com/content/dam/support/us/en/documents/processors/mitigations-jump-conditional-code-erratum.pdf

@llvmbot llvmbot added clang Clang issues not falling into any other category backend:X86 clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:codegen IR generation bugs: mangling, exceptions, etc. labels Feb 5, 2024
@llvmbot
Copy link
Member

llvmbot commented Feb 5, 2024

@llvm/pr-subscribers-clang-codegen

@llvm/pr-subscribers-backend-x86

Author: Anton Bikineev (AntonBikineev)

Changes

The existing attribute works only for loop headers. This can unfortunately be quite limiting, especially as a protection against the "Jump Conditional Code Erratum" [0] (for example, if there is an unaligned check in a hot loop). The command line option -malign-branch-boundary can help with that, but it's too coarse-grained and can significantly increase the binary size.

This change introduces a per-function [[clang::code_align(N)]] attribute that aligns all the basic blocks of a function by the given N.

[0] https://www.intel.com/content/dam/support/us/en/documents/processors/mitigations-jump-conditional-code-erratum.pdf


Full diff: https://github.com/llvm/llvm-project/pull/80765.diff

8 Files Affected:

  • (modified) clang/include/clang/Basic/Attr.td (+4-3)
  • (modified) clang/include/clang/Basic/AttrDocs.td (+11-5)
  • (modified) clang/lib/CodeGen/CodeGenModule.cpp (+6)
  • (modified) clang/lib/Sema/SemaDeclAttr.cpp (+10)
  • (added) clang/test/CodeGen/code_align_function.c (+42)
  • (modified) clang/test/Sema/code_align.c (+12-2)
  • (modified) llvm/lib/CodeGen/MachineBlockPlacement.cpp (+21-6)
  • (added) llvm/test/CodeGen/X86/code-align-basic-blocks.ll (+29)
diff --git a/clang/include/clang/Basic/Attr.td b/clang/include/clang/Basic/Attr.td
index b2d5309e142c1..abce685e9f7a6 100644
--- a/clang/include/clang/Basic/Attr.td
+++ b/clang/include/clang/Basic/Attr.td
@@ -4435,10 +4435,11 @@ def PreferredType: InheritableAttr {
   let Documentation = [PreferredTypeDocumentation];
 }
 
-def CodeAlign: StmtAttr {
+def CodeAlign : InheritableAttr {
   let Spellings = [Clang<"code_align">];
-  let Subjects = SubjectList<[ForStmt, CXXForRangeStmt, WhileStmt, DoStmt],
-                              ErrorDiag, "'for', 'while', and 'do' statements">;
+  let Subjects =
+      SubjectList<[ForStmt, CXXForRangeStmt, WhileStmt, DoStmt, Function],
+                  ErrorDiag, "'for', 'while', 'do' statements and functions">;
   let Args = [ExprArgument<"Alignment">];
   let Documentation = [CodeAlignAttrDocs];
   let AdditionalMembers = [{
diff --git a/clang/include/clang/Basic/AttrDocs.td b/clang/include/clang/Basic/AttrDocs.td
index 041786f37fb8a..57b63469f1573 100644
--- a/clang/include/clang/Basic/AttrDocs.td
+++ b/clang/include/clang/Basic/AttrDocs.td
@@ -7704,11 +7704,12 @@ def CodeAlignAttrDocs : Documentation {
   let Category = DocCatVariable;
   let Heading = "clang::code_align";
   let Content = [{
-The ``clang::code_align(N)`` attribute applies to a loop and specifies the byte
-alignment for a loop. The attribute accepts a positive integer constant
-initialization expression indicating the number of bytes for the minimum
-alignment boundary. Its value must be a power of 2, between 1 and 4096
-(inclusive).
+The ``clang::code_align(N)`` attribute applies to a loop or a function. When
+applied to a loop it specifies the byte alignment for the loop header. When
+applied to a function it aligns all the basic blocks of the function. The
+attribute accepts a positive integer constant initialization expression
+indicating the number of bytes for the minimum alignment boundary. Its value
+must be a power of 2, between 1 and 4096 (inclusive).
 
 .. code-block:: c++
 
@@ -7738,6 +7739,11 @@ alignment boundary. Its value must be a power of 2, between 1 and 4096
     [[clang::code_align(A)]] for(;;) { }
   }
 
+  [[clang:code_align(16)]] int foo(bool b) {
+    if (b) return 2;
+    return 3;
+  }
+
   }];
 }
 
diff --git a/clang/lib/CodeGen/CodeGenModule.cpp b/clang/lib/CodeGen/CodeGenModule.cpp
index 36b63d78b06f8..87a822e91fff6 100644
--- a/clang/lib/CodeGen/CodeGenModule.cpp
+++ b/clang/lib/CodeGen/CodeGenModule.cpp
@@ -2515,6 +2515,12 @@ void CodeGenModule::SetLLVMFunctionAttributesForDefinition(const Decl *D,
       F->setAlignment(std::max(llvm::Align(2), F->getAlign().valueOrOne()));
   }
 
+  if (const auto *CodeAlign = D->getAttr<CodeAlignAttr>()) {
+    const auto *CE = cast<ConstantExpr>(CodeAlign->getAlignment());
+    std::string ArgValStr = llvm::toString(CE->getResultAsAPSInt(), 10);
+    F->addFnAttr("align-basic-blocks", ArgValStr);
+  }
+
   // In the cross-dso CFI mode with canonical jump tables, we want !type
   // attributes on definitions only.
   if (CodeGenOpts.SanitizeCfiCrossDso &&
diff --git a/clang/lib/Sema/SemaDeclAttr.cpp b/clang/lib/Sema/SemaDeclAttr.cpp
index d785714c4d811..40412801b632a 100644
--- a/clang/lib/Sema/SemaDeclAttr.cpp
+++ b/clang/lib/Sema/SemaDeclAttr.cpp
@@ -9036,6 +9036,12 @@ static void handleArmNewAttr(Sema &S, Decl *D, const ParsedAttr &AL) {
                  ArmNewAttr(S.Context, AL, NewState.data(), NewState.size()));
 }
 
+static void handleCodeAlignAttr(Sema &S, Decl *D, const ParsedAttr &A) {
+  Expr *E = A.getArgAsExpr(0);
+  if (Attr *CodeAlignAttr = S.BuildCodeAlignAttr(A, E))
+    D->addAttr(CodeAlignAttr);
+}
+
 /// ProcessDeclAttribute - Apply the specific attribute to the specified decl if
 /// the attribute applies to decls.  If the attribute is a type attribute, just
 /// silently ignore it if a GNU attribute.
@@ -9855,6 +9861,10 @@ ProcessDeclAttribute(Sema &S, Scope *scope, Decl *D, const ParsedAttr &AL,
   case ParsedAttr::AT_UsingIfExists:
     handleSimpleAttribute<UsingIfExistsAttr>(S, D, AL);
     break;
+
+  case ParsedAttr::AT_CodeAlign:
+    handleCodeAlignAttr(S, D, AL);
+    break;
   }
 }
 
diff --git a/clang/test/CodeGen/code_align_function.c b/clang/test/CodeGen/code_align_function.c
new file mode 100644
index 0000000000000..8289e1e307cd0
--- /dev/null
+++ b/clang/test/CodeGen/code_align_function.c
@@ -0,0 +1,42 @@
+// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -emit-llvm -x c %s %s -o - | FileCheck -check-prefix=CHECK-C %s
+// RUN: %clang_cc1 -fsyntax-only -emit-llvm -x c++ -std=c++11 %s -o - | FileCheck %s --check-prefixes CHECK-CPP
+
+// CHECK-C: define dso_local i32 @code_align_function(i32 noundef %a) #[[A0:[0-9]+]]
+// CHECK-C: define dso_local i32 @code_align_function_with_aligned_loop(i32 noundef %a) #[[A1:[0-9]+]]
+
+// CHECK-C: attributes #[[A0]] = {{.*}} "align-basic-blocks"="32"
+[[clang::code_align(32)]] int code_align_function(int a) {
+  if (a) {
+    return 2;
+  }
+  return 3;
+}
+
+// CHECK-C: attributes #[[A1]] = {{.*}} "align-basic-blocks"="64"
+[[clang::code_align(64)]] int code_align_function_with_aligned_loop(int a) {
+  if (a) {
+    return 2;
+  }
+  int c = 0;
+  // CHECK-C: !{!"llvm.loop.align", i32 128}
+  [[clang::code_align(128)]] for (int i = 0; i < a; ++i) {
+    c += i;
+  }
+  return c;
+}
+
+#if __cplusplus >= 201103L
+struct S {
+  // CHECK-CPP: @_ZN1S19code_align_functionILi1EEEii({{.*}}) #[[A2:[0-9]+]]
+  // CHECK-CPP: attributes #[[A2]] = {{.*}} "align-basic-blocks"="16"
+  template <int A>
+  [[clang::code_align(16)]] int code_align_function(int a) {
+    if (a) {
+      return 2;
+    }
+    return 3;
+  }
+};
+
+template int S::code_align_function<1>(int);
+#endif
diff --git a/clang/test/Sema/code_align.c b/clang/test/Sema/code_align.c
index d494d5ea1558f..c56e716436c86 100644
--- a/clang/test/Sema/code_align.c
+++ b/clang/test/Sema/code_align.c
@@ -9,14 +9,14 @@ void foo() {
   for (i = 0; i < 10; ++i) {  // this is OK
     a[i] = b[i] = 0;
   }
-  // expected-error@+1{{'code_align' attribute only applies to 'for', 'while', and 'do' statements}}
+  // expected-error@+1{{'code_align' attribute only applies to 'for', 'while', 'do' statements and functions}}
   [[clang::code_align(4)]]
   i = 7;
   for (i = 0; i < 10; ++i) {
     a[i] = b[i] = 0;
   }
 
-  // expected-error@+1{{'code_align' attribute cannot be applied to a declaration}}
+  // expected-error@+1{{'code_align' attribute only applies to 'for', 'while', 'do' statements and functions}}
   [[clang::code_align(12)]] int n[10];
 }
 
@@ -121,6 +121,11 @@ void check_code_align_expression() {
 #endif
 }
 
+[[clang::code_align(32)]] int function_attribute();
+
+// expected-error@+1{{'code_align' attribute requires an integer argument which is a constant power of two}}
+[[clang::code_align(31)]] int function_attribute_misaligned();
+
 #if __cplusplus >= 201103L
 template <int A, int B, int C, int D, int E>
 void code_align_dependent() {
@@ -161,6 +166,11 @@ void bar4() {
   for(int I=0; I<128; ++I) { bar(I); }
 }
 
+struct S {
+  template <int A>
+  [[clang::code_align(32)]] int foo();
+};
+
 int main() {
   code_align_dependent<8, 23, 32, -10, 64>(); // #neg-instantiation
   bar3<4>();  // #temp-instantiation
diff --git a/llvm/lib/CodeGen/MachineBlockPlacement.cpp b/llvm/lib/CodeGen/MachineBlockPlacement.cpp
index ef34e920aed50..ba4f161ef3250 100644
--- a/llvm/lib/CodeGen/MachineBlockPlacement.cpp
+++ b/llvm/lib/CodeGen/MachineBlockPlacement.cpp
@@ -2940,8 +2940,12 @@ void MachineBlockPlacement::alignBlocks() {
       }
     }
 
-    // Use max of the TLIAlign and MDAlign
-    const Align LoopAlign = std::max(TLIAlign, Align(MDAlign));
+    unsigned FunctionMBBAlign =
+        F->getFunction().getFnAttributeAsParsedInteger("align-basic-blocks", 1);
+
+    // Use max of the TLIAlign, MDAlign or the function-level alignment.
+    const Align LoopAlign =
+        std::max(std::max(TLIAlign, Align(MDAlign)), Align(FunctionMBBAlign));
     if (LoopAlign == 1)
       continue; // Don't care about loop alignment.
 
@@ -3475,28 +3479,39 @@ bool MachineBlockPlacement::runOnMachineFunction(MachineFunction &MF) {
   bool HasMaxBytesOverride =
       MaxBytesForAlignmentOverride.getNumOccurrences() > 0;
 
+  unsigned long long MBBAlignment =
+      MF.getFunction().getFnAttributeAsParsedInteger("align-basic-blocks", 1);
+
+  // Respect BB alignment that could already be set by llvm.loop.align in
+  // alignBlocks() above.
   if (AlignAllBlock)
     // Align all of the blocks in the function to a specific alignment.
     for (MachineBasicBlock &MBB : MF) {
+      unsigned MaxAlignment = std::max(1ULL << AlignAllBlock, MBBAlignment);
       if (HasMaxBytesOverride)
-        MBB.setAlignment(Align(1ULL << AlignAllBlock),
+        MBB.setAlignment(std::max(Align(MaxAlignment), MBB.getAlignment()),
                          MaxBytesForAlignmentOverride);
       else
-        MBB.setAlignment(Align(1ULL << AlignAllBlock));
+        MBB.setAlignment(std::max(Align(MaxAlignment), MBB.getAlignment()));
     }
   else if (AlignAllNonFallThruBlocks) {
     // Align all of the blocks that have no fall-through predecessors to a
     // specific alignment.
     for (auto MBI = std::next(MF.begin()), MBE = MF.end(); MBI != MBE; ++MBI) {
       auto LayoutPred = std::prev(MBI);
+      unsigned MaxAlignment = std::max(1ULL << AlignAllNonFallThruBlocks, MBBAlignment);
       if (!LayoutPred->isSuccessor(&*MBI)) {
         if (HasMaxBytesOverride)
-          MBI->setAlignment(Align(1ULL << AlignAllNonFallThruBlocks),
+          MBI->setAlignment(std::max(Align(MaxAlignment), MBI->getAlignment()),
                             MaxBytesForAlignmentOverride);
         else
-          MBI->setAlignment(Align(1ULL << AlignAllNonFallThruBlocks));
+          MBI->setAlignment(std::max(Align(MaxAlignment), MBI->getAlignment()));
       }
     }
+  } else if (MBBAlignment != 1) {
+    for (MachineBasicBlock &MBB : MF) {
+      MBB.setAlignment(std::max(Align(MBBAlignment), MBB.getAlignment()));
+    }
   }
   if (ViewBlockLayoutWithBFI != GVDT_None &&
       (ViewBlockFreqFuncName.empty() ||
diff --git a/llvm/test/CodeGen/X86/code-align-basic-blocks.ll b/llvm/test/CodeGen/X86/code-align-basic-blocks.ll
new file mode 100644
index 0000000000000..7cc1a046aef5c
--- /dev/null
+++ b/llvm/test/CodeGen/X86/code-align-basic-blocks.ll
@@ -0,0 +1,29 @@
+; RUN: llc < %s -mtriple=x86_64-pc-linux-gnu | FileCheck %s -check-prefixes=CHECK
+
+declare void @bar();
+
+; CHECK-LABEL: foo:
+; CHECK-LABEL: # %bb.0:
+; CHECK: .p2align        5, 0x90
+; CHECK-LABEL: # %bb.1:
+; CHECK: .p2align        10, 0x90
+; CHECK-LABEL: .LBB0_2:
+; CHECK: .p2align        5, 0x90
+; CHECK-LABEL: # %bb.3:
+; CHECK: .p2align        5, 0x90
+
+define i32 @foo(i1 %1) "align-basic-blocks"="32" {
+  br i1 %1, label %7, label %3
+3:
+  %4 = phi i32 [ %5, %3 ], [ 0, %2 ]
+  call void @bar()
+  %5 = add nuw nsw i32 %4, 1
+  %6 = icmp eq i32 %5, 90
+  br i1 %6, label %7, label %3, !llvm.loop !0
+7:
+  %8 = phi i32 [ 2, %2 ], [ 3, %3 ]
+  ret i32 %8
+}
+
+!0 = distinct !{!0, !1}
+!1 = !{!"llvm.loop.align", i32 1024}

Copy link

github-actions bot commented Feb 5, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

The existing attribute works only for loop headers. This can
unfortunately be quite limiting, especially as a protection against the
"Jump Conditional Code Erratum" [0] (for example, if there is an
unaligned check in a hot loop). The command line option
-malign-branch-boundary can help with that, but it's too coarse-grained
and can significantly increase the binary size.

This change introduces a per-function [[clang::code_align(N)]] attribute
that aligns all the basic blocks of a function by the given N.

[0] https://www.intel.com/content/dam/support/us/en/documents/processors/mitigations-jump-conditional-code-erratum.pdf
@MaskRay
Copy link
Member

MaskRay commented Feb 6, 2024

Do we want the Clang-specific attribute to apply to a function? How is it different from __attribute__((aligned(X)))?

GCC has a lot of -falign-* options: -falign-functions -falign-jumps -falign-labels -falign-loops, some of which are supported by Clang . I think they may happily accept a proposal to add an attribute, so some coordination can be useful before we restrict ourselves to a Clang-specific attribute.

@efriedma-quic
Copy link
Collaborator

I'm skeptical this actually makes sense in its current form. LLVM internally has a thing it calls a basic block, but what exactly counts is, to some extent, machine-specific, and can change from version to version. We don't make any guarantees about the general relationship between basic blocks and branch instructions.

Also, this is very different from per-function "-malign-branch-boundary"; if that's what you want, the implementation should be deeply integrated with the x86 backend, like the current implementation.

@AntonBikineev
Copy link
Contributor Author

How is it different from attribute((aligned(X)))?

__attribute__((aligned(X))) aligns the function entry, whereas the intent of the per-function [[clang::code_align(X)]] would be to align all the basic blocks of a function.

GCC has a lot of -falign-* options: -falign-functions -falign-jumps -falign-labels -falign-loops, some of which are supported by Clang . I think they may happily accept a proposal to add an attribute, so some coordination can be useful before we restrict ourselves to a Clang-specific attribute.

I think you can achieve the same on GCC with __attribute__((optimize("-falign-jumps=X"))), but we don't support the __attribute__((optimize("..."))).

I'm skeptical this actually makes sense in its current form. LLVM internally has a thing it calls a basic block, but what exactly counts is, to some extent, machine-specific, and can change from version to version.

I guess regardless of whether we call it BB alignment or label alignment, the concept is already supported on the frontend side for loops. My idea was that we could lift it to the function level, forcing all the function "labels" be aligned on a certain boundary.

@efriedma-quic
Copy link
Collaborator

I'm skeptical this actually makes sense in its current form. LLVM internally has a thing it calls a basic block, but what exactly counts is, to some extent, machine-specific, and can change from version to version.

I guess regardless of whether we call it BB alignment or label alignment, the concept is already supported on the frontend side for loops. My idea was that we could lift it to the function level, forcing all the function "labels" be aligned on a certain boundary.

Aligning the targets of branches is a different thing from what you've implemented. There are basic blocks which are not branch targets, and there are branch targets which are not at the beginning of a basic block. (Branch targets that aren't at the beginning of a basic block are rare on most targets, but it doesn't seem like something we should completely ignore.)

"Loop alignment" is specifically a request to override the normal compiler heuristics for aligning the loop header i.e. the target of the loop backedges. This is best-effort; a "loop" in C code might not lower to something that actually has a header we can align.

"-malign-branch-boundary" specifically inserts padding so branch instructions don't cross a 32-byte boundary; it isn't trying to align anything.

@AntonBikineev
Copy link
Contributor Author

AntonBikineev commented Feb 9, 2024

Aligning the targets of branches is a different thing from what you've implemented. There are basic blocks which are not branch targets, and there are branch targets which are not at the beginning of a basic block. (Branch targets that aren't at the beginning of a basic block are rare on most targets, but it doesn't seem like something we should completely ignore.)

This makes sense, thanks. Would there be an interest in something like [[clang::x86_align_branch_boundary]] that would align branch instructions in a function?

@MaskRay
Copy link
Member

MaskRay commented Feb 9, 2024

Aligning the targets of branches is a different thing from what you've implemented. There are basic blocks which are not branch targets, and there are branch targets which are not at the beginning of a basic block. (Branch targets that aren't at the beginning of a basic block are rare on most targets, but it doesn't seem like something we should completely ignore.)

This makes sense, thanks. Would there be an interest in something like [[clang::x86_align_branch_boundary]] that would align branch instructions in a function?

(I was involved in adding -mbranches-within-32B-boundaries and relevant assembler support.) I think the interest is low. A function attribute is for finer granularity so that you can mix some functions with unannotated functions. Users either specify -mbranches-within-32B-boundaries to mitigate performance penalty for the whole TU (or globally for LTO) when a microcode update is applied for certain Intel CPUs or not. I am not sure a case where people care for specific functions and want annotating them.

@AntonBikineev
Copy link
Contributor Author

I am not sure a case where people care for specific functions and want annotating them.

Our (V8 and Chromium) use case is that we know some hot functions that suffer from misaligned jumps. These functions would ideally be annotated with the attribute. We don't want to use the option for the entire binary as it'd significantly increase the binary size.

The alternative would be moving those functions into a separate TU(s), but it's just cumbersome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:X86 clang:codegen IR generation bugs: mangling, exceptions, etc. clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants