Skip to content

[BPF] introduce __attribute__((bpf_fastcall)) #101228

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Aug 19, 2024
Merged

Conversation

eddyz87
Copy link
Contributor

@eddyz87 eddyz87 commented Jul 30, 2024

This commit introduces attribute bpf_fastcall to declare BPF functions that do not clobber some of the caller saved registers (R0-R5).

The idea is to generate the code complying with generic BPF ABI, but allow compatible Linux Kernel to remove unnecessary spills and fills of non-scratched registers (given some compiler assistance).

For such functions do register allocation as-if caller saved registers are not clobbered, but later wrap the calls with spill and fill patterns that are simple to recognize in kernel.

For example for the following C code:

 #define __bpf_fastcall __attribute__((bpf_fastcall))

 void bar(void) __bpf_fastcall;
 void buz(long i, long j, long k);

 void foo(long i, long j, long k) {
   bar();
   buz(i, j, k);
 }

First allocate registers as if:

foo:
  call bar    # note: no spills for i,j,k (r1,r2,r3)
  call buz
  exit

And later insert spills fills on the peephole phase:

foo:
  *(u64 *)(r10 - 8) = r1;  # Such call pattern is
  *(u64 *)(r10 - 16) = r2; # correct when used with
  *(u64 *)(r10 - 24) = r3; # old kernels.
  call bar
  r3 = *(u64 *)(r10 - 24); # But also allows new
  r2 = *(u64 *)(r10 - 16); # kernels to recognize the
  r1 = *(u64 *)(r10 - 8);  # pattern and remove spills/fills.
  call buz
  exit

The offsets for generated spills/fills are picked as minimal stack offsets for the function. Allocated stack slots are not used for any other purposes, in order to simplify in-kernel analysis.

Corresponding functionality had been merged in Linux Kernel as this patch set (the patch assumed that no_caller_saved_regsiters attribute would be used by LLVM, naming does not matter for the Kernel).

Copy link

github-actions bot commented Jul 30, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

This commit introduces attribute bpf_fastcall to declare BPF functions
that do not clobber some of the caller saved registers (R0-R5).

The idea is to generate the code complying with generic BPF ABI,
but allow compatible Linux Kernel to remove unnecessary spills and
fills of non-scratched registers (given some compiler assistance).

For such functions do register allocation as-if caller saved registers
are not clobbered, but later wrap the calls with spill and fill
patterns that are simple to recognize in kernel.

For example for the following C code:

   #define __bpf_fastcall __attribute__((bpf_fastcall))

   void bar(void) __bpf_fastcall;
   void buz(long i, long j, long k);

   void foo(long i, long j, long k) {
     bar();
     buz(i, j, k);
   }

First allocate registers as if:

   foo:
     call bar    # note: no spills for i,j,k (r1,r2,r3)
     call buz
     exit

And later insert spills fills on the peephole phase:

   foo:
     *(u64 *)(r10 - 8) = r1;  # Such call pattern is
     *(u64 *)(r10 - 16) = r2; # correct when used with
     *(u64 *)(r10 - 24) = r3; # old kernels.
     call bar
     r3 = *(u64 *)(r10 - 24); # But also allows new
     r2 = *(u64 *)(r10 - 16); # kernels to recognize the
     r1 = *(u64 *)(r10 - 8);  # pattern and remove spills/fills.
     call buz
     exit

The offsets for generated spills/fills are picked as minimal stack
offsets for the function. Allocated stack slots are not used for any
other purposes, in order to simplify in-kernel analysis.
@eddyz87 eddyz87 marked this pull request as ready for review July 31, 2024 05:49
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:codegen IR generation bugs: mangling, exceptions, etc. labels Jul 31, 2024
@llvmbot
Copy link
Member

llvmbot commented Jul 31, 2024

@llvm/pr-subscribers-clang

@llvm/pr-subscribers-clang-codegen

Author: None (eddyz87)

Changes

This commit introduces attribute bpf_fastcall to declare BPF functions that do not clobber some of the caller saved registers (R0-R5).

The idea is to generate the code complying with generic BPF ABI, but allow compatible Linux Kernel to remove unnecessary spills and fills of non-scratched registers (given some compiler assistance).

For such functions do register allocation as-if caller saved registers are not clobbered, but later wrap the calls with spill and fill patterns that are simple to recognize in kernel.

For example for the following C code:

 #define __bpf_fastcall __attribute__((bpf_fastcall))

 void bar(void) __bpf_fastcall;
 void buz(long i, long j, long k);

 void foo(long i, long j, long k) {
   bar();
   buz(i, j, k);
 }

First allocate registers as if:

foo:
  call bar    # note: no spills for i,j,k (r1,r2,r3)
  call buz
  exit

And later insert spills fills on the peephole phase:

foo:
  *(u64 *)(r10 - 8) = r1;  # Such call pattern is
  *(u64 *)(r10 - 16) = r2; # correct when used with
  *(u64 *)(r10 - 24) = r3; # old kernels.
  call bar
  r3 = *(u64 *)(r10 - 24); # But also allows new
  r2 = *(u64 *)(r10 - 16); # kernels to recognize the
  r1 = *(u64 *)(r10 - 8);  # pattern and remove spills/fills.
  call buz
  exit

The offsets for generated spills/fills are picked as minimal stack offsets for the function. Allocated stack slots are not used for any other purposes, in order to simplify in-kernel analysis.

Corresponding functionality had been merged in Linux Kernel as [this](https://lore.kernel.org/bpf/172179364482.1919.9590705031832457529.git-patchwork-notify@kernel.org/) patch set (the patch assumed that no_caller_saved_regsiters attribute would be used by LLVM, naming does not matter for the Kernel).


Patch is 23.73 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/101228.diff

16 Files Affected:

  • (modified) clang/include/clang/Basic/Attr.td (+8)
  • (modified) clang/include/clang/Basic/AttrDocs.td (+19)
  • (modified) clang/lib/CodeGen/CGCall.cpp (+2)
  • (added) clang/test/CodeGen/bpf-attr-bpf-fastcall-1.c (+24)
  • (modified) clang/test/Misc/pragma-attribute-supported-attributes-list.test (+1)
  • (added) clang/test/Sema/bpf-attr-bpf-fastcall.c (+7)
  • (modified) llvm/lib/Target/BPF/BPFCallingConv.td (+1)
  • (modified) llvm/lib/Target/BPF/BPFISelLowering.cpp (+31)
  • (modified) llvm/lib/Target/BPF/BPFInstrInfo.td (+1-3)
  • (modified) llvm/lib/Target/BPF/BPFMIPeephole.cpp (+88)
  • (modified) llvm/lib/Target/BPF/BPFRegisterInfo.cpp (+11)
  • (modified) llvm/lib/Target/BPF/BPFRegisterInfo.h (+3)
  • (added) llvm/test/CodeGen/BPF/bpf-fastcall-1.ll (+46)
  • (added) llvm/test/CodeGen/BPF/bpf-fastcall-2.ll (+68)
  • (added) llvm/test/CodeGen/BPF/bpf-fastcall-3.ll (+62)
  • (added) llvm/test/CodeGen/BPF/bpf-fastcall-regmask-1.ll (+110)
diff --git a/clang/include/clang/Basic/Attr.td b/clang/include/clang/Basic/Attr.td
index 46d0a66d59c37..437cf3999bc36 100644
--- a/clang/include/clang/Basic/Attr.td
+++ b/clang/include/clang/Basic/Attr.td
@@ -2189,6 +2189,14 @@ def BTFTypeTag : TypeAttr {
   let LangOpts = [COnly];
 }
 
+def BPFFastCall : InheritableAttr,
+                  TargetSpecificAttr<TargetBPF> {
+  let Spellings = [GCC<"bpf_fastcall">];
+  let Subjects = SubjectList<[FunctionLike]>;
+  let Documentation = [BPFFastCallDocs];
+  let SimpleHandler = 1;
+}
+
 def WebAssemblyExportName : InheritableAttr,
                             TargetSpecificAttr<TargetWebAssembly> {
   let Spellings = [Clang<"export_name">];
diff --git a/clang/include/clang/Basic/AttrDocs.td b/clang/include/clang/Basic/AttrDocs.td
index 4b8d520d73893..6f0f659f064d9 100644
--- a/clang/include/clang/Basic/AttrDocs.td
+++ b/clang/include/clang/Basic/AttrDocs.td
@@ -2317,6 +2317,25 @@ section.
   }];
 }
 
+def BPFFastCallDocs : Documentation {
+  let Category = DocCatType;
+  let Content = [{
+Functions annotated with this attribute are likely to be inlined by BPF JIT.
+It is assumed that inlined implementation uses less caller saved registers,
+than a regular function.
+Specifically, the following registers are likely to be preserved:
+- ``R0`` if function return value is ``void``;
+- ``R2-R5` if function takes 1 argument;
+- ``R3-R5` if function takes 2 arguments;
+- ``R4-R5` if function takes 3 arguments;
+- ``R5`` if function takes 4 arguments;
+
+For such functions Clang generates code pattern that allows BPF JIT
+to recognize and remove unnecessary spills and fills of the preserved
+registers.
+  }];
+}
+
 def MipsInterruptDocs : Documentation {
   let Category = DocCatFunction;
   let Heading = "interrupt (MIPS)";
diff --git a/clang/lib/CodeGen/CGCall.cpp b/clang/lib/CodeGen/CGCall.cpp
index 2f3dd5d01fa6c..dbddf49de964b 100644
--- a/clang/lib/CodeGen/CGCall.cpp
+++ b/clang/lib/CodeGen/CGCall.cpp
@@ -2447,6 +2447,8 @@ void CodeGenModule::ConstructAttributeList(StringRef Name,
       FuncAttrs.addAttribute(llvm::Attribute::NoCfCheck);
     if (TargetDecl->hasAttr<LeafAttr>())
       FuncAttrs.addAttribute(llvm::Attribute::NoCallback);
+    if (TargetDecl->hasAttr<BPFFastCallAttr>())
+      FuncAttrs.addAttribute("bpf_fastcall");
 
     HasOptnone = TargetDecl->hasAttr<OptimizeNoneAttr>();
     if (auto *AllocSize = TargetDecl->getAttr<AllocSizeAttr>()) {
diff --git a/clang/test/CodeGen/bpf-attr-bpf-fastcall-1.c b/clang/test/CodeGen/bpf-attr-bpf-fastcall-1.c
new file mode 100644
index 0000000000000..bf49c3c9be086
--- /dev/null
+++ b/clang/test/CodeGen/bpf-attr-bpf-fastcall-1.c
@@ -0,0 +1,24 @@
+// REQUIRES: bpf-registered-target
+// RUN: %clang_cc1 -triple bpf -emit-llvm -disable-llvm-passes %s -o - | FileCheck %s
+
+#define __bpf_fastcall __attribute__((bpf_fastcall))
+
+void test(void) __bpf_fastcall;
+void (*ptr)(void) __bpf_fastcall;
+
+void foo(void) {
+  test();
+  (*ptr)();
+}
+
+// CHECK: @ptr = global ptr null
+// CHECK: define {{.*}} @foo()
+// CHECK: entry:
+// CHECK:   call void @test() #[[test_attr:[0-9]+]]
+// CHECK:   %[[ptr:.*]] = load ptr, ptr @ptr, align 8
+// CHECK:   call void %[[ptr]]() #[[test_attr]]
+// CHECK:   ret void
+
+// CHECK: declare void @test() #[[ptr_attr:[0-9]+]]
+// CHECK: attributes #1 = { {{.*}}"bpf_fastcall"{{.*}} }
+// CHECK: attributes #[[test_attr]] = { {{.*}}"bpf_fastcall"{{.*}} }
diff --git a/clang/test/Misc/pragma-attribute-supported-attributes-list.test b/clang/test/Misc/pragma-attribute-supported-attributes-list.test
index e082db698ef0c..b8f2b02d758d9 100644
--- a/clang/test/Misc/pragma-attribute-supported-attributes-list.test
+++ b/clang/test/Misc/pragma-attribute-supported-attributes-list.test
@@ -22,6 +22,7 @@
 // CHECK-NEXT: AssumeAligned (SubjectMatchRule_objc_method, SubjectMatchRule_function)
 // CHECK-NEXT: Availability ((SubjectMatchRule_record, SubjectMatchRule_enum, SubjectMatchRule_enum_constant, SubjectMatchRule_field, SubjectMatchRule_function, SubjectMatchRule_namespace, SubjectMatchRule_objc_category, SubjectMatchRule_objc_implementation, SubjectMatchRule_objc_interface, SubjectMatchRule_objc_method, SubjectMatchRule_objc_property, SubjectMatchRule_objc_protocol, SubjectMatchRule_record, SubjectMatchRule_type_alias, SubjectMatchRule_variable))
 // CHECK-NEXT: AvailableOnlyInDefaultEvalMethod (SubjectMatchRule_type_alias)
+// CHECK-NEXT: BPFFastCall (SubjectMatchRule_hasType_functionType)
 // CHECK-NEXT: BPFPreserveAccessIndex (SubjectMatchRule_record)
 // CHECK-NEXT: BPFPreserveStaticOffset (SubjectMatchRule_record)
 // CHECK-NEXT: BTFDeclTag (SubjectMatchRule_variable, SubjectMatchRule_function, SubjectMatchRule_record, SubjectMatchRule_field, SubjectMatchRule_type_alias)
diff --git a/clang/test/Sema/bpf-attr-bpf-fastcall.c b/clang/test/Sema/bpf-attr-bpf-fastcall.c
new file mode 100644
index 0000000000000..81a5cf68a8c06
--- /dev/null
+++ b/clang/test/Sema/bpf-attr-bpf-fastcall.c
@@ -0,0 +1,7 @@
+// REQUIRES: bpf-registered-target
+// RUN: %clang_cc1 %s -triple bpf -verify
+
+__attribute__((bpf_fastcall)) int var; // expected-warning {{'bpf_fastcall' attribute only applies to functions and function pointers}}
+
+__attribute__((bpf_fastcall)) void func();
+__attribute__((bpf_fastcall(1))) void func_invalid(); // expected-error {{'bpf_fastcall' attribute takes no arguments}}
diff --git a/llvm/lib/Target/BPF/BPFCallingConv.td b/llvm/lib/Target/BPF/BPFCallingConv.td
index ef4ef1930aa8f..a557211437e95 100644
--- a/llvm/lib/Target/BPF/BPFCallingConv.td
+++ b/llvm/lib/Target/BPF/BPFCallingConv.td
@@ -46,3 +46,4 @@ def CC_BPF32 : CallingConv<[
 ]>;
 
 def CSR : CalleeSavedRegs<(add R6, R7, R8, R9, R10)>;
+def CSR_PreserveAll : CalleeSavedRegs<(add R0, R1, R2, R3, R4, R5, R6, R7, R8, R9, R10)>;
diff --git a/llvm/lib/Target/BPF/BPFISelLowering.cpp b/llvm/lib/Target/BPF/BPFISelLowering.cpp
index 071fe004806e3..ff23d3b055d0d 100644
--- a/llvm/lib/Target/BPF/BPFISelLowering.cpp
+++ b/llvm/lib/Target/BPF/BPFISelLowering.cpp
@@ -402,6 +402,21 @@ SDValue BPFTargetLowering::LowerFormalArguments(
 
 const size_t BPFTargetLowering::MaxArgs = 5;
 
+static void resetRegMaskBit(const TargetRegisterInfo *TRI, uint32_t *RegMask,
+                            MCRegister Reg) {
+  for (MCPhysReg SubReg : TRI->subregs_inclusive(Reg))
+    RegMask[SubReg / 32] &= ~(1u << (SubReg % 32));
+}
+
+static uint32_t *regMaskFromTemplate(const TargetRegisterInfo *TRI,
+                                     MachineFunction &MF,
+                                     const uint32_t *BaseRegMask) {
+  uint32_t *RegMask = MF.allocateRegMask();
+  unsigned RegMaskSize = MachineOperand::getRegMaskSize(TRI->getNumRegs());
+  memcpy(RegMask, BaseRegMask, sizeof(RegMask[0]) * RegMaskSize);
+  return RegMask;
+}
+
 SDValue BPFTargetLowering::LowerCall(TargetLowering::CallLoweringInfo &CLI,
                                      SmallVectorImpl<SDValue> &InVals) const {
   SelectionDAG &DAG = CLI.DAG;
@@ -513,6 +528,22 @@ SDValue BPFTargetLowering::LowerCall(TargetLowering::CallLoweringInfo &CLI,
   for (auto &Reg : RegsToPass)
     Ops.push_back(DAG.getRegister(Reg.first, Reg.second.getValueType()));
 
+  bool HasFastCall =
+      (CLI.CB && isa<CallInst>(CLI.CB) && CLI.CB->hasFnAttr("bpf_fastcall"));
+  const TargetRegisterInfo *TRI = MF.getSubtarget().getRegisterInfo();
+  if (HasFastCall) {
+    uint32_t *RegMask = regMaskFromTemplate(
+        TRI, MF, TRI->getCallPreservedMask(MF, CallingConv::PreserveAll));
+    for (auto const &RegPair : RegsToPass)
+      resetRegMaskBit(TRI, RegMask, RegPair.first);
+    if (!CLI.CB->getType()->isVoidTy())
+      resetRegMaskBit(TRI, RegMask, BPF::R0);
+    Ops.push_back(DAG.getRegisterMask(RegMask));
+  } else {
+    Ops.push_back(
+        DAG.getRegisterMask(TRI->getCallPreservedMask(MF, CLI.CallConv)));
+  }
+
   if (InGlue.getNode())
     Ops.push_back(InGlue);
 
diff --git a/llvm/lib/Target/BPF/BPFInstrInfo.td b/llvm/lib/Target/BPF/BPFInstrInfo.td
index 55989f5eb6a3c..cf9764ca62123 100644
--- a/llvm/lib/Target/BPF/BPFInstrInfo.td
+++ b/llvm/lib/Target/BPF/BPFInstrInfo.td
@@ -677,9 +677,7 @@ let isBranch = 1, isTerminator = 1, hasDelaySlot=0, isBarrier = 1 in {
 }
 
 // Jump and link
-let isCall=1, hasDelaySlot=0, Uses = [R11],
-    // Potentially clobbered registers
-    Defs = [R0, R1, R2, R3, R4, R5] in {
+let isCall=1, hasDelaySlot=0, Uses = [R11] in {
   def JAL  : CALL<"call">;
   def JALX  : CALLX<"callx">;
 }
diff --git a/llvm/lib/Target/BPF/BPFMIPeephole.cpp b/llvm/lib/Target/BPF/BPFMIPeephole.cpp
index f0edf706bd8fd..5ada86a8bf1c2 100644
--- a/llvm/lib/Target/BPF/BPFMIPeephole.cpp
+++ b/llvm/lib/Target/BPF/BPFMIPeephole.cpp
@@ -24,6 +24,8 @@
 #include "BPFInstrInfo.h"
 #include "BPFTargetMachine.h"
 #include "llvm/ADT/Statistic.h"
+#include "llvm/CodeGen/LivePhysRegs.h"
+#include "llvm/CodeGen/MachineFrameInfo.h"
 #include "llvm/CodeGen/MachineFunctionPass.h"
 #include "llvm/CodeGen/MachineInstrBuilder.h"
 #include "llvm/CodeGen/MachineRegisterInfo.h"
@@ -319,6 +321,7 @@ struct BPFMIPreEmitPeephole : public MachineFunctionPass {
   bool in16BitRange(int Num);
   bool eliminateRedundantMov();
   bool adjustBranch();
+  bool insertMissingCallerSavedSpills();
 
 public:
 
@@ -333,6 +336,7 @@ struct BPFMIPreEmitPeephole : public MachineFunctionPass {
     Changed = eliminateRedundantMov();
     if (SupportGotol)
       Changed = adjustBranch() || Changed;
+    Changed |= insertMissingCallerSavedSpills();
     return Changed;
   }
 };
@@ -596,6 +600,90 @@ bool BPFMIPreEmitPeephole::adjustBranch() {
   return Changed;
 }
 
+static const unsigned CallerSavedRegs[] = {BPF::R0, BPF::R1, BPF::R2,
+                                           BPF::R3, BPF::R4, BPF::R5};
+
+struct BPFFastCall {
+  MachineInstr *MI;
+  unsigned LiveCallerSavedRegs;
+};
+
+static void collectBPFFastCalls(const TargetRegisterInfo *TRI,
+                                LivePhysRegs &LiveRegs, MachineBasicBlock &BB,
+                                SmallVectorImpl<BPFFastCall> &Calls) {
+  LiveRegs.init(*TRI);
+  LiveRegs.addLiveOuts(BB);
+  Calls.clear();
+  for (MachineInstr &MI : llvm::reverse(BB)) {
+    unsigned LiveCallerSavedRegs;
+    if (!MI.isCall())
+      goto NextInsn;
+    LiveCallerSavedRegs = 0;
+    for (MCRegister R : CallerSavedRegs) {
+      bool DoSpillFill = !MI.definesRegister(R, TRI) && LiveRegs.contains(R);
+      if (!DoSpillFill)
+        continue;
+      LiveCallerSavedRegs |= 1 << R;
+    }
+    if (LiveCallerSavedRegs)
+      Calls.push_back({&MI, LiveCallerSavedRegs});
+  NextInsn:
+    LiveRegs.stepBackward(MI);
+  }
+}
+
+static int64_t computeMinFixedObjOffset(MachineFrameInfo &MFI,
+                                        unsigned SlotSize) {
+  int64_t MinFixedObjOffset = 0;
+  // Same logic as in X86FrameLowering::adjustFrameForMsvcCxxEh()
+  for (int I = MFI.getObjectIndexBegin(); I < MFI.getObjectIndexEnd(); ++I) {
+    if (MFI.isDeadObjectIndex(I))
+      continue;
+    MinFixedObjOffset = std::min(MinFixedObjOffset, MFI.getObjectOffset(I));
+  }
+  MinFixedObjOffset -=
+      (SlotSize + MinFixedObjOffset % SlotSize) & (SlotSize - 1);
+  return MinFixedObjOffset;
+}
+
+bool BPFMIPreEmitPeephole::insertMissingCallerSavedSpills() {
+  MachineFrameInfo &MFI = MF->getFrameInfo();
+  SmallVector<BPFFastCall, 8> Calls;
+  LivePhysRegs LiveRegs;
+  const unsigned SlotSize = 8;
+  int64_t MinFixedObjOffset = computeMinFixedObjOffset(MFI, SlotSize);
+  bool Changed = false;
+  for (MachineBasicBlock &BB : *MF) {
+    collectBPFFastCalls(TRI, LiveRegs, BB, Calls);
+    Changed |= !Calls.empty();
+    for (BPFFastCall &Call : Calls) {
+      int64_t CurOffset = MinFixedObjOffset;
+      for (MCRegister Reg : CallerSavedRegs) {
+        if (((1 << Reg) & Call.LiveCallerSavedRegs) == 0)
+          continue;
+        // Allocate stack object
+        CurOffset -= SlotSize;
+        MFI.CreateFixedSpillStackObject(SlotSize, CurOffset);
+        // Generate spill
+        BuildMI(BB, Call.MI->getIterator(), Call.MI->getDebugLoc(),
+                TII->get(BPF::STD))
+            .addReg(Reg)
+            .addReg(BPF::R10)
+            .addImm(CurOffset)
+            .addImm(0);
+        // Generate fill
+        BuildMI(BB, ++Call.MI->getIterator(), Call.MI->getDebugLoc(),
+                TII->get(BPF::LDD))
+            .addReg(Reg)
+            .addReg(BPF::R10)
+            .addImm(CurOffset)
+            .addImm(0);
+      }
+    }
+  }
+  return Changed;
+}
+
 } // end default namespace
 
 INITIALIZE_PASS(BPFMIPreEmitPeephole, "bpf-mi-pemit-peephole",
diff --git a/llvm/lib/Target/BPF/BPFRegisterInfo.cpp b/llvm/lib/Target/BPF/BPFRegisterInfo.cpp
index 84af6806abb36..69e1318954a97 100644
--- a/llvm/lib/Target/BPF/BPFRegisterInfo.cpp
+++ b/llvm/lib/Target/BPF/BPFRegisterInfo.cpp
@@ -40,6 +40,17 @@ BPFRegisterInfo::getCalleeSavedRegs(const MachineFunction *MF) const {
   return CSR_SaveList;
 }
 
+const uint32_t *
+BPFRegisterInfo::getCallPreservedMask(const MachineFunction &MF,
+                                      CallingConv::ID CC) const {
+  switch (CC) {
+  default:
+    return CSR_RegMask;
+  case CallingConv::PreserveAll:
+    return CSR_PreserveAll_RegMask;
+  }
+}
+
 BitVector BPFRegisterInfo::getReservedRegs(const MachineFunction &MF) const {
   BitVector Reserved(getNumRegs());
   markSuperRegs(Reserved, BPF::W10); // [W|R]10 is read only frame pointer
diff --git a/llvm/lib/Target/BPF/BPFRegisterInfo.h b/llvm/lib/Target/BPF/BPFRegisterInfo.h
index f7dea75ebea6f..db868769a1579 100644
--- a/llvm/lib/Target/BPF/BPFRegisterInfo.h
+++ b/llvm/lib/Target/BPF/BPFRegisterInfo.h
@@ -26,6 +26,9 @@ struct BPFRegisterInfo : public BPFGenRegisterInfo {
 
   const MCPhysReg *getCalleeSavedRegs(const MachineFunction *MF) const override;
 
+  const uint32_t *getCallPreservedMask(const MachineFunction &MF,
+                                       CallingConv::ID) const override;
+
   BitVector getReservedRegs(const MachineFunction &MF) const override;
 
   bool eliminateFrameIndex(MachineBasicBlock::iterator MI, int SPAdj,
diff --git a/llvm/test/CodeGen/BPF/bpf-fastcall-1.ll b/llvm/test/CodeGen/BPF/bpf-fastcall-1.ll
new file mode 100644
index 0000000000000..fd81314a495ef
--- /dev/null
+++ b/llvm/test/CodeGen/BPF/bpf-fastcall-1.ll
@@ -0,0 +1,46 @@
+; RUN: llc -O2 --march=bpfel %s -o - | FileCheck %s
+
+; Generated from the following C code:
+;
+;   #define __bpf_fastcall __attribute__((bpf_fastcall))
+;
+;   void bar(void) __bpf_fastcall;
+;   void buz(long i, long j, long k);
+;
+;   void foo(long i, long j, long k) {
+;     bar();
+;     buz(i, j, k);
+;   }
+;
+; Using the following command:
+;
+;   clang --target=bpf -emit-llvm -O2 -S -o - t.c
+;
+; (unnecessary attrs removed maually)
+
+; Check that function marked with bpf_fastcall does not clobber R1-R5.
+
+define dso_local void @foo(i64 noundef %i, i64 noundef %j, i64 noundef %k) {
+entry:
+  tail call void @bar() #1
+  tail call void @buz(i64 noundef %i, i64 noundef %j, i64 noundef %k)
+  ret void
+}
+
+; CHECK:      foo:
+; CHECK:      # %bb.0:
+; CHECK-NEXT:   *(u64 *)(r10 - 8) = r1
+; CHECK-NEXT:   *(u64 *)(r10 - 16) = r2
+; CHECK-NEXT:   *(u64 *)(r10 - 24) = r3
+; CHECK-NEXT:   call bar
+; CHECK-NEXT:   r3 = *(u64 *)(r10 - 24)
+; CHECK-NEXT:   r2 = *(u64 *)(r10 - 16)
+; CHECK-NEXT:   r1 = *(u64 *)(r10 - 8)
+; CHECK-NEXT:   call buz
+; CHECK-NEXT:   exit
+
+declare dso_local void @bar() #0
+declare dso_local void @buz(i64 noundef, i64 noundef, i64 noundef)
+
+attributes #0 = { "bpf_fastcall" }
+attributes #1 = { nounwind "bpf_fastcall" }
diff --git a/llvm/test/CodeGen/BPF/bpf-fastcall-2.ll b/llvm/test/CodeGen/BPF/bpf-fastcall-2.ll
new file mode 100644
index 0000000000000..e3e29cdddca8e
--- /dev/null
+++ b/llvm/test/CodeGen/BPF/bpf-fastcall-2.ll
@@ -0,0 +1,68 @@
+; RUN: llc -O2 --march=bpfel %s -o - | FileCheck %s
+
+; Generated from the following C code:
+;
+;   #define __bpf_fastcall __attribute__((bpf_fastcall))
+;
+;   void bar(void) __bpf_fastcall;
+;   void buz(long i, long j);
+;
+;   void foo(long i, long j, long k, long l) {
+;     bar();
+;     if (k > 42l)
+;       buz(i, 1);
+;     else
+;       buz(1, j);
+;   }
+;
+; Using the following command:
+;
+;   clang --target=bpf -emit-llvm -O2 -S -o - t.c
+;
+; (unnecessary attrs removed maually)
+
+; Check that function marked with bpf_fastcall does not clobber R1-R5.
+; Use R1 in one branch following call and R2 in another branch following call.
+
+define dso_local void @foo(i64 noundef %i, i64 noundef %j, i64 noundef %k, i64 noundef %l) {
+entry:
+  tail call void @bar() #0
+  %cmp = icmp sgt i64 %k, 42
+  br i1 %cmp, label %if.then, label %if.else
+
+if.then:
+  tail call void @buz(i64 noundef %i, i64 noundef 1)
+  br label %if.end
+
+if.else:
+  tail call void @buz(i64 noundef 1, i64 noundef %j)
+  br label %if.end
+
+if.end:
+  ret void
+}
+
+; CHECK:      foo:                                    # @foo
+; CHECK:      # %bb.0:                                # %entry
+; CHECK-NEXT:   *(u64 *)(r10 - 8) = r1
+; CHECK-NEXT:   *(u64 *)(r10 - 16) = r2
+; CHECK-NEXT:   *(u64 *)(r10 - 24) = r3
+; CHECK-NEXT:   call bar
+; CHECK-NEXT:   r3 = *(u64 *)(r10 - 24)
+; CHECK-NEXT:   r2 = *(u64 *)(r10 - 16)
+; CHECK-NEXT:   r1 = *(u64 *)(r10 - 8)
+; CHECK-NEXT:   r4 = 43
+; CHECK-NEXT:   if r4 s> r3 goto [[ELSE:.*]]
+; CHECK-NEXT: # %bb.1:                                # %if.then
+; CHECK-NEXT:   r2 = 1
+; CHECK-NEXT:   goto [[END:.*]]
+; CHECK-NEXT: [[ELSE]]:                               # %if.else
+; CHECK-NEXT:   r1 = 1
+; CHECK-NEXT: [[END]]:                                # %if.end
+; CHECK-NEXT:   call buz
+; CHECK-NEXT:   exit
+
+declare dso_local void @bar() #0
+declare dso_local void @buz(i64 noundef, i64 noundef)
+
+attributes #0 = { "bpf_fastcall" }
diff --git a/llvm/test/CodeGen/BPF/bpf-fastcall-3.ll b/llvm/test/CodeGen/BPF/bpf-fastcall-3.ll
new file mode 100644
index 0000000000000..81ca4e1ac57bc
--- /dev/null
+++ b/llvm/test/CodeGen/BPF/bpf-fastcall-3.ll
@@ -0,0 +1,62 @@
+; RUN: llc -O2 --march=bpfel %s -o - | FileCheck %s
+
+; Generated from the following C code:
+;
+; #define __bpf_fastcall __attribute__((bpf_fastcall))
+;
+; void quux(void *);
+; void bar(long) __bpf_fastcall;
+; void buz(long i, long j);
+;
+; void foo(long i, long j) {
+;   long k;
+;   bar(i);
+;   bar(i);
+;   buz(i, j);
+;   quux(&k);
+; }
+;
+; Using the following command:
+;
+;   clang --target=bpf -emit-llvm -O2 -S -o - t.c
+;
+; (unnecessary attrs removed maually)
+
+; Check that function marked with bpf_fastcall does not clobber R1-R5.
+; Check that spills/fills wrapping the call use and reuse lowest stack offsets.
+
+define dso_local void @foo(i64 noundef %i, i64 noundef %j) {
+entry:
+  %k = alloca i64, align 8
+  tail call void @bar(i64 noundef %i) #0
+  tail call void @bar(i64 noundef %i) #0
+  tail call void @buz(i64 noundef %i, i64 noundef %j)
+  call void @quux(ptr noundef nonnull %k)
+  ret void
+}
+
+; CHECK:      # %bb.0:
+; CHECK-NEXT:   r3 = r1
+; CHECK-NEXT:   *(u64 *)(r10 - 16) = r2
+; CHECK-NEXT:   *(u64 *)(r10 - 24) = r3
+; CHECK-NEXT:   call bar
+; CHECK-NEXT:   r3 = *(u64 *)(r10 - 24)
+; CHECK-NEXT:   r2 = *(u64 *)(r10 - 16)
+; CHECK-NEXT:   r1 = r3
+; CHECK-NEXT:   *(u64 *)(r10 - 16) = r2
+; CHECK-NEXT:   *(u64 *)(r10 - 24) = r3
+; CHECK-NEXT:   call bar
+; CHECK-NEXT:   r3 = *(u64 *)(r10 - 24)
+; CHECK-NEXT:   r2 = *(u64 *)(r10 - 16)
+; CHECK-NEXT:   r1 = r3
+; CHECK-NEXT:   call buz
+; CHECK-NEXT:   r1 = r10
+; CHECK-NEXT:   r1 += -8
+; CHECK-NEXT:   call quux
+; CHECK-NEXT:   exit
+
+declare dso_local void @bar(i64 noundef) #0
+declare dso_local void @buz(i64 noundef, i64 noundef)
+declare dso_local void @quux(ptr noundef)
+
+attributes #0 = { "bpf_fastcall" }
diff --git a/llvm/test/CodeGen/BPF/bpf-fastcall-regmask-1.ll b/llvm/test/CodeGen/BPF/bpf-fastcall-regmask-1.ll
new file mode 100644
index 0000000000000..857d2f000d1d5
--- /dev/null
+++ b/llvm/test/CodeGen/BPF/bpf-fastcall-regmask-1.ll
@@ -0,0 +1,110 @@
+; RUN: llc -O2 --march=bpfel \
+; RUN:   -print-after=stack-slot-coloring %s \
+; RUN:   -o /dev/null 2>&1 | FileCheck %s
+
+; Generated from the following C code:
+;
+;   #define __bpf_fastcall __attribute__((bpf_fastcall))
+;
+;   void bar1(void) __bpf_fastcall;
+;   void buz1(long i, long j, long k);
+;   void foo1(long i, long j, long k) {
+;     bar1();
+;     buz1(i, j, k);
+;   }
+;
+;   long bar2(void) __bpf_fastcall;
+;   void buz2(long i, long j, l...
[truncated]

@efriedma-quic
Copy link
Collaborator

Is there some reason this is an attribute, and not a calling convention, at the IR level?

@AaronBallman
Copy link
Collaborator

Is there some reason this is an attribute, and not a calling convention, at the IR level?

And why would it not be named __attribute__((fastcall)) when targeting BPF? (e.g., do we need a new calling convention at all?)

@eddyz87
Copy link
Contributor Author

eddyz87 commented Aug 1, 2024

Is there some reason this is an attribute, and not a calling convention, at the IR level?

Thought about it and decided against that, but I agree that this is an option, my reasoning below.
From the semantic point of view the difference between current attribute implementation and calling convention would be whether the code below reports be an error:

    void (*ptr1)(void) __bpf_fastcall;
    void (*ptr2)(void);
    void foo(void) {
      ptr2 = ptr1; // is this an error?
    }

From the Kernel point of view it really is not, as this "fast call" is an optimization hint, that could be safely ignored by BPF jit. Plus desire to keep clang front-end changes to the minimum. Also, the argument made for preserve_caller_saved_registers here sort-of makes sense for bpf_fastcall as well.

And why would it not be named attribute((fastcall)) when targeting BPF

I wanted to avoid potential confusion, as bpf_fastcall operates in a manner completely different from what fastcall for x86 does (first two parameters in ecx, edx, other parameters on stack).

Copy link
Contributor

@yonghong-song yonghong-song left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also tried a few examples and it looks good to me. Only a few minor comments.

@eddyz87
Copy link
Contributor Author

eddyz87 commented Aug 3, 2024

@AaronBallman , @efriedma-quic , could you please check my last comment here?

@AaronBallman
Copy link
Collaborator

Is there some reason this is an attribute, and not a calling convention, at the IR level?

Thought about it and decided against that, but I agree that this is an option, my reasoning below. From the semantic point of view the difference between current attribute implementation and calling convention would be whether the code below reports be an error:

    void (*ptr1)(void) __bpf_fastcall;
    void (*ptr2)(void);
    void foo(void) {
      ptr2 = ptr1; // is this an error?
    }

From the Kernel point of view it really is not, as this "fast call" is an optimization hint, that could be safely ignored by BPF jit. Plus desire to keep clang front-end changes to the minimum. Also, the argument made for preserve_caller_saved_registers here sort-of makes sense for bpf_fastcall as well.

Doesn't that kind of defeat the purpose of the calling convention? (A function designator decays into a function pointer basically any time you mention it.)

This commit introduces attribute bpf_fastcall to declare BPF functions that do not clobber some of the caller saved registers (R0-R5).

If the attribute can be dropped, then the caller side has to assume the caller-saved registers are still going to be clobbered even in the presence of the attribute, so it always has to save and restore those registers, doesn't it?

@eddyz87
Copy link
Contributor Author

eddyz87 commented Aug 5, 2024

@AaronBallman,

    void (*ptr1)(void) __bpf_fastcall;
    void (*ptr2)(void);
    void foo(void) {
      ptr2 = ptr1; // is this an error?
    }

...
Doesn't that kind of defeat the purpose of the calling convention? (A function designator decays into a function pointer basically any time you mention it.)

This commit introduces attribute bpf_fastcall to declare BPF functions that do not clobber some of the caller saved registers (R0-R5).

If the attribute can be dropped, then the caller side has to assume the caller-saved registers are still going to be clobbered even in the presence of the attribute, so it always has to save and restore those registers, doesn't it?

If attribute is dropped, then yes. E.g. if function is called through ptr2 from the example above, the register allocator would assume that all caller saved registers are clobbered. However, the generated code is still correct, nothing will break or behave unexpectedly. Worst case, some performance would be left on a table. That's why I'm inclined to say that this is not an error.

@eddyz87
Copy link
Contributor Author

eddyz87 commented Aug 5, 2024

Moreover, BPF only has one calling convention at the moment: all parameters are passed in registers (R1-R5), return value in R0. But assuming a new calling convention would be added one day, e.g. some parameters passed on stack, the feature in question would still be applicable => this behaves more like an attribute, not a calling convention (same as existing no_preserve_caller_saved_registers).

@AaronBallman
Copy link
Collaborator

If attribute is dropped, then yes. E.g. if function is called through ptr2 from the example above, the register allocator would assume that all caller saved registers are clobbered. However, the generated code is still correct, nothing will break or behave unexpectedly. Worst case, some performance would be left on a table. That's why I'm inclined to say that this is not an error.

But a mismatch can still potentially result in a miscompilation, right? e.g., you have a function in a header file with the bfp_fastcall attribute on it. The definition of the function is compiled into a library with Clang 16 and ignores the unknown attribute, so the callee will clobber registers. But the declaration is used by an application compiled with Clang 20 and assumes the attribute means the callee won't clobber registers, so doesn't generate the save/restore code. When the call resolves, the registers are clobbered unexpectedly, right? But if the attribute was part of the function type, this program presumably would not link due to type mismatch.

@eddyz87
Copy link
Contributor Author

eddyz87 commented Aug 7, 2024

@AaronBallman,

But a mismatch can still potentially result in a miscompilation, right? e.g., you have a function in a header file with the bfp_fastcall attribute on it. The definition of the function is compiled into a library with Clang 16 and ignores the unknown attribute, so the callee will clobber registers. But the declaration is used by an application compiled with Clang 20 and assumes the attribute means the callee won't clobber registers, so doesn't generate the save/restore code. When the call resolves, the registers are clobbered unexpectedly, right? But if the attribute was part of the function type, this program presumably would not link due to type mismatch.

Not quite, the feature is a bit unusual and pursues several goals:

  • allow bpf programs run on new kernels (those that support bpf_fastcall) to gain some performance by clobbering less registers;
  • allow same code to run unmodified on old kernels (those that do not support bpf_fastcall);
  • allow bpf_fastcall only for "builtin" functions, implemented by kernel, not written in BPF;
  • allow new kernels to provide different sets of bpf_fastcall builtin functions depending on configuration and architecture;
  • do not complicate kernel side implementation much (read as: do not do register allocation in kernel jit).

Suppose there is a function void foo(void) marked with bpf_fastcall and the following IR:

%v = ... define %v somehow ...
call void @foo
... use %v somehow ...

In such a case:

  • only @foo prototype is visible to BPF program, not it's body;
  • for the BPF backend the attribute serves as a hint for register allocation, the BPF code like below might be generated:
    r1 = ... definition of %v ...
    fp[-32] = r1
    call foo
    r1 = fp[-32]
    ... use r1 as %v ...

Kernels supporting bpf_fastcall for function foo would remove r1 spill and fill, kernels not supporting bpf_fastcall would just work as before.

@yonghong-song
Copy link
Contributor

@AaronBallman Just want to clarify about linking. For bpf ecosystem, we do not do linking with llvm lld. We do linking with bpftool (see https://www.mankier.com/8/bpftool-gen) as there are special requirements to glue bpf programs together.

In bpf_fastcall use case, the function to be inlined is not in BPF programs but in kernel. For some specific kernel functions called from bpf program. In old kernel, where bpf_fastcall not available to them, nothing changes, bpf program will call them as usual. In new kernel where kernel implements to process bpf_fastcall generated code,
kernel can implement inlining since it knows what exactly that function intends to do. Of course we only target
simple kernel functions at this moment.

So your concern about a func in bpf lib compiled with bpf_fastcall attribute ignored won't be an issue here.

In the future, if we indeed intend to inline bpf functions in library, I guess those functions can be defined in header files.

@eddyz87
Copy link
Contributor Author

eddyz87 commented Aug 13, 2024

@AaronBallman, @efriedma-quic, could you please comment?
We are eager to use this feature on the BPF side (e.g. here) and landing it to the main would simplify the cooperation.

@AaronBallman
Copy link
Collaborator

Thank you (all) for the explanations, that was helpful -- it addressed my concerns with the design.

As discussed in the pull request thread, it is not an error to assign
pointer with bpf_fastcall attribute to a pointer w/o such attribute
and vice-versa.
(Hence, no need to track bpf_fastcall as a part of the type).
Copy link
Collaborator

@AaronBallman AaronBallman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clang bits LGTM; accepting but please wait to land until someone accepts the LLVM changes as well.

@eddyz87
Copy link
Contributor Author

eddyz87 commented Aug 16, 2024

Clang bits LGTM; accepting but please wait to land until someone accepts the LLVM changes as well.

@AaronBallman , thank you for the review!
On the LLVM side the changes are only for BPF backend and these were already approved by @yonghong-song, or do you have someone else in mind?

eddyz87 added a commit to eddyz87/bpf that referenced this pull request Aug 17, 2024
Attribute used by LLVM implementation of the feature had been changed
from no_caller_saved_registers to bpf_fastcall (see [1]).
This commit replaces references to nocsr by references to bpf_fastcall
to keep LLVM and Kernel parts in sync.

[1] llvm/llvm-project#101228

Signed-off-by: Eduard Zingerman <[email protected]>
eddyz87 added a commit to eddyz87/bpf that referenced this pull request Aug 17, 2024
Attribute used by LLVM implementation of the feature had been changed
from no_caller_saved_registers to bpf_fastcall (see [1]).
This commit replaces references to nocsr by references to bpf_fastcall
to keep LLVM and selftests parts in sync.

[1] llvm/llvm-project#101228

Signed-off-by: Eduard Zingerman <[email protected]>
eddyz87 added a commit to eddyz87/bpf that referenced this pull request Aug 17, 2024
Attribute used by LLVM implementation of the feature had been changed
from no_caller_saved_registers to bpf_fastcall (see [1]).
This commit replaces references to nocsr by references to bpf_fastcall
to keep LLVM and selftests parts in sync.

[1] llvm/llvm-project#101228

Signed-off-by: Eduard Zingerman <[email protected]>
eddyz87 added a commit to eddyz87/bpf that referenced this pull request Aug 17, 2024
Attribute used by LLVM implementation of the feature had been changed
from no_caller_saved_registers to bpf_fastcall (see [1]).
This commit replaces references to nocsr by references to bpf_fastcall
to keep LLVM and Kernel parts in sync.

[1] llvm/llvm-project#101228

Signed-off-by: Eduard Zingerman <[email protected]>
eddyz87 added a commit to eddyz87/bpf that referenced this pull request Aug 17, 2024
Attribute used by LLVM implementation of the feature had been changed
from no_caller_saved_registers to bpf_fastcall (see [1]).
This commit replaces references to nocsr by references to bpf_fastcall
to keep LLVM and selftests parts in sync.

[1] llvm/llvm-project#101228

Signed-off-by: Eduard Zingerman <[email protected]>
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request Aug 17, 2024
Attribute used by LLVM implementation of the feature had been changed
from no_caller_saved_registers to bpf_fastcall (see [1]).
This commit replaces references to nocsr by references to bpf_fastcall
to keep LLVM and Kernel parts in sync.

[1] llvm/llvm-project#101228

Signed-off-by: Eduard Zingerman <[email protected]>
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request Aug 17, 2024
Attribute used by LLVM implementation of the feature had been changed
from no_caller_saved_registers to bpf_fastcall (see [1]).
This commit replaces references to nocsr by references to bpf_fastcall
to keep LLVM and selftests parts in sync.

[1] llvm/llvm-project#101228

Signed-off-by: Eduard Zingerman <[email protected]>
kernel-patches-daemon-bpf-rc bot pushed a commit to kernel-patches/bpf-rc that referenced this pull request Aug 17, 2024
Attribute used by LLVM implementation of the feature had been changed
from no_caller_saved_registers to bpf_fastcall (see [1]).
This commit replaces references to nocsr by references to bpf_fastcall
to keep LLVM and Kernel parts in sync.

[1] llvm/llvm-project#101228

Signed-off-by: Eduard Zingerman <[email protected]>
kernel-patches-daemon-bpf-rc bot pushed a commit to kernel-patches/bpf-rc that referenced this pull request Aug 17, 2024
Attribute used by LLVM implementation of the feature had been changed
from no_caller_saved_registers to bpf_fastcall (see [1]).
This commit replaces references to nocsr by references to bpf_fastcall
to keep LLVM and selftests parts in sync.

[1] llvm/llvm-project#101228

Signed-off-by: Eduard Zingerman <[email protected]>
kernel-patches-daemon-bpf bot pushed a commit to kernel-patches/bpf that referenced this pull request Aug 17, 2024
Attribute used by LLVM implementation of the feature had been changed
from no_caller_saved_registers to bpf_fastcall (see [1]).
This commit replaces references to nocsr by references to bpf_fastcall
to keep LLVM and Kernel parts in sync.

[1] llvm/llvm-project#101228

Signed-off-by: Eduard Zingerman <[email protected]>
kernel-patches-daemon-bpf bot pushed a commit to kernel-patches/bpf that referenced this pull request Aug 17, 2024
Attribute used by LLVM implementation of the feature had been changed
from no_caller_saved_registers to bpf_fastcall (see [1]).
This commit replaces references to nocsr by references to bpf_fastcall
to keep LLVM and selftests parts in sync.

[1] llvm/llvm-project#101228

Signed-off-by: Eduard Zingerman <[email protected]>
@yonghong-song
Copy link
Contributor

@AaronBallman the llvm side of change looks good to me!

@AaronBallman
Copy link
Collaborator

@AaronBallman the llvm side of change looks good to me!

Then I think we're good to land! @eddyz87 do you need someone to land on your behalf?

@eddyz87
Copy link
Contributor Author

eddyz87 commented Aug 19, 2024

Then I think we're good to land! @eddyz87 do you need someone to land on your behalf?

@AaronBallman, I have commit rights, will merge this change shortly, thank you!

@eddyz87 eddyz87 merged commit e9b2e16 into llvm:main Aug 19, 2024
9 checks passed
eddyz87 added a commit that referenced this pull request Aug 19, 2024
@eddyz87
Copy link
Contributor Author

eddyz87 commented Aug 19, 2024

There was a failure for expansive checks:
https://lab.llvm.org/buildbot/#/builders/187/builds/509

I reverted the commit, will investigate.

@eddyz87
Copy link
Contributor Author

eddyz87 commented Aug 19, 2024

@AaronBallman, could you please help me with the procedural question:
I reverted the commit and current pull request is now closed, what should I do when I'll figure out the fix?
It looks like there is no option to re-open pull request, hence it looks like I will have to open a new pull request.

@llvm-ci
Copy link
Collaborator

llvm-ci commented Aug 19, 2024

LLVM Buildbot has detected a new failure on builder llvm-clang-x86_64-expensive-checks-debian running on gribozavr4 while building clang,llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/16/builds/3736

Here is the relevant piece of the build log for the reference:

Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM :: CodeGen/BPF/bpf-fastcall-3.ll' FAILED ********************
Exit Code: 2

Command Output (stderr):
--
RUN: at line 1: /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/llc -O2 --march=bpfel /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/CodeGen/BPF/bpf-fastcall-3.ll -o - | /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/FileCheck /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/CodeGen/BPF/bpf-fastcall-3.ll
+ /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/llc -O2 --march=bpfel /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/CodeGen/BPF/bpf-fastcall-3.ll -o -
+ /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/FileCheck /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/CodeGen/BPF/bpf-fastcall-3.ll

# After BPF PreEmit Peephole Optimization
# Machine code for function foo: NoPHIs, TracksLiveness, NoVRegs, TiedOpsRewritten, TracksDebugUserValues
Frame Objects:
  fi#-4: size=8, align=8, fixed, at location [SP-24]
  fi#-3: size=8, align=8, fixed, at location [SP-16]
  fi#-2: size=8, align=8, fixed, at location [SP-24]
  fi#-1: size=8, align=8, fixed, at location [SP-16]
  fi#0: size=8, align=8, at location [SP-8]
Function Live Ins: $r1, $r2

bb.0.entry:
  liveins: $r1, $r2
  $r3 = MOV_rr $r1
  STD $r2, $r10, -16, 0
  STD $r3, $r10, -24, 0
  JAL @bar, <regmask $r0 $r2 $r3 $r4 $r5 $r6 $r7 $r8 $r9 $r10 $w0 $w2 $w3 $w4 $w5 $w6 $w7 $w8 $w9 $w10>, implicit $r11, implicit $r1, implicit-def $r11
  LDD $r3, $r10, -24, 0
  LDD $r2, $r10, -16, 0
  $r1 = MOV_rr $r3
  STD $r2, $r10, -16, 0
  STD $r3, $r10, -24, 0
  JAL @bar, <regmask $r0 $r2 $r3 $r4 $r5 $r6 $r7 $r8 $r9 $r10 $w0 $w2 $w3 $w4 $w5 $w6 $w7 $w8 $w9 $w10>, implicit $r11, implicit $r1, implicit-def $r11
  LDD $r3, $r10, -24, 0
  LDD $r2, $r10, -16, 0
  $r1 = MOV_rr killed $r3
  JAL @buz, <regmask $r6 $r7 $r8 $r9 $r10 $w6 $w7 $w8 $w9 $w10>, implicit $r11, implicit $r1, implicit $r2, implicit-def $r11
  $r1 = MOV_rr $r10
  $r1 = ADD_ri $r1(tied-def 0), -8
  JAL @quux, <regmask $r6 $r7 $r8 $r9 $r10 $w6 $w7 $w8 $w9 $w10>, implicit $r11, implicit $r1, implicit-def $r11
  RET

# End machine code for function foo.

*** Bad machine code: Extra explicit operand on non-variadic instruction ***
- function:    foo
- basic block: %bb.0 entry (0x9a5ced8)
- instruction: STD $r2, $r10, -16, 0
- operand 3:   0

*** Bad machine code: Extra explicit operand on non-variadic instruction ***
- function:    foo
...

@vvereschaka
Copy link
Contributor

Hi @eddyz87 ,

there are few failed fastcall tests on `llvm-clang-x86_64-expensive-checks-ubuntu also

https://lab.llvm.org/buildbot/#/builders/187/builds/509

  • LLVM::bpf-fastcall-regmask-1.ll
  • LLVM::bpf-fastcall-1.ll
  • LLVM::bpf-fastcall-3.ll
  • LLVM::bpf-fastcall-2.ll

@eddyz87
Copy link
Contributor Author

eddyz87 commented Aug 19, 2024

Hi @vvereschaka ,

Yes, I receive email notifications, the commit was reverted an hour ago.
While at it, could you please help me understand current process, should I open a new pull request once I figure out the fix?

@vvereschaka
Copy link
Contributor

@eddyz87 ,

just create a new PR with the changes and make the cross references between this PR and the new one to let people easy find the original PR when necessary.

@AaronBallman
Copy link
Collaborator

@eddyz87 ,

just create a new PR with the changes and make the cross references between this PR and the new one to let people easy find the original PR when necessary.

Yup, this is the typical process (GitHub doesn't let you reopen a merged PR).

@eddyz87
Copy link
Contributor Author

eddyz87 commented Aug 20, 2024

@eddyz87 ,
just create a new PR with the changes and make the cross references between this PR and the new one to let people easy find the original PR when necessary.

Yup, this is the typical process (GitHub doesn't let you reopen a merged PR).

Understood, thank you.
Will post the new pr.

kutemeikito added a commit to kutemeikito/llvm-project that referenced this pull request Aug 23, 2024
* 'main' of https://github.com/llvm/llvm-project: (1385 commits)
  [llvm][NVPTX] Fix quadratic runtime in ProxyRegErasure (#105730)
  [ScalarizeMaskedMemIntr] Don't use a scalar mask on GPUs (#104842)
  [clang][NFC] order C++ standards in reverse in release notes (#104866)
  Revert "[clang] Merge lifetimebound and GSL code paths for lifetime analysis (#104906)" (#105752)
  [SandboxIR] Implement CatchReturnInst (#105605)
  [RISCV][TTI] Use legalized element types when costing casts (#105723)
  [LTO] Use a helper function to add a definition (NFC) (#105721)
  [Vectorize] Fix a warning
  Revert "[clang][rtsan] Introduce realtime sanitizer codegen and drive… (#105744)
  [NFC][ADT] Add reverse iterators and `value_type` to StringRef (#105579)
  [mlir][SCF]-Fix loop coalescing with iteration arguements (#105488)
  [compiler-rt][test] Change tests to remove the use of `unset` command in lit internal shell  (#104880)
  [Clang] [Parser] Improve diagnostic for `friend concept` (#105121)
  [clang][rtsan] Introduce realtime sanitizer codegen and driver (#102622)
  [libunwind] Stop installing the mach-o module map (#105616)
  [VPlan] Fix typo in cb4efe1d.
  [VPlan] Don't trigger VF assertion if VPlan has extra simplifications.
  [LLD][COFF] Generate X64 thunks for ARM64EC entry points and patchable functions. (#105499)
  [VPlan] Factor out precomputing costs from LVP::cost (NFC).
  AMDGPU: Remove global/flat atomic fadd intrinics (#97051)
  [LTO] Introduce helper functions to add GUIDs to ImportList (NFC) (#105555)
  Revert "[MCA][X86] Add missing 512-bit vpscatterqd/vscatterqps schedu… (#105716)
  [libc] Fix locale structs with old headergen
  [libc] Add `ctype.h` locale variants (#102711)
  [NFC] [MLIR] [OpenMP] Fixing typo of clause. (#105712)
  [AMDGPU] Correctly insert s_nops for dst forwarding hazard (#100276)
  Fix dap stacktrace perf issue (#104874)
  [HLSL][SPIRV]Add SPIRV generation for HLSL dot (#104656)
  [libc] Fix leftover thread local
  [NFC] [Docs] add missing space
  [libc] Initial support for 'locale.h' in the LLVM libc (#102689)
  Revert " [libc] Add `ctype.h` locale variants (#102711)"
  [libc] Add `ctype.h` locale variants (#102711)
  [libc++] Fix transform_error.mandates.verify.cpp test on msvc (#104635)
  [VPlan] Move EVL memory recipes to VPlanRecipes.cpp (NFC)
  [Xtensa,test] Fix div.ll after #99981
  [MCA][X86] Add missing 512-bit vpscatterqd/vscatterqps schedule data
  [MCA][X86] Add scatter instruction test coverage for #105675
  [IR] Simplify comparisons with std::optional (NFC) (#105624)
  Recommit "[FunctionAttrs] deduce attr `cold` on functions if all CG paths call a `cold` function"
  [lldb] Change the two remaining SInt64 settings in Target to uint (#105460)
  [libc++] Adjust armv7 XFAIL target triple for the setfill_wchar_max test. (#105586)
  [clang][bytecode] Fix 'if consteval' in non-constant contexts (#104707)
  [NFC] [SCCP] remove unused functions (#105603)
  [WebAssembly] Change half-precision feature name to fp16. (#105434)
  [C23] Remove WG14 N2517 from the status page
  [bindings][ocaml] Add missing AtomicRMW operations (#105673)
  [MCA][X86] Add scatter instruction test coverage for #105675
  [Driver] Add -Wa, options -mmapsyms={default,implicit}
  [CodeGen] Construct SmallVector with iterator ranges (NFC) (#105622)
  [lldb] Fix typos in ScriptedInterface.h
  [AMDGPU][GlobalISel] Disable fixed-point iteration in all Combiners (#105517)
  [AArch64,ELF] Allow implicit $d/$x at section beginning
  [AArch64] Fix a warning
  [Vectorize] Fix warnings
  Reland "[asan] Remove debug tracing from `report_globals` (#104404)" (#105601)
  [X86] Add BSR/BSF tests to check for implicit zero extension
  [AArch64] Lower aarch64_neon_saddlv via SADDLV nodes. (#103307)
  [lldb][test] Add a unit-test for importRecordLayoutFromOrigin
  [ARM] Fix missing ELF FPU attributes for fp-armv8-fullfp16-d16  (#105677)
  [lldb] Pick the correct architecutre when target and core file disagree (#105576)
  [Verifier] Make lrint and lround intrinsic cases concise. NFC (#105676)
  [SLP]Improve/fix subvectors in gather/buildvector nodes handling
  [DwarfEhPrepare] Assign dummy debug location for more inserted _Unwind_Resume calls (#105513)
  [RISCV][GISel] Implement canLowerReturn. (#105465)
  [AMDGPU] Generate checks for vector indexing. NFC. (#105668)
  [NFC] Replace bool <= bool comparison (#102948)
  [SLP]Do not count extractelement costs in unreachable/landing pad blocks.
  [SimplifyCFG] Fold switch over ucmp/scmp to icmp and br (#105636)
  [libc++] Post-LLVM19-release docs cleanup (#99667)
  [AArch64] optimise SVE cmp intrinsics with no active lanes (#104779)
  [RISCV] Introduce local peephole to reduce VLs based on demanded VL (#104689)
  [DAG][RISCV] Use vp_reduce_* when widening illegal types for reductions (#105455)
  [libc++][docs] Major update to the documentation
  [InstCombine] Handle logical op for and/or of icmp 0/-1
  [InstCombine] Add more test variants with poison elements (NFC)
  [LLVM][CodeGen][SVE] Increase vector.insert test coverage.
  [PowerPC] Fix mask for __st[d/w/h/b]cx builtins (#104453)
  [Analysis] Teach ScalarEvolution::getRangeRef about more dereferenceable objects (#104778)
  [mlir][LLVM] Add support for constant struct with multiple fields (#102752)
  [mlir][OpenMP][NFC] clean up optional reduction region parsing (#105644)
  [InstCombine] Add more tests for foldLogOpOfMaskedICmps transform (NFC)
  [clang][bytecode] Allow adding offsets to function pointers (#105641)
  [Clang][Sema] Rebuild template parameters for out-of-line template definitions and partial specializations (#104030)
  [InstCombine] Fold `scmp(x -nsw y, 0)` to `scmp(x, y)` (#105583)
  [flang][OpenMP] use reduction alloc region (#102525)
  [mlir][OpenMP] Convert reduction alloc region to LLVMIR (#102524)
  [mlir][OpenMP] Add optional alloc region to reduction decl (#102522)
  [libc++] Add link to the Github conformance table from the documentation
  [libc++] Fix the documentation build
  [NFC][SetTheory] Refactor to use const pointers and range loops (#105544)
  [NFC][VPlan] Correct two typos in comments.
  [clang][bytecode] Fix void unary * operators (#105640)
  Revert "[lldb] Extend frame recognizers to hide frames from backtraces (#104523)"
  Revert "[lldb-dap] Mark hidden frames as "subtle" (#105457)"
  Revert "[lldb][swig] Use the correct variable in the return statement"
  [DebugInfo][NFC] Constify debug DbgVariableRecord::{isDbgValue,isDbgDeclare}  (#105570)
  [cmake] Include GNUInstallDirs before using variables defined by it. (#83807)
  [AMDGPU] GFX12 VMEM loads can write VGPR results out of order (#105549)
  [AMDGPU] Add GFX12 test coverage for vmcnt flushing in loop headers (#105548)
  [AArch64][GlobalISel] Libcall i128 srem/urem and scalarize more vector types.
  [AArch64] Add GISel srem/urem tests of various sizes. NFC
  LSV: forbid load-cycles when vectorizing; fix bug (#104815)
  [X86] Allow speculative BSR/BSF instructions on targets with CMOV (#102885)
  [lit] Fix substitutions containing backslashes (#103042)
  [Dexter] Sanitize user details from git repo URL in dexter --version (#105533)
  [SimplifyCFG] Add tests for switch over cmp intrinsic (NFC)
  [libc++] Refactor the std::unique_lock tests (#102151)
  Fix logf128 tests to allow negative NaNs from (#104929)
  [MemCpyOpt] Avoid infinite loops in `MemCpyOptPass::processMemCpyMemCpyDependence` (#103218)
  [mlir][dataflow] Propagate errors from `visitOperation` (#105448)
  Enable logf128 constant folding for hosts with 128bit long double (#104929)
  [mlir][llvmir][debug] Correctly generate location for phi nodes. (#105534)
  [Sparc] Add flags to enable errata workaround pass for GR712RC and UT700 (#104742)
  [lldb][AIX] Updating XCOFF,PPC entry in LLDB ArchSpec (#105523)
  [mlir][cuda] NFC: Remove accidentally committed 'asd' file. (#105491)
  [clang] Merge lifetimebound and GSL code paths for lifetime analysis (#104906)
  [Xtensa] Implement lowering Mul/Div/Shift operations. (#99981)
  [clang][bytecode] Don't discard all void-typed expressions (#105625)
  Build SanitizerCommon if ctx_profile enabled (#105495)
  [InstCombine] Fold icmp over select of cmp more aggressively (#105536)
  [SPIR-V] Rework usage of virtual registers' types and classes (#104104)
  [ELF] Move target to Ctx. NFC
  [Transforms] Refactor CreateControlFlowHub (#103013)
  [asan][Darwin] Simplify test (#105599)
  [Transforms] Construct SmallVector with iterator ranges (NFC) (#105607)
  [Flang][Runtime] Fix type used to store result of typeInfo::Value::Ge… (#105589)
  [PGO][OpenMP] Instrumentation for GPU devices (Revision of #76587) (#102691)
  [clang][NFC] remove resolved issue from StandardCPlusPlusModules.rst (#105610)
  AMDGPU: Handle folding frame indexes into s_add_i32 (#101694)
  [RISCV][GISel] Correct registers classes in vector anyext.mir test. NFC
  [ELF] Move script into Ctx. NFC
  [ELF] LinkerScript: initialize dot. NFC
  [RISCV][GISel] Correct registers classes in vector sext/zext.mir tests. NFC
  [ELF] Remove unneeded script->. NFC
  [ELF] Move mainPart to Ctx. NFC
  [Symbolizer, DebugInfo] Clean up LLVMSymbolizer API: const string& -> StringRef (#104541)
  [flang][NFC] Move OpenMP related passes into a separate directory (#104732)
  [RISCV] Add CSRs and an instruction for Smctr and Ssctr extensions. (#105148)
  [SandboxIR] Implement FuncletPadInst, CatchPadInst and CleanupInst (#105294)
  [lldb-dap] Skip the lldb-dap output test on windows, it seems all the lldb-dap tests are disabled on windows. (#105604)
  [libc] Fix accidentally using system file on GPU
  [llvm][nsan] Skip function declarations (#105598)
  Handle #dbg_values in SROA. (#94070)
  Revert "Speculative fix for asan/TestCases/Darwin/cstring_section.c"
  [BPF] introduce __attribute__((bpf_fastcall)) (#105417)
  [SandboxIR] Simplify matchers in ShuffleVectorInst unit test (NFC) (#105596)
  [compiler-rt][test] Added REQUIRES:shell to fuzzer test with for-loop (#105557)
  [ctx_prof] API to get the instrumentation of a BB (#105468)
  [lldb] Speculative fix for trap_frame_sym_ctx.test
  [LTO] Compare std::optional<ImportKind> directly with ImportKind (NFC) (#105561)
  [LTO] Use enum class for ImportFailureReason (NFC) (#105564)
  [flang][runtime] Add build-time flags to runtime to adjust SELECTED_x_KIND() (#105575)
  [libc] Add `scanf` support to the GPU build (#104812)
  [SandboxIR] Add tracking for `ShuffleVectorInst::setShuffleMask`. (#105590)
  [NFC][TableGen] Change global variables from anonymous NS to static (#105504)
  [SandboxIR] Fix use-of-uninitialized in ShuffleVectorInst unit test. (#105592)
  [InstCombine] Fold `sext(A < B) + zext(A > B)` into `ucmp/scmp(A, B)` (#103833)
  Revert "[Coroutines] [NFCI] Don't search the DILocalVariable for __promise when constructing the debug varaible for __coro_frame"
  Revert "[Coroutines] Fix -Wunused-variable in CoroFrame.cpp (NFC)"
  Revert "[Coroutines] Salvage the debug information for coroutine frames within optimizations"
  [mlir] Add nodiscard attribute to allowsUnregisteredDialects (#105530)
  [libc++] Mark LWG3404 as implemented
  [lldb-dap] When sending a DAP Output Event break each message into separate lines. (#105456)
  [RFC][flang] Replace special symbols in uniqued global names. (#104859)
  [libc++] Improve the granularity of status tracking from Github issues
  [ADT] Add `isPunct` to StringExtras (#105461)
  [SandboxIR] Add ShuffleVectorInst (#104891)
  [AArch64] Add SVE lowering of fixed-length UABD/SABD (#104991)
  [SLP]Try to keep scalars, used in phi nodes, if phi nodes from same block are vectorized.
  [SLP]Fix PR105120: fix the order of phi nodes vectorization.
  [CGData] Fix tests for sed without using options (#105546)
  [flang][OpenMP] Follow-up to build-breakage fix (#102028)
  [NFC][ADT] Remove << operators from StringRefTest (#105500)
  [lldb-dap] Implement `StepGranularity` for "next" and "step-in" (#105464)
  [Docs] Update Loop Optimization WG call.
  [gn build] Port a6bae5cb3791
  [AMDGPU] Split GCNSubtarget into its own file. NFC. (#105525)
  [ctx_prof] Profile flatterner (#104539)
  [libc][docs] Update docs to reflect new headergen (#102381)
  [clang] [test] Use lit Syntax for Environment Variables in Clang subproject (#102647)
  [RISCV] Minor style fixes in lowerVectorMaskVecReduction [nfc]
  [libc++] Standardize how we track removed and superseded papers
  [libc++][NFC] A few mechanical adjustments to capitalization in status files
  [LLDB][Minidump] Fix ProcessMinidump::GetMemoryRegions to include 64b regions when /proc/pid maps are missing. (#101086)
  Scalarize the vector inputs to llvm.lround intrinsic by default. (#101054)
  [AArch64] Set scalar fneg to free for fnmul (#104814)
  [libcxx] Add cache file for the GPU build (#99348)
  [Offload] Improve error reporting on memory faults (#104254)
  [bazel] Fix mlir build broken by 681ae097. (#105552)
  [CGData] Rename CodeGenDataTests to CGDataTests (#105463)
  [ELF,test] Enhance hip-section-layout.s
  [clang-format] Use double hyphen for multiple-letter flags (#100978)
  [mlir] [tablegen] Make `hasSummary` and `hasDescription` useful (#105531)
  [flang][Driver] Remove misleading test comment (#105528)
  [MLIR][OpenMP] Add missing OpenMP to LLVM conversion patterns (#104440)
  [flang][debug] Allow non default array lower bounds. (#104467)
  [DAGCombiner] Fix ReplaceAllUsesOfValueWith mutation bug in visitFREEZE (#104924)
  Fix bug with -ffp-contract=fast-honor-pragmas (#104857)
  [RISCV] Add coverage for fp reductions of <2^N-1 x FP> vectors
  [AMDGPU][True16][MC] added VOPC realtrue/faketrue flag and fake16 instructions (#104739)
  [libc++] Enable C++23 and C++26 issues to be synchronized
  [gn] port 7ad7f8f7a3d4
  Speculative fix for asan/TestCases/Darwin/cstring_section.c
  [libc++] Mark C++14 as complete and remove the status pages (#105514)
  [AArch64] Bail out for scalable vecs in areExtractShuffleVectors (#105484)
  [LTO] Use a range-based for loop (NFC) (#105467)
  [LTO] Use DenseSet in computeLTOCacheKey (NFC) (#105466)
  Revert "[flang][NFC] Move OpenMP related passes into a separate directory (#104732)"
  [AArch64] Add support for ACTLR_EL12 system register (#105497)
  [InstCombine] Add tests for icmp of select of cmp (NFC)
  [NFC][ADT] Format StringRefTest.cpp to fit in 80 columns. (#105502)
  [flang][NFC] Move OpenMP related passes into a separate directory (#104732)
  [libcxx] Add `LIBCXX_HAS_TERMINAL_AVAILABLE` CMake option to disable `print` terminal checks (#99259)
  [clang] Diagnose functions with too many parameters (#104833)
  [mlir][memref]: Allow collapse dummy strided unit dim (#103719)
  [lldb][swig] Use the correct variable in the return statement
  [libc++] Avoid -Wzero-as-null-pointer-constant in operator<=> (#79465)
  [llvm-reduce] Disable fixpoint verification in InstCombine
  [libc++] Refactor the tests for mutex, recursive mutex and their timed counterparts (#104852)
  [Clang] fix generic lambda inside requires-clause of friend function template (#99813)
  Revert "[asan] Remove debug tracing from `report_globals` (#104404)"
  [analyzer] Limit `isTainted()` by skipping complicated symbols (#105493)
  [clang][CodeGen][SPIR-V][AMDGPU] Tweak AMDGCNSPIRV ABI to allow for the correct handling of aggregates passed to kernels / functions. (#102776)
  [InstCombine] Extend Fold of Zero-extended Bit Test (#102100)
  [LLVM][VPlan] Keep all VPBlend masks until VPlan transformation. (#104015)
  [gn build] Port 0cff3e85db00
  [NFC][Support] Move ModRef/MemoryEffects printers to their own file (#105367)
  [NFC][ADT] Add unit test for llvm::mismatch. (#105459)
  LAA: pre-commit tests for stride-versioning (#97570)
  [VPlan] Only use selectVectorizationFactor for cross-check (NFCI). (#103033)
  [SPIR-V] Sort basic blocks to match the SPIR-V spec (#102929)
  [DAG] Add select_cc -> abd folds (#102137)
  [MLIR][mesh] moving shardinginterfaceimpl for tensor to tensor extension lib (#104913)
  AMDGPU: Remove flat/global atomic fadd v2bf16 intrinsics (#97050)
  [InstCombine] Remove some of the complexity-based canonicalization (#91185)
  [PS5][Driver] Link main components with -pie by default (#102901)
  [bazel] Port a3d41879ecf5690a73f9226951d3856c7faa34a4
  [gn build] Port 6c189eaea994
  [Clang][NFCI] Cleanup the fix for default function argument substitution (#104911)
  [AMDGPU][True16][test] added missing true16 flag in gfx12 asm vop1 (#104884)
  [RISCV] Make EmitRISCVCpuSupports accept multiple features (#104917)
  [AArch64] Add SME peephole optimizer pass (#104612)
  [RISCV] Remove experimental for Ssqosid ext (#105476)
  Revert "[LLVM] [X86] Fix integer overflows in frame layout for huge frames (#101840)"
  [llvm][test] Write temporary files into a temporary directory
  [GlobalIsel] Push cast through build vector (#104634)
  [Clang] Implement CWG2351 `void{}` (#78060)
  [VPlan] Introduce explicit ExtractFromEnd recipes for live-outs. (#100658)
  [gn build] Port 7c4cadfc4333
  [mlir][vector] Add more tests for ConvertVectorToLLVM (5/n) (#104784)
  [mlir][Linalg] Bugfix for folder of `linalg.transpose` (#102888)
  [RISCV] Add Hazard3 Core as taped out for RP2350 (#102452)
  [X86][AVX10.2] Support AVX10.2-CONVERT new instructions. (#101600)
  [Flang][Runtime] Handle missing definitions in <cfenv> (#101242)
  [compiler-rt] Reland "SetThreadName implementation for Fuchsia" (#105179)
  [LAA] Collect loop guards only once in MemoryDepChecker (NFCI).
  [ELF] Move ppc64noTocRelax to Ctx. NFC
  [clang-repl] Fix printing preprocessed tokens and macros (#104964)
  [mlir][ODS] Optionally generate public C++ functions for type constraints (#104577)
  [Driver] Use llvm::make_range(std::pair) (NFC) (#105470)
  Revert "[AArch64] Optimize when storing symmetry constants" (#105474)
  [llvm][DWARFLinker] Don't attach DW_AT_dwo_id to CUs (#105186)
  [lldb-dap] Mark hidden frames as "subtle" (#105457)
  [clang][bytecode] Fix diagnostic in final ltor cast (#105292)
  [clang-repl] [codegen] Reduce the state in TBAA. NFC for static compilation. (#98138)
  [CMake] Update CMake cache file for the ARM/Aarch64 cross toolchain builds. NFC. (#103552)
  Revert "[FunctionAttrs] deduce attr `cold` on functions if all CG paths call a `cold` function"
  [AMDGPU] Update instrumentAddress method to support aligned size and unusual size accesses. (#104804)
  [BOLT] Improve BinaryFunction::inferFallThroughCounts() (#105450)
  [lldb][test] Workaround older systems that lack gettid (#104831)
  [LTO] Teach computeLTOCacheKey to return std::string (NFC) (#105331)
  [gn build] Port c8a678b1e486
  [gn build] Port 55d744eea361
  [ELF,test] Improve error-handling-script-linux.test
  [gn] tblgen opts for llvm-cgdata
  [MLIR][MathDialect] fix fp32 promotion crash when encounters scf.if (#104451)
  Reland "[gn build] Port d3fb41dddc11 (llvm-cgdata)"
  RISC-V: Add fminimumnum and fmaximumnum support (#104411)
  [mlir] Fix -Wunused-result in ElementwiseOpFusion.cpp (NFC)
  [RISCV][GISel] Merge RISCVCallLowering::lowerReturnVal into RISCVCallLowering::lowerReturn. NFC
  [AArch64] Basic SVE PCS support for handling scalable vectors on Darwin.
  Fix KCFI types for generated functions with integer normalization (#104826)
  [RISCV] Add coverage for int reductions of <3 x i8> vectors
  Revert "[RISCV][GISel] Allow >2*XLen integers in isSupportedReturnType."
  [DirectX] Register a few DXIL passes with the new PM
  [RISCV][GISel] Allow >2*XLen integers in isSupportedReturnType.
  [mlir][linalg] Improve getPreservedProducerResults estimation in ElementwiseOpFusion (#104409)
  [lldb] Extend frame recognizers to hide frames from backtraces (#104523)
  [RISCV][GISel] Split LoadStoreActions in LoadActions and StoreActions.
  [lldb][test] XFAIL TestAnonNamespaceParamFunc.cpp on Windows
  [FunctionAttrs] deduce attr `cold` on functions if all CG paths call a `cold` function
  [FunctionAttrs] Add tests for deducing attr `cold` on functions; NFC
  [DXIL][Analysis] Update test to match comment. NFC (#105409)
  [flang] Fix test on ppc64le & aarch64 (#105439)
  [bazel] Add missing dependencies for c8a678b1e4863df2845b1305849534047f10caf1
  [RISCV][GISel] Remove s32 support for G_ABS on RV64.
  [TableGen] Rework `EmitIntrinsicToBuiltinMap` (#104681)
  [libc] move newheadergen back to safe_load (#105374)
  [cmake] Set up llvm-ml as ASM_MASM tool in WinMsvc.cmake (#104903)
  [libc] Include startup code when installing all (#105203)
  [DAG][RISCV] Use vp.<binop> when widening illegal types for binops which can trap (#105214)
  [BOLT] Reduce CFI warning verbosity (#105336)
  [flang] Disable part of failing test (temporary) (#105350)
  AMDGPU: Temporarily stop adding AtomicExpand to new PM passes
  [OpenMP] Temporarily disable test to keep bots green
  [Clang] Re-land Overflow Pattern Exclusions (#104889)
  [RISCV][GISel] Remove s32 support on RV64 for DIV, and REM. (#102519)
  [flang] Disable failing test (#105327)
  [NFC] Fix a typo in InternalsManual: ActOnCXX -> ActOnXXX (#105207)
  [NFC] Fixed two typos: "__builin_" --> "__builtin_" (#98782)
  [flang] Re-enable date_and_time intrinsic test (NFC) (#104967)
  [clang] Support -Wa, options -mmsa and -mno-msa (#99615)
  AMDGPU/NewPM: Start filling out addIRPasses (#102884)
  AMDGPU/NewPM: Fill out passes in addCodeGenPrepare (#102867)
  [SandboxIR] Implement CatchSwitchInst (#104652)
  clang/AMDGPU: Emit atomicrmw for flat/global atomic min/max f64 builtins (#96876)
  clang/AMDGPU: Emit atomicrmw for global/flat fadd v2bf16 builtins (#96875)
  clang/AMDGPU: Emit atomicrmw from flat_atomic_{f32|f64} builtins (#96874)
  [Driver,DXIL] Fix build
  [Attributor] Improve AAUnderlyingObjects (#104835)
  [flang] Fix IEEE_NEAREST_AFTER folding edge cases (#104846)
  [flang] Silence spurious error (#104821)
  [flang] Silence an inappropriate warning (#104685)
  [flang] Fix inheritance of IMPLICIT typing rules (#102692)
  [flang] More support for anonymous parent components in struct constr… (#102642)
  clang/AMDGPU: Emit atomicrmw from {global|flat}_atomic_fadd_v2f16 builtins (#96873)
  [lldb][test] Change unsupported cat -e to cat -v to work with lit internal shell (#104878)
  [llvm-lit][test] Updated built-in cat command tests (#104473)
  [mlir][gpu] Add extra value types for gpu::ShuffleOp (#104605)
  [AArch64][MachO] Add ptrauth ABI version to arm64e cpusubtype. (#104650)
  [libc++] Fix several double-moves in the code base (#104616)
  [lldb] Disable the API test TestCppBitfields on Windows (#105037)
  llvm.lround: Update verifier to validate support of vector types. (#98950)
  [mlir][sparse] support sparsification to coiterate operations. (#102546)
  Fix post-104491 (#105191)
  [mlir][tablegen] Fix tablegen bug with `Complex` class (#104974)
  [DirectX] Encapsulate DXILOpLowering's state into a class. NFC
  [ctx_prof] Add analysis utility to fetch ID of a callsite (#104491)
  [lldb] Fix windows debug build after 9d07f43 (#104896)
  [lldb][ClangExpressionParser] Implement ExternalSemaSource::ReadUndefinedButUsed (#104817)
  Revert "[compiler-rt][fuzzer] implements SetThreadName for fuchsia." (#105162)
  [lldb][ClangExpressionParser] Don't leak memory when multiplexing ExternalASTSources (#104799)
  [mlir][gpu] Add 'cluster_size' attribute to gpu.subgroup_reduce (#104851)
  [mlir][spirv] Support `gpu` in `convert-to-spirv` pass (#105010)
  [libc++][chono] Use hidden friends for leap_second comparison. (#104713)
  [OpenMP] Map `omp_default_mem_alloc` to global memory (#104790)
  [NFC][TableGen] Elminate use of isalpha/isdigit from TGLexer (#104837)
  [HLSL] Implement support for HLSL intrinsic  - saturate (#104619)
  [RISCV] Add isel optimization for (and (sra y, c2), c1) to recover regression from #101751. (#104114)
  [bazel] Add missing deps in {Arith,DLTI}DialectTdFiles (#105091)
  [bazel] Port bf68e9047f62c22ca87f9a4a7c59a46b3de06abb (#104907)
  [Clang] CWG722: nullptr to ellipses (#104704)
  [RISCV] Add coverage for VP div[u]/rem[u] with non-power-of-2 vectors
  Recommit "[CodeGenPrepare] Folding `urem` with loop invariant value"
  [CodeGenPrepare][X86] Add tests for fixing `urem` transform; NFC
  Fix a warning for -Wcovered-switch-default (#105054)
  [OpenMP][FIX] Check for requirements early (#104836)
  [mlir] [irdl] Improve IRDL documentation (#104928)
  [CMake] Remove HAVE_LINK_H
  [Support] Remove unneeded __has_include fallback
  [docs] Fix typo in llvm.experimental.vector.compress code-block snippet
  [clang][ASTMatcher] Fix execution order of hasOperands submatchers (#104148)
  InferAddressSpaces: Factor replacement loop into function [NFC] (#104430)
  [DXIL][Analysis] Delete unnecessary test (#105025)
  [MLIR][EmitC] Allow ptrdiff_t as result in sub op (#104921)
  [NFC] Remove explicit bitcode enumeration from BitCodeFormat.rst (#102618)
  [NVPTX] Add elect.sync Intrinsic (#104780)
  [AMDGPU] Move AMDGPUMemoryUtils out of Utils. NFC. (#104930)
  [clang][OpenMP] Fix typo in comment, NFC
  [AArch64] fix buildbot by removing dead code
  [llvm-cgdata] Fix -Wcovered-switch-default (NFC)
  Reenable anon structs (#104922)
  [DXIL][Analysis] Add validator version to info collected by Module Metadata Analysis  (#104828)
  Reland [CGData] llvm-cgdata #89884 (#101461)
  [CostModel][X86] Add missing costkinds for scalar CTLZ/CTTZ instructions
  [Driver] Make ffp-model=fast honor non-finite-values, introduce ffp-model=aggressive (#100453)
  [InstCombine] Thwart complexity-based canonicalization in test (NFC)
  [AArch64] Extend sxtw peephole to uxtw. (#104516)
  Reapply "[CycleAnalysis] Methods to verify cycles and their nesting. (#102300)"
  [AArch64] Optimize when storing symmetry constants (#93717)
  [lldb][Windows] Fixed the API test breakpoint_with_realpath_and_source_map (#104918)
  [SPARC] Remove assertions in printOperand for inline asm operands (#104692)
  [llvm][offload] Move AMDGPU offload utilities to LLVM (#102487)
  [AArch64][NEON] Extend faminmax patterns with fminnm/fmaxnm (#104766)
  [AArch64] Remove TargetParser CPU/Arch feature tests (#104587)
  [InstCombine] Adjust fixpoint error message (NFC)
  [LLVM] Add a C API for creating instructions with custom syncscopes. (#104775)
  [llvm-c] Add getters for LLVMContextRef for various types (#99087)
  [clang][NFC] Split invalid-cpu-note tests (#104601)
  [X86][AVX10] Fix unexpected error and warning when using intrinsic (#104781)
  [ScheduleDAG] Dirty height/depth in addPred/removePred even for latency zero (#102915)
  [gn build] Port 42067f26cd08
  [X86] Use correct fp immediate types in _mm_set_ss/sd
  [X86] Add clang codegen test coverage for #104848
  [SimplifyCFG] Add support for hoisting commutative instructions (#104805)
  [clang][bytecode] Fix discarding CompoundLiteralExprs (#104909)
  Revert "[CycleAnalysis] Methods to verify cycles and their nesting. (#102300)"
  [LLVM-Reduce] - Distinct Metadata Reduction (#104624)
  [clang][modules] Built-in modules are not correctly enabled for Mac Catalyst (#104872)
  [MLIR][DLTI] Introduce DLTIQueryInterface and impl for DLTI attrs (#104595)
  [Flang][OpenMP] Prevent re-composition of composite constructs (#102613)
  [BasicAA] Use nuw attribute of GEPs (#98608)
  [CycleAnalysis] Methods to verify cycles and their nesting. (#102300)
  [mlir][EmitC] Model lvalues as a type in EmitC (#91475)
  [mlir][EmitC] Do not convert illegal types in EmitC (#104571)
  [Clang][test] Add bytecode interpreter tests for floating comparison functions (#104703)
  [clang][bytecode] Fix initializing base casts (#104901)
  [mlir][ArmSME][docs] Update example (NFC)
  [llvm][GitHub] Fix formatting of new contributor comments
  [Coroutines] Salvage the debug information for coroutine frames within optimizations
  [lldb][AIX] 1. Avoid namespace collision on other platforms (#104679)
  [MLIR][Bufferize][NFC] Fix documentation typo (#104881)
  [LV] Simplify !UserVF.isZero() -> UserVF (NFC).
  [DataLayout] Refactor the rest of `parseSpecification` (#104545)
  [LLD][COFF] Detect weak reference cycles. (#104463)
  [MLIR][Python] remove unused init python file (#104890)
  [clang-doc] add support for block commands in clang-doc html output (#101108)
  [Coroutines] Fix -Wunused-variable in CoroFrame.cpp (NFC)
  [IR] Check that arguments of naked function are not used (#104757)
  [Coroutines] [NFCI] Don't search the DILocalVariable for __promise when constructing the debug varaible for __coro_frame
  [MLIR] Introduce a SelectLikeOpInterface (#104751)
  Revert "[scudo] Add partial chunk heuristic to retrieval algorithm." (#104894)
  [NVPTX] Fix bugs involving maximum/minimum and bf16
  [SelectionDAG] Fix lowering of IEEE 754 2019 minimum/maximum
  [llvm-objcopy][WebAssembly] Allow --strip-debug to operate on relocatable files. (#102978)
  [lld][WebAssembly] Ignore local symbols when parsing lazy object files. (#104876)
  [clang][bytecode] Support ObjC blocks (#104551)
  Revert "[mlir] NFC: fix dependence of (Tensor|Linalg|MemRef|Complex) dialects on LLVM Dialect and LLVM Core in CMake build (#104832)"
  [ADT] Fix a minor build error (#104840)
  [Driver] Default -msmall-data-limit= to 0 and clean up code
  [docs] Revise the doc for __builtin_allow_runtime_check
  [MLIR][Transforms] Fix dialect conversion inverse mapping (#104648)
  [scudo] Add partial chunk heuristic to retrieval algorithm. (#104807)
  [mlir] NFC: fix dependence of (Tensor|Linalg|MemRef|Complex) dialects on LLVM Dialect and LLVM Core in CMake build (#104832)
  [offload] - Fix issue with standalone debug offload build (#104647)
  [ValueTracking] Handle incompatible types instead of asserting in `isKnownNonEqual`; NFC
  [AMDGPU] Add VOPD combine dependency tests. NFC. (#104841)
  [compiler-rt][fuzzer] implements SetThreadName for fuchsia. (#99953)
  [Support] Do not ignore unterminated open { in formatv (#104688)
  Reapply "[HWASan] symbolize stack overflows" (#102951) (#104036)
  Fix StartDebuggingRequestHandler/ReplModeRequestHandler in lldb-dap (#104824)
  Emit `BeginSourceFile` failure with `elog`. (#104845)
  [libc][NFC] Add sollya script to compute worst case range reduction. (#104803)
  Reland "[asan] Catch `initialization-order-fiasco` in modules without…" (#104730)
  [NFC][asan] Create `ModuleName` lazily (#104729)
  [asan] Better `___asan_gen_` names (#104728)
  [NFC][ADT] Add range wrapper for std::mismatch (#104838)
  [Clang] Fix ICE in SemaOpenMP with structured binding (#104822)
  [MC] Remove duplicate getFixupKindInfo calls. NFC
  [C++23] Fix infinite recursion (Clang 19.x regression) (#104829)
  AMDGPU/NewPM: Start implementing addCodeGenPrepare (#102816)
  [AMDGPU][Docs] DWARF aspace-aware base types
  Pre-commit AMDGPU tests for masked load/store/scatter/gather (#104645)
  [ADT] Add a missing call to a unique_function destructor after move (#98747)
  [ADT] Minor code cleanup in STLExtras.h (#104808)
  [libc++abi] Remove unnecessary dependency on std::unique_ptr (#73277)
  [clang] Increase the default expression nesting limit (#104717)
  [mlir][spirv] Fix incorrect metadata in SPIR-V Header (#104242)
  [ADT] Fix alignment check in unique_function constructor (#99403)
  LSV: fix style after cursory reading (NFC) (#104793)
  Revert "[BPF] introduce `__attribute__((bpf_fastcall))` (#101228)"
  [NFC][asan] Don't `cd` after `split-file` (#104727)
  [NFC][Instrumentation] Use `Twine` in `createPrivateGlobalForString` (#104726)
  [mlir][spirv] Add `GroupNonUniformBallotFindLSB` and `GroupNonUniformBallotFindMSB` ops (#104791)
  [GlobalISel] Bail out early for big-endian (#103310)
  [compiler-rt][nsan] Add more tests for shadow memory (#100906)
  [Flang] Fix test case for AIX(big-endian) system for issuing an extra message. (#104792)
  [asan] Change Apple back to fixed allocator base address (#104818)
  [NVPTX] Add conversion intrinsics from/to fp8 types (e4m3, e5m2) (#102969)
  [RISCV] Improve BCLRITwoBitsMaskHigh SDNodeXForm. NFC
  [clang][dataflow] Collect local variables referenced within a functio… (#104459)
  [AMDGPU][GlobalISel] Save a copy in one case of addrspacecast (#104789)
  [AMDGPU] Simplify, fix and improve known bits for mbcnt (#104768)
  [TableGen] Detect invalid -D arguments and fail (#102813)
  [DirectX] Disentangle DXIL.td's op types from LLVMType. NFC
  [Clang] Check constraints for an explicit instantiation of a member function (#104438)
  [DirectX] Differentiate between 0/1 overloads in the OpBuilder. NFC
  [docs] Add note about "Re-request review" (#104735)
  [lld][ELF] Combine uniqued small data sections (#104485)
  [BPF] introduce `__attribute__((bpf_fastcall))` (#101228)
  [SmallPtrSet] Optimize find/erase
  [PowerPC] Fix codegen for transparent_union function params (#101738)
  [llvm-mca] Add bottle-neck analysis to JSON output. (#90056)
  [lldb][Python] Silence GCC warning for modules error workaround
  [gn build] Port a56663591573
  [gn build] Port a449b857241d
  [clang][bytecode] Discard NullToPointer cast SubExpr (#104782)
  [lldb] PopulatePrpsInfoTest can fail due to hardcoded priority value (#104617)
  [mlir][[spirv] Add support for math.log2 and math.log10 to GLSL/OpenCL SPIRV Backends (#104608)
  [lldb][test] Fix GCC warnings in TestGetControlFlowKindX86.cpp
  [TableGen] Resolve References at top level (#104578)
  [LLVM] [X86] Fix integer overflows in frame layout for huge frames (#101840)
  [lldb][ASTUtils] Remove unused SemaSourceWithPriorities::addSource API
  [lldb][test] Fix cast dropping const warnin in TestBreakpointSetCallback.cpp
  [SimplifyCFG] Add tests for hoisting of commutative instructions (NFC)
  [AMDGPU][R600] Move R600CodeGenPassBuilder into R600TargetMachine(NFC). (#103721)
  Revert "[clang][ExtractAPI] Stop dropping fields of nested anonymous record types when they aren't attached to variable declaration (#104600)"
  MathExtras: template'ize alignToPowerOf2 (#97814)
  [AMDGPU] Move AMDGPUCodeGenPassBuilder into AMDGPUTargetMachine(NFC) (#103720)
  [clang][ExtractAPI] Stop dropping fields of nested anonymous record types when they aren't attached to variable declaration (#104600)
  [Clang][NFC] Fix potential null dereference in encodeTypeForFunctionPointerAuth (#104737)
  [DebugInfo] Make tests SimplifyCFG-independent (NFC)
  [mlir][ArmSME] Remove XFAILs (#104758)
  [RISCV] Add vector and vector crypto to SiFiveP400 scheduler model (#102155)
  [clang][OpenMP] Diagnose badly-formed collapsed imperfect loop nests (#60678) (#101305)
  Require !windows instead of XFAIL'ing ubsan/TestCases/Integer/bit-int.c
  [clang][bytecode] Fix member pointers to IndirectFieldDecls (#104756)
  [AArch64] Add fneg(fmul) and fmul(fneg) tests. NFC
  [clang][bytecode] Use first FieldDecl instead of asserting (#104760)
  [DataLayout] Refactor parsing of i/f/v/a specifications (#104699)
  [X86] LowerABD - simplify i32/i64 to use sub+sub+cmov instead of repeating nodes via abs (#102174)
  [docs] Update a filename, fix indentation (#103018)
  [CostModel][X86] Add cost tests for scmp/ucmp intrinsics
  [NFC][SLP] Remove useless code of the schedule (#104697)
  [VPlan] Rename getBestPlanFor -> getPlanFor (NFC).
  [InstCombine] Fold `(x < y) ? -1 : zext(x != y)` into `u/scmp(x,y)` (#101049)
  [VPlan] Emit note when UserVF > MaxUserVF (NFCI).
  [LLVM][NewPM] Add C API for running the pipeline on a single function. (#103773)
  [mlir][vector] Populate sink patterns in apply_patterns.vector.reduction_to_contract (#104754)
  [lld][MachO] Fix a suspicous assert in SyntheticSections.cpp
  [PowerPC] Support -mno-red-zone option (#94581)
  [PAC][ELF][AArch64] Encode several ptrauth features in PAuth core info (#102508)
  [VPlan] Rename getBestVF -> computeBestVF (NFC).
  [MLIR][LLVM] Improve the noalias propagation during inlining (#104750)
  [LoongArch] Fix the assertion for atomic store with 'ptr' type
  [AArch64][SME] Return false from produceCompactUnwindFrame if VG save required. (#104588)
  [X86] Cleanup lowerShuffleWithUNPCK/PACK signatures to match (most) other lowerShuffle* methods. NFC.
  [X86] VPERM2*128 instructions aren't microcoded on znver1
  [X86] VPERM2*128 instructions aren't microcoded on znver2
  [VPlan] Move some LoopVectorizationPlanner helpers to VPlan.cpp (NFC).
  [mlir][docs] Update Bytecode documentation (#99854)
  [SimplifyCFG] Don't block sinking for allocas if no phi created (#104579)
  [LoongArch] Merge base and offset for LSX/LASX memory accesses (#104452)
  [RISCV] Make extension names lower case in RISCVISAInfo::checkDependency() error messages.
  [RISCV] Add helper functions to exploit similarity of some RISCVISAInfo::checkDependency() error strings. NFC
  [RISCV] Merge some ISA error reporting together and make some errors more precise.
  [RISCV] Simplify reserse fixed regs (#104736)
  [RISCV] Add more tests for RISCVISAInfo::checkDependency(). NFC
  [Sparc] Add errata workaround pass for GR712RC and UT700 (#103843)
  [TableGen] Print Error and not crash on dumping non-string values (#104568)
  [RISCV][MC] Support experimental extensions Zvbc32e and Zvkgs (#103709)
  Revert "[CodeGenPrepare] Folding `urem` with loop invariant value"
  [SelectionDAG][X86] Preserve unpredictable metadata for conditional branches in SelectionDAG, as well as JCCs generated by X86 backend. (#102101)
  [MLIR][Python] enhance python api for tensor.empty (#103087)
  [AMDGPU][NFC] Fix preload-kernarg.ll test after attributor move (#98840)
  [CodeGenPrepare] Folding `urem` with loop invariant value
  [CodeGenPrepare][X86] Add tests for folding `urem` with loop invariant value; NFC
  [MC] Remove ELFRelocationEntry::OriginalAddend
  [TLI] Add support for inferring attr `cold`/`noreturn` on `std::terminate` and `__cxa_throw`
  [DAG][PatternMatch] Add support for matchers with flags; NFC
  Update Clang version from 19 to 20 in scan-build.1.
  [clang-format] Change GNU style language standard to LS_Latest (#104669)
  [MIPS] Remove expensive LLVM_DEBUG relocation dump
  [MC] Add test that requires multiple relaxation steps
  [libc][gpu] Add Atan2 Benchmarks (#104708)
  [libc] Add single threaded kernel attributes to AMDGPU startup utility (#104651)
  [HIP] search fatbin symbols for libs passed by -l (#104638)
  [gn build] Port 0d150db214e2
  [llvm][clang] Move RewriterBuffer to ADT. (#99770)
  [Clang] Do not allow `[[clang::lifetimebound]]` on explicit object member functions (#96113)
  [clang][OpenMP] Change /* ParamName */ to /*ParamName=*/, NFC
  [clang-tidy] Support member functions with modernize-use-std-print/format (#104675)
  [clang] fix divide by zero in ComplexExprEvaluator (#104666)
  [clang][OpenMP] Avoid multiple calls to getCurrentDirective in DSAChecker, NFC
  [clang][bytecode] Only booleans can be inverted
  [Flang]: Use actual endianness for Integer<80> (#103928)
  [libc++][docs] Fixing hyperlink for mathematical special function documentation (#104444)
  [InstSimplify] Simplify `uadd.sat(X, Y) u>= X + Y` and `usub.sat(X, Y) u<= X, Y` (#104698)
  [LV] Don't cost branches and conditions to empty blocks.
  [clang][test] Remove bytecode interpreter RUN line from test
  [Clang] warn on discarded [[nodiscard]] function results after casting in C (#104677)
  [GlobalISel] Add and use an Opcode variable and update match-table-cxx.td checks. NFC
  [Clang] `constexpr` builtin floating point classification / comparison functions (#94118)
  [clang][bytecode] IntPointer::atOffset() should append (#104686)
  [clang][bytecode][NFC] Improve Pointer::print()
  [RISCV] Remove unused tablegen classes from unratified Zbp instructions. NFC
  [PowerPC] Use MathExtras helpers to simplify code. NFC (#104691)
  [clang-tidy] Correct typo in ReleaseNotes.rst (#104674)
  [APInt] Replace enum with static constexpr member variables. NFC
  [MLIR][OpenMP] Fix MLIR->LLVM value matching in privatization logic (#103718)
  [VE] Use SelectionDAG::getSignedConstant/getAllOnesConstant.
  [gn build] Port 27a62ec72aed
  [LSR] Split the -lsr-term-fold transformation into it's own pass (#104234)
  [AArch64] Use SelectionDAG::getSignedConstant/getAllOnesConstant.
  [ARM] Use SelectonDAG::getSignedConstant.
  [SelectionDAG] Use getAllOnesConstant.
  [LLD] [MinGW] Recognize the -rpath option (#102886)
  [clang][bytecode] Fix shifting negative values (#104663)
  [flang] Handle Hollerith in data statement initialization in big endian (#103451)
  [clang][bytecode] Classify 1-bit unsigned integers as bool (#104662)
  [RISCV][MC] Make error message of CSR with wrong extension more detailed (#104424)
  [X86] Don't save/restore fp around longjmp instructions (#102556)
  AMDGPU: Add tonearest and towardzero roundings for intrinsic llvm.fptrunc.round (#104486)
  [libc] Fix type signature for strlcpy and strlcat (#104643)
  [AArch64] Add a check for invalid default features (#104435)
  [clang][NFC] Clean up `Sema` headers
  [NFC] Cleanup in ADT and Analysis headers. (#104484)
  [InstCombine] Avoid infinite loop when negating phi nodes (#104581)
  Add non-temporal support for LLVM masked loads (#104598)
  [AMDGPU] Disable inline constants for pseudo scalar transcendentals (#104395)
  [mlir][Transforms] Dialect conversion: Fix bug in `computeNecessaryMaterializations` (#104630)
  [RISCV] Use getAllOnesConstant/getSignedConstant.
  [SelectionDAG] Use getSignedConstant/getAllOnesConstant.
  [NFC][asan] Make 'Module &M' class member
  [AMDGPU][NFC] Remove duplicate code by using getAddressableLocalMemorySize (#104604)
  [CodeGen][asan] Use `%t` instead of `cd` in test
  Revert "[asan] Catch `initialization-order-fiasco` in modules without globals" (#104665)
  [SelectionDAG][X86] Use getAllOnesConstant. NFC (#104640)
  [LLVM][NVPTX] Add support for brkpt instruction (#104470)
  [asan] Catch `initialization-order-fiasco` in modules without globals (#104621)
  [RISCV] Remove feature implication from Zvknhb.
  [clang-format] Adjust requires clause wrapping (#101550) (#102078)
  [MC,AArch64] Remove unneeded STT_NOTYPE/STB_LOCAL code for mapping symbols and improve tests
  [NFC][DXIL] move replace/erase in DXIL intrinsic expansion to caller (#104626)
  [flang] Allow flexible name in llvm.ident (NFC) (#104543)
  [SandboxIR] Implement SwitchInst (#104641)
  [Clang] Fix sema checks thinking kernels aren't kernels (#104460)
  [asan] Pre-commit test with global constructor without any global (#104620)
  [clang-doc] add support for enums comments in html generation (#101282)
  Revert "[AArch64] Fold more load.x into load.i with large offset"
  [NFC][cxxabi] Apply `cp-to-llvm.sh` (#101970)
  [Clang] fix crash by avoiding invalidation of extern main declaration during strictness checks (#104594)
  [Mips] Fix fast isel for i16 bswap. (#103398)
  [libc] Add missing math definitions for round and scal for GPU (#104636)
  [ScalarizeMaskedMemIntr] Optimize splat non-constant masks (#104537)
  [SandboxIR] Implement ConstantInt (#104639)
  [SLP]Fix PR104637: do not create new nodes for fully overlapped non-schedulable nodes
  [DataLayout] Refactor parsing of "p" specification (#104583)
  [flang][cuda] Remove run line
  Reland "[flang][cuda][driver] Make sure flang does not switch to cc1 (#104613)"
  Revert "Reland "[flang][cuda][driver] Make sure flang does not switch to cc1 (#104613)""
  [SandboxIR][Tracker][NFC] GenericSetterWithIdx (#104615)
  Reland "[flang][cuda][driver] Make sure flang does not switch to cc1 (#104613)"
  [MC] Drop whitespace padding in AMDGPU combined asm/disasm tests. (#104433)
  [gn build] Port 7ff377ba60bf
  [InstrProf] Support conditional counter updates (#102542)
  [Analysis] Fix null ptr dereference when using WriteGraph without branch probability info (#104102)
  [DirectX] Revert specialized createOp methods part of #101250
  [VPlan] Compute cost for most opcodes in VPWidenRecipe (NFCI). (#98764)
  [PowerPC] Do not merge TLS constants within PPCMergeStringPool.cpp (#94059)
  Revert "[flang][cuda][driver] Make sure flang does not switch to cc1" (#104632)
  [AArch64][MachO] Encode @AUTH to ARM64_RELOC_AUTHENTICATED_POINTER.
  [flang][cuda][driver] Make sure flang does not switch to cc1 (#104613)
  AMDGPU: Rename type helper functions in atomic handling
  [libc] Fix generated header definitions in cmake (#104628)
  [libcxx][fix] Rename incorrect filename variable
  [SDAG] Read-only intrinsics must have WillReturn and !Throws attributes to be treated as loads (#99999)
  Re-Apply "[DXIL][Analysis] Implement enough of DXILResourceAnalysis for buffers" (#104517)
  [SelectionDAGISel] Use getSignedConstant for OPC_EmitInteger.
  [DirectX] Add missing Analysis usage to DXILResourceMDWrapper
  [AArch64] Remove apple-a7-sysreg. (#102709)
  Revert "[libc] Disable old headergen checks unless enabled" (#104627)
  [LLD, MachO] Default objc_relative_method_lists on MacOS10.16+/iOS14+ (#104519)
  [Clang][OMPX] Add the code generation for multi-dim `thread_limit` clause (#102717)
  [lldb][test] Mark gtest cases as XFAIL if the test suite is XFAIL (#102986)
  [APINotes] Support fields of C/C++ structs
  [Attributor] Enable `AAAddressSpace` in `OpenMPOpt` (#104363)
  [HLSL] Change default linkage of HLSL functions to internal (#95331)
  [bazel] Fix cyclic dependencies for macos (#104528)
  [libc] Disable old headergen checks unless enabled (#104522)
  [SandboxIR] Implement AtomicRMWInst (#104529)
  [RISCV] Move vmv.v.v peephole from SelectionDAG to RISCVVectorPeephole (#100367)
  [nfc] Improve testability of PGOInstrumentationGen (#104490)
  [test] Prevent generation of the bigendian code inside clang test CodeGen/bit-int-ubsan.c (#104607)
  [TableGen] Refactor Intrinsic handling in TableGen (#103980)
  [mlir][emitc] Add 'emitc.switch' op to the dialect (#102331)
  [SelectionDAG][X86] Add SelectionDAG::getSignedConstant and use it in a few places. (#104555)
  [mlir][AMDGPU] Implement AMDGPU DPP operation in MLIR. (#89233)
  [RISCV] Allow YAML file to control multilib selection (#98856)
  [mlir][vector] Group re-order patterns together (#102856)
  [lldb] Add Populate Methods for ELFLinuxPrPsInfo and ELFLinuxPrStatus (#104109)
  [HLSL] Flesh out basic type typedefs (#104479)
  [mlir][vector] Add more tests for ConvertVectorToLLVM (4/n) (#103391)
  [TableGen] Sign extend constants based on size for EmitIntegerMatcher. (#104550)
  [gn] Port AST/ByteCode #104552
  [DAGCombiner] Remove TRUNCATE_(S/U)SAT_(S/U) from an assert that isn't tested. NFC (#104466)
  [RISCV] Don't support TRUNCATE_SSAT_U. (#104468)
  [Hexagon] Use range-based for loops (NFC) (#104538)
  [CodeGen] Use range-based for loops (NFC) (#104536)
  [Bazel] Port AST/ByteCode #104552
  [mlir][linalg] Implement TilingInterface for winograd operators (#96184)
  [libc++][math] Fix acceptance of convertible types in `std::isnan()` and `std::isinf()` (#98952)
  [clang] Rename all AST/Interp stuff to AST/ByteCode (#104552)
  [mlir] [tosa] Bug fixes in shape inference pass (#104146)
  [libc++] Fix rejects-valid in std::span copy construction (#104500)
  [InstCombine] Handle commuted variant of sqrt transform
  [InstCombine] Thwart complexity-based canonicalization in sqrt test (NFC)
  [InstCombine] Preserve nsw in A + -B fold
  [InstCombine] Add nsw tests for A + -B fold (NFC)
  [include-cleaner] fix 32-bit buildbots after a426ffdee1ca7814f2684b6
  [PhaseOrdering] Regenerate test checks (NFC)
  [InstCombine] Regenerate test checks (NFC)
  [X86] Fold extract_subvector(int_to_fp(x)) vXi32/vXf32 cases to match existing fp_to_int folds
  [InstCombine] Regenerate test checks (NFC)
  [mlir][spirv] Update documentation. NFC (#104584)
  [GlobalIsel] Revisit ext of ext. (#102769)
  [libc++] Fix backslash as root dir breaks lexically_relative, lexically_proximate and hash_value on Windows (#99780)
  [AArch64][GlobalISel] Disable fixed-point iteration in all Combiners
  [SLP][REVEC] Fix CreateInsertElement does not use the correct result if MinBWs applied. (#104558)
  Add FPMR register and update dependencies of FP8 instructions (#102910)
  [InstCombine] Fix incorrect zero ext in select of lshr/ashr fold
  [InstCombine] Add i128 test for select of lshr/ashr transform (NFC)
  [llvm-c] Add non-cstring versions of LLVMGetNamedFunction and LLVMGetNamedGlobal (#103396)
  [InstCombine] Fold an unsigned icmp of ucmp/scmp with a constant to an icmp of the original arguments (#104471)
  [clang][Interp] Fix classifying enum types (#104582)
  [clang] Add a new test for CWG2091 (#104573)
  [mlir][ArmSME][docs] Fix broken link (NFC)
  [compiler-rt] Stop using x86 builtin on AArch64 with GCC (#93890)
  [DataLayout] Refactor parsing of "ni" specification (#104546)
  [X86] SimplifyDemandedVectorEltsForTargetNode - reduce width of X86 conversions nodes when upper elements are not demanded. (#102882)
  [include-cleaner] Add handling for new/delete expressions (#104033)
  InferAddressSpaces: Convert test to generated checks
  [LAA] Use computeConstantDifference() (#103725)
  [SimplifyCFG] Add test for #104567 (NFC)
  [bazel] Port for 75cb9edf09fdc091e5bc0f3d46a96c2877735a39
  [AMDGPU][NFC] AMDGPUUsage.rst: document corefile format (#104419)
  [lldb][NFC] Moved FindSchemeByProtocol() from Acceptor to Socket (#104439)
  [X86] lowerShuffleAsDecomposedShuffleMerge - don't lower to unpack+permute if either source is zero.
  [X86] Add shuffle tests for #104482
  [clang][Interp][NFC] Remove Function::Loc
  [clang][NFC] Update `cxx_dr_status.html`
  [MLIR][GPU-LLVM] Add GPU to LLVM-SPV address space mapping (#102621)
  [DAG] SD Pattern Match: Operands patterns with VP Context  (#103308)
  Revert "[clang][driver] Fix -print-target-triple OS version for apple targets" (#104563)
  [NFC][X86] Refactor: merge avx512_binop_all2 into avx512_binop_all (#104561)
  [RISCV] Merge bitrotate crash test into shuffle reverse tests. NFC
  [Passes] clang-format initialization files (NFC)
  [mlir][IR] Fix `checkFoldResult` error message (#104559)
  [RISCV] Merge shuffle reverse tests. NFC
  [RISCV] Use shufflevector in shuffle reverse tests. NFC
  [RISCV] Remove -riscv-v-vector-bits-max from reverse tests. NFC
  [flang][stack-arrays] Collect analysis results for OMP ws loops (#103590)
  [clang][Interp] Add scopes to conditional operator subexpressions (#104418)
  [RISCV] Simplify (srl (and X, Mask), Const) to TH_EXTU (#102802)
  [RISCV][NFC] Fix typo: "wererenamed" to "were renamed" (#104530)
  [RISCV] Lower fixed reverse vector_shuffles through vector_reverse (#104461)
  [asan] Fix build breakage from report_globals change
  [MLIR][test] Run SVE and SME Integration tests using qemu-aarch64 (#101568)
  [DAGCombiner] Don't let scalarizeBinOpOfSplats create illegal scalar MULHS/MULHU (#104518)
  [flang][cuda] Add version in libCufRuntime name (#104506)
  [mlir][tosa] Add missing check for new_shape of `tosa.reshape` (#104394)
  [Bitcode] Use range-based for loops (NFC) (#104534)
  [HLSL] update default validator version to 1.8. (#104040)
  [ScalarizeMaskedMemIntr] Pre-commit tests for splat optimizations (#104527)
  [Sparc] Remove dead code (NFC) (#104264)
  [Clang] [Sema] Error on reference types inside a union with msvc 1900+ (#102851)
  [Driver] Reject -Wa,-mrelax-relocations= for non-ELF
  [Analysis] Use a range-based for loop (NFC) (#104445)
  [llvm] Use llvm::any_of (NFC) (#104443)
  [PowerPC] Use range-based for loops (NFC) (#104410)
  [CodeGen] Use a range-based for loop (NFC) (#104408)
  [ORC] Gate testcase for 3e1d4ec671c on x86-64 and aarch64 target support.
  [builitins] Only try to use getauxval on Linux (#104047)
  [ORC] Add missing dependence on BinaryFormat library.
  [flang] Inline minval/maxval over elemental/designate (#103503)
  [Driver] Correctly handle -Wa,--crel -Wa,--no-crel
  [lldb] Correctly fix a usage of `PATH_MAX`, and fix unit tests (#104502)
  [gn build] Port 3e1d4ec671c5
  [asan] Remove debug tracing from `report_globals` (#104404)
  [workflows] Add a new workflow for checking commit access qualifications (#93301)
  [Driver] Improve error message for -Wa,-x=unknown
  [SandboxIR] Implement UnaryOperator (#104509)
  [ORC] loadRelocatableObject: universal binary support, clearer errors (#104406)
  [RISCV] Use significant bits helpers in narrowing of build vectors [nfc] (#104511)
  [LLDB] Reapply #100443 SBSaveCore Thread list (#104497)
  [Driver] Reject -Wa,-mrelax-relocations= for non-x86
  [docs] Stress out the branch naming scheme for Graphite. (#104499)
  [NFC][sanitizer] Use `UNLIKELY` in VReport/VPrintf (#104403)
  [asan] Reduce priority of "contiguous_container:" VPrintf (#104402)
  [libc] Make sure we have RISC-V f or d extension before using it (#104476)
  [Driver] Make CodeGenOptions name match MCTargetOptions names
  [Attributor][FIX] Ensure we do not use stale references (#104495)
  [libclang/python] Expose `clang_isBeforeInTranslationUnit` for `SourceRange.__contains__`
  [Clang] Add target triple to fix failing test (#104513)
  [clang][NFC] Fix table of contents in `Sema.h`
  [-Wunsafe-buffer-usage] Fix warning after #102953
  [flang] Make sure range is valid (#104281)
  [MC] Replace hasAltEntry() with isMachO()
  MCAsmInfo: Replace some Mach-O specific check with isMachO(). NFC
  [asan] De-prioritize VReport `DTLS_Find` (#104401)
  Revert "[DXIL][Analysis] Implement enough of DXILResourceAnalysis for buffers" (#104504)
  [ubsan] Limit _BitInt ubsan tests to x86-64 platform only (#104494)
  Update load intrinsic attributes (#101562)
  [MC] Replace HasAggressiveSymbolFolding with SetDirectiveSuppressesReloc. NFC
  [SandboxIR] Implement BinaryOperator (#104121)
  [RISCV][GISel] Support nxv16p0 for RV32. (#101573)
  [nfc][ctx_prof] Remove the need for `PassBuilder` to know about `UseCtxProfile` (#104492)
  [Clang] [NFC] Rewrite constexpr vectors test to use element access (#102757)
  (lldb) Fix PATH_MAX for Windows (#104493)
  [libc] Add definition for `atan2l` on 64-bit long double platforms (#104489)
  Revert "[sanitizer] Remove GetCurrentThread nullness checks from Allocate"
  Reapply "Fix prctl to handle PR_GET_PDEATHSIG. (#101749)" (#104469)
  [-Wunsafe-buffer-usage] Fix a small bug recently found (#102953)
  [TargetLowering] Don't call SelectionDAG::getTargetLoweringInfo() from TargetLowering methods. NFC (#104197)
  [PowerPC][GlobalMerge] Enable GlobalMerge by default on AIX (#101226)
  [Clang] Implement C++26’s P2893R3 ‘Variadic friends’ (#101448)
  clang/AMDGPU: Emit atomicrmw for __builtin_amdgcn_global_atomic_fadd_{f32|f64} (#96872)
  [llvm-objdump] Fix a warning
  [bazel] Port 47721d46187f89c12a13d07b5857496301cf5d6e (#104481)
  [libc++] Remove the allocator<const T> extension (#102655)
  [Clang] handle both gnu and cpp11 attributes to ensure correct parsing inside extern block (#102864)
  [gn build] Port 47721d46187f
  [lldb] Realpath symlinks for breakpoints (#102223)
  llvm-objdump: ensure a MachO symbol isn't STAB before looking up secion (#86667)
  [test]Fix test error due to CRT dependency (#104462)
  [clang][Interp] Call move function for certain primitive types (#104437)
  [llvm-objdump] Print out  xcoff file header for xcoff object file with option private-headers (#96350)
  [Clang] prevent null explicit object argument from being deduced (#104328)
  Revert "[Clang] Overflow Pattern Exclusions (#100272)"
  [flang][OpenMP] Fix 2 more regressions after #101009 (#101538)
  [InstCombine] Fold `ucmp/scmp(x, y) >> N` to `zext/sext(x < y)` when N is one less than the width of the result of `ucmp/scmp` (#104009)
  [bazel] Enable more lit self tests (#104285)
  Fix single thread stepping timeout race condition (#104195)
  [SPARC][Utilities] Add names for SPARC ELF flags in LLVM binary utilities (#102843)
  [SPARC][Driver] Add -m(no-)v8plus flags handling (#98713)
  [OpenMP] Add support for pause with omp_pause_stop_tool (#97100)
  Revert "[SLP][NFC]Remove unused using declarations, reduce mem usage in containers, NFC"
  [ValueTracking] Fix f16 fptosi range for large integers
  [InstSimplify] Add tests for f16 to i128 range (NFC)
  Revert "[Object][x86-64] Add support for `R_X86_64_GLOB_DAT` relocations. (#103029)" (#103497)
  [NFC] Fix spelling of "definitely". (#104455)
  [InstCombine][NFC] Add tests for shifts of constants by common factor (#103471)
  [OpenMP] Miscellaneous small code improvements (#95603)
  [clang][ExtractAPI] Emit environment component of target triple in SGF (#103273)
  [RISCV] Narrow indices to e16 for LMUL > 1 when lowering vector_reverse (#104427)
  [NFC] Fix code line exceeding 80 columns (#104428)
  [SLP][NFC]Remove unused using declarations, reduce mem usage in containers, NFC
  [Clang] Check explicit object parameter for defaulted operators properly (#100419)
  [LegalizeTypes][AMDGPU]: Allow for scalarization of insert_subvector (#104236)
  Allow optimization of __size_returning_new variants. (#102258)
  [SLP]Fix PR104422: Wrong value truncation
  [GlobalISel] Combiner: Fix warning after #102163
  [SLP][NFC]Add a test with incorrect minbitwidth analysis for reduced operands
  [ubsan] Display correct runtime messages for negative _BitInt (#96240)
  Revert "[SLP][NFC]Remove unused using declarations, reduce mem usage in containers, NFC"
  [DataLayout] Extract loop body into a function to reduce nesting (NFC) (#104420)
  [clang][ExtractAPI] Compute inherited availability information (#103040)
  [CodeGen] Fix -Wcovered-switch-default in Combiner.cpp (NFC)
  [CompilerRT][Tests] Fix profile/darwin-proof-of-concept.c (#104237)
  [mlir][gpu] Fix typo in test filename (#104053)
  [LoongArch] Pre-commit tests for validating the merge base offset in vecotrs. NFC
  [AArch64] optimise SVE prefetch intrinsics with no active lanes (#103052)
  [AMDGPU] MCExpr printing helper with KnownBits support (#95951)
  [GlobalISel] Combiner: Observer-based DCE and retrying of combines
  [libcxx] Use `aligned_alloc` for testing instead of `posix_memalign` (#101748)
  [VPlan] Run VPlan optimizations on plans in native path.
  [clang][Interp] Use first field decl for Record field lookup (#104412)
  InferAddressSpaces: Restore non-instruction user check
  [AMDGPU][llvm-split] Fix another division by zero (#104421)
  Reapply "[lldb] Tolerate multiple compile units with the same DWO ID (#100577)" (#104041)
  [lldb-dap] Expose log path in extension settings (#103482)
  [clang][Interp] Pass callee decl to null_callee diagnostics (#104426)
  [llvm][CodeGen] Resolve issues when updating live intervals in window scheduler (#101945)
  [DataLayout] Add helper predicates to sort specifications (NFC) (#104417)
  InferAddressSpaces: Make getPredicatedAddrSpace less confusing (#104052)
  [AArch64] Fold more load.x into load.i with large offset
  [AArch64] merge index address with large offset into base address
  [AArch64] Add verification for MemOp immediate ranges (#97561)
  Revert "[Clang] [AST] Fix placeholder return type name mangling for MSVC 1920+ / VS2019+ (#102848)"
  [analyzer] Do not reason about locations passed as inline asm input (#103714)
  [NFC][mlir][scf] Fix misspelling of replace (#101683)
  Revert "Remove empty line."
  [mlir][Transforms] Dialect conversion: Build unresolved materialization for replaced ops (#101514)
  Remove empty line.
  [DirectX] Use a more consistent pass name for DXILTranslateMetadata
  [Flang][OpenMP] Move assert for wrapper syms and block args to genLoopNestOp (#103731)
  [clang][driver] Fix -print-target-triple OS version for apple targets (#104037)
  [bazel] Port for 141536544f4ec1d1bf24256157f4ff1a3bc07dae
  [DAG] Adding m_FPToUI and m_FPToSI to SDPatternMatch.h (#104044)
  [llvm][Docs] `_or_null` -> `_if_present` in Programmer's Manual (#98586)
  [MLIR][LLVM]: Add an IR utility to perform slice walking (#103053)
  [lldb][test] Mark sys_info zdump test unsupported on 32 bit Arm Linux
  [flang][test] Run Driver/fveclib-codegen.f90 for aarch64 and x86_64 (#103730)
  [lldb] Remove Phabricator usernames from Code Owners file (#102590)
  [DataLayout] Move '*AlignElem' structs and enum inside DataLayout (NFC) (#103723)
  [flang][test] Fix Lower/default-initialization-globals.f90 on SPARC (#103722)
  [mlir][test] XFAIL little-endian-only tests on SPARC (#103726)
  [UnitTests] Convert some data layout parsing tests to GTest (#104346)
  Fix warnings in #102848 [-Wunused-but-set-variable]
  [VPlan] Move VPWidenStoreRecipe::execute to VPlanRecipes.cpp (NFC).
  [include-cleaner] Remove two commented-out lines of code.
  [mlir][tosa] Add verifier for `tosa.table` (#103708)
  [X86][MC] Remove CMPCCXADD's CondCode flavor. (#103898)
  [ctx_prof] Remove an unneeded include in CtxProfAnalysis.cpp
  Intrinsic: introduce minimumnum and maximumnum for IR and SelectionDAG (#96649)
  Remove failing test until it can be fixed properly.
  [Clang][NFC] Move FindCountedByField into FieldDecl (#104235)
  Fix testcases. Use -emit-llvm and not -S. Use LABEL checking.
  [Clang] [AST] Fix placeholder return type name mangling for MSVC 1920+ / VS2019+ (#102848)
  [LLDB][OSX] Removed semi colon generating a warning during build (#104398)
  [OpenMP] Use range-based for loops (NFC) (#103511)
  [RISCV] Implement RISCVTTIImpl::shouldConsiderAddressTypePromotion for RISCV (#102560)
  [lld-macho] Fix crash: ObjC category merge + relative method lists (#104081)
  [ELF][NFC] Allow non-GotSection for addAddendOnlyRelocIfNonPreemptible (#104228)
  [ctx_prof] CtxProfAnalysis: populate module data (#102930)
  [sanitizer] Remove GetCurrentThread nullness checks from Allocate
  Remove '-emit-llvm' and use '-triple'
  Use clang_cc1 and specify the target explicitly.
  utils/git: Add linkify script.
  [mlir][MemRef] Add more ops to narrow type support, strided metadata expansion (#102228)
  [Clang] Overflow Pattern Exclusions (#100272)
  [Clang] Error on extraneous template headers by default. (#104046)
  [Sanitizers] Disable prctl test on Android.
  [RISCV] Don't combine (sext_inreg (fmv_x_anyexth X), i16) with Zhinx.
  Remove unused variable, and unneeded extract element instruction (#103489)
  [bazel] Port 4bac8fd8904904bc7d502f39851eef50b5afff73 (#104278)
  Reland "[flang][cuda] Use cuda runtime API #103488"
  [Clang] Add `__CLANG_GPU_DISABLE_MATH_WRAPPERS` macro for offloading math (#98234)
  [llvm-lit] Fix Unhashable TypeError when using lit's internal shell (#101590)
  [llvm-lit][test][NFC] Moved cat command tests into separate lit test file (#102366)
  [RISCV] Add signext attribute to return of fmv_x_w test in float-convert.ll. NFC
  [DXIL][Analysis] Implement enough of DXILResourceAnalysis for buffers
  Reapply "[Attributor][AMDGPU] Enable AAIndirectCallInfo for AMDAttributor (#100952)"
  [DXIL][Analysis] Boilerplate for DXILResourceAnalysis pass
  [mlir] Add bubbling patterns for non intersecting reshapes (#103401)
  Revert "[flang][cuda] Use cuda runtime API" (#104232)
  [libc++] Remove non-existent LWG issue from the .csv files
  [RISCV][GISel] Remove support for s32 G_VAARG on RV64. (#102533)
  [NVPTX] Add idp2a, idp4a intrinsics (#102763)
  [X86] Check if an invoked function clobbers fp or bp (#103446)
  [flang][cuda] Use cuda runtime API (#103488)
  [SLP][NFC]Remove unused using declarations, reduce mem usage in containers, NFC
  [TargetLowering] Remove unncessary null check. NFC
  [OpenMP] Fix buildbot failing on allocator test
  [clang] Turn -Wenum-constexpr-conversion into a hard error (#102364)
  [libcxx] Adjust inline assembly constraints for the AMDGPU target (#101747)
  [lld-macho] Make relative method lists work on x86-64 (#103905)
  [libcxx] Disable invalid `__start/__stop` reference on NVPTX (#99381)
  [libcxx] Add fallback to standard C when `unistd` is unavailable (#102005)
  [Clang] Fix 'nvlink-wrapper' not ignoring `-plugin` like lld does (#104056)
  [OpenMP] Implement 'omp_alloc' on the device (#102526)
  [vscode-mlir] Added per-LSP-server executable arguments (#79671)
  [flang] Read the extra field from the in box when doing reboxing (#102992)
  [HLSL] Split out the ROV attribute from the resource attribute, make it a new spellable attribute. (#102414)
  [libc++] Fix ambiguous constructors for std::complex and std::optional (#103409)
  AMDGPU: Avoid manually reconstructing atomicrmw (#103769)
  [libc] Fix 'float type' incorrectly being used as the return type
  [Clang] Adjust concept definition locus (#103867)
  [SandboxIR] Implement Instruction flags (#103343)
  [AArch64] Add some uxtw peephole tests. NFC
  AMDGPU: Stop promoting allocas with addrspacecast users (#104051)
  [NVPTX] Fix typo causing GCC warning (#103045)
  [attributes][-Wunsafe-buffer-usage] Support adding unsafe_buffer_usage attribute to struct fields (#101585)
  [RISCV][GISel] Support G_SEXT_INREG for Zbb. (#102682)
  [SystemZ][z/OS] Continuation of __ptr32 support (#103393)
  [X86] concat(permv3(x0,m0,y0),permv3(x0,m1,y0)) -> permv3(concat(x0,u),m3,concat(y0,u))
  [X86] Add test coverage for #103564
  [X86] combineEXTRACT_SUBVECTOR - treat oneuse extractions from loads as free
  [libcxx] Set `_LIBCPP_HAS_CLOCK_GETTIME` for GPU targets (#99243)
  Fix bazel build (#104054)
  CodeGen/NewPM: Add ExpandLarge* passes to isel IR passes (#102815)
  AMDGPU/NewPM: Fill out addPreISelPasses (#102814)
  [libc++] Add mechanical update to CxxPapers.rst to git-blame-ignore-revs
  [libc++] Mechanical adjustments for the C++14 Paper status files
  [LLDB][OSX] Add a fallback support exe directory (#103458)
  [TextAPI] Use range-based for loops (NFC) (#103530)
  [mlir][vector] Add tests for `populateSinkVectorBroadcastPatterns` (1/n) (#102286)
  [libc++] Remove duplicate C++17 LWG issues from the CSVs
  [clang] Implement `__builtin_is_implicit_lifetime()` (#101807)
  Fix prctl test to execute all test cases if the first condition fails. (#102987)
  Revert "[scudo] Separated committed and decommitted entries." (#104045)
  [SelectionDAG] Scalarize binar…
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:codegen IR generation bugs: mangling, exceptions, etc. clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants