Skip to content

[CodeGen] Inline stack guard check on Windows #136290

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Jun 12, 2025

Conversation

omjavaid
Copy link
Contributor

@omjavaid omjavaid commented Apr 18, 2025

This patch optimizes the Windows security cookie check mechanism by moving the comparison inline and only calling __security_check_cookie when the check fails. This reduces the overhead of making a DLL call for every function return.

Previously, we implemented this optimization through a dedicated pass (X86WinFixupBufferSecurityCheckPass) in PR #95904 submitted by @mahesh-attarde, but this solution appears generic and efficient. We have reverted that pass in favor of this new approach.

X86WinFixupBufferSecurityCheckPass is a machine-level pass that:

  • Scanned through the generated code to find __security_check_cookie calls
  • Modified these calls by splitting basic blocks
  • Added comparison logic and conditional branching
  • Required complex block management and live register computation

This new approach is simpler and more maintainable:

  • Implements the optimization at instruction selection time when generating the security check sequence
  • Directly emits the comparison and conditional branching
  • No need for post-processing or basic block manipulation
  • Generates more efficient code by design

We will abandon the AArch64 specific implementation of AArch64WinFixupBufferSecurityCheck pass in PR #121938 in favor of this more general solution.

Note: This solution is inspired from #121938 (comment) by @tamaspetz

@llvmbot
Copy link
Member

llvmbot commented Apr 18, 2025

@llvm/pr-subscribers-debuginfo
@llvm/pr-subscribers-llvm-globalisel
@llvm/pr-subscribers-backend-x86

@llvm/pr-subscribers-backend-aarch64

Author: Omair Javaid (omjavaid)

Changes

This patch optimizes the Windows security cookie check mechanism by moving the comparison inline and only calling __security_check_cookie when the check fails. This reduces the overhead of making a DLL call for every function return.

Previously, we implemented this optimization through a dedicated pass (X86WinFixupBufferSecurityCheckPass) in PR #95904, but this solution appears generic and efficient. We have reverted that pass in favor of this new approach.

X86WinFixupBufferSecurityCheckPass is a machine-level pass that:

  • Scanned through the generated code to find __security_check_cookie calls
  • Modified these calls by splitting basic blocks
  • Added comparison logic and conditional branching
  • Required complex block management and live register computation

This new approach is simpler and more maintainable:

  • Implements the optimization at instruction selection time when generating the security check sequence
  • Directly emits the comparison and conditional branching
  • No need for post-processing or basic block manipulation
  • Generates more efficient code by design

We will abandon the AArch64 specific implementation of AArch64WinFixupBufferSecurityCheck pass in PR #121938 in favor of this more general solution.

Note: This solution is inspired from #121938 (comment) by @tamaspetz


Patch is 34.53 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/136290.diff

11 Files Affected:

  • (modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp (+59-34)
  • (modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp (+4-25)
  • (modified) llvm/lib/Target/X86/CMakeLists.txt (-1)
  • (modified) llvm/lib/Target/X86/X86.h (-4)
  • (modified) llvm/lib/Target/X86/X86TargetMachine.cpp (-1)
  • (removed) llvm/lib/Target/X86/X86WinFixupBufferSecurityCheck.cpp (-245)
  • (modified) llvm/test/CodeGen/AArch64/GlobalISel/irtranslator-stack-protector-windows.ll (+10-2)
  • (modified) llvm/test/CodeGen/X86/opt-pipeline.ll (-1)
  • (modified) llvm/test/CodeGen/X86/stack-protector-msvc.ll (+102-27)
  • (modified) llvm/test/CodeGen/X86/tailcc-ssp.ll (+28-13)
  • (modified) llvm/test/DebugInfo/COFF/fpo-stack-protect.ll (+5-2)
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
index 8cae34d06c8ba..f2265c7f68e12 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
@@ -3059,33 +3059,6 @@ void SelectionDAGBuilder::visitSPDescriptorParent(StackProtectorDescriptor &SPD,
   if (TLI.useStackGuardXorFP())
     GuardVal = TLI.emitStackGuardXorFP(DAG, GuardVal, dl);
 
-  // Retrieve guard check function, nullptr if instrumentation is inlined.
-  if (const Function *GuardCheckFn = TLI.getSSPStackGuardCheck(M)) {
-    // The target provides a guard check function to validate the guard value.
-    // Generate a call to that function with the content of the guard slot as
-    // argument.
-    FunctionType *FnTy = GuardCheckFn->getFunctionType();
-    assert(FnTy->getNumParams() == 1 && "Invalid function signature");
-
-    TargetLowering::ArgListTy Args;
-    TargetLowering::ArgListEntry Entry;
-    Entry.Node = GuardVal;
-    Entry.Ty = FnTy->getParamType(0);
-    if (GuardCheckFn->hasParamAttribute(0, Attribute::AttrKind::InReg))
-      Entry.IsInReg = true;
-    Args.push_back(Entry);
-
-    TargetLowering::CallLoweringInfo CLI(DAG);
-    CLI.setDebugLoc(getCurSDLoc())
-        .setChain(DAG.getEntryNode())
-        .setCallee(GuardCheckFn->getCallingConv(), FnTy->getReturnType(),
-                   getValue(GuardCheckFn), std::move(Args));
-
-    std::pair<SDValue, SDValue> Result = TLI.LowerCallTo(CLI);
-    DAG.setRoot(Result.second);
-    return;
-  }
-
   // If useLoadStackGuardNode returns true, generate LOAD_STACK_GUARD.
   // Otherwise, emit a volatile load to retrieve the stack guard value.
   SDValue Chain = DAG.getEntryNode();
@@ -3126,14 +3099,66 @@ void SelectionDAGBuilder::visitSPDescriptorParent(StackProtectorDescriptor &SPD,
 /// For a high level explanation of how this fits into the stack protector
 /// generation see the comment on the declaration of class
 /// StackProtectorDescriptor.
-void
-SelectionDAGBuilder::visitSPDescriptorFailure(StackProtectorDescriptor &SPD) {
+void SelectionDAGBuilder::visitSPDescriptorFailure(
+    StackProtectorDescriptor &SPD) {
+
   const TargetLowering &TLI = DAG.getTargetLoweringInfo();
-  TargetLowering::MakeLibCallOptions CallOptions;
-  CallOptions.setDiscardResult(true);
-  SDValue Chain = TLI.makeLibCall(DAG, RTLIB::STACKPROTECTOR_CHECK_FAIL,
-                                  MVT::isVoid, {}, CallOptions, getCurSDLoc())
-                      .second;
+  MachineBasicBlock *ParentBB = SPD.getParentMBB();
+  const Module &M = *ParentBB->getParent()->getFunction().getParent();
+  SDValue Chain;
+
+  // Retrieve guard check function, nullptr if instrumentation is inlined.
+  if (const Function *GuardCheckFn = TLI.getSSPStackGuardCheck(M)) {
+
+    // First create the loads to the guard/stack slot for the comparison.
+    EVT PtrTy = TLI.getPointerTy(DAG.getDataLayout());
+    EVT PtrMemTy = TLI.getPointerMemTy(DAG.getDataLayout());
+
+    MachineFrameInfo &MFI = ParentBB->getParent()->getFrameInfo();
+    int FI = MFI.getStackProtectorIndex();
+
+    SDLoc dl = getCurSDLoc();
+    SDValue StackSlotPtr = DAG.getFrameIndex(FI, PtrTy);
+    Align Align = DAG.getDataLayout().getPrefTypeAlign(
+        PointerType::get(M.getContext(), 0));
+
+    // Generate code to load the content of the guard slot.
+    SDValue GuardVal = DAG.getLoad(
+        PtrMemTy, dl, DAG.getEntryNode(), StackSlotPtr,
+        MachinePointerInfo::getFixedStack(DAG.getMachineFunction(), FI), Align,
+        MachineMemOperand::MOVolatile);
+
+    if (TLI.useStackGuardXorFP())
+      GuardVal = TLI.emitStackGuardXorFP(DAG, GuardVal, dl);
+
+    // The target provides a guard check function to validate the guard value.
+    // Generate a call to that function with the content of the guard slot as
+    // argument.
+    FunctionType *FnTy = GuardCheckFn->getFunctionType();
+    assert(FnTy->getNumParams() == 1 && "Invalid function signature");
+
+    TargetLowering::ArgListTy Args;
+    TargetLowering::ArgListEntry Entry;
+    Entry.Node = GuardVal;
+    Entry.Ty = FnTy->getParamType(0);
+    if (GuardCheckFn->hasParamAttribute(0, Attribute::AttrKind::InReg))
+      Entry.IsInReg = true;
+    Args.push_back(Entry);
+
+    TargetLowering::CallLoweringInfo CLI(DAG);
+    CLI.setDebugLoc(getCurSDLoc())
+        .setChain(DAG.getEntryNode())
+        .setCallee(GuardCheckFn->getCallingConv(), FnTy->getReturnType(),
+                   getValue(GuardCheckFn), std::move(Args));
+
+    Chain = TLI.LowerCallTo(CLI).second;
+  } else {
+    TargetLowering::MakeLibCallOptions CallOptions;
+    CallOptions.setDiscardResult(true);
+    Chain = TLI.makeLibCall(DAG, RTLIB::STACKPROTECTOR_CHECK_FAIL, MVT::isVoid,
+                            {}, CallOptions, getCurSDLoc())
+                .second;
+  }
 
   // Emit a trap instruction if we are required to do so.
   const TargetOptions &TargetOpts = DAG.getTarget().Options;
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp
index 598de6b207754..2a75dd7d8419b 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp
@@ -1881,12 +1881,9 @@ void SelectionDAGISel::SelectAllBasicBlocks(const Function &Fn) {
       FastIS->recomputeInsertPt();
     }
 
-    if (SP->shouldEmitSDCheck(*LLVMBB)) {
-      bool FunctionBasedInstrumentation =
-          TLI->getSSPStackGuardCheck(*Fn.getParent());
+    if (SP->shouldEmitSDCheck(*LLVMBB))
       SDB->SPDescriptor.initialize(LLVMBB, FuncInfo->getMBB(LLVMBB),
-                                   FunctionBasedInstrumentation);
-    }
+                                   false);
 
     if (Begin != BI)
       ++NumDAGBlocks;
@@ -1948,24 +1945,7 @@ SelectionDAGISel::FinishBasicBlock() {
     PHI.addReg(FuncInfo->PHINodesToUpdate[i].second).addMBB(FuncInfo->MBB);
   }
 
-  // Handle stack protector.
-  if (SDB->SPDescriptor.shouldEmitFunctionBasedCheckStackProtector()) {
-    // The target provides a guard check function. There is no need to
-    // generate error handling code or to split current basic block.
-    MachineBasicBlock *ParentMBB = SDB->SPDescriptor.getParentMBB();
-
-    // Add load and check to the basicblock.
-    FuncInfo->MBB = ParentMBB;
-    FuncInfo->InsertPt =
-        findSplitPointForStackProtector(ParentMBB, *TII);
-    SDB->visitSPDescriptorParent(SDB->SPDescriptor, ParentMBB);
-    CurDAG->setRoot(SDB->getRoot());
-    SDB->clear();
-    CodeGenAndEmitDAG();
-
-    // Clear the Per-BB State.
-    SDB->SPDescriptor.resetPerBBState();
-  } else if (SDB->SPDescriptor.shouldEmitStackProtector()) {
+  if (SDB->SPDescriptor.shouldEmitStackProtector()) {
     MachineBasicBlock *ParentMBB = SDB->SPDescriptor.getParentMBB();
     MachineBasicBlock *SuccessMBB = SDB->SPDescriptor.getSuccessMBB();
 
@@ -1979,8 +1959,7 @@ SelectionDAGISel::FinishBasicBlock() {
         findSplitPointForStackProtector(ParentMBB, *TII);
 
     // Splice the terminator of ParentMBB into SuccessMBB.
-    SuccessMBB->splice(SuccessMBB->end(), ParentMBB,
-                       SplitPoint,
+    SuccessMBB->splice(SuccessMBB->end(), ParentMBB, SplitPoint,
                        ParentMBB->end());
 
     // Add compare/jump on neq/jump to the parent BB.
diff --git a/llvm/lib/Target/X86/CMakeLists.txt b/llvm/lib/Target/X86/CMakeLists.txt
index 9553a8619feb5..44a54c8ec62cb 100644
--- a/llvm/lib/Target/X86/CMakeLists.txt
+++ b/llvm/lib/Target/X86/CMakeLists.txt
@@ -83,7 +83,6 @@ set(sources
   X86TargetTransformInfo.cpp
   X86VZeroUpper.cpp
   X86WinEHState.cpp
-  X86WinFixupBufferSecurityCheck.cpp
   X86InsertWait.cpp
   GISel/X86CallLowering.cpp
   GISel/X86InstructionSelector.cpp
diff --git a/llvm/lib/Target/X86/X86.h b/llvm/lib/Target/X86/X86.h
index 48a3fe1934a96..98faae9f14dbc 100644
--- a/llvm/lib/Target/X86/X86.h
+++ b/llvm/lib/Target/X86/X86.h
@@ -73,9 +73,6 @@ FunctionPass *createX86OptimizeLEAs();
 /// Return a pass that transforms setcc + movzx pairs into xor + setcc.
 FunctionPass *createX86FixupSetCC();
 
-/// Return a pass that transform inline buffer security check into seperate bb
-FunctionPass *createX86WinFixupBufferSecurityCheckPass();
-
 /// Return a pass that avoids creating store forward block issues in the hardware.
 FunctionPass *createX86AvoidStoreForwardingBlocks();
 
@@ -190,7 +187,6 @@ void initializeX86ExpandPseudoPass(PassRegistry &);
 void initializeX86FastPreTileConfigPass(PassRegistry &);
 void initializeX86FastTileConfigPass(PassRegistry &);
 void initializeX86FixupSetCCPassPass(PassRegistry &);
-void initializeX86WinFixupBufferSecurityCheckPassPass(PassRegistry &);
 void initializeX86FlagsCopyLoweringPassPass(PassRegistry &);
 void initializeX86LoadValueInjectionLoadHardeningPassPass(PassRegistry &);
 void initializeX86LoadValueInjectionRetHardeningPassPass(PassRegistry &);
diff --git a/llvm/lib/Target/X86/X86TargetMachine.cpp b/llvm/lib/Target/X86/X86TargetMachine.cpp
index 4cecbbf27aa30..c5e76b1c38d26 100644
--- a/llvm/lib/Target/X86/X86TargetMachine.cpp
+++ b/llvm/lib/Target/X86/X86TargetMachine.cpp
@@ -552,7 +552,6 @@ bool X86PassConfig::addPreISel() {
 void X86PassConfig::addPreRegAlloc() {
   if (getOptLevel() != CodeGenOptLevel::None) {
     addPass(&LiveRangeShrinkID);
-    addPass(createX86WinFixupBufferSecurityCheckPass());
     addPass(createX86FixupSetCC());
     addPass(createX86OptimizeLEAs());
     addPass(createX86CallFrameOptimization());
diff --git a/llvm/lib/Target/X86/X86WinFixupBufferSecurityCheck.cpp b/llvm/lib/Target/X86/X86WinFixupBufferSecurityCheck.cpp
deleted file mode 100644
index 5c12af1fee637..0000000000000
--- a/llvm/lib/Target/X86/X86WinFixupBufferSecurityCheck.cpp
+++ /dev/null
@@ -1,245 +0,0 @@
-//===- X86WinFixupBufferSecurityCheck.cpp Fix Buffer Security Check Call -===//
-//
-// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
-// See https://llvm.org/LICENSE.txt for license information.
-// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
-//
-//===----------------------------------------------------------------------===//
-// Buffer Security Check implementation inserts windows specific callback into
-// code. On windows, __security_check_cookie call gets call everytime function
-// is return without fixup. Since this function is defined in runtime library,
-// it incures cost of call in dll which simply does comparison and returns most
-// time. With Fixup, We selective move to call in DLL only if comparison fails.
-//===----------------------------------------------------------------------===//
-
-#include "X86.h"
-#include "X86FrameLowering.h"
-#include "X86InstrInfo.h"
-#include "X86Subtarget.h"
-#include "llvm/CodeGen/LivePhysRegs.h"
-#include "llvm/CodeGen/MachineFunctionPass.h"
-#include "llvm/CodeGen/MachineInstrBuilder.h"
-#include "llvm/IR/Module.h"
-
-using namespace llvm;
-
-#define DEBUG_TYPE "x86-win-fixup-bscheck"
-
-namespace {
-
-class X86WinFixupBufferSecurityCheckPass : public MachineFunctionPass {
-public:
-  static char ID;
-
-  X86WinFixupBufferSecurityCheckPass() : MachineFunctionPass(ID) {}
-
-  StringRef getPassName() const override {
-    return "X86 Windows Fixup Buffer Security Check";
-  }
-
-  bool runOnMachineFunction(MachineFunction &MF) override;
-
-  std::pair<MachineBasicBlock *, MachineInstr *>
-  getSecurityCheckerBasicBlock(MachineFunction &MF);
-
-  void getGuardCheckSequence(MachineBasicBlock *CurMBB, MachineInstr *CheckCall,
-                             MachineInstr *SeqMI[5]);
-
-  void SplitBasicBlock(MachineBasicBlock *CurMBB, MachineBasicBlock *NewRetMBB,
-                       MachineBasicBlock::iterator SplitIt);
-
-  void FinishBlock(MachineBasicBlock *MBB);
-
-  void FinishFunction(MachineBasicBlock *FailMBB, MachineBasicBlock *NewRetMBB);
-
-  std::pair<MachineInstr *, MachineInstr *>
-  CreateFailCheckSequence(MachineBasicBlock *CurMBB, MachineBasicBlock *FailMBB,
-                          MachineInstr *SeqMI[5]);
-};
-} // end anonymous namespace
-
-char X86WinFixupBufferSecurityCheckPass::ID = 0;
-
-INITIALIZE_PASS(X86WinFixupBufferSecurityCheckPass, DEBUG_TYPE, DEBUG_TYPE,
-                false, false)
-
-FunctionPass *llvm::createX86WinFixupBufferSecurityCheckPass() {
-  return new X86WinFixupBufferSecurityCheckPass();
-}
-
-void X86WinFixupBufferSecurityCheckPass::SplitBasicBlock(
-    MachineBasicBlock *CurMBB, MachineBasicBlock *NewRetMBB,
-    MachineBasicBlock::iterator SplitIt) {
-  NewRetMBB->splice(NewRetMBB->end(), CurMBB, SplitIt, CurMBB->end());
-}
-
-std::pair<MachineBasicBlock *, MachineInstr *>
-X86WinFixupBufferSecurityCheckPass::getSecurityCheckerBasicBlock(
-    MachineFunction &MF) {
-  MachineBasicBlock::reverse_iterator RBegin, REnd;
-
-  for (auto &MBB : llvm::reverse(MF)) {
-    for (RBegin = MBB.rbegin(), REnd = MBB.rend(); RBegin != REnd; ++RBegin) {
-      auto &MI = *RBegin;
-      if (MI.getOpcode() == X86::CALL64pcrel32 &&
-          MI.getNumExplicitOperands() == 1) {
-        auto MO = MI.getOperand(0);
-        if (MO.isGlobal()) {
-          auto Callee = dyn_cast<Function>(MO.getGlobal());
-          if (Callee && Callee->getName() == "__security_check_cookie") {
-            return std::make_pair(&MBB, &MI);
-            break;
-          }
-        }
-      }
-    }
-  }
-  return std::make_pair(nullptr, nullptr);
-}
-
-void X86WinFixupBufferSecurityCheckPass::getGuardCheckSequence(
-    MachineBasicBlock *CurMBB, MachineInstr *CheckCall,
-    MachineInstr *SeqMI[5]) {
-
-  MachineBasicBlock::iterator UIt(CheckCall);
-  MachineBasicBlock::reverse_iterator DIt(CheckCall);
-  // Seq From StackUp to Stack Down Is fixed.
-  // ADJCALLSTACKUP64
-  ++UIt;
-  SeqMI[4] = &*UIt;
-
-  // CALL __security_check_cookie
-  SeqMI[3] = CheckCall;
-
-  // COPY function slot cookie
-  ++DIt;
-  SeqMI[2] = &*DIt;
-
-  // ADJCALLSTACKDOWN64
-  ++DIt;
-  SeqMI[1] = &*DIt;
-
-  MachineBasicBlock::reverse_iterator XIt(SeqMI[1]);
-  for (; XIt != CurMBB->rbegin(); ++XIt) {
-    auto &CI = *XIt;
-    if ((CI.getOpcode() == X86::XOR64_FP) || (CI.getOpcode() == X86::XOR32_FP))
-      break;
-  }
-  SeqMI[0] = &*XIt;
-}
-
-std::pair<MachineInstr *, MachineInstr *>
-X86WinFixupBufferSecurityCheckPass::CreateFailCheckSequence(
-    MachineBasicBlock *CurMBB, MachineBasicBlock *FailMBB,
-    MachineInstr *SeqMI[5]) {
-
-  auto MF = CurMBB->getParent();
-
-  Module &M = *MF->getFunction().getParent();
-  GlobalVariable *GV = M.getGlobalVariable("__security_cookie");
-  assert(GV && " Security Cookie was not installed!");
-
-  const TargetInstrInfo *TII = MF->getSubtarget().getInstrInfo();
-
-  MachineInstr *GuardXor = SeqMI[0];
-  MachineBasicBlock::iterator InsertPt(GuardXor);
-  ++InsertPt;
-
-  // Compare security_Cookie with XOR_Val, if not same, we have violation
-  auto CMI = BuildMI(*CurMBB, InsertPt, DebugLoc(), TII->get(X86::CMP64rm))
-                 .addReg(GuardXor->getOperand(0).getReg())
-                 .addReg(X86::RIP)
-                 .addImm(1)
-                 .addReg(X86::NoRegister)
-                 .addGlobalAddress(GV)
-                 .addReg(X86::NoRegister);
-
-  BuildMI(*CurMBB, InsertPt, DebugLoc(), TII->get(X86::JCC_1))
-      .addMBB(FailMBB)
-      .addImm(X86::COND_NE);
-
-  auto JMI = BuildMI(*CurMBB, InsertPt, DebugLoc(), TII->get(X86::JMP_1));
-
-  return std::make_pair(CMI.getInstr(), JMI.getInstr());
-}
-
-void X86WinFixupBufferSecurityCheckPass::FinishBlock(MachineBasicBlock *MBB) {
-  LivePhysRegs LiveRegs;
-  computeAndAddLiveIns(LiveRegs, *MBB);
-}
-
-void X86WinFixupBufferSecurityCheckPass::FinishFunction(
-    MachineBasicBlock *FailMBB, MachineBasicBlock *NewRetMBB) {
-  FailMBB->getParent()->RenumberBlocks();
-  // FailMBB includes call to MSCV RT  where is __security_check_cookie
-  // function is called. This function uses regcall and it expects cookie
-  // value from stack slot.( even if this is modified)
-  // Before going further we compute back livein for this block to make sure
-  // it is live and provided.
-  FinishBlock(FailMBB);
-  FinishBlock(NewRetMBB);
-}
-
-bool X86WinFixupBufferSecurityCheckPass::runOnMachineFunction(
-    MachineFunction &MF) {
-  bool Changed = false;
-  const X86Subtarget &STI = MF.getSubtarget<X86Subtarget>();
-
-  if (!(STI.isTargetWindowsItanium() || STI.isTargetWindowsMSVC()))
-    return Changed;
-
-  // Check if security cookie was installed or not
-  Module &M = *MF.getFunction().getParent();
-  GlobalVariable *GV = M.getGlobalVariable("__security_cookie");
-  if (!GV)
-    return Changed;
-
-  const TargetInstrInfo *TII = MF.getSubtarget().getInstrInfo();
-
-  // Check if security check cookie was installed or not
-  auto [CurMBB, CheckCall] = getSecurityCheckerBasicBlock(MF);
-
-  if (!CheckCall)
-    return Changed;
-
-  MachineBasicBlock *FailMBB = MF.CreateMachineBasicBlock();
-  MachineBasicBlock *NewRetMBB = MF.CreateMachineBasicBlock();
-
-  MF.insert(MF.end(), NewRetMBB);
-  MF.insert(MF.end(), FailMBB);
-
-  MachineInstr *SeqMI[5];
-  getGuardCheckSequence(CurMBB, CheckCall, SeqMI);
-  // MachineInstr * GuardXor  = SeqMI[0];
-
-  auto FailSeqRange = CreateFailCheckSequence(CurMBB, FailMBB, SeqMI);
-  MachineInstrBuilder JMI(MF, FailSeqRange.second);
-
-  // After Inserting JMP_1, we can not have two terminators
-  // in same block, split CurrentMBB after JMP_1
-  MachineBasicBlock::iterator SplitIt(SeqMI[4]);
-  ++SplitIt;
-  SplitBasicBlock(CurMBB, NewRetMBB, SplitIt);
-
-  // Fill up Failure Routine, move Fail Check Squence from CurMBB to FailMBB
-  MachineBasicBlock::iterator U1It(SeqMI[1]);
-  MachineBasicBlock::iterator U2It(SeqMI[4]);
-  ++U2It;
-  FailMBB->splice(FailMBB->end(), CurMBB, U1It, U2It);
-  BuildMI(*FailMBB, FailMBB->end(), DebugLoc(), TII->get(X86::INT3));
-
-  // Move left over instruction after StackUp
-  // from Current Basic BLocks into New Return Block
-  JMI.addMBB(NewRetMBB);
-  MachineBasicBlock::iterator SplicePt(JMI.getInstr());
-  ++SplicePt;
-  if (SplicePt != CurMBB->end())
-    NewRetMBB->splice(NewRetMBB->end(), CurMBB, SplicePt);
-
-  // Restructure Basic Blocks
-  CurMBB->addSuccessor(NewRetMBB);
-  CurMBB->addSuccessor(FailMBB);
-
-  FinishFunction(FailMBB, NewRetMBB);
-  return !Changed;
-}
diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/irtranslator-stack-protector-windows.ll b/llvm/test/CodeGen/AArch64/GlobalISel/irtranslator-stack-protector-windows.ll
index 6aefc5341da07..83b964a89d44d 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/irtranslator-stack-protector-windows.ll
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/irtranslator-stack-protector-windows.ll
@@ -17,8 +17,12 @@ define void @caller() sspreq {
 ; CHECK-NEXT:    ldr x8, [x8, :lo12:__security_cookie]
 ; CHECK-NEXT:    str x8, [sp, #8]
 ; CHECK-NEXT:    bl callee
-; CHECK-NEXT:    ldr x0, [sp, #8]
-; CHECK-NEXT:    bl __security_check_cookie
+; CHECK-NEXT:    adrp x8, __security_cookie
+; CHECK-NEXT:    ldr x9, [sp, #8]
+; CHECK-NEXT:    ldr x8, [x8, :lo12:__security_cookie]
+; CHECK-NEXT:    cmp x8, x9
+; CHECK-NEXT:    b.ne    .LBB0_2
+; CHECK-NEXT:  // %bb.1: // %entry
 ; CHECK-NEXT:    .seh_startepilogue
 ; CHECK-NEXT:    ldr x30, [sp, #16] // 8-byte Folded Reload
 ; CHECK-NEXT:    .seh_save_reg x30, 16
@@ -26,6 +30,10 @@ define void @caller() sspreq {
 ; CHECK-NEXT:    .seh_stackalloc 32
 ; CHECK-NEXT:    .seh_endepilogue
 ; CHECK-NEXT:    ret
+; CHECK-NEXT:  .LBB0_2: // %entry
+; CHECK-NEXT:    ldr x0, [sp, #8]
+; CHECK-NEXT:    bl __security_check_cookie
+; CHECK-NEXT:    brk #0x1
 ; CHECK-NEXT:    .seh_endfunclet
 ; CHECK-NEXT:    .seh_endproc
 entry:
diff --git a/llvm/test/CodeGen/X86/opt-pipeline.ll b/llvm/test/CodeGen/X86/opt-pipeline.ll
index d72f517cfb603..5dd368a9e1439 100644
--- a/llvm/test/CodeGen/X86/opt-pipeline.ll
+++ b/llvm/test/CodeGen/X86/opt-pipeline.ll
@@ -121,7 +121,6 @@
 ; CHECK-NEXT:       Peephole Optimizations
 ; CHECK-NEXT:       Remove dead machine instructions
 ; CHECK-NEXT:       Live Range Shrink
-; CHECK-NEXT:       X86 Windows Fixup Buffer Security Check
 ; CHECK-NEXT:       X86 Fixup SetCC
 ; CHECK-NEXT:       Lazy Machine Block Frequency Analysis
 ; CHECK-NEXT:       X86 LEA Optimize
diff --git a/llvm/test/CodeGen/X86/stack-protector-msvc.ll b/llvm/test/CodeGen/X86/stack-protector-msvc.ll
index d718062d2c485..142c7d7b57b4b 100644
--- a/llvm/test/CodeGen/X86/st...
[truncated]

Copy link

github-actions bot commented Apr 18, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

if (const Function *GuardCheckFn = TLI.getSSPStackGuardCheck(M)) {

// First create the loads to the guard/stack slot for the comparison.
EVT PtrTy = TLI.getPointerTy(DAG.getDataLayout());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should essentially never use getPointerTy. The correct pointer type should be implied by some value in context

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The stack protector is an invented value which doesn't have any inherent width from the IR or datalayout. The width is part of the ABI. In practice, on CPU targets it's "pointer width".

We could add a method to TargetLowering, I guess. Something like virtual MVT getStackProtectorTy(const DataLayout &DL) { return getPointerTy(DL, 0); }.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it's really just getFrameIndexTy?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in latest update.

Comment on lines 3122 to 3123
Align Align = DAG.getDataLayout().getPrefTypeAlign(
PointerType::get(M.getContext(), 0));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should use the correct address space pointer, implied from the value

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in latest update.

Copy link
Collaborator

@efriedma-quic efriedma-quic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all makes sense to me; my only high-level question is if we care about preserving the old codegen at -Oz.

if (const Function *GuardCheckFn = TLI.getSSPStackGuardCheck(M)) {

// First create the loads to the guard/stack slot for the comparison.
EVT PtrTy = TLI.getPointerTy(DAG.getDataLayout());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The stack protector is an invented value which doesn't have any inherent width from the IR or datalayout. The width is part of the ABI. In practice, on CPU targets it's "pointer width".

We could add a method to TargetLowering, I guess. Something like virtual MVT getStackProtectorTy(const DataLayout &DL) { return getPointerTy(DL, 0); }.

@omjavaid omjavaid force-pushed the inline_guard_check_windows branch from 32e051b to 86ffb24 Compare May 8, 2025 11:58
@omjavaid
Copy link
Contributor Author

omjavaid commented May 9, 2025

This all makes sense to me; my only high-level question is if we care about preserving the old codegen at -Oz.

Thanks! .. I think we can skip this behavior in case of Oz, for that I am making a change to skip this optimization in the presence of MinSize enabled.

@efriedma-quic
Copy link
Collaborator

Is there anything I can do to help with this?

@omjavaid
Copy link
Contributor Author

omjavaid commented Jun 3, 2025

Is there anything I can do to help with this?

Sorry for the delay regarding, I was OOO. Will post an update during the week. Thanks

@omjavaid omjavaid force-pushed the inline_guard_check_windows branch from 86ffb24 to e935988 Compare June 4, 2025 22:36
@omjavaid
Copy link
Contributor Author

omjavaid commented Jun 4, 2025

This all makes sense to me; my only high-level question is if we care about preserving the old codegen at -Oz.

Fixed in latest update.

Copy link
Collaborator

@efriedma-quic efriedma-quic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

omjavaid added a commit that referenced this pull request Jun 12, 2025
…#143500)

The assertions in llvm/test/CodeGen/X86/tailcc-ssp.ll were outdated. The
initial comment indicated they were generated with
`utils/update_llc_test_checks.py UTC_ARGS: --version 5`, but this was
not accurate based on the file's content.

Running `utils/update_llc_test_checks.py` regenerated the assertions,
aligning them with the current `llc` output.
This commit ensures that the test's claimed behavior accurately reflects
the actual `llc` output, even though the tests were already passing.

This was identified by @efriedma-quic during review of #136290.

Submitting a separate PR to make sure these changes stay isolated.
@omjavaid omjavaid force-pushed the inline_guard_check_windows branch from 16d3cb5 to 6a7151f Compare June 12, 2025 12:17
@omjavaid omjavaid merged commit e1e1836 into llvm:main Jun 12, 2025
7 checks passed
tomtor pushed a commit to tomtor/llvm-project that referenced this pull request Jun 14, 2025
…llvm#143500)

The assertions in llvm/test/CodeGen/X86/tailcc-ssp.ll were outdated. The
initial comment indicated they were generated with
`utils/update_llc_test_checks.py UTC_ARGS: --version 5`, but this was
not accurate based on the file's content.

Running `utils/update_llc_test_checks.py` regenerated the assertions,
aligning them with the current `llc` output.
This commit ensures that the test's claimed behavior accurately reflects
the actual `llc` output, even though the tests were already passing.

This was identified by @efriedma-quic during review of llvm#136290.

Submitting a separate PR to make sure these changes stay isolated.
tomtor pushed a commit to tomtor/llvm-project that referenced this pull request Jun 14, 2025
This patch optimizes the Windows security cookie check mechanism by
moving the comparison inline and only calling __security_check_cookie
when the check fails. This reduces the overhead of making a DLL call 
for every function return.

Previously, we implemented this optimization through a machine pass
(X86WinFixupBufferSecurityCheckPass) in PR llvm#95904 submitted by
@mahesh-attarde. We have reverted that pass in favor of this new 
approach. Also we have abandoned the AArch64 specific implementation 
of same pass in PR llvm#121938 in favor of this more general solution.

The old machine instruction pass approach:
- Scanned the generated code to find __security_check_cookie calls
- Modified these calls by splitting basic blocks
- Added comparison logic and conditional branching
- Required complex block management and live register computation

The new approach:
- Implements the same optimization during instruction selection
- Directly emits the comparison and conditional branching
- No need for post-processing or basic block manipulation
- Disables optimization at -Oz.

Thanks @tamaspetz, @efriedma-quic and @arsenm for their help.
akuhlens pushed a commit to akuhlens/llvm-project that referenced this pull request Jun 24, 2025
…llvm#143500)

The assertions in llvm/test/CodeGen/X86/tailcc-ssp.ll were outdated. The
initial comment indicated they were generated with
`utils/update_llc_test_checks.py UTC_ARGS: --version 5`, but this was
not accurate based on the file's content.

Running `utils/update_llc_test_checks.py` regenerated the assertions,
aligning them with the current `llc` output.
This commit ensures that the test's claimed behavior accurately reflects
the actual `llc` output, even though the tests were already passing.

This was identified by @efriedma-quic during review of llvm#136290.

Submitting a separate PR to make sure these changes stay isolated.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants