Skip to content

[X86][CodeGen] security check cookie execute only when needed #95904

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 19 commits into from
Jul 8, 2024

Conversation

mahesh-attarde
Copy link
Contributor

For windows __security_check_cookie call gets call everytime function is
return without fixup. Since this function is defined in runtime library, it
incures cost of call in dll which simply does comparison and returns most
time. With Fixup, We selective move to call in DLL only if comparison fails.

@llvmbot llvmbot added clang Clang issues not falling into any other category backend:X86 clang:codegen IR generation bugs: mangling, exceptions, etc. labels Jun 18, 2024
@llvmbot
Copy link
Member

llvmbot commented Jun 18, 2024

@llvm/pr-subscribers-backend-x86

@llvm/pr-subscribers-clang-codegen

Author: None (mahesh-attarde)

Changes

For windows __security_check_cookie call gets call everytime function is
return without fixup. Since this function is defined in runtime library, it
incures cost of call in dll which simply does comparison and returns most
time. With Fixup, We selective move to call in DLL only if comparison fails.


Full diff: https://github.com/llvm/llvm-project/pull/95904.diff

9 Files Affected:

  • (modified) clang/lib/CodeGen/Targets/X86.cpp (+20)
  • (added) clang/test/CodeGen/regcall3.c (+53)
  • (modified) llvm/lib/Target/X86/CMakeLists.txt (+1)
  • (modified) llvm/lib/Target/X86/X86.h (+4)
  • (added) llvm/lib/Target/X86/X86FixupStackProtector.cpp (+249)
  • (modified) llvm/lib/Target/X86/X86TargetMachine.cpp (+1)
  • (modified) llvm/test/CodeGen/X86/opt-pipeline.ll (+1)
  • (modified) llvm/test/CodeGen/X86/stack-protector-msvc.ll (+17-5)
  • (modified) llvm/test/CodeGen/X86/tailcc-ssp.ll (+23-4)
diff --git a/clang/lib/CodeGen/Targets/X86.cpp b/clang/lib/CodeGen/Targets/X86.cpp
index 43dadf5e724ac..506d106ad65b0 100644
--- a/clang/lib/CodeGen/Targets/X86.cpp
+++ b/clang/lib/CodeGen/Targets/X86.cpp
@@ -148,6 +148,7 @@ class X86_32ABIInfo : public ABIInfo {
 
   Class classify(QualType Ty) const;
   ABIArgInfo classifyReturnType(QualType RetTy, CCState &State) const;
+
   ABIArgInfo classifyArgumentType(QualType RetTy, CCState &State,
                                   unsigned ArgIndex) const;
 
@@ -1306,6 +1307,8 @@ class X86_64ABIInfo : public ABIInfo {
                                            unsigned &NeededSSE,
                                            unsigned &MaxVectorWidth) const;
 
+  bool DoesRegcallStructFitInReg(QualType Ty) const;
+
   bool IsIllegalVectorType(QualType Ty) const;
 
   /// The 0.98 ABI revision clarified a lot of ambiguities,
@@ -2830,6 +2833,20 @@ X86_64ABIInfo::classifyArgumentType(QualType Ty, unsigned freeIntRegs,
   return ABIArgInfo::getDirect(ResType);
 }
 
+bool X86_64ABIInfo::DoesRegcallStructFitInReg(QualType Ty) const {
+  auto RT = Ty->castAs<RecordType>();
+  // For Integer class, Max GPR Size is 64
+  if (getContext().getTypeSize(Ty) > 64)
+    return false;
+  // Struct At hand must not have other non Builtin types
+  for (const auto *FD : RT->getDecl()->fields()) {
+    QualType MTy = FD->getType();
+    if (!MTy->isBuiltinType())
+      return false;
+  }
+  return true;
+}
+
 ABIArgInfo
 X86_64ABIInfo::classifyRegCallStructTypeImpl(QualType Ty, unsigned &NeededInt,
                                              unsigned &NeededSSE,
@@ -2837,6 +2854,9 @@ X86_64ABIInfo::classifyRegCallStructTypeImpl(QualType Ty, unsigned &NeededInt,
   auto RT = Ty->getAs<RecordType>();
   assert(RT && "classifyRegCallStructType only valid with struct types");
 
+  if (DoesRegcallStructFitInReg(Ty))
+    return classifyArgumentType(Ty, UINT_MAX, NeededInt, NeededSSE, true, true);
+
   if (RT->getDecl()->hasFlexibleArrayMember())
     return getIndirectReturnResult(Ty);
 
diff --git a/clang/test/CodeGen/regcall3.c b/clang/test/CodeGen/regcall3.c
new file mode 100644
index 0000000000000..1c83407220861
--- /dev/null
+++ b/clang/test/CodeGen/regcall3.c
@@ -0,0 +1,53 @@
+// RUN: %clang_cc1 -S %s -o - -ffreestanding -triple=x86_64-unknown-linux-gnu | FileCheck %s --check-prefixes=LINUX64
+
+#include <xmmintrin.h>
+struct struct1 { int x; int y; };
+void __regcall v6(int a, float b, struct struct1 c) {}
+
+void v6_caller(){
+    struct struct1 c0;
+    c0.x = 0xa0a0; c0.y = 0xb0b0;
+    int x= 0xf0f0, y = 0x0f0f;
+    v6(x,y,c0);
+}
+
+// LINUX64-LABEL: __regcall3__v6
+// LINUX64: movq	%rcx, -8(%rsp)
+// LINUX64: movl	%eax, -12(%rsp)
+// LINUX64: movss	%xmm0, -16(%rsp)
+
+// LINUX64-LABEL: v6_caller
+// LINUX64: movl	$41120, 16(%rsp)                # imm = 0xA0A0
+// LINUX64: movl	$45232, 20(%rsp)                # imm = 0xB0B0
+// LINUX64: movl	$61680, 12(%rsp)                # imm = 0xF0F0
+// LINUX64: movl	$3855, 8(%rsp)                  # imm = 0xF0F
+// LINUX64: movl	12(%rsp), %eax
+// LINUX64: cvtsi2ssl	8(%rsp), %xmm0
+// LINUX64: movq	16(%rsp), %rcx
+// LINUX64: callq	.L__regcall3__v6$local
+
+
+struct struct2 { int x; float y; };
+void __regcall v31(int a, float b, struct struct2 c) {}
+
+void v31_caller(){
+    struct struct2 c0;
+    c0.x = 0xa0a0; c0.y = 0xb0b0;
+    int x= 0xf0f0, y = 0x0f0f;
+    v31(x,y,c0);
+}
+
+// LINUX64: __regcall3__v31:                        # @__regcall3__v31
+// LINUX64: 	movq	%rcx, -8(%rsp)
+// LINUX64: 	movl	%eax, -12(%rsp)
+// LINUX64: 	movss	%xmm0, -16(%rsp)
+// LINUX64: v31_caller:                             # @v31_caller
+// LINUX64: 	movl	$41120, 16(%rsp)                # imm = 0xA0A0
+// LINUX64: 	movss	.LCPI3_0(%rip), %xmm0           # xmm0 = [4.5232E+4,0.0E+0,0.0E+0,0.0E+0]
+// LINUX64: 	movss	%xmm0, 20(%rsp)
+// LINUX64: 	movl	$61680, 12(%rsp)                # imm = 0xF0F0
+// LINUX64: 	movl	$3855, 8(%rsp)                  # imm = 0xF0F
+// LINUX64: 	movl	12(%rsp), %eax
+// LINUX64: 	cvtsi2ssl	8(%rsp), %xmm0
+// LINUX64: 	movq	16(%rsp), %rcx
+// LINUX64: 	callq	.L__regcall3__v31$local
diff --git a/llvm/lib/Target/X86/CMakeLists.txt b/llvm/lib/Target/X86/CMakeLists.txt
index 44a54c8ec62cb..5303758ff8a2e 100644
--- a/llvm/lib/Target/X86/CMakeLists.txt
+++ b/llvm/lib/Target/X86/CMakeLists.txt
@@ -48,6 +48,7 @@ set(sources
   X86AvoidStoreForwardingBlocks.cpp
   X86DynAllocaExpander.cpp
   X86FixupSetCC.cpp
+  X86FixupStackProtector.cpp
   X86FlagsCopyLowering.cpp
   X86FloatingPoint.cpp
   X86FrameLowering.cpp
diff --git a/llvm/lib/Target/X86/X86.h b/llvm/lib/Target/X86/X86.h
index fdb9e4cad5e89..b4432f45987cd 100644
--- a/llvm/lib/Target/X86/X86.h
+++ b/llvm/lib/Target/X86/X86.h
@@ -73,6 +73,9 @@ FunctionPass *createX86OptimizeLEAs();
 /// Return a pass that transforms setcc + movzx pairs into xor + setcc.
 FunctionPass *createX86FixupSetCC();
 
+/// Return a pass that transform inline stack protector into seperate bb
+FunctionPass *createX86FixupStackProtectorPass();
+
 /// Return a pass that avoids creating store forward block issues in the hardware.
 FunctionPass *createX86AvoidStoreForwardingBlocks();
 
@@ -186,6 +189,7 @@ void initializeX86ExpandPseudoPass(PassRegistry &);
 void initializeX86FastPreTileConfigPass(PassRegistry &);
 void initializeX86FastTileConfigPass(PassRegistry &);
 void initializeX86FixupSetCCPassPass(PassRegistry &);
+void initializeX86FixupStackProtectorPassPass(PassRegistry &);
 void initializeX86FlagsCopyLoweringPassPass(PassRegistry &);
 void initializeX86LoadValueInjectionLoadHardeningPassPass(PassRegistry &);
 void initializeX86LoadValueInjectionRetHardeningPassPass(PassRegistry &);
diff --git a/llvm/lib/Target/X86/X86FixupStackProtector.cpp b/llvm/lib/Target/X86/X86FixupStackProtector.cpp
new file mode 100644
index 0000000000000..f1355c62cc2c1
--- /dev/null
+++ b/llvm/lib/Target/X86/X86FixupStackProtector.cpp
@@ -0,0 +1,249 @@
+//===---- X86FixupStackProtector.cpp Fix Stack Protector Call ----------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+// Stack Protector implementation inserts platform specific callback into code.
+// For windows __security_check_cookie call gets call everytime function is
+// return without fixup. Since this function is defined in runtime library, it
+// incures cost of call in dll which simply does comparison and returns most
+// time. With Fixup, We selective move to call in DLL only if comparison fails.
+//===----------------------------------------------------------------------===//
+
+#include "X86.h"
+#include "X86FrameLowering.h"
+#include "X86InstrInfo.h"
+#include "X86Subtarget.h"
+#include "llvm/CodeGen/LivePhysRegs.h"
+#include "llvm/CodeGen/MachineFunctionPass.h"
+#include "llvm/CodeGen/MachineInstrBuilder.h"
+#include "llvm/CodeGen/MachineRegisterInfo.h"
+#include "llvm/IR/Module.h"
+#include <iterator>
+
+using namespace llvm;
+
+#define DEBUG_TYPE "x86-fixup-spcall"
+
+namespace {
+
+class X86FixupStackProtectorPass : public MachineFunctionPass {
+public:
+  static char ID;
+
+  X86FixupStackProtectorPass() : MachineFunctionPass(ID) {}
+
+  StringRef getPassName() const override { return "X86 Fixup Stack Protector"; }
+
+  bool runOnMachineFunction(MachineFunction &MF) override;
+
+  std::pair<MachineBasicBlock *, MachineInstr *>
+  getSecurityCheckerBasicBlock(MachineFunction &MF);
+
+  void getGuardCheckSequence(MachineBasicBlock *CurMBB, MachineInstr *CheckCall,
+                             MachineInstr *SeqMI[5]);
+ 
+  void SplitBasicBlock(MachineBasicBlock *CurMBB, MachineBasicBlock *NewRetMBB,
+                       MachineBasicBlock::iterator SplitIt);
+
+  void FinishBlock(MachineBasicBlock *MBB);
+
+  void FinishFunction(MachineBasicBlock *FailMBB, MachineBasicBlock *NewRetMBB);
+
+  std::pair<MachineInstr *, MachineInstr *>
+  CreateFailCheckSequence(MachineBasicBlock *CurMBB, MachineBasicBlock *FailMBB,
+                          MachineInstr *SeqMI[5]);
+};
+} // end anonymous namespace
+
+char X86FixupStackProtectorPass::ID = 0;
+
+INITIALIZE_PASS(X86FixupStackProtectorPass, DEBUG_TYPE, DEBUG_TYPE, false,
+                false)
+
+FunctionPass *llvm::createX86FixupStackProtectorPass() {
+  return new X86FixupStackProtectorPass();
+}
+
+void X86FixupStackProtectorPass::SplitBasicBlock(
+    MachineBasicBlock *CurMBB, MachineBasicBlock *NewRetMBB,
+    MachineBasicBlock::iterator SplitIt) {
+  NewRetMBB->splice(NewRetMBB->end(), CurMBB, SplitIt, CurMBB->end());
+}
+
+std::pair<MachineBasicBlock *, MachineInstr *>
+X86FixupStackProtectorPass::getSecurityCheckerBasicBlock(MachineFunction &MF) {
+  MachineBasicBlock::reverse_iterator RBegin, REnd;
+
+  for (auto &MBB : llvm::reverse(MF)) {
+    for (RBegin = MBB.rbegin(), REnd = MBB.rend(); RBegin != REnd; RBegin++) {
+      auto &MI = *RBegin;
+      if (MI.getOpcode() == X86::CALL64pcrel32 &&
+          MI.getNumExplicitOperands() == 1) {
+        auto MO = MI.getOperand(0);
+        if (MO.isGlobal()) {
+          auto Callee = dyn_cast<Function>(MO.getGlobal());
+          if (Callee && Callee->getName() == "__security_check_cookie") {
+            return std::make_pair(&MBB, &MI);
+            break;
+          }
+        }
+      }
+    }
+  }
+  return std::make_pair(nullptr, nullptr);
+}
+
+void X86FixupStackProtectorPass::getGuardCheckSequence(
+    MachineBasicBlock *CurMBB, MachineInstr *CheckCall,
+    MachineInstr *SeqMI[5]) {
+
+  MachineBasicBlock::iterator UIt(CheckCall);
+  MachineBasicBlock::reverse_iterator DIt(CheckCall);
+  // Seq From StackUp to Stack Down Is fixed.
+  // ADJCALLSTACKUP64
+  UIt++;
+  SeqMI[4] = &*UIt;
+
+  // CALL __security_check_cookie
+  SeqMI[3] = CheckCall;
+
+  // COPY function slot cookie
+  DIt++;
+  SeqMI[2] = &*DIt;
+
+  // ADJCALLSTACKDOWN64
+  DIt++;
+  SeqMI[1] = &*DIt;
+
+  MachineBasicBlock::reverse_iterator XIt(SeqMI[1]);
+  for (; XIt != CurMBB->rbegin(); XIt++) {
+    auto &CI = *XIt;
+    if ((CI.getOpcode() == X86::XOR64_FP) || (CI.getOpcode() == X86::XOR32_FP))
+      break;
+  }
+  SeqMI[0] = &*XIt;
+}
+
+std::pair<MachineInstr *, MachineInstr *>
+X86FixupStackProtectorPass::CreateFailCheckSequence(MachineBasicBlock *CurMBB,
+                                                    MachineBasicBlock *FailMBB,
+                                                    MachineInstr *SeqMI[5]) {
+
+  auto MF = CurMBB->getParent();
+
+  Module &M = *MF->getFunction().getParent();
+  GlobalVariable *GV = M.getGlobalVariable("__security_cookie");
+  assert(GV && " Security Cookie was not installed!");
+
+  MachineRegisterInfo &MRI = MF->getRegInfo();
+  const TargetInstrInfo *TII = MF->getSubtarget().getInstrInfo();
+
+  MachineInstr *GuardXor = SeqMI[0];
+  MachineBasicBlock::iterator InsertPt(GuardXor);
+  InsertPt++;
+  unsigned DestReg = MRI.createVirtualRegister(&X86::GR64RegClass);
+  // MOV security_cookie value into register
+  auto CMI =
+      BuildMI(*CurMBB, InsertPt, DebugLoc(), TII->get(X86::MOV64rm), DestReg)
+          .addReg(X86::RIP)
+          .addImm(1)
+          .addReg(X86::NoRegister)
+          .addGlobalAddress(GV)
+          .addReg(X86::NoRegister);
+
+  // Compare security_Cookie with XOR_Val, if not same, we have violation
+  BuildMI(*CurMBB, InsertPt, DebugLoc(), TII->get(X86::CMP64rr))
+      .addReg(DestReg)
+      .addReg(GuardXor->getOperand(0).getReg());
+
+  BuildMI(*CurMBB, InsertPt, DebugLoc(), TII->get(X86::JCC_1))
+      .addMBB(FailMBB)
+      .addImm(X86::COND_NE);
+
+  auto JMI = BuildMI(*CurMBB, InsertPt, DebugLoc(), TII->get(X86::JMP_1));
+
+  return std::make_pair(CMI.getInstr(), JMI.getInstr());
+}
+
+void X86FixupStackProtectorPass::FinishBlock(MachineBasicBlock *MBB) {
+  LivePhysRegs LiveRegs;
+  computeAndAddLiveIns(LiveRegs, *MBB);
+}
+
+void X86FixupStackProtectorPass::FinishFunction(MachineBasicBlock *FailMBB,
+                                                MachineBasicBlock *NewRetMBB) {
+  FailMBB->getParent()->RenumberBlocks();
+  // FailMBB includes call to MSCV RT  where is __security_check_cookie
+  // function is called. This function uses regcall and it expects cookie
+  // value from stack slot.( even if this is modified)
+  // Before going further we compute back livein for this block to make sure
+  // it is live and provided.
+  FinishBlock(FailMBB);
+  FinishBlock(NewRetMBB);
+}
+
+bool X86FixupStackProtectorPass::runOnMachineFunction(MachineFunction &MF) {
+  bool Changed = false;
+  const X86Subtarget &STI = MF.getSubtarget<X86Subtarget>();
+
+  if (!(STI.isTargetWindowsItanium() || STI.isTargetWindowsMSVC()))
+    return Changed;
+
+  // Check if security cookie was installed or not
+  Module &M = *MF.getFunction().getParent();
+  GlobalVariable *GV = M.getGlobalVariable("__security_cookie");
+  if (!GV)
+    return Changed;
+
+  const TargetInstrInfo *TII = MF.getSubtarget().getInstrInfo();
+
+  // Check if security check cookie was installed or not
+  auto [CurMBB, CheckCall] = getSecurityCheckerBasicBlock(MF);
+
+  if (!CheckCall)
+    return Changed;
+
+  MachineBasicBlock *FailMBB = MF.CreateMachineBasicBlock();
+  MachineBasicBlock *NewRetMBB = MF.CreateMachineBasicBlock();
+
+  MF.insert(MF.end(), NewRetMBB);
+  MF.insert(MF.end(), FailMBB);
+
+  MachineInstr *SeqMI[5];
+  getGuardCheckSequence(CurMBB, CheckCall, SeqMI);
+  // MachineInstr * GuardXor  = SeqMI[0];
+
+  auto FailSeqRange = CreateFailCheckSequence(CurMBB, FailMBB, SeqMI);
+  MachineInstrBuilder JMI(MF, FailSeqRange.second);
+
+  // After Inserting JMP_1, we can not have two terminators
+  // in same block, split CurrentMBB after JMP_1
+  MachineBasicBlock::iterator SplitIt(SeqMI[4]);
+  SplitIt++;
+  SplitBasicBlock(CurMBB, NewRetMBB, SplitIt);
+
+  // Fill up Failure Routine, move Fail Check Squence from CurMBB to FailMBB
+  MachineBasicBlock::iterator U1It(SeqMI[1]);
+  MachineBasicBlock::iterator U2It(SeqMI[4]);
+  U2It++;
+  FailMBB->splice(FailMBB->end(), CurMBB, U1It, U2It);
+  BuildMI(*FailMBB, FailMBB->end(), DebugLoc(), TII->get(X86::INT3));
+
+  // Move left over instruction after StackUp
+  // from Current Basic BLocks into New Return Block
+  JMI.addMBB(NewRetMBB);
+  MachineBasicBlock::iterator SplicePt(JMI.getInstr());
+  SplicePt++;
+  if (SplicePt != CurMBB->end())
+    NewRetMBB->splice(NewRetMBB->end(), CurMBB, SplicePt);
+
+  // Restructure Basic Blocks
+  CurMBB->addSuccessor(NewRetMBB);
+  CurMBB->addSuccessor(FailMBB);
+
+  FinishFunction(FailMBB, NewRetMBB);
+  return !Changed;
+}
diff --git a/llvm/lib/Target/X86/X86TargetMachine.cpp b/llvm/lib/Target/X86/X86TargetMachine.cpp
index d4e642c7df9cf..b245e80ad18dc 100644
--- a/llvm/lib/Target/X86/X86TargetMachine.cpp
+++ b/llvm/lib/Target/X86/X86TargetMachine.cpp
@@ -550,6 +550,7 @@ bool X86PassConfig::addPreISel() {
 void X86PassConfig::addPreRegAlloc() {
   if (getOptLevel() != CodeGenOptLevel::None) {
     addPass(&LiveRangeShrinkID);
+    addPass(createX86FixupStackProtectorPass());
     addPass(createX86FixupSetCC());
     addPass(createX86OptimizeLEAs());
     addPass(createX86CallFrameOptimization());
diff --git a/llvm/test/CodeGen/X86/opt-pipeline.ll b/llvm/test/CodeGen/X86/opt-pipeline.ll
index 15c496bfb7f66..631f955ee6cc0 100644
--- a/llvm/test/CodeGen/X86/opt-pipeline.ll
+++ b/llvm/test/CodeGen/X86/opt-pipeline.ll
@@ -119,6 +119,7 @@
 ; CHECK-NEXT:       Peephole Optimizations
 ; CHECK-NEXT:       Remove dead machine instructions
 ; CHECK-NEXT:       Live Range Shrink
+; CHECK-NEXT:       X86 Fixup Stack Protector
 ; CHECK-NEXT:       X86 Fixup SetCC
 ; CHECK-NEXT:       Lazy Machine Block Frequency Analysis
 ; CHECK-NEXT:       X86 LEA Optimize
diff --git a/llvm/test/CodeGen/X86/stack-protector-msvc.ll b/llvm/test/CodeGen/X86/stack-protector-msvc.ll
index d0b6585f40ffe..f8eb47663fb18 100644
--- a/llvm/test/CodeGen/X86/stack-protector-msvc.ll
+++ b/llvm/test/CodeGen/X86/stack-protector-msvc.ll
@@ -38,8 +38,13 @@ return:    ; preds = %entry
 ; MSVC-X64: callq strcpy
 ; MSVC-X64: movq [[SLOT]](%rsp), %rcx
 ; MSVC-X64: xorq %rsp, %rcx
-; MSVC-X64: callq __security_check_cookie
+; MSVC-X64: movq	__security_cookie(%rip), %rax
+; MSVC-X64: cmpq	%rcx, %rax
+; MSVC-X64: jne	.LBB0_2
 ; MSVC-X64: retq
+; MSVC-X64: LBB0_2:
+; MSVC-X64: callq __security_check_cookie
+; MSVC-X64: int3
 
 ; MSVC-X86-O0-LABEL: _test:
 ; MSVC-X86-O0: movl ___security_cookie, %[[REG1:[^ ]*]]
@@ -97,9 +102,13 @@ define void @test_vla(i32 %n) nounwind ssp {
 ; MSVC-X64: callq escape
 ; MSVC-X64: movq [[SLOT]](%rbp), %rcx
 ; MSVC-X64: xorq %rbp, %rcx
-; MSVC-X64: callq __security_check_cookie
+; MSVC-X64:	movq	__security_cookie(%rip), %rax
+; MSVC-X64:	cmpq	%rcx, %rax
+; MSVC-X64:	jne	.LBB1_2
 ; MSVC-X64: retq
-
+; MSVC-X64: LBB1_2
+; MSVC-X64: callq __security_check_cookie
+; MSVC-X64: int3
 
 ; This case is interesting because we address local variables with RBX but XOR
 ; the guard value with RBP. That's fine, either value will do, as long as they
@@ -148,11 +157,14 @@ define void @test_vla_realign(i32 %n) nounwind ssp {
 ; MSVC-X64: callq escape
 ; MSVC-X64: movq [[SLOT]](%rbx), %rcx
 ; MSVC-X64: xorq %rbp, %rcx
+; MSVC-X64: movq	__security_cookie(%rip), %rax
+; MSVC-X64: cmpq	%rcx, %rax
+; MSVC-X64: jne	.LBB2_2
+; MSVC-X64: retq 
 ; MSVC-X64: callq __security_check_cookie
-; MSVC-X64: retq
+; MSVC-X64: int3
 
 
 declare ptr @strcpy(ptr, ptr) nounwind
 
 declare i32 @printf(ptr, ...) nounwind
-
diff --git a/llvm/test/CodeGen/X86/tailcc-ssp.ll b/llvm/test/CodeGen/X86/tailcc-ssp.ll
index bb9b4429c0761..ad9f4b9d6a4b6 100644
--- a/llvm/test/CodeGen/X86/tailcc-ssp.ll
+++ b/llvm/test/CodeGen/X86/tailcc-ssp.ll
@@ -5,9 +5,20 @@ declare void @h(ptr, i64, ptr)
 
 define tailcc void @tailcall_frame(ptr %0, i64 %1) sspreq {
 ; WINDOWS-LABEL: tailcall_frame:
-; WINDOWS: callq __security_check_cookie
+; WINDOWS: movq	__security_cookie(%rip), %rax
+; WINDOWS: xorq	%rsp, %rax
+; WINDOWS: movq	%rax, {{[0-9]*}}(%rsp)
+; WINDOWS: movq	 {{[0-9]*}}(%rsp), %rcx
+; WINDOWS: xorq	%rsp, %rcx
+; WINDOWS: movq	__security_cookie(%rip), %rax
+; WINDOWS: cmpq	%rcx, %rax
+; WINDOWS: jne	.LBB0_1
 ; WINDOWS: xorl %ecx, %ecx
 ; WINDOWS: jmp h
+; WINDOWS: .LBB0_1
+; WINDOWS: callq __security_check_cookie
+; WINDOWS: int3
+
 
 ; LINUX-LABEL: tailcall_frame:
 ; LINUX: jne
@@ -22,10 +33,18 @@ declare void @bar()
 define void @tailcall_unrelated_frame() sspreq {
 ; WINDOWS-LABEL: tailcall_unrelated_frame:
 ; WINDOWS: subq [[STACK:\$.*]], %rsp
+; WINDOWS: movq	__security_cookie(%rip), %rax
+; WINDOWS: xorq	%rsp, %rax
 ; WINDOWS: callq bar
-; WINDOWS: callq __security_check_cookie
-; WINDOWS: addq [[STACK]], %rsp
-; WINDOWS: jmp bar
+; WINDOWS: movq	 {{[0-9]*}}(%rsp), %rcx
+; WINDOWS: xorq	%rsp, %rcx
+; WINDOWS: movq	__security_cookie(%rip), %rax
+; WINDOWS: cmpq	%rcx, %rax
+; WINDOWS: jne	.LBB1_1
+; WINDOWS: jmp	bar
+; WINDOWS: .LBB1_1
+; WINDOWS: callq	__security_check_cookie
+; WINDOWS: int3
 
 ; LINUX-LABEL: tailcall_unrelated_frame:
 ; LINUX: callq bar

Copy link

github-actions bot commented Jun 18, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@mahesh-attarde
Copy link
Contributor Author

PR not related to targets other than X86_64_windows
MC/AMDGPU/hsa-sym-expr-failure.s Failure seems unrelated

@@ -5,9 +5,20 @@ declare void @h(ptr, i64, ptr)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use update_llc_test_checks.py for this test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -97,9 +102,13 @@ define void @test_vla(i32 %n) nounwind ssp {
; MSVC-X64: callq escape
; MSVC-X64: movq [[SLOT]](%rbp), %rcx
; MSVC-X64: xorq %rbp, %rcx
; MSVC-X64: callq __security_check_cookie
; MSVC-X64: movq __security_cookie(%rip), %rax
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid loading the cookie into a register?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@mahesh-attarde
Copy link
Contributor Author

@efriedma-quic do you have any more comments?

@mahesh-attarde
Copy link
Contributor Author

@phoebewang
My current PR is MachineFunctionPass, this addressed limitations of SelectionDAG based approach.

@mahesh-attarde
Copy link
Contributor Author

@efriedma-quic do you have any more comments?

ping @efriedma-quic

@mahesh-attarde
Copy link
Contributor Author

@RKSimon @asl can you help with review or reviewers for stack protector in x86?

@mahesh-attarde mahesh-attarde changed the title security check cookie execute only when needed [x86][Codegen] security check cookie execute only when needed Jun 24, 2024
@mahesh-attarde mahesh-attarde changed the title [x86][Codegen] security check cookie execute only when needed [x86][CodeGen] security check cookie execute only when needed Jun 24, 2024
@mahesh-attarde mahesh-attarde changed the title [x86][CodeGen] security check cookie execute only when needed [X86][CodeGen] security check cookie execute only when needed Jun 24, 2024
@RKSimon
Copy link
Collaborator

RKSimon commented Jun 24, 2024

@mahesh-attarde please can you rebase against trunk - I've cleaned up the test checks to help with the codegen diff

@mahesh-attarde
Copy link
Contributor Author

@mahesh-attarde please can you rebase against trunk - I've cleaned up the test checks to help with the codegen diff

done.

@mahesh-attarde
Copy link
Contributor Author

ping @MaskRay @RKSimon

@@ -9,95 +9,6 @@
@"\01LC" = internal constant [11 x i8] c"buf == %s\0A\00" ; <ptr> [#uses=1]

define void @test(ptr %a) nounwind ssp {
; MSVC-X86-LABEL: test:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where did the test checks go?

; MSVC-X64: movq [[SLOT]](%rbp), %rcx
; MSVC-X64: xorq %rbp, %rcx
; MSVC-X64: callq __security_check_cookie
; MSVC-X64: retq
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These look to have been manually re-added instead of using the update_llc_test_checks script

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These were carried from old edition of this file. Looks like update llc test added separate content earlier to this.
Fixed it.

@mahesh-attarde
Copy link
Contributor Author

ping @RKSimon @MaskRay

@@ -0,0 +1,247 @@
//===---- X86FixupBufferSecurityCheck.cpp Fix Buffer Security Check Call---===//
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

X86WinFixupBufferSecurityCheck.cpp

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


using namespace llvm;

#define DEBUG_TYPE "x86-fixup-bscheck"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update this name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

X86WinFixupBufferSecurityCheckPass() : MachineFunctionPass(ID) {}

StringRef getPassName() const override {
return "X86 Fixup Buffer Security Check";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"X86 Windows Fixup Buffer Security Check" ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

Copy link
Collaborator

@RKSimon RKSimon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM with a few minors

@mahesh-attarde
Copy link
Contributor Author

ping @MaskRay

@mahesh-attarde
Copy link
Contributor Author

@RKSimon @MaskRay @efriedma-quic
can we merge please?

Copy link
Collaborator

@RKSimon RKSimon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@RKSimon RKSimon merged commit 854bbc5 into llvm:main Jul 8, 2024
7 checks passed
@mahesh-attarde mahesh-attarde deleted the sec_cookie_bb_restruct branch July 8, 2024 17:31
AlexisPerry pushed a commit to llvm-project-tlp/llvm-project that referenced this pull request Jul 9, 2024
AlexisPerry pushed a commit to llvm-project-tlp/llvm-project that referenced this pull request Jul 9, 2024
omjavaid added a commit to omjavaid/llvm-project that referenced this pull request Jan 7, 2025
This patch adds the AArch64WinFixupBufferSecurityCheckPass to optimize
the handling of buffer security checks on AArch64 Windows targets.

The pass selectively replaces __security_check_cookie calls with inline
comparisons, transferring control to the runtime library only on failure.

A similar implementation for X86 Windows target was implemented by llvm#95904
omjavaid added a commit to omjavaid/llvm-project that referenced this pull request Mar 28, 2025
This patch adds the AArch64WinFixupBufferSecurityCheckPass to optimize
the handling of buffer security checks on AArch64 Windows targets.

The pass selectively replaces __security_check_cookie calls with inline
comparisons, transferring control to the runtime library only on failure.

A similar implementation for X86 Windows target was implemented by llvm#95904
omjavaid added a commit to omjavaid/llvm-project that referenced this pull request Apr 18, 2025
omjavaid added a commit to omjavaid/llvm-project that referenced this pull request Jun 4, 2025
omjavaid added a commit to omjavaid/llvm-project that referenced this pull request Jun 12, 2025
omjavaid added a commit that referenced this pull request Jun 12, 2025
This patch optimizes the Windows security cookie check mechanism by
moving the comparison inline and only calling __security_check_cookie
when the check fails. This reduces the overhead of making a DLL call 
for every function return.

Previously, we implemented this optimization through a machine pass
(X86WinFixupBufferSecurityCheckPass) in PR #95904 submitted by
@mahesh-attarde. We have reverted that pass in favor of this new 
approach. Also we have abandoned the AArch64 specific implementation 
of same pass in PR #121938 in favor of this more general solution.

The old machine instruction pass approach:
- Scanned the generated code to find __security_check_cookie calls
- Modified these calls by splitting basic blocks
- Added comparison logic and conditional branching
- Required complex block management and live register computation

The new approach:
- Implements the same optimization during instruction selection
- Directly emits the comparison and conditional branching
- No need for post-processing or basic block manipulation
- Disables optimization at -Oz.

Thanks @tamaspetz, @efriedma-quic and @arsenm for their help.
tomtor pushed a commit to tomtor/llvm-project that referenced this pull request Jun 14, 2025
This patch optimizes the Windows security cookie check mechanism by
moving the comparison inline and only calling __security_check_cookie
when the check fails. This reduces the overhead of making a DLL call 
for every function return.

Previously, we implemented this optimization through a machine pass
(X86WinFixupBufferSecurityCheckPass) in PR llvm#95904 submitted by
@mahesh-attarde. We have reverted that pass in favor of this new 
approach. Also we have abandoned the AArch64 specific implementation 
of same pass in PR llvm#121938 in favor of this more general solution.

The old machine instruction pass approach:
- Scanned the generated code to find __security_check_cookie calls
- Modified these calls by splitting basic blocks
- Added comparison logic and conditional branching
- Required complex block management and live register computation

The new approach:
- Implements the same optimization during instruction selection
- Directly emits the comparison and conditional branching
- No need for post-processing or basic block manipulation
- Disables optimization at -Oz.

Thanks @tamaspetz, @efriedma-quic and @arsenm for their help.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:X86 clang:codegen IR generation bugs: mangling, exceptions, etc. clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants