Skip to content

[StackProtector] Fix phi handling in HasAddressTaken() #129248

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 5, 2025

Conversation

nikic
Copy link
Contributor

@nikic nikic commented Feb 28, 2025

Despite the name, the HasAddressTaken() heuristic identifies not only allocas that have their address taken, but also those that have accesses that cannot be proven to be in-bounds.

However, the current handling for phi nodes is incorrect. Phi nodes are only visited once, and will perform the analysis using whichever (remaining) allocation size is passed the first time the phi node is visited. If it is later visited with a smaller remaining size, which may lead to out of bounds accesses, it will not be detected.

Fix this by keeping track of the smallest seen remaining allocation size and redo the analysis if it is decreased. To avoid degenerate cases (including via loops), limit the number of allowed decreases to a small number.

@llvmbot
Copy link
Member

llvmbot commented Feb 28, 2025

@llvm/pr-subscribers-backend-x86

Author: Nikita Popov (nikic)

Changes

Despite the name, the HasAddressTaken() heuristic identifies not only allocas that have their address taken, but also those that have accesses that cannot be proven to be in-bounds.

However, the current handling for phi nodes is incorrect. Phi nodes are only visited once, and will perform the analysis using whichever (remaining) allocation size is passed the first time the phi node is visited. If it is later visited with a smaller remaining size, which may lead to out of bounds accesses, it will not be detected.

Fix this by keeping track of the smallest seen remaining allocation size and redo the analysis if it is decreased. To avoid degenerate cases (including via loops), limit the number of allowed decreases to a small number.


Full diff: https://github.com/llvm/llvm-project/pull/129248.diff

2 Files Affected:

  • (modified) llvm/lib/CodeGen/StackProtector.cpp (+26-4)
  • (modified) llvm/test/CodeGen/X86/stack-protector-phi.ll (+26-4)
diff --git a/llvm/lib/CodeGen/StackProtector.cpp b/llvm/lib/CodeGen/StackProtector.cpp
index 232e84fb672e0..83f90b4ffd3db 100644
--- a/llvm/lib/CodeGen/StackProtector.cpp
+++ b/llvm/lib/CodeGen/StackProtector.cpp
@@ -251,10 +251,21 @@ static bool ContainsProtectableArray(Type *Ty, Module *M, unsigned SSPBufferSize
   return NeedsProtector;
 }
 
+/// Maximum remaining allocation size obsorved for a phi node, and how often
+/// the allocation size has already been decreased. We only allow a limited
+/// number of decreases.
+struct PhiInfo {
+  TypeSize AllocSize;
+  unsigned NumDecreased = 0;
+  static constexpr unsigned MaxNumDecreased = 3;
+  PhiInfo(TypeSize AllocSize) : AllocSize(AllocSize) {}
+};
+using PhiMap = SmallDenseMap<const PHINode *, PhiInfo, 16>;
+
 /// Check whether a stack allocation has its address taken.
 static bool HasAddressTaken(const Instruction *AI, TypeSize AllocSize,
                             Module *M,
-                            SmallPtrSet<const PHINode *, 16> &VisitedPHIs) {
+                            PhiMap &VisitedPHIs) {
   const DataLayout &DL = M->getDataLayout();
   for (const User *U : AI->users()) {
     const auto *I = cast<Instruction>(U);
@@ -325,9 +336,20 @@ static bool HasAddressTaken(const Instruction *AI, TypeSize AllocSize,
       // Keep track of what PHI nodes we have already visited to ensure
       // they are only visited once.
       const auto *PN = cast<PHINode>(I);
-      if (VisitedPHIs.insert(PN).second)
-        if (HasAddressTaken(PN, AllocSize, M, VisitedPHIs))
+      auto [It, Inserted] = VisitedPHIs.try_emplace(PN, AllocSize);
+      if (!Inserted) {
+        if (TypeSize::isKnownGE(AllocSize, It->second.AllocSize))
+          break;
+
+        // Check again with smaller size.
+        if (It->second.NumDecreased == PhiInfo::MaxNumDecreased)
           return true;
+
+        It->second.AllocSize = AllocSize;
+        ++It->second.NumDecreased;
+      }
+      if (HasAddressTaken(PN, AllocSize, M, VisitedPHIs))
+        return true;
       break;
     }
     case Instruction::Load:
@@ -377,7 +399,7 @@ bool SSPLayoutAnalysis::requiresStackProtector(Function *F,
   // The set of PHI nodes visited when determining if a variable's reference has
   // been taken.  This set is maintained to ensure we don't visit the same PHI
   // node multiple times.
-  SmallPtrSet<const PHINode *, 16> VisitedPHIs;
+  PhiMap VisitedPHIs;
 
   unsigned SSPBufferSize = F->getFnAttributeAsParsedInteger(
       "stack-protector-buffer-size", SSPLayoutInfo::DefaultSSPBufferSize);
diff --git a/llvm/test/CodeGen/X86/stack-protector-phi.ll b/llvm/test/CodeGen/X86/stack-protector-phi.ll
index bf0442dbf47a1..8041869ad8783 100644
--- a/llvm/test/CodeGen/X86/stack-protector-phi.ll
+++ b/llvm/test/CodeGen/X86/stack-protector-phi.ll
@@ -4,16 +4,29 @@
 define void @test_phi_diff_size(i1 %c) sspstrong {
 ; CHECK-LABEL: test_phi_diff_size:
 ; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    subq $24, %rsp
+; CHECK-NEXT:    .cfi_def_cfa_offset 32
+; CHECK-NEXT:    movq %fs:40, %rax
+; CHECK-NEXT:    movq %rax, {{[0-9]+}}(%rsp)
 ; CHECK-NEXT:    testb $1, %dil
 ; CHECK-NEXT:    je .LBB0_1
 ; CHECK-NEXT:  # %bb.2: # %if
-; CHECK-NEXT:    leaq -{{[0-9]+}}(%rsp), %rax
-; CHECK-NEXT:    movq $0, (%rax)
-; CHECK-NEXT:    retq
+; CHECK-NEXT:    leaq {{[0-9]+}}(%rsp), %rax
+; CHECK-NEXT:    jmp .LBB0_3
 ; CHECK-NEXT:  .LBB0_1:
-; CHECK-NEXT:    leaq -{{[0-9]+}}(%rsp), %rax
+; CHECK-NEXT:    leaq {{[0-9]+}}(%rsp), %rax
+; CHECK-NEXT:  .LBB0_3: # %join
 ; CHECK-NEXT:    movq $0, (%rax)
+; CHECK-NEXT:    movq %fs:40, %rax
+; CHECK-NEXT:    cmpq {{[0-9]+}}(%rsp), %rax
+; CHECK-NEXT:    jne .LBB0_5
+; CHECK-NEXT:  # %bb.4: # %SP_return
+; CHECK-NEXT:    addq $24, %rsp
+; CHECK-NEXT:    .cfi_def_cfa_offset 8
 ; CHECK-NEXT:    retq
+; CHECK-NEXT:  .LBB0_5: # %CallStackCheckFailBlk
+; CHECK-NEXT:    .cfi_def_cfa_offset 32
+; CHECK-NEXT:    callq __stack_chk_fail@PLT
 entry:
   %a = alloca i64
   br i1 %c, label %if, label %join
@@ -38,6 +51,8 @@ define void @test_phi_loop(i1 %c) sspstrong {
 ; CHECK-NEXT:    .cfi_def_cfa_register %rbp
 ; CHECK-NEXT:    andq $-131072, %rsp # imm = 0xFFFE0000
 ; CHECK-NEXT:    subq $262144, %rsp # imm = 0x40000
+; CHECK-NEXT:    movq %fs:40, %rax
+; CHECK-NEXT:    movq %rax, {{[0-9]+}}(%rsp)
 ; CHECK-NEXT:    movq %rsp, %rax
 ; CHECK-NEXT:    .p2align 4
 ; CHECK-NEXT:  .LBB1_1: # %loop
@@ -47,10 +62,17 @@ define void @test_phi_loop(i1 %c) sspstrong {
 ; CHECK-NEXT:    testb $1, %dil
 ; CHECK-NEXT:    jne .LBB1_1
 ; CHECK-NEXT:  # %bb.2: # %exit
+; CHECK-NEXT:    movq %fs:40, %rax
+; CHECK-NEXT:    cmpq {{[0-9]+}}(%rsp), %rax
+; CHECK-NEXT:    jne .LBB1_4
+; CHECK-NEXT:  # %bb.3: # %SP_return
 ; CHECK-NEXT:    movq %rbp, %rsp
 ; CHECK-NEXT:    popq %rbp
 ; CHECK-NEXT:    .cfi_def_cfa %rsp, 8
 ; CHECK-NEXT:    retq
+; CHECK-NEXT:  .LBB1_4: # %CallStackCheckFailBlk
+; CHECK-NEXT:    .cfi_def_cfa %rbp, 16
+; CHECK-NEXT:    callq __stack_chk_fail@PLT
 entry:
   %a = alloca <10000 x i64>
   br label %loop

Copy link

github-actions bot commented Feb 28, 2025

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff 5d89123a3962016216e377463b4b3c97df927016 3ac7edca94bae3957f19bb6e181cb9949e7874ff --extensions cpp -- llvm/lib/CodeGen/StackProtector.cpp
View the diff from clang-format here.
diff --git a/llvm/lib/CodeGen/StackProtector.cpp b/llvm/lib/CodeGen/StackProtector.cpp
index 8aa3f9d4f4..e001b3a912 100644
--- a/llvm/lib/CodeGen/StackProtector.cpp
+++ b/llvm/lib/CodeGen/StackProtector.cpp
@@ -264,8 +264,7 @@ using PhiMap = SmallDenseMap<const PHINode *, PhiInfo, 16>;
 
 /// Check whether a stack allocation has its address taken.
 static bool HasAddressTaken(const Instruction *AI, TypeSize AllocSize,
-                            Module *M,
-                            PhiMap &VisitedPHIs) {
+                            Module *M, PhiMap &VisitedPHIs) {
   const DataLayout &DL = M->getDataLayout();
   for (const User *U : AI->users()) {
     const auto *I = cast<Instruction>(U);

Despite the name, the HasAddressTaken() heuristic identifies not
only allocas that have their address taken, but also those that
have accesses that cannot be proven to be in-bounds.

However, the current handling for phi nodes is incorrect. Phi
nodes are only visited once, and will perform the analysis using
whichever (remaining) allocation size is passed the first time
the phi node is visited. If it is later visited with a smaller
remainin size, which may lead to out of bounds accesses, it will
not be detected.

Fix this by keeping track of the smallest seen remaining allocation
size and redo the analysis if it is decreased. To avoid degenerate
cases (including via loops), limit the number of allowed decreases
to a small number.
@nikic nikic force-pushed the stack-protector-phi branch from 3ea48c7 to 9f7c0c2 Compare February 28, 2025 14:16
@john-brawn-arm
Copy link
Collaborator

Repeatedly traversing the same phi until we hit an arbitrary recursion limit doesn't seem like a great solution. If we have a gep that reduces the remaining allocation size, and that gep is based on a phi which has the gep as an input, then we can assume that the remaining allocation size will go to zero. I have a feeling that we can probably figure this out for sure in two passes (forward pass from alloca to meminst, backward pass from meminst to alloca), I'll think about this some more and see if I can figure out something.

@john-brawn-arm
Copy link
Collaborator

I think that something like this should work:

  • Switch to iterating over a worklist instead of recursing
  • First pass working forward from the alloca which:
    • Keeps track of how much geps have reduced the effective allocation size, and store this per-instruction in a map. The value resets to zero when we see a phi.
    • Keeps track of the most recent phi and uses that to construct a map of instruction to most recent phi (null when the instruction is itself a phi).
    • Finds the memory instructions and notes most recent phi and accumulated size reduction.
  • Second pass goes through each memory instruction:
    • If it doesn't have a most-recent-phi, then use the size reduction at that point to determine if it's out-of-bounds
    • If it does then add the inputs of the most-recent-phi, with current effective size reduction, to the worklist
    • If worklist entry is not a phi, then increase current size reduction by instruction amount. If phi[entry]!=null then add it to worklist, else add current size reduction to size reduction terminal values.
    • If worklist entry is a phi, then add its inputs to the worklist.
    • If we see a phi more than once then we have a loop and assume the size goes to zero, so return true.
    • Otherwise once worklist is empty take the largest terminal size reduction value and check memory access size <= alloca size - size reduction

As an example of how I think this would work, if we have the following function pseudo-IR:

start:
a = alloca alloc_size
x = gep a, 1
br cond if1, middle

if1:
y = gep x, 1
br middle

middle:
p1 = phi (x, y)
br loop

loop:
p2 = phi (p1, z)
br cond exit1, if2

exit1:
store val1, p1

if2:
z = gep p2, 1
br cond loop, exit2

exit2:
store val2, z

Then it would proceed like so:

First pass:
a:   sr[a]=0,   phi[a]=null, add (x,0,null) to worklist
x:   sr[x]=1,   phi[x]=null, add [(y,1,null),(p1,1,null)] to worklist
y:   sr[y]=2,   phi[y]=null, add (p1,2,null) to worklist
p1:  sr[p1]=0,  phi[p1]=null, add [(store val1,0,p1),(p2,0,p1)] to worklist
p1:  already seen, skip
sv1: sr[sv1]=0, phi[sv1]=p1, add sv1 to list for second pass
p2:  sr[p2]=0,  phi[p2]=null, add (z,0,p2) to worklist
z:   sr[z]=1,   phi[z]=p2, add [(p2,1,p2), (store val2,1,p2)] to worklist
p2:  already seen, skip
sv2: sr[sv2]=1, phi[sv2]=p2, add sv2 to list for second pass

Second Pass:
sv1 has phi[sv1]=p1, initial sr=sr[sv1]=0, add [(x,0),(y,0)] to worklist
x: phi[x]=null, add 0+sr[x]=1 to terminal values
y: phi[y]=null, add 0+sr[y]=2 to terminal values
End of worklist, greatest terminal value is 2, so check store size <= alloc_size-2

sv2 has phi[sv2]=p2, initial sr=sr[sv2]=1, add [(p1,1),(z,1)] to worklist
p1: is a phi, add [(x,1), (y,1)] to worklist
z:  phi[z]=p2, add (p2,2) to worklist
x:  phi[x]=null, add 1+sr[x]=2 to terminal values
y:  phi[y]=null, add 1+sr[y]=3 to terminal values
p2: already seen, so we have a loop and return true

Copy link
Collaborator

@john-brawn-arm john-brawn-arm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've gotten most of the way through implementing my idea to check that it works, so I think that it makes sense to approve this so you can merge it (because it does fix this bug) and then I'll push a patch for review.

@nikic nikic merged commit 53c1579 into llvm:main Mar 5, 2025
10 of 11 checks passed
@nikic nikic deleted the stack-protector-phi branch March 5, 2025 11:45
jph-13 pushed a commit to jph-13/llvm-project that referenced this pull request Mar 21, 2025
Despite the name, the HasAddressTaken() heuristic identifies not only
allocas that have their address taken, but also those that have accesses
that cannot be proven to be in-bounds.

However, the current handling for phi nodes is incorrect. Phi nodes are
only visited once, and will perform the analysis using whichever
(remaining) allocation size is passed the first time the phi node is
visited. If it is later visited with a smaller remaining size, which may
lead to out of bounds accesses, it will not be detected.

Fix this by keeping track of the smallest seen remaining allocation size
and redo the analysis if it is decreased. To avoid degenerate cases
(including via loops), limit the number of allowed decreases to a small
number.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants