[llvm] Improve llvm.objectsize computation by computing GEP, alloca a… #117849

serge-sans-paille · 2024-11-27T06:16:38Z

…nd malloc parameters bound

Using a naive expression walker, it is possible to compute valuable information for allocation functions, GEP and alloca, even in the presence of some dynamic information.

We don't rely on computeConstantRange to avoid taking advantage of undefined behavior, which would be counter-productive wrt. usual llvm.objectsize usage.

llvm.objectsize plays an important role in _FORTIFY_SOURCE definitions, so improving its diagnostic in turns improves the security of compiled application.

As a side note, as a result of recent optimization improvements, clang no longer passes https://github.com/serge-sans-paille/builtin_object_size-test-suite This commit restores the situation and greatly improves the scope of code handled by the static version of __builtin_object_size.

This is a recommit of #115522 with fix applied.

llvmbot · 2024-11-27T06:17:11Z

@llvm/pr-subscribers-llvm-ir
@llvm/pr-subscribers-llvm-analysis
@llvm/pr-subscribers-flang-runtime

@llvm/pr-subscribers-llvm-transforms

Author: None (serge-sans-paille)

Changes

…nd malloc parameters bound

Using a naive expression walker, it is possible to compute valuable information for allocation functions, GEP and alloca, even in the presence of some dynamic information.

We don't rely on computeConstantRange to avoid taking advantage of undefined behavior, which would be counter-productive wrt. usual llvm.objectsize usage.

llvm.objectsize plays an important role in _FORTIFY_SOURCE definitions, so improving its diagnostic in turns improves the security of compiled application.

As a side note, as a result of recent optimization improvements, clang no longer passes https://github.com/serge-sans-paille/builtin_object_size-test-suite This commit restores the situation and greatly improves the scope of code handled by the static version of __builtin_object_size.

This is a recommit of #115522 with fix applied.

Full diff: https://github.com/llvm/llvm-project/pull/117849.diff

5 Files Affected:

(modified) flang/include/flang/Runtime/descriptor.h (+1)
(modified) flang/runtime/io-api-minimal.cpp (+1-1)
(modified) llvm/include/llvm/IR/Value.h (+8-4)
(modified) llvm/lib/Analysis/MemoryBuiltins.cpp (+103-4)
(added) llvm/test/Transforms/LowerConstantIntrinsics/builtin-object-size-range.ll (+109)

diff --git a/flang/include/flang/Runtime/descriptor.h b/flang/include/flang/Runtime/descriptor.h
index 030d0c1031fbaa..d674a8e2229a04 100644
--- a/flang/include/flang/Runtime/descriptor.h
+++ b/flang/include/flang/Runtime/descriptor.h
@@ -467,6 +467,7 @@ class alignas(Descriptor) StaticDescriptor {
   RT_OFFLOAD_VAR_GROUP_END
 
   RT_API_ATTRS Descriptor &descriptor() {
+
     return *reinterpret_cast<Descriptor *>(storage_);
   }
   RT_API_ATTRS const Descriptor &descriptor() const {
diff --git a/flang/runtime/io-api-minimal.cpp b/flang/runtime/io-api-minimal.cpp
index ad76fe3de0324c..a06d3d12f8f005 100644
--- a/flang/runtime/io-api-minimal.cpp
+++ b/flang/runtime/io-api-minimal.cpp
@@ -150,7 +150,7 @@ bool IODEF(OutputLogical)(Cookie cookie, bool truth) {
 // Provide own definition for `std::__libcpp_verbose_abort` to avoid dependency
 // on the version provided by libc++.
 
-void std::__libcpp_verbose_abort(char const *format, ...) {
+void std::__libcpp_verbose_abort(char const *format, ...) noexcept {
   va_list list;
   va_start(list, format);
   std::vfprintf(stderr, format, list);
diff --git a/llvm/include/llvm/IR/Value.h b/llvm/include/llvm/IR/Value.h
index 945081b77e9536..d444a768a65436 100644
--- a/llvm/include/llvm/IR/Value.h
+++ b/llvm/include/llvm/IR/Value.h
@@ -723,12 +723,16 @@ class Value {
       bool AllowInvariantGroup = false,
       function_ref<bool(Value &Value, APInt &Offset)> ExternalAnalysis =
           nullptr) const;
-  Value *stripAndAccumulateConstantOffsets(const DataLayout &DL, APInt &Offset,
-                                           bool AllowNonInbounds,
-                                           bool AllowInvariantGroup = false) {
+
+  Value *stripAndAccumulateConstantOffsets(
+      const DataLayout &DL, APInt &Offset, bool AllowNonInbounds,
+      bool AllowInvariantGroup = false,
+      function_ref<bool(Value &Value, APInt &Offset)> ExternalAnalysis =
+          nullptr) {
     return const_cast<Value *>(
         static_cast<const Value *>(this)->stripAndAccumulateConstantOffsets(
-            DL, Offset, AllowNonInbounds, AllowInvariantGroup));
+            DL, Offset, AllowNonInbounds, AllowInvariantGroup,
+            ExternalAnalysis));
   }
 
   /// This is a wrapper around stripAndAccumulateConstantOffsets with the
diff --git a/llvm/lib/Analysis/MemoryBuiltins.cpp b/llvm/lib/Analysis/MemoryBuiltins.cpp
index 4028d5f4e2e1b8..93dfdfadff11be 100644
--- a/llvm/lib/Analysis/MemoryBuiltins.cpp
+++ b/llvm/lib/Analysis/MemoryBuiltins.cpp
@@ -670,6 +670,68 @@ STATISTIC(ObjectVisitorArgument,
 STATISTIC(ObjectVisitorLoad,
           "Number of load instructions with unsolved size and offset");
 
+static std::optional<APInt>
+combinePossibleConstantValues(std::optional<APInt> LHS,
+                              std::optional<APInt> RHS,
+                              ObjectSizeOpts::Mode EvalMode) {
+  if (!LHS || !RHS)
+    return std::nullopt;
+  if (EvalMode == ObjectSizeOpts::Mode::Max)
+    return LHS->sge(*RHS) ? *LHS : *RHS;
+  else
+    return LHS->sle(*RHS) ? *LHS : *RHS;
+}
+
+static std::optional<APInt> aggregatePossibleConstantValuesImpl(
+    const Value *V, ObjectSizeOpts::Mode EvalMode, unsigned recursionDepth) {
+  constexpr unsigned maxRecursionDepth = 4;
+  if (recursionDepth == maxRecursionDepth)
+    return std::nullopt;
+
+  if (const auto *CI = dyn_cast<ConstantInt>(V)) {
+    return CI->getValue();
+  }
+
+  else if (const auto *SI = dyn_cast<SelectInst>(V)) {
+    return combinePossibleConstantValues(
+        aggregatePossibleConstantValuesImpl(SI->getTrueValue(), EvalMode,
+                                            recursionDepth + 1),
+        aggregatePossibleConstantValuesImpl(SI->getFalseValue(), EvalMode,
+                                            recursionDepth + 1),
+        EvalMode);
+  }
+  else if (const auto *PN = dyn_cast<PHINode>(V)) {
+    unsigned Count = PN->getNumIncomingValues();
+    if (Count == 0)
+      return std::nullopt;
+    auto Acc = aggregatePossibleConstantValuesImpl(
+        PN->getIncomingValue(0), EvalMode, recursionDepth + 1);
+    for (unsigned I = 1; Acc && I < Count; ++I) {
+      auto Tmp = aggregatePossibleConstantValuesImpl(
+          PN->getIncomingValue(I), EvalMode, recursionDepth + 1);
+      Acc = combinePossibleConstantValues(Acc, Tmp, EvalMode);
+    }
+    return Acc;
+  }
+
+  return std::nullopt;
+}
+
+static std::optional<APInt>
+aggregatePossibleConstantValues(const Value *V, ObjectSizeOpts::Mode EvalMode) {
+  if (auto *CI = dyn_cast<ConstantInt>(V))
+    return CI->getValue();
+
+  if (EvalMode != ObjectSizeOpts::Mode::Min &&
+      EvalMode != ObjectSizeOpts::Mode::Max)
+    return std::nullopt;
+
+  // Not using computeConstantRange here because we cannot guarantee it's not
+  // doing optimization based on UB which we want to avoid when expanding
+  // __builtin_object_size.
+  return aggregatePossibleConstantValuesImpl(V, EvalMode, 0u);
+}
+
 /// Align \p Size according to \p Alignment. If \p Size is greater than
 /// getSignedMaxValue(), set it as unknown as we can only represent signed value
 /// in OffsetSpan.
@@ -717,11 +779,36 @@ OffsetSpan ObjectSizeOffsetVisitor::computeImpl(Value *V) {
   V = V->stripAndAccumulateConstantOffsets(
       DL, Offset, /* AllowNonInbounds */ true, /* AllowInvariantGroup */ true);
 
+  // Give it another try with approximated analysis. We don't start with this
+  // one because stripAndAccumulateConstantOffsets behaves differently wrt.
+  // overflows if we provide an external Analysis.
+  if ((Options.EvalMode == ObjectSizeOpts::Mode::Min ||
+       Options.EvalMode == ObjectSizeOpts::Mode::Max) &&
+      isa<GEPOperator>(V)) {
+    // External Analysis used to compute the Min/Max value of individual Offsets
+    // within a GEP.
+    ObjectSizeOpts::Mode EvalMode =
+        Options.EvalMode == ObjectSizeOpts::Mode::Min
+            ? ObjectSizeOpts::Mode::Max
+            : ObjectSizeOpts::Mode::Min;
+    auto OffsetRangeAnalysis = [EvalMode](Value &VOffset, APInt &Offset) {
+      if (auto PossibleOffset =
+              aggregatePossibleConstantValues(&VOffset, EvalMode)) {
+        Offset = *PossibleOffset;
+        return true;
+      }
+      return false;
+    };
+
+    V = V->stripAndAccumulateConstantOffsets(
+        DL, Offset, /* AllowNonInbounds */ true, /* AllowInvariantGroup */ true,
+        /*ExternalAnalysis=*/OffsetRangeAnalysis);
+  }
+
   // Later we use the index type size and zero but it will match the type of the
   // value that is passed to computeImpl.
   IntTyBits = DL.getIndexTypeSizeInBits(V->getType());
   Zero = APInt::getZero(IntTyBits);
-
   OffsetSpan ORT = computeValue(V);
 
   bool IndexTypeSizeChanged = InitialIntTyBits != IntTyBits;
@@ -813,8 +900,9 @@ OffsetSpan ObjectSizeOffsetVisitor::visitAllocaInst(AllocaInst &I) {
     return OffsetSpan(Zero, align(Size, I.getAlign()));
 
   Value *ArraySize = I.getArraySize();
-  if (const ConstantInt *C = dyn_cast<ConstantInt>(ArraySize)) {
-    APInt NumElems = C->getValue();
+  if (auto PossibleSize =
+          aggregatePossibleConstantValues(ArraySize, Options.EvalMode)) {
+    APInt NumElems = *PossibleSize;
     if (!CheckedZextOrTrunc(NumElems))
       return ObjectSizeOffsetVisitor::unknown();
 
@@ -840,7 +928,18 @@ OffsetSpan ObjectSizeOffsetVisitor::visitArgument(Argument &A) {
 }
 
 OffsetSpan ObjectSizeOffsetVisitor::visitCallBase(CallBase &CB) {
-  if (std::optional<APInt> Size = getAllocSize(&CB, TLI)) {
+  auto Mapper = [this](const Value *V) -> const Value * {
+    if (!V->getType()->isIntegerTy())
+      return V;
+
+    if (auto PossibleBound =
+            aggregatePossibleConstantValues(V, Options.EvalMode))
+      return ConstantInt::get(V->getType(), *PossibleBound);
+
+    return V;
+  };
+
+  if (std::optional<APInt> Size = getAllocSize(&CB, TLI, Mapper)) {
     // Very large unsigned value cannot be represented as OffsetSpan.
     if (Size->isNegative())
       return ObjectSizeOffsetVisitor::unknown();
diff --git a/llvm/test/Transforms/LowerConstantIntrinsics/builtin-object-size-range.ll b/llvm/test/Transforms/LowerConstantIntrinsics/builtin-object-size-range.ll
new file mode 100644
index 00000000000000..f84ebee1442893
--- /dev/null
+++ b/llvm/test/Transforms/LowerConstantIntrinsics/builtin-object-size-range.ll
@@ -0,0 +1,109 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
+; RUN: opt -passes=lower-constant-intrinsics -S < %s | FileCheck %s
+
+target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-unknown-linux-gnu"
+
+declare i64 @llvm.objectsize.i64.p0(ptr, i1 immarg, i1 immarg, i1 immarg)
+declare noalias ptr @malloc(i64 noundef) #0
+
+define i64 @select_alloc_size(i1 %cond) {
+; CHECK-LABEL: @select_alloc_size(
+; CHECK-NEXT:    [[SIZE:%.*]] = select i1 [[COND:%.*]], i64 3, i64 4
+; CHECK-NEXT:    [[PTR:%.*]] = alloca i8, i64 [[SIZE]], align 1
+; CHECK-NEXT:    [[RES:%.*]] = select i1 [[COND]], i64 4, i64 3
+; CHECK-NEXT:    ret i64 [[RES]]
+;
+  %size = select i1 %cond, i64 3, i64 4
+  %ptr = alloca i8, i64 %size
+  %objsize_max = call i64 @llvm.objectsize.i64.p0(ptr %ptr, i1 false, i1 true, i1 false)
+  %objsize_min = call i64 @llvm.objectsize.i64.p0(ptr %ptr, i1 true, i1 true, i1 false)
+  %res = select i1 %cond, i64 %objsize_max, i64 %objsize_min
+  ret i64 %res
+}
+
+define i64 @select_malloc_size(i1 %cond) {
+; CHECK-LABEL: @select_malloc_size(
+; CHECK-NEXT:    [[SIZE:%.*]] = select i1 [[COND:%.*]], i64 3, i64 4
+; CHECK-NEXT:    [[PTR:%.*]] = call noalias ptr @malloc(i64 noundef [[SIZE]])
+; CHECK-NEXT:    [[RES:%.*]] = select i1 [[COND]], i64 4, i64 3
+; CHECK-NEXT:    ret i64 [[RES]]
+;
+  %size = select i1 %cond, i64 3, i64 4
+  %ptr = call noalias ptr @malloc(i64 noundef %size)
+  %objsize_max = call i64 @llvm.objectsize.i64.p0(ptr %ptr, i1 false, i1 true, i1 false)
+  %objsize_min = call i64 @llvm.objectsize.i64.p0(ptr %ptr, i1 true, i1 true, i1 false)
+  %res = select i1 %cond, i64 %objsize_max, i64 %objsize_min
+  ret i64 %res
+}
+
+define i64 @select_gep_offset(i1 %cond) {
+; CHECK-LABEL: @select_gep_offset(
+; CHECK-NEXT:    [[PTR:%.*]] = alloca i8, i64 10, align 1
+; CHECK-NEXT:    [[OFFSET:%.*]] = select i1 [[COND:%.*]], i64 3, i64 4
+; CHECK-NEXT:    [[PTR_SLIDE:%.*]] = getelementptr inbounds i8, ptr [[PTR]], i64 [[OFFSET]]
+; CHECK-NEXT:    [[RES:%.*]] = select i1 [[COND]], i64 7, i64 6
+; CHECK-NEXT:    ret i64 [[RES]]
+;
+  %ptr = alloca i8, i64 10
+  %offset = select i1 %cond, i64 3, i64 4
+  %ptr.slide = getelementptr inbounds i8, ptr %ptr, i64 %offset
+  %objsize_max = call i64 @llvm.objectsize.i64.p0(ptr %ptr.slide, i1 false, i1 true, i1 false)
+  %objsize_min = call i64 @llvm.objectsize.i64.p0(ptr %ptr.slide, i1 true, i1 true, i1 false)
+  %res = select i1 %cond, i64 %objsize_max, i64 %objsize_min
+  ret i64 %res
+}
+
+define i64 @select_gep_neg_offset(i1 %c0, i1 %c1) {
+; CHECK-LABEL: @select_gep_neg_offset(
+; CHECK-NEXT:    [[PTR:%.*]] = alloca i8, i64 10, align 1
+; CHECK-NEXT:    [[PTR_SLIDE_1:%.*]] = getelementptr inbounds i8, ptr [[PTR]], i64 5
+; CHECK-NEXT:    [[OFFSET:%.*]] = select i1 [[COND:%.*]], i64 -3, i64 -4
+; CHECK-NEXT:    [[PTR_SLIDE_2:%.*]] = getelementptr inbounds i8, ptr [[PTR_SLIDE_1]], i64 [[OFFSET]]
+; CHECK-NEXT:    [[RES:%.*]] = select i1 [[C1:%.*]], i64 9, i64 8
+; CHECK-NEXT:    ret i64 [[RES]]
+;
+  %ptr = alloca i8, i64 10
+  %ptr.slide.1 = getelementptr inbounds i8, ptr %ptr, i64 5
+  %offset = select i1 %c0, i64 -3, i64 -4
+  %ptr.slide.2 = getelementptr inbounds i8, ptr %ptr.slide.1, i64 %offset
+  %objsize_max = call i64 @llvm.objectsize.i64.p0(ptr %ptr.slide.2, i1 false, i1 true, i1 false)
+  %objsize_min = call i64 @llvm.objectsize.i64.p0(ptr %ptr.slide.2, i1 true, i1 true, i1 false)
+  %res = select i1 %c1, i64 %objsize_max, i64 %objsize_min
+  ret i64 %res
+}
+
+define i64 @select_neg_oob_offset(i1 %c0, i1 %c1) {
+; CHECK-LABEL: @select_neg_oob_offset(
+; CHECK-NEXT:    [[PTR:%.*]] = alloca i8, i64 10, align 1
+; CHECK-NEXT:    [[OFFSET:%.*]] = select i1 [[C0:%.*]], i64 -3, i64 -4
+; CHECK-NEXT:    [[PTR_SLIDE:%.*]] = getelementptr inbounds i8, ptr [[PTR]], i64 [[OFFSET]]
+; CHECK-NEXT:    ret i64 0
+;
+  %ptr = alloca i8, i64 10
+  %offset = select i1 %c0, i64 -3, i64 -4
+  %ptr.slide = getelementptr inbounds i8, ptr %ptr, i64 %offset
+  %objsize_max = call i64 @llvm.objectsize.i64.p0(ptr %ptr.slide, i1 false, i1 true, i1 false)
+  %objsize_min = call i64 @llvm.objectsize.i64.p0(ptr %ptr.slide, i1 true, i1 true, i1 false)
+  %res = select i1 %c1, i64 %objsize_max, i64 %objsize_min
+  ret i64 %res
+}
+
+define i64 @select_gep_offsets(i1 %cond) {
+; CHECK-LABEL: @select_gep_offsets(
+; CHECK-NEXT:    [[PTR:%.*]] = alloca [10 x i8], i64 2, align 1
+; CHECK-NEXT:    [[OFFSET:%.*]] = select i1 [[COND:%.*]], i32 0, i32 1
+; CHECK-NEXT:    [[PTR_SLIDE:%.*]] = getelementptr inbounds [10 x i8], ptr [[PTR]], i32 [[OFFSET]], i32 5
+; CHECK-NEXT:    [[RES:%.*]] = select i1 [[COND]], i64 15, i64 5
+; CHECK-NEXT:    ret i64 [[RES]]
+;
+  %ptr = alloca [10 x i8], i64 2
+  %offset = select i1 %cond, i32 0, i32 1
+  %ptr.slide = getelementptr inbounds [10 x i8], ptr %ptr, i32 %offset, i32 5
+  %objsize_max = call i64 @llvm.objectsize.i64.p0(ptr %ptr.slide, i1 false, i1 true, i1 false)
+  %objsize_min = call i64 @llvm.objectsize.i64.p0(ptr %ptr.slide, i1 true, i1 true, i1 false)
+  %res = select i1 %cond, i64 %objsize_max, i64 %objsize_min
+  ret i64 %res
+}
+
+attributes #0 = { nounwind allocsize(0) }

serge-sans-paille · 2024-11-27T06:19:05Z

@fmayer I think I fixed the issue we've been discussing in the previous attempt (there was a typo in the phi node handling), would you mind giving this PR a try? thanks!

github-actions · 2024-11-27T06:19:49Z

✅ With the latest revision this PR passed the C/C++ code formatter.

…nd malloc parameters bound Using a naive expression walker, it is possible to compute valuable information for allocation functions, GEP and alloca, even in the presence of some dynamic information. We don't rely on computeConstantRange to avoid taking advantage of undefined behavior, which would be counter-productive wrt. usual llvm.objectsize usage. llvm.objectsize plays an important role in _FORTIFY_SOURCE definitions, so improving its diagnostic in turns improves the security of compiled application. As a side note, as a result of recent optimization improvements, clang no longer passes https://github.com/serge-sans-paille/builtin_object_size-test-suite This commit restores the situation and greatly improves the scope of code handled by the static version of __builtin_object_size.

fmayer · 2024-11-28T04:25:27Z

Thanks! I'm on vacation, @vitalybuka could you please take a look?

serge-sans-paille · 2024-11-28T19:51:43Z

@vitalybuka / @fmayer : I've been able to reproduce the ppc issue on a 4rd party machine and can confirm this version of the patch fixes it \o/

serge-sans-paille · 2024-12-03T06:33:27Z

gentle ping 🙇

vitalybuka · 2024-12-03T06:49:12Z

If it's just re-commit than neither me not @fmayer have access to the ppc bot. For that we had to rely on post submit testing by that bot from revert.

Patch itself does not look trivial and I don't see anyone approved the original. Maybe @nikic needs to take a look?

serge-sans-paille · 2024-12-03T08:34:34Z

The original was actually #114673 which got approved by @hvdijk , maybe he could have a look at this one? I ran the manual testing on ppc and it was fine.

serge-sans-paille · 2024-12-06T11:39:16Z

@hvdijk : gentle ping :-)

serge-sans-paille · 2024-12-09T20:45:10Z

given that this was previously accepted, and that I've reproduced and fixed the bug on PPC, I'll go ahead and merge that one tomorrow.

hvdijk · 2024-12-16T15:12:28Z

@hvdijk : gentle ping :-)

Apologies for the late reply, I have been on holiday. In #115504 I commented that I am concerned that that resulted in incorrect results in some cases, which you were intending to fix in a subsequent PR. I cannot tell whether the current implementation is or is not meant to address those cases. If it is, great! If not, then I cannot offer a meaningful review. For reviewing PRs, I want to make sure that even if they are not perfect yet, they at least do not make things any worse. But this being the rather complicated bit of logic that it is, if there are known incorrect results in some cases, I am not capable of grasping the logic well enough to make sure that a PR is an improvement.

mstorsjo · 2024-12-18T09:48:02Z

This causes some cases of buffer overflow checks to fail - we ran into this in llvm/llvm-test-suite#188.

I've managed to reduce the issue down to a small standalone reproducer:

void *checkasm_check_func(void *func, const char *name, ...);
void randomize_loopfilter_buffers(int lineoff, int str,
                                         int dir, int flim_E, int flim_I,
                                         int hev_thresh, unsigned char *buf,
                                         int force_hev);
void fill_loopfilter_buffers(unsigned char *buf, int stride, int w, int h);
void func(void *ptr);

#define memcpy(dst, src, n) __builtin___memcpy_chk((dst), (src), (n), __builtin_object_size((dst), 0))

void check_loopfilter_16y(void)
{
    unsigned char base0[32 + 16 * 16];
    unsigned char base1[32 + 16 * 16];
    int dir, edge, force_hev;
    int flim_E = 20, flim_I = 10, hev_thresh = 7;

    for (dir = 0; dir < 2; dir++) {
        int midoff = dir ? 4 * 16 : 4;
        int midoff_aligned = dir ? 4 * 16 : 16;
        unsigned char *buf0 = base0 + midoff_aligned;
        unsigned char *buf1 = base1 + midoff_aligned;
        for (edge = 0; edge < 2; edge++) {
            if (checkasm_check_func(func, "vp8_loop_filter16y")) {
                for (force_hev = -1; force_hev <= 1; force_hev++) {
                    fill_loopfilter_buffers(buf0, 16, 16, 16);
                    randomize_loopfilter_buffers(0, 16, dir, flim_E, flim_I, hev_thresh, buf0, force_hev);
                    memcpy(buf1 - midoff, base0, 16 * 16);
                    func(buf1 - midoff);
                }
            }
        }
    }
}

If we compile this with e.g. clang --target=aarch64-linux-gnu -O2 repro.c -S -o - before this change, the whole memcpy and __builtin_memcpy_chk gets expanded into an inline memcpy with no call to __memcpy_chk.

After this change, we instead get this:

        add     x0, x23, x22
        add     x1, sp, #296
        mov     w2, #256                        // =0x100
        mov     x3, xzr
        bl      __memcpy_chk

Parameter x3 here is the buffer size parameter to __memcpy_chk, which gets passed as zero - i.e. we're doing memcpy to a zero byte output buffer, which will fail as soon as we're copying more than zero bytes into it.

The same reproduces for arm64-apple-darwin and x86_64-linux-gnu too, and presumably any target, just pick whichever you're most comfortable with reading :-)

Can we revert this until this issue is fixed?

serge-sans-paille · 2024-12-18T12:56:58Z

Simpler reproducer:

unsigned char check_loopfilter_16y(unsigned i, int dir, long * res)
{
  unsigned char base1[32 + 16 * 16];
  int midoff = dir ? 4 * 16 : 4;
  int midoff_aligned = dir ? 4 * 16 : 16;
  unsigned char *buf1 = base1 + midoff_aligned;
  *res = __builtin_object_size((buf1 - midoff), 0);
  return base1[i];
}

Basically what happens here is that buf1 is computed to be in [base1 + 16, base1 + 4 * 16] and -midoff in [-4 * 16, -4]. Because we have no data dependency, __builtin_object_size computes that its argument may be before base1 (through 16 - 4*16) and correctly detects it, but the corresponding action is probably not the best. please give me a day to make sure my local patch is correct, otherwise we'll revert.

EDIT: the bug is not in current code in this patch but in the way we handle error, so reverting this patch would hide the issue but won't fix it.

mstorsjo · 2024-12-18T14:39:40Z

Thanks! Makes sense, somewhat. In this case, when there's uncertainty whether we really are out of bounds or not, I think the more reasonable conclusion is not the worst case, but the best case (since we can't know if the worst case actually is hit), at least for calculating potential bounds. Anyway, it'll be interesting to see what you come up with tomorrow!

mikaelholmen · 2024-12-19T06:31:14Z

We also start seeing problems with this patch.
E.g. we hit an UBSan error like
load of address 0x811 with insufficient space for an object of type 'int'
on a variable access that looks perfectly fine.
Unfortunately I don't have a reproducer I can share at the moment.

EdiT:
Well here's a reduced reproducer:
llc bbi-102493.ll -o -
In the output we erroneously get an unconditional call to "bad()" since
%0 = tail call i32 @llvm.objectsize.i32.p0(ptr nonnull %arrayidx.i, i1 false, i1 false, i1 false)
has been folded to 0 according to -debug printouts:
Folding %0 = tail call i32 @llvm.objectsize.i32.p0(ptr nonnull %arrayidx.i, i1 false, i1 false, i1 false) to i32 0

Note that the loop only does one iteration so %p_5.0.i is only 0 in

  %arrayidx.i = getelementptr inbounds nuw [1 x i16], ptr @g_8, i16 0, i16 %p_5.0.i
  %0 = tail call i32 @llvm.objectsize.i32.p0(ptr nonnull %arrayidx.i, i1 false, i1 false, i1 false)

bbi-102493.ll.gz

serge-sans-paille · 2024-12-19T07:43:34Z

@mikaelholmen and I can confirm #120424 fixes it :-)

mikaelholmen · 2024-12-19T07:47:51Z

@mikaelholmen and I can confirm #120424 fixes it :-)

Great!

Edit: I've verified that the original problem I saw with my full reproducer also goes away with #120424 .

bjope · 2025-03-14T12:35:25Z

Hello @serge-sans-paille .

Found another problem that seems to be related to this patch.

Reduced IR test case looks like this:

target datalayout = "p:16:16"

declare i32 @llvm.objectsize.i32.p0(ptr, i1 immarg, i1 immarg, i1 immarg)

define i32 @test1() {
entry:
  %l_677 = alloca [5 x i8], align 1
  %31 = select i1 true, i32 0, i32 -1736099245
  %arrayidx99 = getelementptr inbounds nuw [5 x i8], ptr %l_677, i32 0, i32 %31
  %32 = call i32 @llvm.objectsize.i32.p0(ptr %arrayidx99, i1 false, i1 false, i1 false)
  ret i32 %32
}

lower-constant-intrinsics seem to derive size to 0 for the above test case, while the correct result (given true result from the select) would be 5. Or I guess lower-constant-intrinsics should return -1 to indicate that it doesn't know the size (similar to the fix for the earlier problem fixed in #120424).

Here is a godbolt link for the above: https://godbolt.org/z/o7YYxczYP

Main goal here is to fix some bugs seen with LowerConstantIntrinsics pass and the lowering of llvm.objectsize. In ObjectSizeOffsetVisitor::computeImpl we are using an external analysis together with stripAndAccumulateConstantOffsets. The idea is to compute the Min/Max value of individual offsets within a GEP. The bug solved here is that when doing the Min/Max comparisons the external analysis wasn't considering the index type size (given by the data layout), it was simply using the type from the IR. Since a GEP is defined as sext/truncating indices we need to consider the index type size in the external analysis. This solves a regression (false ubsan warnings) seen after commit 02b8ee2 (llvm#117849).

serge-sans-paille requested a review from fmayer November 27, 2024 06:16

llvmbot added flang:runtime flang Flang issues not falling into any other category llvm:ir llvm:analysis Includes value tracking, cost tables and constant folding llvm:transforms labels Nov 27, 2024

serge-sans-paille force-pushed the fix/bos-range branch from 2f1082b to e54340f Compare November 27, 2024 06:21

serge-sans-paille force-pushed the fix/bos-range branch from e54340f to ba0a238 Compare November 27, 2024 09:10

vitalybuka requested a review from nikic December 3, 2024 06:41

vitalybuka requested review from vitalybuka and removed request for fmayer December 3, 2024 06:53

serge-sans-paille requested a review from hvdijk December 3, 2024 08:35

vitalybuka approved these changes Dec 9, 2024

View reviewed changes

serge-sans-paille merged commit f8c1a22 into llvm:main Dec 10, 2024
8 checks passed

mstorsjo mentioned this pull request Dec 18, 2024

Add ffmpeg as potential external project llvm/llvm-test-suite#188

Closed

bjope mentioned this pull request Mar 21, 2025

[MemoryBuiltins] Consider index type size when aggregating gep offsets #132365

Open

[llvm] Improve llvm.objectsize computation by computing GEP, alloca a… #117849

[llvm] Improve llvm.objectsize computation by computing GEP, alloca a… #117849

Uh oh!

Conversation

serge-sans-paille commented Nov 27, 2024

Uh oh!

llvmbot commented Nov 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

serge-sans-paille commented Nov 27, 2024

Uh oh!

github-actions bot commented Nov 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fmayer commented Nov 28, 2024

Uh oh!

serge-sans-paille commented Nov 28, 2024

Uh oh!

serge-sans-paille commented Dec 3, 2024

Uh oh!

vitalybuka commented Dec 3, 2024

Uh oh!

serge-sans-paille commented Dec 3, 2024

Uh oh!

serge-sans-paille commented Dec 6, 2024

Uh oh!

serge-sans-paille commented Dec 9, 2024

Uh oh!

Uh oh!

hvdijk commented Dec 16, 2024

Uh oh!

mstorsjo commented Dec 18, 2024

Uh oh!

serge-sans-paille commented Dec 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mstorsjo commented Dec 18, 2024

Uh oh!

mikaelholmen commented Dec 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

serge-sans-paille commented Dec 19, 2024

Uh oh!

mikaelholmen commented Dec 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bjope commented Mar 14, 2025

Uh oh!

Uh oh!

llvmbot commented Nov 27, 2024 •

edited

Loading

github-actions bot commented Nov 27, 2024 •

edited

Loading

serge-sans-paille commented Dec 18, 2024 •

edited

Loading

mikaelholmen commented Dec 19, 2024 •

edited

Loading

mikaelholmen commented Dec 19, 2024 •

edited

Loading