Skip to content

[llvm] Improve llvm.objectsize computation by computing GEP, alloca a… #117849

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Dec 10, 2024

Conversation

serge-sans-paille
Copy link
Collaborator

…nd malloc parameters bound

Using a naive expression walker, it is possible to compute valuable information for allocation functions, GEP and alloca, even in the presence of some dynamic information.

We don't rely on computeConstantRange to avoid taking advantage of undefined behavior, which would be counter-productive wrt. usual llvm.objectsize usage.

llvm.objectsize plays an important role in _FORTIFY_SOURCE definitions, so improving its diagnostic in turns improves the security of compiled application.

As a side note, as a result of recent optimization improvements, clang no longer passes https://github.com/serge-sans-paille/builtin_object_size-test-suite This commit restores the situation and greatly improves the scope of code handled by the static version of __builtin_object_size.

This is a recommit of #115522 with fix applied.

@llvmbot llvmbot added flang:runtime flang Flang issues not falling into any other category llvm:ir llvm:analysis Includes value tracking, cost tables and constant folding llvm:transforms labels Nov 27, 2024
@llvmbot
Copy link
Member

llvmbot commented Nov 27, 2024

@llvm/pr-subscribers-llvm-ir
@llvm/pr-subscribers-llvm-analysis
@llvm/pr-subscribers-flang-runtime

@llvm/pr-subscribers-llvm-transforms

Author: None (serge-sans-paille)

Changes

…nd malloc parameters bound

Using a naive expression walker, it is possible to compute valuable information for allocation functions, GEP and alloca, even in the presence of some dynamic information.

We don't rely on computeConstantRange to avoid taking advantage of undefined behavior, which would be counter-productive wrt. usual llvm.objectsize usage.

llvm.objectsize plays an important role in _FORTIFY_SOURCE definitions, so improving its diagnostic in turns improves the security of compiled application.

As a side note, as a result of recent optimization improvements, clang no longer passes https://github.com/serge-sans-paille/builtin_object_size-test-suite This commit restores the situation and greatly improves the scope of code handled by the static version of __builtin_object_size.

This is a recommit of #115522 with fix applied.


Full diff: https://github.com/llvm/llvm-project/pull/117849.diff

5 Files Affected:

  • (modified) flang/include/flang/Runtime/descriptor.h (+1)
  • (modified) flang/runtime/io-api-minimal.cpp (+1-1)
  • (modified) llvm/include/llvm/IR/Value.h (+8-4)
  • (modified) llvm/lib/Analysis/MemoryBuiltins.cpp (+103-4)
  • (added) llvm/test/Transforms/LowerConstantIntrinsics/builtin-object-size-range.ll (+109)
diff --git a/flang/include/flang/Runtime/descriptor.h b/flang/include/flang/Runtime/descriptor.h
index 030d0c1031fbaa..d674a8e2229a04 100644
--- a/flang/include/flang/Runtime/descriptor.h
+++ b/flang/include/flang/Runtime/descriptor.h
@@ -467,6 +467,7 @@ class alignas(Descriptor) StaticDescriptor {
   RT_OFFLOAD_VAR_GROUP_END
 
   RT_API_ATTRS Descriptor &descriptor() {
+
     return *reinterpret_cast<Descriptor *>(storage_);
   }
   RT_API_ATTRS const Descriptor &descriptor() const {
diff --git a/flang/runtime/io-api-minimal.cpp b/flang/runtime/io-api-minimal.cpp
index ad76fe3de0324c..a06d3d12f8f005 100644
--- a/flang/runtime/io-api-minimal.cpp
+++ b/flang/runtime/io-api-minimal.cpp
@@ -150,7 +150,7 @@ bool IODEF(OutputLogical)(Cookie cookie, bool truth) {
 // Provide own definition for `std::__libcpp_verbose_abort` to avoid dependency
 // on the version provided by libc++.
 
-void std::__libcpp_verbose_abort(char const *format, ...) {
+void std::__libcpp_verbose_abort(char const *format, ...) noexcept {
   va_list list;
   va_start(list, format);
   std::vfprintf(stderr, format, list);
diff --git a/llvm/include/llvm/IR/Value.h b/llvm/include/llvm/IR/Value.h
index 945081b77e9536..d444a768a65436 100644
--- a/llvm/include/llvm/IR/Value.h
+++ b/llvm/include/llvm/IR/Value.h
@@ -723,12 +723,16 @@ class Value {
       bool AllowInvariantGroup = false,
       function_ref<bool(Value &Value, APInt &Offset)> ExternalAnalysis =
           nullptr) const;
-  Value *stripAndAccumulateConstantOffsets(const DataLayout &DL, APInt &Offset,
-                                           bool AllowNonInbounds,
-                                           bool AllowInvariantGroup = false) {
+
+  Value *stripAndAccumulateConstantOffsets(
+      const DataLayout &DL, APInt &Offset, bool AllowNonInbounds,
+      bool AllowInvariantGroup = false,
+      function_ref<bool(Value &Value, APInt &Offset)> ExternalAnalysis =
+          nullptr) {
     return const_cast<Value *>(
         static_cast<const Value *>(this)->stripAndAccumulateConstantOffsets(
-            DL, Offset, AllowNonInbounds, AllowInvariantGroup));
+            DL, Offset, AllowNonInbounds, AllowInvariantGroup,
+            ExternalAnalysis));
   }
 
   /// This is a wrapper around stripAndAccumulateConstantOffsets with the
diff --git a/llvm/lib/Analysis/MemoryBuiltins.cpp b/llvm/lib/Analysis/MemoryBuiltins.cpp
index 4028d5f4e2e1b8..93dfdfadff11be 100644
--- a/llvm/lib/Analysis/MemoryBuiltins.cpp
+++ b/llvm/lib/Analysis/MemoryBuiltins.cpp
@@ -670,6 +670,68 @@ STATISTIC(ObjectVisitorArgument,
 STATISTIC(ObjectVisitorLoad,
           "Number of load instructions with unsolved size and offset");
 
+static std::optional<APInt>
+combinePossibleConstantValues(std::optional<APInt> LHS,
+                              std::optional<APInt> RHS,
+                              ObjectSizeOpts::Mode EvalMode) {
+  if (!LHS || !RHS)
+    return std::nullopt;
+  if (EvalMode == ObjectSizeOpts::Mode::Max)
+    return LHS->sge(*RHS) ? *LHS : *RHS;
+  else
+    return LHS->sle(*RHS) ? *LHS : *RHS;
+}
+
+static std::optional<APInt> aggregatePossibleConstantValuesImpl(
+    const Value *V, ObjectSizeOpts::Mode EvalMode, unsigned recursionDepth) {
+  constexpr unsigned maxRecursionDepth = 4;
+  if (recursionDepth == maxRecursionDepth)
+    return std::nullopt;
+
+  if (const auto *CI = dyn_cast<ConstantInt>(V)) {
+    return CI->getValue();
+  }
+
+  else if (const auto *SI = dyn_cast<SelectInst>(V)) {
+    return combinePossibleConstantValues(
+        aggregatePossibleConstantValuesImpl(SI->getTrueValue(), EvalMode,
+                                            recursionDepth + 1),
+        aggregatePossibleConstantValuesImpl(SI->getFalseValue(), EvalMode,
+                                            recursionDepth + 1),
+        EvalMode);
+  }
+  else if (const auto *PN = dyn_cast<PHINode>(V)) {
+    unsigned Count = PN->getNumIncomingValues();
+    if (Count == 0)
+      return std::nullopt;
+    auto Acc = aggregatePossibleConstantValuesImpl(
+        PN->getIncomingValue(0), EvalMode, recursionDepth + 1);
+    for (unsigned I = 1; Acc && I < Count; ++I) {
+      auto Tmp = aggregatePossibleConstantValuesImpl(
+          PN->getIncomingValue(I), EvalMode, recursionDepth + 1);
+      Acc = combinePossibleConstantValues(Acc, Tmp, EvalMode);
+    }
+    return Acc;
+  }
+
+  return std::nullopt;
+}
+
+static std::optional<APInt>
+aggregatePossibleConstantValues(const Value *V, ObjectSizeOpts::Mode EvalMode) {
+  if (auto *CI = dyn_cast<ConstantInt>(V))
+    return CI->getValue();
+
+  if (EvalMode != ObjectSizeOpts::Mode::Min &&
+      EvalMode != ObjectSizeOpts::Mode::Max)
+    return std::nullopt;
+
+  // Not using computeConstantRange here because we cannot guarantee it's not
+  // doing optimization based on UB which we want to avoid when expanding
+  // __builtin_object_size.
+  return aggregatePossibleConstantValuesImpl(V, EvalMode, 0u);
+}
+
 /// Align \p Size according to \p Alignment. If \p Size is greater than
 /// getSignedMaxValue(), set it as unknown as we can only represent signed value
 /// in OffsetSpan.
@@ -717,11 +779,36 @@ OffsetSpan ObjectSizeOffsetVisitor::computeImpl(Value *V) {
   V = V->stripAndAccumulateConstantOffsets(
       DL, Offset, /* AllowNonInbounds */ true, /* AllowInvariantGroup */ true);
 
+  // Give it another try with approximated analysis. We don't start with this
+  // one because stripAndAccumulateConstantOffsets behaves differently wrt.
+  // overflows if we provide an external Analysis.
+  if ((Options.EvalMode == ObjectSizeOpts::Mode::Min ||
+       Options.EvalMode == ObjectSizeOpts::Mode::Max) &&
+      isa<GEPOperator>(V)) {
+    // External Analysis used to compute the Min/Max value of individual Offsets
+    // within a GEP.
+    ObjectSizeOpts::Mode EvalMode =
+        Options.EvalMode == ObjectSizeOpts::Mode::Min
+            ? ObjectSizeOpts::Mode::Max
+            : ObjectSizeOpts::Mode::Min;
+    auto OffsetRangeAnalysis = [EvalMode](Value &VOffset, APInt &Offset) {
+      if (auto PossibleOffset =
+              aggregatePossibleConstantValues(&VOffset, EvalMode)) {
+        Offset = *PossibleOffset;
+        return true;
+      }
+      return false;
+    };
+
+    V = V->stripAndAccumulateConstantOffsets(
+        DL, Offset, /* AllowNonInbounds */ true, /* AllowInvariantGroup */ true,
+        /*ExternalAnalysis=*/OffsetRangeAnalysis);
+  }
+
   // Later we use the index type size and zero but it will match the type of the
   // value that is passed to computeImpl.
   IntTyBits = DL.getIndexTypeSizeInBits(V->getType());
   Zero = APInt::getZero(IntTyBits);
-
   OffsetSpan ORT = computeValue(V);
 
   bool IndexTypeSizeChanged = InitialIntTyBits != IntTyBits;
@@ -813,8 +900,9 @@ OffsetSpan ObjectSizeOffsetVisitor::visitAllocaInst(AllocaInst &I) {
     return OffsetSpan(Zero, align(Size, I.getAlign()));
 
   Value *ArraySize = I.getArraySize();
-  if (const ConstantInt *C = dyn_cast<ConstantInt>(ArraySize)) {
-    APInt NumElems = C->getValue();
+  if (auto PossibleSize =
+          aggregatePossibleConstantValues(ArraySize, Options.EvalMode)) {
+    APInt NumElems = *PossibleSize;
     if (!CheckedZextOrTrunc(NumElems))
       return ObjectSizeOffsetVisitor::unknown();
 
@@ -840,7 +928,18 @@ OffsetSpan ObjectSizeOffsetVisitor::visitArgument(Argument &A) {
 }
 
 OffsetSpan ObjectSizeOffsetVisitor::visitCallBase(CallBase &CB) {
-  if (std::optional<APInt> Size = getAllocSize(&CB, TLI)) {
+  auto Mapper = [this](const Value *V) -> const Value * {
+    if (!V->getType()->isIntegerTy())
+      return V;
+
+    if (auto PossibleBound =
+            aggregatePossibleConstantValues(V, Options.EvalMode))
+      return ConstantInt::get(V->getType(), *PossibleBound);
+
+    return V;
+  };
+
+  if (std::optional<APInt> Size = getAllocSize(&CB, TLI, Mapper)) {
     // Very large unsigned value cannot be represented as OffsetSpan.
     if (Size->isNegative())
       return ObjectSizeOffsetVisitor::unknown();
diff --git a/llvm/test/Transforms/LowerConstantIntrinsics/builtin-object-size-range.ll b/llvm/test/Transforms/LowerConstantIntrinsics/builtin-object-size-range.ll
new file mode 100644
index 00000000000000..f84ebee1442893
--- /dev/null
+++ b/llvm/test/Transforms/LowerConstantIntrinsics/builtin-object-size-range.ll
@@ -0,0 +1,109 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
+; RUN: opt -passes=lower-constant-intrinsics -S < %s | FileCheck %s
+
+target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-unknown-linux-gnu"
+
+declare i64 @llvm.objectsize.i64.p0(ptr, i1 immarg, i1 immarg, i1 immarg)
+declare noalias ptr @malloc(i64 noundef) #0
+
+define i64 @select_alloc_size(i1 %cond) {
+; CHECK-LABEL: @select_alloc_size(
+; CHECK-NEXT:    [[SIZE:%.*]] = select i1 [[COND:%.*]], i64 3, i64 4
+; CHECK-NEXT:    [[PTR:%.*]] = alloca i8, i64 [[SIZE]], align 1
+; CHECK-NEXT:    [[RES:%.*]] = select i1 [[COND]], i64 4, i64 3
+; CHECK-NEXT:    ret i64 [[RES]]
+;
+  %size = select i1 %cond, i64 3, i64 4
+  %ptr = alloca i8, i64 %size
+  %objsize_max = call i64 @llvm.objectsize.i64.p0(ptr %ptr, i1 false, i1 true, i1 false)
+  %objsize_min = call i64 @llvm.objectsize.i64.p0(ptr %ptr, i1 true, i1 true, i1 false)
+  %res = select i1 %cond, i64 %objsize_max, i64 %objsize_min
+  ret i64 %res
+}
+
+define i64 @select_malloc_size(i1 %cond) {
+; CHECK-LABEL: @select_malloc_size(
+; CHECK-NEXT:    [[SIZE:%.*]] = select i1 [[COND:%.*]], i64 3, i64 4
+; CHECK-NEXT:    [[PTR:%.*]] = call noalias ptr @malloc(i64 noundef [[SIZE]])
+; CHECK-NEXT:    [[RES:%.*]] = select i1 [[COND]], i64 4, i64 3
+; CHECK-NEXT:    ret i64 [[RES]]
+;
+  %size = select i1 %cond, i64 3, i64 4
+  %ptr = call noalias ptr @malloc(i64 noundef %size)
+  %objsize_max = call i64 @llvm.objectsize.i64.p0(ptr %ptr, i1 false, i1 true, i1 false)
+  %objsize_min = call i64 @llvm.objectsize.i64.p0(ptr %ptr, i1 true, i1 true, i1 false)
+  %res = select i1 %cond, i64 %objsize_max, i64 %objsize_min
+  ret i64 %res
+}
+
+define i64 @select_gep_offset(i1 %cond) {
+; CHECK-LABEL: @select_gep_offset(
+; CHECK-NEXT:    [[PTR:%.*]] = alloca i8, i64 10, align 1
+; CHECK-NEXT:    [[OFFSET:%.*]] = select i1 [[COND:%.*]], i64 3, i64 4
+; CHECK-NEXT:    [[PTR_SLIDE:%.*]] = getelementptr inbounds i8, ptr [[PTR]], i64 [[OFFSET]]
+; CHECK-NEXT:    [[RES:%.*]] = select i1 [[COND]], i64 7, i64 6
+; CHECK-NEXT:    ret i64 [[RES]]
+;
+  %ptr = alloca i8, i64 10
+  %offset = select i1 %cond, i64 3, i64 4
+  %ptr.slide = getelementptr inbounds i8, ptr %ptr, i64 %offset
+  %objsize_max = call i64 @llvm.objectsize.i64.p0(ptr %ptr.slide, i1 false, i1 true, i1 false)
+  %objsize_min = call i64 @llvm.objectsize.i64.p0(ptr %ptr.slide, i1 true, i1 true, i1 false)
+  %res = select i1 %cond, i64 %objsize_max, i64 %objsize_min
+  ret i64 %res
+}
+
+define i64 @select_gep_neg_offset(i1 %c0, i1 %c1) {
+; CHECK-LABEL: @select_gep_neg_offset(
+; CHECK-NEXT:    [[PTR:%.*]] = alloca i8, i64 10, align 1
+; CHECK-NEXT:    [[PTR_SLIDE_1:%.*]] = getelementptr inbounds i8, ptr [[PTR]], i64 5
+; CHECK-NEXT:    [[OFFSET:%.*]] = select i1 [[COND:%.*]], i64 -3, i64 -4
+; CHECK-NEXT:    [[PTR_SLIDE_2:%.*]] = getelementptr inbounds i8, ptr [[PTR_SLIDE_1]], i64 [[OFFSET]]
+; CHECK-NEXT:    [[RES:%.*]] = select i1 [[C1:%.*]], i64 9, i64 8
+; CHECK-NEXT:    ret i64 [[RES]]
+;
+  %ptr = alloca i8, i64 10
+  %ptr.slide.1 = getelementptr inbounds i8, ptr %ptr, i64 5
+  %offset = select i1 %c0, i64 -3, i64 -4
+  %ptr.slide.2 = getelementptr inbounds i8, ptr %ptr.slide.1, i64 %offset
+  %objsize_max = call i64 @llvm.objectsize.i64.p0(ptr %ptr.slide.2, i1 false, i1 true, i1 false)
+  %objsize_min = call i64 @llvm.objectsize.i64.p0(ptr %ptr.slide.2, i1 true, i1 true, i1 false)
+  %res = select i1 %c1, i64 %objsize_max, i64 %objsize_min
+  ret i64 %res
+}
+
+define i64 @select_neg_oob_offset(i1 %c0, i1 %c1) {
+; CHECK-LABEL: @select_neg_oob_offset(
+; CHECK-NEXT:    [[PTR:%.*]] = alloca i8, i64 10, align 1
+; CHECK-NEXT:    [[OFFSET:%.*]] = select i1 [[C0:%.*]], i64 -3, i64 -4
+; CHECK-NEXT:    [[PTR_SLIDE:%.*]] = getelementptr inbounds i8, ptr [[PTR]], i64 [[OFFSET]]
+; CHECK-NEXT:    ret i64 0
+;
+  %ptr = alloca i8, i64 10
+  %offset = select i1 %c0, i64 -3, i64 -4
+  %ptr.slide = getelementptr inbounds i8, ptr %ptr, i64 %offset
+  %objsize_max = call i64 @llvm.objectsize.i64.p0(ptr %ptr.slide, i1 false, i1 true, i1 false)
+  %objsize_min = call i64 @llvm.objectsize.i64.p0(ptr %ptr.slide, i1 true, i1 true, i1 false)
+  %res = select i1 %c1, i64 %objsize_max, i64 %objsize_min
+  ret i64 %res
+}
+
+define i64 @select_gep_offsets(i1 %cond) {
+; CHECK-LABEL: @select_gep_offsets(
+; CHECK-NEXT:    [[PTR:%.*]] = alloca [10 x i8], i64 2, align 1
+; CHECK-NEXT:    [[OFFSET:%.*]] = select i1 [[COND:%.*]], i32 0, i32 1
+; CHECK-NEXT:    [[PTR_SLIDE:%.*]] = getelementptr inbounds [10 x i8], ptr [[PTR]], i32 [[OFFSET]], i32 5
+; CHECK-NEXT:    [[RES:%.*]] = select i1 [[COND]], i64 15, i64 5
+; CHECK-NEXT:    ret i64 [[RES]]
+;
+  %ptr = alloca [10 x i8], i64 2
+  %offset = select i1 %cond, i32 0, i32 1
+  %ptr.slide = getelementptr inbounds [10 x i8], ptr %ptr, i32 %offset, i32 5
+  %objsize_max = call i64 @llvm.objectsize.i64.p0(ptr %ptr.slide, i1 false, i1 true, i1 false)
+  %objsize_min = call i64 @llvm.objectsize.i64.p0(ptr %ptr.slide, i1 true, i1 true, i1 false)
+  %res = select i1 %cond, i64 %objsize_max, i64 %objsize_min
+  ret i64 %res
+}
+
+attributes #0 = { nounwind allocsize(0) }

@serge-sans-paille
Copy link
Collaborator Author

@fmayer I think I fixed the issue we've been discussing in the previous attempt (there was a typo in the phi node handling), would you mind giving this PR a try? thanks!

Copy link

github-actions bot commented Nov 27, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

…nd malloc parameters bound

Using a naive expression walker, it is possible to compute valuable information for
allocation functions, GEP and alloca, even in the presence of some dynamic
information.

We don't rely on computeConstantRange to avoid taking advantage of
undefined behavior, which would be counter-productive wrt. usual
llvm.objectsize usage.

llvm.objectsize plays an important role in _FORTIFY_SOURCE definitions,
so improving its diagnostic in turns improves the security of compiled
application.

As a side note, as a result of recent optimization improvements, clang no
longer passes https://github.com/serge-sans-paille/builtin_object_size-test-suite
This commit restores the situation and greatly improves the scope of
code handled by the static version of __builtin_object_size.
@fmayer
Copy link
Contributor

fmayer commented Nov 28, 2024

Thanks! I'm on vacation, @vitalybuka could you please take a look?

@serge-sans-paille
Copy link
Collaborator Author

@vitalybuka / @fmayer : I've been able to reproduce the ppc issue on a 4rd party machine and can confirm this version of the patch fixes it \o/

@serge-sans-paille
Copy link
Collaborator Author

gentle ping 🙇

@vitalybuka vitalybuka requested a review from nikic December 3, 2024 06:41
@vitalybuka
Copy link
Collaborator

If it's just re-commit than neither me not @fmayer have access to the ppc bot. For that we had to rely on post submit testing by that bot from revert.

Patch itself does not look trivial and I don't see anyone approved the original. Maybe @nikic needs to take a look?

@vitalybuka vitalybuka requested review from vitalybuka and removed request for fmayer December 3, 2024 06:53
@serge-sans-paille
Copy link
Collaborator Author

The original was actually #114673 which got approved by @hvdijk , maybe he could have a look at this one? I ran the manual testing on ppc and it was fine.

@serge-sans-paille
Copy link
Collaborator Author

@hvdijk : gentle ping :-)

@serge-sans-paille
Copy link
Collaborator Author

given that this was previously accepted, and that I've reproduced and fixed the bug on PPC, I'll go ahead and merge that one tomorrow.

@serge-sans-paille serge-sans-paille merged commit f8c1a22 into llvm:main Dec 10, 2024
8 checks passed
@hvdijk
Copy link
Contributor

hvdijk commented Dec 16, 2024

@hvdijk : gentle ping :-)

Apologies for the late reply, I have been on holiday. In #115504 I commented that I am concerned that that resulted in incorrect results in some cases, which you were intending to fix in a subsequent PR. I cannot tell whether the current implementation is or is not meant to address those cases. If it is, great! If not, then I cannot offer a meaningful review. For reviewing PRs, I want to make sure that even if they are not perfect yet, they at least do not make things any worse. But this being the rather complicated bit of logic that it is, if there are known incorrect results in some cases, I am not capable of grasping the logic well enough to make sure that a PR is an improvement.

@mstorsjo
Copy link
Member

This causes some cases of buffer overflow checks to fail - we ran into this in llvm/llvm-test-suite#188.

I've managed to reduce the issue down to a small standalone reproducer:

void *checkasm_check_func(void *func, const char *name, ...);
void randomize_loopfilter_buffers(int lineoff, int str,
                                         int dir, int flim_E, int flim_I,
                                         int hev_thresh, unsigned char *buf,
                                         int force_hev);
void fill_loopfilter_buffers(unsigned char *buf, int stride, int w, int h);
void func(void *ptr);

#define memcpy(dst, src, n) __builtin___memcpy_chk((dst), (src), (n), __builtin_object_size((dst), 0))

void check_loopfilter_16y(void)
{
    unsigned char base0[32 + 16 * 16];
    unsigned char base1[32 + 16 * 16];
    int dir, edge, force_hev;
    int flim_E = 20, flim_I = 10, hev_thresh = 7;

    for (dir = 0; dir < 2; dir++) {
        int midoff = dir ? 4 * 16 : 4;
        int midoff_aligned = dir ? 4 * 16 : 16;
        unsigned char *buf0 = base0 + midoff_aligned;
        unsigned char *buf1 = base1 + midoff_aligned;
        for (edge = 0; edge < 2; edge++) {
            if (checkasm_check_func(func, "vp8_loop_filter16y")) {
                for (force_hev = -1; force_hev <= 1; force_hev++) {
                    fill_loopfilter_buffers(buf0, 16, 16, 16);
                    randomize_loopfilter_buffers(0, 16, dir, flim_E, flim_I, hev_thresh, buf0, force_hev);
                    memcpy(buf1 - midoff, base0, 16 * 16);
                    func(buf1 - midoff);
                }
            }
        }
    }
}

If we compile this with e.g. clang --target=aarch64-linux-gnu -O2 repro.c -S -o - before this change, the whole memcpy and __builtin_memcpy_chk gets expanded into an inline memcpy with no call to __memcpy_chk.

After this change, we instead get this:

        add     x0, x23, x22
        add     x1, sp, #296
        mov     w2, #256                        // =0x100
        mov     x3, xzr
        bl      __memcpy_chk

Parameter x3 here is the buffer size parameter to __memcpy_chk, which gets passed as zero - i.e. we're doing memcpy to a zero byte output buffer, which will fail as soon as we're copying more than zero bytes into it.

The same reproduces for arm64-apple-darwin and x86_64-linux-gnu too, and presumably any target, just pick whichever you're most comfortable with reading :-)

Can we revert this until this issue is fixed?

@serge-sans-paille
Copy link
Collaborator Author

serge-sans-paille commented Dec 18, 2024

Simpler reproducer:

unsigned char check_loopfilter_16y(unsigned i, int dir, long * res)
{
  unsigned char base1[32 + 16 * 16];
  int midoff = dir ? 4 * 16 : 4;
  int midoff_aligned = dir ? 4 * 16 : 16;
  unsigned char *buf1 = base1 + midoff_aligned;
  *res = __builtin_object_size((buf1 - midoff), 0);
  return base1[i];
}

Basically what happens here is that buf1 is computed to be in [base1 + 16, base1 + 4 * 16] and -midoff in [-4 * 16, -4]. Because we have no data dependency, __builtin_object_size computes that its argument may be before base1 (through 16 - 4*16) and correctly detects it, but the corresponding action is probably not the best. please give me a day to make sure my local patch is correct, otherwise we'll revert.

EDIT: the bug is not in current code in this patch but in the way we handle error, so reverting this patch would hide the issue but won't fix it.

@mstorsjo
Copy link
Member

Thanks! Makes sense, somewhat. In this case, when there's uncertainty whether we really are out of bounds or not, I think the more reasonable conclusion is not the worst case, but the best case (since we can't know if the worst case actually is hit), at least for calculating potential bounds. Anyway, it'll be interesting to see what you come up with tomorrow!

@mikaelholmen
Copy link
Collaborator

mikaelholmen commented Dec 19, 2024

We also start seeing problems with this patch.
E.g. we hit an UBSan error like
load of address 0x811 with insufficient space for an object of type 'int'
on a variable access that looks perfectly fine.
Unfortunately I don't have a reproducer I can share at the moment.

EdiT:
Well here's a reduced reproducer:
llc bbi-102493.ll -o -
In the output we erroneously get an unconditional call to "bad()" since
%0 = tail call i32 @llvm.objectsize.i32.p0(ptr nonnull %arrayidx.i, i1 false, i1 false, i1 false)
has been folded to 0 according to -debug printouts:
Folding %0 = tail call i32 @llvm.objectsize.i32.p0(ptr nonnull %arrayidx.i, i1 false, i1 false, i1 false) to i32 0

Note that the loop only does one iteration so %p_5.0.i is only 0 in

  %arrayidx.i = getelementptr inbounds nuw [1 x i16], ptr @g_8, i16 0, i16 %p_5.0.i
  %0 = tail call i32 @llvm.objectsize.i32.p0(ptr nonnull %arrayidx.i, i1 false, i1 false, i1 false)

bbi-102493.ll.gz

@serge-sans-paille
Copy link
Collaborator Author

@mikaelholmen and I can confirm #120424 fixes it :-)

@mikaelholmen
Copy link
Collaborator

mikaelholmen commented Dec 19, 2024

@mikaelholmen and I can confirm #120424 fixes it :-)

Great!

Edit: I've verified that the original problem I saw with my full reproducer also goes away with #120424 .

@bjope
Copy link
Collaborator

bjope commented Mar 14, 2025

Hello @serge-sans-paille .

Found another problem that seems to be related to this patch.

Reduced IR test case looks like this:

target datalayout = "p:16:16"

declare i32 @llvm.objectsize.i32.p0(ptr, i1 immarg, i1 immarg, i1 immarg)

define i32 @test1() {
entry:
  %l_677 = alloca [5 x i8], align 1
  %31 = select i1 true, i32 0, i32 -1736099245
  %arrayidx99 = getelementptr inbounds nuw [5 x i8], ptr %l_677, i32 0, i32 %31
  %32 = call i32 @llvm.objectsize.i32.p0(ptr %arrayidx99, i1 false, i1 false, i1 false)
  ret i32 %32
}

lower-constant-intrinsics seem to derive size to 0 for the above test case, while the correct result (given true result from the select) would be 5. Or I guess lower-constant-intrinsics should return -1 to indicate that it doesn't know the size (similar to the fix for the earlier problem fixed in #120424).

Here is a godbolt link for the above: https://godbolt.org/z/o7YYxczYP

bjope added a commit to bjope/llvm-project that referenced this pull request Mar 21, 2025
Main goal here is to fix some bugs seen with LowerConstantIntrinsics
pass and the lowering of llvm.objectsize.

In ObjectSizeOffsetVisitor::computeImpl we are using an external
analysis together with stripAndAccumulateConstantOffsets. The idea
is to compute the Min/Max value of individual offsets within a GEP.
The bug solved here is that when doing the Min/Max comparisons the
external analysis wasn't considering the index type size (given by
the data layout), it was simply using the type from the IR. Since a
GEP is defined as sext/truncating indices we need to consider the
index type size in the external analysis.

This solves a regression (false ubsan warnings) seen after commit
02b8ee2 (llvm#117849).
bjope added a commit to bjope/llvm-project that referenced this pull request Mar 26, 2025
Main goal here is to fix some bugs seen with LowerConstantIntrinsics
pass and the lowering of llvm.objectsize.

In ObjectSizeOffsetVisitor::computeImpl we are using an external
analysis together with stripAndAccumulateConstantOffsets. The idea
is to compute the Min/Max value of individual offsets within a GEP.
The bug solved here is that when doing the Min/Max comparisons the
external analysis wasn't considering the index type size (given by
the data layout), it was simply using the type from the IR. Since a
GEP is defined as sext/truncating indices we need to consider the
index type size in the external analysis.

This solves a regression (false ubsan warnings) seen after commit
02b8ee2 (llvm#117849).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flang:runtime flang Flang issues not falling into any other category llvm:analysis Includes value tracking, cost tables and constant folding llvm:ir llvm:transforms
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants