Skip to content

[analyzer] EvalBinOpLL should return Unknown less often #114222

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Oct 31, 2024
Merged

[analyzer] EvalBinOpLL should return Unknown less often #114222

merged 3 commits into from
Oct 31, 2024

Conversation

steakhal
Copy link
Contributor

SValBuilder::getKnownValue, getMinValue, getMaxValue use SValBuilder::simplifySVal.

simplifySVal does repeated simplification until a fixed-point is reached. A single step is done by SimpleSValBuilder::simplifySValOnce, using a Simplifier visitor. That will basically decompose SymSymExprs, and apply constant folding using the constraints we have in the State. Once it decomposes a SymSymExpr, it simplifies both sides and then uses the SValBuilder::evalBinOp to reconstruct the same - but now simpler - SymSymExpr, while applying some caching to remain performant.

This decomposition, and then the subsequent re-composition poses new challenges to the SValBuilder::evalBinOp, which is built to handle expressions coming from real C/C++ code, thus applying some implicit assumptions.

One previous assumption was that nobody would form an expression like "((int*)0) - q" (where q is an int pointer), because it doesn't really makes sense to write code like that.

However, during simplification, we may end up with a call to evalBinOp similar to this.

To me, simplifying a SymbolRef should never result in Unknown or Undef, unless it was Unknown or Undef initially or, during simplification we realized that it's a division by zero once we did the constant folding, etc.

In the following case the simplified SVal should not become UnknownVal:

void top(char *p, char *q) {
  int diff = p - q; // diff: reg<p> - reg<q>
  if (!p) // p: NULL
    simplify(diff); // diff after simplification should be: 0(loc) - reg<q>
}

Returning Unknown from the simplifySVal can weaken analysis precision in other places too, such as in SValBuilder::getKnownValue, getMinValue, or getMaxValue because we call simplifySVal before doing anything else.

For nonloc::SymbolVals, this loss of precision is critical, because for those the SymbolRef carries an accurate type of the encoded computation, thus we should at least have a conservative upper or lower bound that we could return from getMinValue or getMaxValue - yet we would just return nullptr.

const llvm::APSInt *SimpleSValBuilder::getKnownValue(ProgramStateRef state,
                                                      SVal V) {
  return getConstValue(state, simplifySVal(state, V));
}

const llvm::APSInt *SimpleSValBuilder::getMinValue(ProgramStateRef state,
                                                    SVal V) {
  V = simplifySVal(state, V);

  if (const llvm::APSInt *Res = getConcreteValue(V))
    return Res;

  if (SymbolRef Sym = V.getAsSymbol())
    return state->getConstraintManager().getSymMinVal(state, Sym);

  return nullptr;
}

For now, I don't plan to make the simplification bullet-proof, I'm just explaining why I made this change and what you need to look out for in the future if you see a similar issue.

CPP-5750

SValBuilder::getKnownValue, getMinValue, getMaxValue use
SValBuilder::simplifySVal.

simplifySVal does repeated simplification until a fixed-point is
reached. A single step is done by SimpleSValBuilder::simplifySValOnce,
using a Simplifier visitor. That will basically decompose SymSymExprs,
and apply constant folding using the constraints we have in the State.
Once it decomposes a SymSymExpr, it simplifies both sides and then uses
the SValBuilder::evalBinOp to reconstruct the same - but now simpler -
SymSymExpr, while applying some caching to remain performant.

This decomposition, and then the subsequent re-composition poses new
challenges to the SValBuilder::evalBinOp, which is built to handle
expressions coming from real C/C++ code, thus applying some implicit
assumptions.

One previous assumption was that nobody would form an expression like
"((int*)0) - q" (where q is an int pointer), because it doesn't really
makes sense to write code like that.

However, during simplification, we may end up with a call to evalBinOp
similar to this.

To me, simplifying a SymbolRef should never result in Unknown or Undef,
unless it was Unknown or Undef initially or, during simplification we
realized that it's a division by zero once we did the constant folding,
etc.

In the following case the simplified SVal should not become UnknownVal:
```c++
void top(char *p, char *q) {
  int diff = p - q; // diff: reg<p> - reg<q>
  if (!p) // p: NULL
    simplify(diff); // diff after simplification should be: 0(loc) - reg<q>
}
```

Returning Unknown from the simplifySVal can weaken analysis precision in
other places too, such as in SValBuilder::getKnownValue, getMinValue, or
getMaxValue because we call simplifySVal before doing anything else.

For nonloc::SymbolVals, this loss of precision is critical, because for
those the SymbolRef carries an accurate type of the encoded computation,
thus we should at least have a conservative upper or lower bound that we
could return from getMinValue or getMaxValue - yet we would just return
nullptr.

```c++
const llvm::APSInt *SimpleSValBuilder::getKnownValue(ProgramStateRef state,
                                                      SVal V) {
  return getConstValue(state, simplifySVal(state, V));
}

const llvm::APSInt *SimpleSValBuilder::getMinValue(ProgramStateRef state,
                                                    SVal V) {
  V = simplifySVal(state, V);

  if (const llvm::APSInt *Res = getConcreteValue(V))
    return Res;

  if (SymbolRef Sym = V.getAsSymbol())
    return state->getConstraintManager().getSymMinVal(state, Sym);

  return nullptr;
}
```

For now, I don't plan to make the simplification bullet-proof, I'm just
explaining why I made this change and what you need to look out for in
the future if you see a similar issue.

CPP-5750
@llvmbot llvmbot added the clang Clang issues not falling into any other category label Oct 30, 2024
@steakhal
Copy link
Contributor Author

@necto

@llvmbot
Copy link
Member

llvmbot commented Oct 30, 2024

@llvm/pr-subscribers-clang

@llvm/pr-subscribers-clang-static-analyzer-1

Author: Balazs Benics (steakhal)

Changes

SValBuilder::getKnownValue, getMinValue, getMaxValue use SValBuilder::simplifySVal.

simplifySVal does repeated simplification until a fixed-point is reached. A single step is done by SimpleSValBuilder::simplifySValOnce, using a Simplifier visitor. That will basically decompose SymSymExprs, and apply constant folding using the constraints we have in the State. Once it decomposes a SymSymExpr, it simplifies both sides and then uses the SValBuilder::evalBinOp to reconstruct the same - but now simpler - SymSymExpr, while applying some caching to remain performant.

This decomposition, and then the subsequent re-composition poses new challenges to the SValBuilder::evalBinOp, which is built to handle expressions coming from real C/C++ code, thus applying some implicit assumptions.

One previous assumption was that nobody would form an expression like "((int*)0) - q" (where q is an int pointer), because it doesn't really makes sense to write code like that.

However, during simplification, we may end up with a call to evalBinOp similar to this.

To me, simplifying a SymbolRef should never result in Unknown or Undef, unless it was Unknown or Undef initially or, during simplification we realized that it's a division by zero once we did the constant folding, etc.

In the following case the simplified SVal should not become UnknownVal:

void top(char *p, char *q) {
  int diff = p - q; // diff: reg&lt;p&gt; - reg&lt;q&gt;
  if (!p) // p: NULL
    simplify(diff); // diff after simplification should be: 0(loc) - reg&lt;q&gt;
}

Returning Unknown from the simplifySVal can weaken analysis precision in other places too, such as in SValBuilder::getKnownValue, getMinValue, or getMaxValue because we call simplifySVal before doing anything else.

For nonloc::SymbolVals, this loss of precision is critical, because for those the SymbolRef carries an accurate type of the encoded computation, thus we should at least have a conservative upper or lower bound that we could return from getMinValue or getMaxValue - yet we would just return nullptr.

const llvm::APSInt *SimpleSValBuilder::getKnownValue(ProgramStateRef state,
                                                      SVal V) {
  return getConstValue(state, simplifySVal(state, V));
}

const llvm::APSInt *SimpleSValBuilder::getMinValue(ProgramStateRef state,
                                                    SVal V) {
  V = simplifySVal(state, V);

  if (const llvm::APSInt *Res = getConcreteValue(V))
    return Res;

  if (SymbolRef Sym = V.getAsSymbol())
    return state-&gt;getConstraintManager().getSymMinVal(state, Sym);

  return nullptr;
}

For now, I don't plan to make the simplification bullet-proof, I'm just explaining why I made this change and what you need to look out for in the future if you see a similar issue.

CPP-5750


Full diff: https://github.com/llvm/llvm-project/pull/114222.diff

3 Files Affected:

  • (modified) clang/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp (+4-3)
  • (modified) clang/unittests/StaticAnalyzer/CMakeLists.txt (+1)
  • (added) clang/unittests/StaticAnalyzer/SValSimplifyerTest.cpp (+103)
diff --git a/clang/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp b/clang/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp
index 45e48d435aca6a..229169f848e228 100644
--- a/clang/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp
+++ b/clang/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp
@@ -860,11 +860,12 @@ SVal SimpleSValBuilder::evalBinOpLL(ProgramStateRef state,
     // If one of the operands is a symbol and the other is a constant,
     // build an expression for use by the constraint manager.
     if (SymbolRef rSym = rhs.getAsLocSymbol()) {
-      // We can only build expressions with symbols on the left,
-      // so we need a reversible operator.
-      if (!BinaryOperator::isComparisonOp(op) || op == BO_Cmp)
+      if (op == BO_Cmp)
         return UnknownVal();
 
+      if (!BinaryOperator::isComparisonOp(op))
+        return makeNonLoc(L.getValue(), op, rSym, resultTy);
+
       op = BinaryOperator::reverseComparisonOp(op);
       return makeNonLoc(rSym, op, L.getValue(), resultTy);
     }
diff --git a/clang/unittests/StaticAnalyzer/CMakeLists.txt b/clang/unittests/StaticAnalyzer/CMakeLists.txt
index 5ef72cfaea4011..f5da86e5456030 100644
--- a/clang/unittests/StaticAnalyzer/CMakeLists.txt
+++ b/clang/unittests/StaticAnalyzer/CMakeLists.txt
@@ -20,6 +20,7 @@ add_clang_unittest(StaticAnalysisTests
   RegisterCustomCheckersTest.cpp
   StoreTest.cpp
   SymbolReaperTest.cpp
+  SValSimplifyerTest.cpp
   SValTest.cpp
   TestReturnValueUnderConstruction.cpp
   Z3CrosscheckOracleTest.cpp
diff --git a/clang/unittests/StaticAnalyzer/SValSimplifyerTest.cpp b/clang/unittests/StaticAnalyzer/SValSimplifyerTest.cpp
new file mode 100644
index 00000000000000..b3feb0e4cce231
--- /dev/null
+++ b/clang/unittests/StaticAnalyzer/SValSimplifyerTest.cpp
@@ -0,0 +1,103 @@
+//===- unittests/StaticAnalyzer/SValSimplifyerTest.cpp --------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "CheckerRegistration.h"
+#include "clang/StaticAnalyzer/Core/BugReporter/BugReporter.h"
+#include "clang/StaticAnalyzer/Core/BugReporter/BugType.h"
+#include "clang/StaticAnalyzer/Core/Checker.h"
+#include "clang/StaticAnalyzer/Core/PathSensitive/CallDescription.h"
+#include "clang/StaticAnalyzer/Core/PathSensitive/CheckerContext.h"
+#include "clang/StaticAnalyzer/Frontend/AnalysisConsumer.h"
+#include "clang/StaticAnalyzer/Frontend/CheckerRegistry.h"
+#include "llvm/ADT/Twine.h"
+#include "llvm/Support/raw_ostream.h"
+#include "gtest/gtest.h"
+
+using namespace clang;
+using namespace ento;
+
+static std::string toString(SVal V) {
+  std::string Result;
+  llvm::raw_string_ostream Stream(Result);
+  V.dumpToStream(Stream);
+  return Result;
+}
+
+static void replace(std::string &Content, StringRef Substr,
+                    StringRef Replacement) {
+  std::size_t Pos = 0;
+  while ((Pos = Content.find(Substr, Pos)) != std::string::npos) {
+    Content.replace(Pos, Substr.size(), Replacement);
+    Pos += Replacement.size();
+  }
+}
+
+namespace {
+
+class SimplifyChecker : public Checker<check::PreCall> {
+  const BugType Bug{this, "SimplifyChecker"};
+  const CallDescription SimplifyCall{CDM::SimpleFunc, {"simplify"}, 1};
+
+  void report(CheckerContext &C, const Expr *E, StringRef Description) const {
+    PathDiagnosticLocation Loc(E->getExprLoc(), C.getSourceManager(), E);
+    auto Report = std::make_unique<BasicBugReport>(Bug, Description, Loc);
+    C.emitReport(std::move(Report));
+  }
+
+public:
+  void checkPreCall(const CallEvent &Call, CheckerContext &C) const {
+    if (!SimplifyCall.matches(Call))
+      return;
+    const Expr *Arg = Call.getArgExpr(0);
+    SVal Val = C.getSVal(Arg);
+    SVal SimplifiedVal = C.getSValBuilder().simplifySVal(C.getState(), Val);
+    std::string Subject = toString(Val);
+    std::string Simplified = toString(SimplifiedVal);
+    std::string Message = (llvm::Twine{Subject} + " -> " + Simplified).str();
+    report(C, Arg, Message);
+  }
+};
+} // namespace
+
+static void addSimplifyChecker(AnalysisASTConsumer &AnalysisConsumer,
+                               AnalyzerOptions &AnOpts) {
+  AnOpts.CheckersAndPackages = {{"SimplifyChecker", true}};
+  AnalysisConsumer.AddCheckerRegistrationFn([](CheckerRegistry &Registry) {
+    Registry.addChecker<SimplifyChecker>("SimplifyChecker", "EmptyDescription",
+                                         "EmptyDocsUri");
+  });
+}
+
+static void runThisCheckerOnCode(const std::string &Code, std::string &Diags) {
+  ASSERT_TRUE(runCheckerOnCode<addSimplifyChecker>(Code, Diags,
+                                                   /*OnlyEmitWarnings=*/true));
+  ASSERT_FALSE(Diags.empty());
+  ASSERT_EQ(Diags.back(), '\n');
+  Diags.pop_back();
+}
+
+namespace {
+
+TEST(SValSimplifyerTest, LHSConstrainedNullPtrDiff) {
+  constexpr auto Code = R"cpp(
+template <class T> void simplify(T);
+void LHSConstrainedNullPtrDiff(char *p, char *q) {
+  int diff = p - q;
+  if (!p)
+    simplify(diff);
+})cpp";
+
+  std::string Diags;
+  runThisCheckerOnCode(Code, Diags);
+  replace(Diags, "(reg_$0<char * p>)", "reg_p");
+  replace(Diags, "(reg_$1<char * q>)", "reg_q");
+  // This should not be simplified to "Unknown".
+  EXPECT_EQ(Diags, "SimplifyChecker: reg_p - reg_q -> 0U - reg_q");
+}
+
+} // namespace

Copy link
Collaborator

@Xazax-hun Xazax-hun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense for me. It would be nice if we had a minimal reproducer for a regression test instead of maintaining a large-ish test.

@Xazax-hun
Copy link
Collaborator

To me, simplifying a SymbolRef should never result in Unknown or Undef, unless it was Unknown or Undef initially or, during simplification we realized that it's a division by zero once we did the constant folding, etc.

I understand that we might not be ready for this, but feels like at some point we should have this as a form of an assertion. I am not sure about the Undef part because there are many ways to get to undef during constant folding (signed overflows, division by zero, OOB access of an array, defer of an invalid pointer). Even if some of those cannot happen during the current simplification they might be modeled later. So, I think a simpler "the result can only be Unknown if the input was also Unknown" would make a lot of sense to me.

@steakhal
Copy link
Contributor Author

Makes sense for me. It would be nice if we had a minimal reproducer for a regression test instead of maintaining a large-ish test.

I wish I could add a LIT test instead.
Out of the get min/max val, or getKnownValue APIs, only the latter is used slightly more broadly, where the callsites can't assume it should return a valid result. Still, these APIs are barely used.
In my case I have additional context in my downstream checker, where I can make this assumption and I was surprised to see crashes. So I had no other options but to craft a unittest for this.

@steakhal steakhal merged commit e67e03a into llvm:main Oct 31, 2024
8 checks passed
@steakhal steakhal deleted the bb/fix-simplify-to-unknown branch October 31, 2024 10:01
@NagyDonat
Copy link
Contributor

Offtopic @Xazax-hun

there are many ways to get to undef during constant folding (signed overflows ...

I was surprised to read this because as far as I know (knew) the analyzer models signed operations as if overflow was completely natural for them. (See e.g. the method assumeInBound which IIRC relies on this to convert a "within this interval" check to a ">= some threshold" check. I also vaguely recall that there are many FPs where the analyzer assumes wraparond of signed values.)

I would be very grateful if somebody pointed out some logic in the analyzer source which ensures that a signed overflow produces an UndefinedVal (or the overflow is disallowed by some other mechanism). If I was right and we don't have such code yet, then it might be good to add some...

@steakhal
Copy link
Contributor Author

Offtopic @Xazax-hun

there are many ways to get to undef during constant folding (signed overflows ...

I was surprised to read this because as far as I know (knew) the analyzer models signed operations as if overflow was completely natural for them. (See e.g. the method assumeInBound which IIRC relies on this to convert a "within this interval" check to a ">= some threshold" check. I also vaguely recall that there are many FPs where the analyzer assumes wraparond of signed values.)

I would be very grateful if somebody pointed out some logic in the analyzer source which ensures that a signed overflow produces an UndefinedVal (or the overflow is disallowed by some other mechanism). If I was right and we don't have such code yet, then it might be good to add some...

I think we don't have any mechanisms like these. This was just a theoretical comment if I understood it right.

@Xazax-hun
Copy link
Collaborator

I think we don't have any mechanisms like these. This was just a theoretical comment if I understood it right.

Actually, both. Yes, we don't have many of these in the engine as of today, but we do model some overflows in some checkers, like the bitwise shifts checker. It would be nice if we could write that logic once, and reuse it in constant folding as well, but this is really theoretical at this point.

@llvm-ci
Copy link
Collaborator

llvm-ci commented Oct 31, 2024

LLVM Buildbot has detected a new failure on builder sanitizer-x86_64-linux-bootstrap-asan running on sanitizer-buildbot1 while building clang at step 2 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/52/builds/3353

Here is the relevant piece of the build log for the reference
Step 2 (annotate) failure: 'python ../sanitizer_buildbot/sanitizers/zorg/buildbot/builders/sanitizers/buildbot_selector.py' (failure)
...
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using lld-link: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/lld-link
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using ld.lld: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/ld.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using lld-link: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/lld-link
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/main.py:72: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 900 seconds was requested on the command line. Forcing timeout to be 900 seconds.
-- Testing: 86709 of 86710 tests, 88 workers --
Testing:  0.. 10
FAIL: Clang :: Interpreter/inline-virtual.cpp (12586 of 86709)
******************** TEST 'Clang :: Interpreter/inline-virtual.cpp' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 6: cat /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/test/Interpreter/inline-virtual.cpp | /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/clang-repl -Xcc -fno-rtti -Xcc -fno-sized-deallocation      | /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/test/Interpreter/inline-virtual.cpp
+ cat /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/test/Interpreter/inline-virtual.cpp
+ /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/clang-repl -Xcc -fno-rtti -Xcc -fno-sized-deallocation
+ /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/test/Interpreter/inline-virtual.cpp
RUN: at line 8: cat /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/test/Interpreter/inline-virtual.cpp | /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/clang-repl -Xcc -fno-rtti -Xcc -fno-sized-deallocation      -Xcc -O2 | /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/test/Interpreter/inline-virtual.cpp
+ cat /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/test/Interpreter/inline-virtual.cpp
+ /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/test/Interpreter/inline-virtual.cpp
+ /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/clang-repl -Xcc -fno-rtti -Xcc -fno-sized-deallocation -Xcc -O2
JIT session error: In graph incr_module_25-jitted-objectbuffer, section .text.startup: relocation target "__dso_handle" at address 0x76991e50e000 is out of range of Delta32 fixup at 0x7a991ee2f039 (<anonymous block> @ 0x7a991ee2f010 + 0x29)
error: Failed to materialize symbols: { (main, { __orc_init_func.incr_module_25, a2, $.incr_module_25.__inits.0 }) }
error: Failed to materialize symbols: { (main, { __orc_init_func.incr_module_25 }) }
/home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/test/Interpreter/inline-virtual.cpp:26:11: error: CHECK: expected string not found in input
// CHECK: ~A(2)
          ^
<stdin>:1:262: note: scanning from here
clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl... clang-repl> clang-repl... clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> ~A(1)
                                                                                                                                                                                                                                                                     ^

Input file: <stdin>
Check file: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/test/Interpreter/inline-virtual.cpp

-dump-input=help explains the following input dump.

Input was:
<<<<<<
          1: clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl... clang-repl> clang-repl... clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> ~A(1) 
check:26                                                                                                                                                                                                                                                                          X error: no match found
          2: clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl>  
check:26     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>>>>

--

Step 10 (stage2/asan check) failure: stage2/asan check (failure)
...
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using lld-link: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/lld-link
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using ld.lld: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/ld.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using lld-link: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/lld-link
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/main.py:72: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 900 seconds was requested on the command line. Forcing timeout to be 900 seconds.
-- Testing: 86709 of 86710 tests, 88 workers --
Testing:  0.. 10
FAIL: Clang :: Interpreter/inline-virtual.cpp (12586 of 86709)
******************** TEST 'Clang :: Interpreter/inline-virtual.cpp' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 6: cat /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/test/Interpreter/inline-virtual.cpp | /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/clang-repl -Xcc -fno-rtti -Xcc -fno-sized-deallocation      | /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/test/Interpreter/inline-virtual.cpp
+ cat /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/test/Interpreter/inline-virtual.cpp
+ /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/clang-repl -Xcc -fno-rtti -Xcc -fno-sized-deallocation
+ /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/test/Interpreter/inline-virtual.cpp
RUN: at line 8: cat /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/test/Interpreter/inline-virtual.cpp | /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/clang-repl -Xcc -fno-rtti -Xcc -fno-sized-deallocation      -Xcc -O2 | /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/test/Interpreter/inline-virtual.cpp
+ cat /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/test/Interpreter/inline-virtual.cpp
+ /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/test/Interpreter/inline-virtual.cpp
+ /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/clang-repl -Xcc -fno-rtti -Xcc -fno-sized-deallocation -Xcc -O2
JIT session error: In graph incr_module_25-jitted-objectbuffer, section .text.startup: relocation target "__dso_handle" at address 0x76991e50e000 is out of range of Delta32 fixup at 0x7a991ee2f039 (<anonymous block> @ 0x7a991ee2f010 + 0x29)
error: Failed to materialize symbols: { (main, { __orc_init_func.incr_module_25, a2, $.incr_module_25.__inits.0 }) }
error: Failed to materialize symbols: { (main, { __orc_init_func.incr_module_25 }) }
/home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/test/Interpreter/inline-virtual.cpp:26:11: error: CHECK: expected string not found in input
// CHECK: ~A(2)
          ^
<stdin>:1:262: note: scanning from here
clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl... clang-repl> clang-repl... clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> ~A(1)
                                                                                                                                                                                                                                                                     ^

Input file: <stdin>
Check file: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/test/Interpreter/inline-virtual.cpp

-dump-input=help explains the following input dump.

Input was:
<<<<<<
          1: clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl... clang-repl> clang-repl... clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> ~A(1) 
check:26                                                                                                                                                                                                                                                                          X error: no match found
          2: clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl>  
check:26     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>>>>

--


smallp-o-p pushed a commit to smallp-o-p/llvm-project that referenced this pull request Nov 3, 2024
SValBuilder::getKnownValue, getMinValue, getMaxValue use
SValBuilder::simplifySVal.

simplifySVal does repeated simplification until a fixed-point is
reached. A single step is done by SimpleSValBuilder::simplifySValOnce,
using a Simplifier visitor. That will basically decompose SymSymExprs,
and apply constant folding using the constraints we have in the State.
Once it decomposes a SymSymExpr, it simplifies both sides and then uses
the SValBuilder::evalBinOp to reconstruct the same - but now simpler -
SymSymExpr, while applying some caching to remain performant.

This decomposition, and then the subsequent re-composition poses new
challenges to the SValBuilder::evalBinOp, which is built to handle
expressions coming from real C/C++ code, thus applying some implicit
assumptions.

One previous assumption was that nobody would form an expression like
"((int*)0) - q" (where q is an int pointer), because it doesn't really
makes sense to write code like that.

However, during simplification, we may end up with a call to evalBinOp
similar to this.

To me, simplifying a SymbolRef should never result in Unknown or Undef,
unless it was Unknown or Undef initially or, during simplification we
realized that it's a division by zero once we did the constant folding,
etc.

In the following case the simplified SVal should not become UnknownVal:
```c++
void top(char *p, char *q) {
  int diff = p - q; // diff: reg<p> - reg<q>
  if (!p) // p: NULL
    simplify(diff); // diff after simplification should be: 0(loc) - reg<q>
}
```

Returning Unknown from the simplifySVal can weaken analysis precision in
other places too, such as in SValBuilder::getKnownValue, getMinValue, or
getMaxValue because we call simplifySVal before doing anything else.

For nonloc::SymbolVals, this loss of precision is critical, because for
those the SymbolRef carries an accurate type of the encoded computation,
thus we should at least have a conservative upper or lower bound that we
could return from getMinValue or getMaxValue - yet we would just return
nullptr.

```c++
const llvm::APSInt *SimpleSValBuilder::getKnownValue(ProgramStateRef state,
                                                      SVal V) {
  return getConstValue(state, simplifySVal(state, V));
}

const llvm::APSInt *SimpleSValBuilder::getMinValue(ProgramStateRef state,
                                                    SVal V) {
  V = simplifySVal(state, V);

  if (const llvm::APSInt *Res = getConcreteValue(V))
    return Res;

  if (SymbolRef Sym = V.getAsSymbol())
    return state->getConstraintManager().getSymMinVal(state, Sym);

  return nullptr;
}
```

For now, I don't plan to make the simplification bullet-proof, I'm just
explaining why I made this change and what you need to look out for in
the future if you see a similar issue.

CPP-5750
NoumanAmir657 pushed a commit to NoumanAmir657/llvm-project that referenced this pull request Nov 4, 2024
SValBuilder::getKnownValue, getMinValue, getMaxValue use
SValBuilder::simplifySVal.

simplifySVal does repeated simplification until a fixed-point is
reached. A single step is done by SimpleSValBuilder::simplifySValOnce,
using a Simplifier visitor. That will basically decompose SymSymExprs,
and apply constant folding using the constraints we have in the State.
Once it decomposes a SymSymExpr, it simplifies both sides and then uses
the SValBuilder::evalBinOp to reconstruct the same - but now simpler -
SymSymExpr, while applying some caching to remain performant.

This decomposition, and then the subsequent re-composition poses new
challenges to the SValBuilder::evalBinOp, which is built to handle
expressions coming from real C/C++ code, thus applying some implicit
assumptions.

One previous assumption was that nobody would form an expression like
"((int*)0) - q" (where q is an int pointer), because it doesn't really
makes sense to write code like that.

However, during simplification, we may end up with a call to evalBinOp
similar to this.

To me, simplifying a SymbolRef should never result in Unknown or Undef,
unless it was Unknown or Undef initially or, during simplification we
realized that it's a division by zero once we did the constant folding,
etc.

In the following case the simplified SVal should not become UnknownVal:
```c++
void top(char *p, char *q) {
  int diff = p - q; // diff: reg<p> - reg<q>
  if (!p) // p: NULL
    simplify(diff); // diff after simplification should be: 0(loc) - reg<q>
}
```

Returning Unknown from the simplifySVal can weaken analysis precision in
other places too, such as in SValBuilder::getKnownValue, getMinValue, or
getMaxValue because we call simplifySVal before doing anything else.

For nonloc::SymbolVals, this loss of precision is critical, because for
those the SymbolRef carries an accurate type of the encoded computation,
thus we should at least have a conservative upper or lower bound that we
could return from getMinValue or getMaxValue - yet we would just return
nullptr.

```c++
const llvm::APSInt *SimpleSValBuilder::getKnownValue(ProgramStateRef state,
                                                      SVal V) {
  return getConstValue(state, simplifySVal(state, V));
}

const llvm::APSInt *SimpleSValBuilder::getMinValue(ProgramStateRef state,
                                                    SVal V) {
  V = simplifySVal(state, V);

  if (const llvm::APSInt *Res = getConcreteValue(V))
    return Res;

  if (SymbolRef Sym = V.getAsSymbol())
    return state->getConstraintManager().getSymMinVal(state, Sym);

  return nullptr;
}
```

For now, I don't plan to make the simplification bullet-proof, I'm just
explaining why I made this change and what you need to look out for in
the future if you see a similar issue.

CPP-5750
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:static analyzer clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants