Skip to content

[clang][dataflow] Process terminator condition within transferCFGBlock(). #78127

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jan 23, 2024

Conversation

martinboehme
Copy link
Contributor

In particular, it's important that we create the "fallback" atomic at this point
(which we produce if the transfer function didn't produce a value for the
expression) so that it is placed in the correct environment.

Previously, we processed the terminator condition in the TerminatorVisitor,
which put the fallback atomic in a copy of the environment that is produced as
input for the successor block, rather than the environment for the block
containing the expression for which we produce the fallback atomic.

As a result, we produce different fallback atomics every time we process the
successor block, and hence we don't have a consistent representation of the
terminator condition in the flow condition.

This patch includes a test (authored by ymand@) that fails without the fix.

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:dataflow Clang Dataflow Analysis framework - https://clang.llvm.org/docs/DataFlowAnalysisIntro.html clang:analysis labels Jan 15, 2024
@llvmbot
Copy link
Member

llvmbot commented Jan 15, 2024

@llvm/pr-subscribers-clang-analysis

@llvm/pr-subscribers-clang

Author: None (martinboehme)

Changes

In particular, it's important that we create the "fallback" atomic at this point
(which we produce if the transfer function didn't produce a value for the
expression) so that it is placed in the correct environment.

Previously, we processed the terminator condition in the TerminatorVisitor,
which put the fallback atomic in a copy of the environment that is produced as
input for the successor block, rather than the environment for the block
containing the expression for which we produce the fallback atomic.

As a result, we produce different fallback atomics every time we process the
successor block, and hence we don't have a consistent representation of the
terminator condition in the flow condition.

This patch includes a test (authored by ymand@) that fails without the fix.


Full diff: https://github.com/llvm/llvm-project/pull/78127.diff

3 Files Affected:

  • (modified) clang/lib/Analysis/FlowSensitive/TypeErasedDataflowAnalysis.cpp (+33-19)
  • (modified) clang/unittests/Analysis/FlowSensitive/LoggerTest.cpp (+1)
  • (modified) clang/unittests/Analysis/FlowSensitive/TransferTest.cpp (+31)
diff --git a/clang/lib/Analysis/FlowSensitive/TypeErasedDataflowAnalysis.cpp b/clang/lib/Analysis/FlowSensitive/TypeErasedDataflowAnalysis.cpp
index faf83a8920d4ead..0b5371f1c601a43 100644
--- a/clang/lib/Analysis/FlowSensitive/TypeErasedDataflowAnalysis.cpp
+++ b/clang/lib/Analysis/FlowSensitive/TypeErasedDataflowAnalysis.cpp
@@ -75,9 +75,8 @@ using TerminatorVisitorRetTy = std::pair<const Expr *, bool>;
 class TerminatorVisitor
     : public ConstStmtVisitor<TerminatorVisitor, TerminatorVisitorRetTy> {
 public:
-  TerminatorVisitor(const StmtToEnvMap &StmtToEnv, Environment &Env,
-                    int BlockSuccIdx)
-      : StmtToEnv(StmtToEnv), Env(Env), BlockSuccIdx(BlockSuccIdx) {}
+  TerminatorVisitor(Environment &Env, int BlockSuccIdx)
+      : Env(Env), BlockSuccIdx(BlockSuccIdx) {}
 
   TerminatorVisitorRetTy VisitIfStmt(const IfStmt *S) {
     auto *Cond = S->getCond();
@@ -126,19 +125,12 @@ class TerminatorVisitor
 
 private:
   TerminatorVisitorRetTy extendFlowCondition(const Expr &Cond) {
-    // The terminator sub-expression might not be evaluated.
-    if (Env.getValue(Cond) == nullptr)
-      transfer(StmtToEnv, Cond, Env);
-
     auto *Val = Env.get<BoolValue>(Cond);
-    // Value merging depends on flow conditions from different environments
-    // being mutually exclusive -- that is, they cannot both be true in their
-    // entirety (even if they may share some clauses). So, we need *some* value
-    // for the condition expression, even if just an atom.
-    if (Val == nullptr) {
-      Val = &Env.makeAtomicBoolValue();
-      Env.setValue(Cond, *Val);
-    }
+    // In transferCFGBlock(), we ensure that we always have a `Value` for the
+    // terminator condition, so assert this.
+    // We consciously assert ourselves instead of asserting via `cast()` so
+    // that we get a more meaningful line number if the assertion fails.
+    assert(Val != nullptr);
 
     bool ConditionValue = true;
     // The condition must be inverted for the successor that encompasses the
@@ -152,7 +144,6 @@ class TerminatorVisitor
     return {&Cond, ConditionValue};
   }
 
-  const StmtToEnvMap &StmtToEnv;
   Environment &Env;
   int BlockSuccIdx;
 };
@@ -335,10 +326,8 @@ computeBlockInputState(const CFGBlock &Block, AnalysisContext &AC) {
         // when the terminator is taken. Copy now.
         TypeErasedDataflowAnalysisState Copy = MaybePredState->fork();
 
-        const StmtToEnvMap StmtToEnv(AC.CFCtx, AC.BlockStates);
         auto [Cond, CondValue] =
-            TerminatorVisitor(StmtToEnv, Copy.Env,
-                              blockIndexInPredecessor(*Pred, Block))
+            TerminatorVisitor(Copy.Env, blockIndexInPredecessor(*Pred, Block))
                 .Visit(PredTerminatorStmt);
         if (Cond != nullptr)
           // FIXME: Call transferBranchTypeErased even if BuiltinTransferOpts
@@ -489,6 +478,31 @@ transferCFGBlock(const CFGBlock &Block, AnalysisContext &AC,
     }
     AC.Log.recordState(State);
   }
+
+  // If we have a terminator, evaluate its condition.
+  // This `Expr` may not appear as a `CFGElement` anywhere else, and it's
+  // important that we evaluate it here (rather than while processing the
+  // terminator) so that we put the corresponding value in the right
+  // environment.
+  if (const Expr *TerminatorCond =
+          dyn_cast_or_null<Expr>(Block.getTerminatorCondition())) {
+    if (State.Env.getValue(*TerminatorCond) == nullptr)
+      // FIXME: This only runs the builtin transfer, not the analysis-specific
+      // transfer. Fixing this isn't trivial, as the analysis-specific transfer
+      // takes a `CFGElement` as input, but some expressions only show up as a
+      // terminator condition, but not as a `CFGElement`. The condition of an if
+      // statement is one such example.
+      transfer(StmtToEnvMap(AC.CFCtx, AC.BlockStates), *TerminatorCond,
+               State.Env);
+
+    // If the transfer function didn't produce a value, create an atom so that
+    // we have *some* value for the condition expression. This ensures that
+    // when we extend the flow condition, it actually changes.
+    if (State.Env.getValue(*TerminatorCond) == nullptr)
+      State.Env.setValue(*TerminatorCond, State.Env.makeAtomicBoolValue());
+    AC.Log.recordState(State);
+  }
+
   return State;
 }
 
diff --git a/clang/unittests/Analysis/FlowSensitive/LoggerTest.cpp b/clang/unittests/Analysis/FlowSensitive/LoggerTest.cpp
index a60dbe1f61f6d6e..c5594aa3fe9d1fe 100644
--- a/clang/unittests/Analysis/FlowSensitive/LoggerTest.cpp
+++ b/clang/unittests/Analysis/FlowSensitive/LoggerTest.cpp
@@ -123,6 +123,7 @@ recordState(Elements=1, Branches=0, Joins=0)
 enterElement(b (ImplicitCastExpr, LValueToRValue, _Bool))
 transfer()
 recordState(Elements=2, Branches=0, Joins=0)
+recordState(Elements=2, Branches=0, Joins=0)
 
 enterBlock(3, false)
 transferBranch(0)
diff --git a/clang/unittests/Analysis/FlowSensitive/TransferTest.cpp b/clang/unittests/Analysis/FlowSensitive/TransferTest.cpp
index 056c4f3383d8322..6d88e25f77c8075 100644
--- a/clang/unittests/Analysis/FlowSensitive/TransferTest.cpp
+++ b/clang/unittests/Analysis/FlowSensitive/TransferTest.cpp
@@ -6408,4 +6408,35 @@ TEST(TransferTest, DifferentReferenceLocInJoin) {
       });
 }
 
+// This test verifies correct modeling of a relational dependency that goes
+// through unmodeled functions (the simple `cond()` in this case).
+TEST(TransferTest, ConditionalRelation) {
+  std::string Code = R"(
+    bool cond();
+    void target() {
+       bool a = true;
+       bool b = true;
+       if (cond()) {
+         a = false;
+         if (cond()) {
+           b = false;
+         }
+       }
+       (void)0;
+       // [[p]]
+    }
+ )";
+  runDataflow(
+      Code,
+      [](const llvm::StringMap<DataflowAnalysisState<NoopLattice>> &Results,
+         ASTContext &ASTCtx) {
+        const Environment &Env = getEnvironmentAtAnnotation(Results, "p");
+        auto &A = Env.arena();
+        auto &VarA = getValueForDecl<BoolValue>(ASTCtx, Env, "a").formula();
+        auto &VarB = getValueForDecl<BoolValue>(ASTCtx, Env, "b").formula();
+
+        EXPECT_FALSE(Env.allows(A.makeAnd(VarA, A.makeNot(VarB))));
+      });
+}
+
 } // namespace

@martinboehme
Copy link
Contributor Author

This is a re-land of #77750, which I reverted; it caused build bots to fail because TerminatorVisitor::StmtToEnv became unused.

This version removes TerminatorVisitor::StmtToEnv entirely.

Copy link
Collaborator

@Xazax-hun Xazax-hun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@martinboehme
Copy link
Contributor Author

The failing buildkite is merely a clang-format failure for a file not touched by this PR (clang/docs/HLSL/FunctionCalls.rst).

martinboehme added a commit to martinboehme/llvm-project that referenced this pull request Jan 16, 2024
This is to be consistent with `getValue()`, which also uses
`ignoreCFGOmittedNodes()`.

Before this fix, it was not possible to retrieve a `Value` from a "CFG omitted"
node that had previously been set using `setValue()`; see the accompanying test,
which fails without the fix.

I discovered this issue while running internal integration tests on
llvm#78127.
@martinboehme
Copy link
Contributor Author

Heads up: I'm holding off on merging this PR for the time being because I've discovered that it breaks some of our internal integration tests. The reason is a pre-existing bug, for which I have a fix in review at #78245.

The effect of the bug is that the following code in this PR does not guarantee that State.Env.getValue(*TerminatorCond) will return a non-null value after the code is run:

    if (State.Env.getValue(*TerminatorCond) == nullptr)
      State.Env.setValue(*TerminatorCond, State.Env.makeAtomicBoolValue());

As a result, the assertion assert(Val != nullptr); in extendFlowCondition() fails.

martinboehme added a commit that referenced this pull request Jan 16, 2024
)

This is to be consistent with `getValue()`, which also uses
`ignoreCFGOmittedNodes()`.

Before this fix, it was not possible to retrieve a `Value` from a "CFG
omitted"
node that had previously been set using `setValue()`; see the
accompanying test,
which fails without the fix.

I discovered this issue while running internal integration tests on
#78127.
copybara-service bot pushed a commit to google/crubit that referenced this pull request Jan 17, 2024
…#78127.

While working on the framework PR
[#78127](llvm/llvm-project#78127), I discovered that in
its current form it breaks an internal integration test. This patch adds an
equivalent of that internal test that is also broken by #78127 in its current
form.

I will modify #78127 to make this test pass before submitting.

PiperOrigin-RevId: 599139104
Change-Id: I2512b0aa733a585adab60ccf1a80d026ad15fe60
…ck()`.

In particular, it's important that we create the "fallback" atomic at this point
(which we produce if the transfer function didn't produce a value for the
expression) so that it is placed in the correct environment.

Previously, we processed the terminator condition in the `TerminatorVisitor`,
which put the fallback atomic in a copy of the environment that is produced as
input for the _successor_ block, rather than the environment for the block
containing the expression for which we produce the fallback atomic.

As a result, we produce different fallback atomics every time we process the
successor block, and hence we don't have a consistent representation of the
terminator condition in the flow condition.

This patch includes a test (authored by ymand@) that fails without the fix.
@martinboehme martinboehme force-pushed the piper_export_cl_565025613 branch from 2e104ab to d2fea00 Compare January 18, 2024 07:41
@martinboehme
Copy link
Contributor Author

PTAL

In internal testing, I discovered yet another issue: StmtToEnvMap::getEnvironment() was returning a “stale” version of the environment if the statement S is in the block currently being processed. We want getEnvironment() to return the “pending” environment that we’re currently mutating, but in fact, it was returning the environment for the last time we processed this block (because this is the environment stored in BlockToState).

This did not use to matter because a) StmtToEnvMap::getEnvironment() is only used when evaluating short-circuited boolean operators, b) these are almost always used in terminator conditions, and c) we used to evaluate terminator conditions outside of transferCfgBlock(), after the current state had already been written to BlockToState. But I believe this was always a bug and would have surfaced for short-circuited boolean operators used outside of terminator conditions.

Now that we evaluate the terminator condition within transferCfgBlock(), however, this bug has become very visible. I have added a test to SignAnalysisTest.cpp that triggers the bug; this test fails without the changes to StmtToEnvMap introduced in this latest iteration of the patch.

Most of the changes relative to the previously approved version of this patch are in Transfer.h, Transfer.cpp, and SignAnalysisTest.cpp. (Sorry for not breaking these changes out into a separate commit – my tooling makes this somewhat hard, and I hope the changes are easy enough to digest even without this.)

BlockToState)
: CFCtx(CFCtx), BlockToState(BlockToState) {}
BlockToState,
const CFGBlock &CurBlock,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about storing just the ID instead? That seems to more directly express the use of the data (which is just to check for identity rather than any deeper use).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point -- done.

auto [Cond, CondValue] =
TerminatorVisitor(StmtToEnv, Copy.Env,
blockIndexInPredecessor(*Pred, Block))
TerminatorVisitor(Copy.Env, blockIndexInPredecessor(*Pred, Block))
Copy link
Collaborator

@Xazax-hun Xazax-hun Jan 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I approved the PR because I believe it is a step in the right direction, but I can't help but feel like something is a bit off. I wonder if we could reorganize things a bit to make errors like this impossible to happen. Specifically, I wonder if it is possible to make sure we never need these extra analysis state copies during transfer, and we can always work on the state we have in place associated with the corresponding block.

Here, we evaluate the side effect of the terminator stmt multiple times, once for each of the successors of the block. This is OK here, since each evaluation is happening on a separate copy of the state. While it is OK, it feels a bit redundant and error prone. I wonder if it would be possible to actually do the transfer for the terminator only once, when the basic block is processed, and here only do the transferBranchTypeErased without ever doing a regular transfer.

@martinboehme martinboehme merged commit ccf1e32 into llvm:main Jan 23, 2024
justinfargnoli pushed a commit to justinfargnoli/llvm-project that referenced this pull request Jan 28, 2024
…m#78245)

This is to be consistent with `getValue()`, which also uses
`ignoreCFGOmittedNodes()`.

Before this fix, it was not possible to retrieve a `Value` from a "CFG
omitted"
node that had previously been set using `setValue()`; see the
accompanying test,
which fails without the fix.

I discovered this issue while running internal integration tests on
llvm#78127.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:analysis clang:dataflow Clang Dataflow Analysis framework - https://clang.llvm.org/docs/DataFlowAnalysisIntro.html clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants