Skip to content

[clang][dataflow] Check for backedges directly (instead of loop statements). #68923

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Oct 16, 2023

Conversation

ymand
Copy link
Collaborator

@ymand ymand commented Oct 12, 2023

Widen on backedge nodes, instead of nodes with a loop statement as terminator.
This fixes #67834 and a precision loss from assignment in a loop condition. The
commit contains tests for both of these issues.

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:dataflow Clang Dataflow Analysis framework - https://clang.llvm.org/docs/DataFlowAnalysisIntro.html clang:analysis labels Oct 12, 2023
@llvmbot
Copy link
Member

llvmbot commented Oct 12, 2023

@llvm/pr-subscribers-clang-analysis

@llvm/pr-subscribers-clang

Author: Yitzhak Mandelbaum (ymand)

Changes

Widen on backedge nodes, instead of nodes with a loop statement as terminator.
This fixes #67834 and a precision loss from assignment in a loop condition. The
commit contains tests for both of these issues.


Full diff: https://github.com/llvm/llvm-project/pull/68923.diff

3 Files Affected:

  • (modified) clang/lib/Analysis/FlowSensitive/TypeErasedDataflowAnalysis.cpp (+10-20)
  • (modified) clang/unittests/Analysis/FlowSensitive/TransferTest.cpp (+14)
  • (modified) clang/unittests/Analysis/FlowSensitive/TypeErasedDataflowAnalysisTest.cpp (+28)
diff --git a/clang/lib/Analysis/FlowSensitive/TypeErasedDataflowAnalysis.cpp b/clang/lib/Analysis/FlowSensitive/TypeErasedDataflowAnalysis.cpp
index 6b167891c1a3ac8..c635edd12219f0e 100644
--- a/clang/lib/Analysis/FlowSensitive/TypeErasedDataflowAnalysis.cpp
+++ b/clang/lib/Analysis/FlowSensitive/TypeErasedDataflowAnalysis.cpp
@@ -12,7 +12,6 @@
 //===----------------------------------------------------------------------===//
 
 #include <algorithm>
-#include <memory>
 #include <optional>
 #include <system_error>
 #include <utility>
@@ -33,8 +32,8 @@
 #include "clang/Analysis/FlowSensitive/TypeErasedDataflowAnalysis.h"
 #include "clang/Analysis/FlowSensitive/Value.h"
 #include "llvm/ADT/ArrayRef.h"
-#include "llvm/ADT/DenseSet.h"
 #include "llvm/ADT/STLExtras.h"
+#include "llvm/ADT/SmallBitVector.h"
 #include "llvm/Support/Debug.h"
 #include "llvm/Support/Error.h"
 
@@ -53,19 +52,8 @@ static int blockIndexInPredecessor(const CFGBlock &Pred,
   return BlockPos - Pred.succ_begin();
 }
 
-static bool isLoopHead(const CFGBlock &B) {
-  if (const auto *T = B.getTerminatorStmt())
-    switch (T->getStmtClass()) {
-      case Stmt::WhileStmtClass:
-      case Stmt::DoStmtClass:
-      case Stmt::ForStmtClass:
-      case Stmt::CXXForRangeStmtClass:
-        return true;
-      default:
-        return false;
-    }
-
-  return false;
+static bool isBackedgeNode(const CFGBlock &B) {
+ return B.getLoopTarget() != nullptr;
 }
 
 namespace {
@@ -502,14 +490,16 @@ runTypeErasedDataflowAnalysis(
         PostVisitCFG) {
   PrettyStackTraceAnalysis CrashInfo(CFCtx, "runTypeErasedDataflowAnalysis");
 
-  PostOrderCFGView POV(&CFCtx.getCFG());
-  ForwardDataflowWorklist Worklist(CFCtx.getCFG(), &POV);
+  const clang::CFG &CFG = CFCtx.getCFG();
+  PostOrderCFGView POV(&CFG);
+  ForwardDataflowWorklist Worklist(CFG, &POV);
+
 
   std::vector<std::optional<TypeErasedDataflowAnalysisState>> BlockStates(
-      CFCtx.getCFG().size());
+      CFG.size());
 
   // The entry basic block doesn't contain statements so it can be skipped.
-  const CFGBlock &Entry = CFCtx.getCFG().getEntry();
+  const CFGBlock &Entry = CFG.getEntry();
   BlockStates[Entry.getBlockID()] = {Analysis.typeErasedInitialElement(),
                                      InitEnv.fork()};
   Worklist.enqueueSuccessors(&Entry);
@@ -553,7 +543,7 @@ runTypeErasedDataflowAnalysis(
         llvm::errs() << "Old Env:\n";
         OldBlockState->Env.dump();
       });
-      if (isLoopHead(*Block)) {
+      if (isBackedgeNode(*Block)) {
         LatticeJoinEffect Effect1 = Analysis.widenTypeErased(
             NewBlockState.Lattice, OldBlockState->Lattice);
         LatticeJoinEffect Effect2 =
diff --git a/clang/unittests/Analysis/FlowSensitive/TransferTest.cpp b/clang/unittests/Analysis/FlowSensitive/TransferTest.cpp
index 632632a1b30e78b..7c697fa5f87d941 100644
--- a/clang/unittests/Analysis/FlowSensitive/TransferTest.cpp
+++ b/clang/unittests/Analysis/FlowSensitive/TransferTest.cpp
@@ -4099,6 +4099,20 @@ TEST(TransferTest, LoopDereferencingChangingRecordPointerConverges) {
   ASSERT_THAT_ERROR(checkDataflowWithNoopAnalysis(Code), llvm::Succeeded());
 }
 
+TEST(TransferTest, LoopWithDisjunctiveConditionConverges) {
+  std::string Code = R"cc(
+    bool foo();
+
+    void target() {
+      bool c = false;
+      while (foo() || foo()) {
+        c = true;
+      }
+    }
+  )cc";
+  ASSERT_THAT_ERROR(checkDataflowWithNoopAnalysis(Code), llvm::Succeeded());
+}
+
 TEST(TransferTest, DoesNotCrashOnUnionThisExpr) {
   std::string Code = R"(
     union Union {
diff --git a/clang/unittests/Analysis/FlowSensitive/TypeErasedDataflowAnalysisTest.cpp b/clang/unittests/Analysis/FlowSensitive/TypeErasedDataflowAnalysisTest.cpp
index 2425bb8711bdbaf..6800d736afd9b08 100644
--- a/clang/unittests/Analysis/FlowSensitive/TypeErasedDataflowAnalysisTest.cpp
+++ b/clang/unittests/Analysis/FlowSensitive/TypeErasedDataflowAnalysisTest.cpp
@@ -913,6 +913,34 @@ TEST_F(FlowConditionTest, WhileStmt) {
       });
 }
 
+TEST_F(FlowConditionTest, WhileStmtWithAssignmentInCondition) {
+  std::string Code = R"(
+    void target(bool Foo) {
+      // This test checks whether the analysis preserves the connection between
+      // the value of `Foo` and the assignment expression, despite widening.
+      // The equality operator generates a fresh boolean variable on each
+      // interpretation, which forces use of widening.
+      while ((Foo = (3 == 4))) {
+        (void)0;
+        /*[[p]]*/
+      }
+    }
+  )";
+  runDataflow(
+      Code,
+      [](const llvm::StringMap<DataflowAnalysisState<NoopLattice>> &Results,
+         ASTContext &ASTCtx) {
+        const ValueDecl *FooDecl = findValueDecl(ASTCtx, "Foo");
+        ASSERT_THAT(FooDecl, NotNull());
+
+        ASSERT_THAT(Results.keys(), UnorderedElementsAre("p"));
+        const Environment &Env = getEnvironmentAtAnnotation(Results, "p");
+
+        auto &FooVal = cast<BoolValue>(Env.getValue(*FooDecl))->formula();
+        EXPECT_TRUE(Env.flowConditionImplies(FooVal));
+      });
+}
+
 TEST_F(FlowConditionTest, Conjunction) {
   std::string Code = R"(
     void target(bool Foo, bool Bar) {

…ments).

Widen on backedge nodes, instead of nodes with a loop statement as terminator.
This fixes llvm#67834 and a precision loss from assignment in a loop condition. The
commit contains tests for both of these issues.
@github-actions
Copy link

github-actions bot commented Oct 12, 2023

✅ With the latest revision this PR passed the C/C++ code formatter.

Copy link
Collaborator

@Xazax-hun Xazax-hun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice fix! So, it looks like our definition of "loop head" was wrong.

@ymand
Copy link
Collaborator Author

ymand commented Oct 12, 2023

Nice fix! So, it looks like our definition of "loop head" was wrong.

Indeed. We didn't think of the way control-flow constructs inside the condition expression would be represented. Ironically (?), this is actually simpler and fixes another issue besides, which is nice. ;)

Copy link
Contributor

@martinboehme martinboehme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with comments

Comment on lines -505 to +501
PostOrderCFGView POV(&CFCtx.getCFG());
ForwardDataflowWorklist Worklist(CFCtx.getCFG(), &POV);
const clang::CFG &CFG = CFCtx.getCFG();
PostOrderCFGView POV(&CFG);
ForwardDataflowWorklist Worklist(CFG, &POV);

std::vector<std::optional<TypeErasedDataflowAnalysisState>> BlockStates(
CFCtx.getCFG().size());
CFG.size());

// The entry basic block doesn't contain statements so it can be skipped.
const CFGBlock &Entry = CFCtx.getCFG().getEntry();
const CFGBlock &Entry = CFG.getEntry();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: This refactoring is entirely independent of the rest of this PR. Suggest submitting changes like this as a separate "NFC" PR to minimize the size of PRs and ease review.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was originally part of this patch, but I ended up undoing some of the new changes. I considered separating this out, but I think for such small NFC changes you run the real risk that you simply won't follow up. That is, it's a great idea, but if we try to make this the norm, I think the result will be many lost small cleanup opportunities. Maybe that's worth it, for review clarity. I don't really know where to draw the line.

}

return false;
static bool isBackedgeNode(const CFGBlock &B) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a short comment explaining what a "backedge node" is?

(I think the term "backedge node" isn't otherwise defined in Clang or the framework -- or if it is, it would be useful to add a link.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/// Some blocks are used to represent the "loop edge" to the start of a loop

Fwiw, I think this comment is better than the one in the CFG, so I don't know that linking is worth much, especially since it can be found pretty easily just by looking at getLoopTarget.

Copy link
Contributor

@martinboehme martinboehme Oct 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but

  • It still requires work (not everyone has an IDE with great cross-linking, so this could be multiple clicks), and

  • We're introducing a new term here ("backedge node"), and it's not clear without an explanation whether a) we intend for this to be a synonym for "has loop target" (but then why aren't we reusing that term -- is it because we think "backedge node" is clearer?), or b) getLoopTarget() != nullptr happens to be the right implementation today, but we think it might change in the future, so we're wrapping this check in a function?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:analysis clang:dataflow Clang Dataflow Analysis framework - https://clang.llvm.org/docs/DataFlowAnalysisIntro.html clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Built-in analysis fails to converge on simple loops
4 participants