Skip to content

[clang][CFG] Change child order in Reverse Post Order (RPO) iteration. #80030

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 30, 2024

Conversation

ymand
Copy link
Collaborator

@ymand ymand commented Jan 30, 2024

The CFG orders the blocks of loop bodies before those of loop successors (both numerically, and in the successor order of the loop condition block). So, RPO necessarily reverses that order, placing the loop successor before the loop body. For many analyses, particularly those that converge to a fixpoint, this results in potentially significant extra work, because loop successors will necessarily need to be reconsidered once the algorithm has reached a fixpoint on the loop body.

This definition of CFG graph traits reverses the order of children, so that loop bodies will come first in an RPO. Then, the algorithm can fully evaluate the loop and only then consider successor blocks.

@ymand ymand requested a review from martinboehme January 30, 2024 16:45
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:dataflow Clang Dataflow Analysis framework - https://clang.llvm.org/docs/DataFlowAnalysisIntro.html clang:analysis labels Jan 30, 2024
@ymand ymand requested a review from Xazax-hun January 30, 2024 16:45
@llvmbot
Copy link
Member

llvmbot commented Jan 30, 2024

@llvm/pr-subscribers-clang-analysis

Author: Yitzhak Mandelbaum (ymand)

Changes

The CFG orders the blocks of loop bodies before those of loop successors
(both numerically, and in the successor order of the loop condition
block). So, RPO necessarily reverses that order, placing the loop successor
before the loop body. For many analyses, particularly those that converge
to a fixpoint, this results in potentially significant extra work, because loop
successors will necessarily need to be reconsidered once the algorithm has
reached a fixpoint on the loop body.

This definition of CFG graph traits reverses the order of children, so that loop
bodies will come first in an RPO. Then, the algorithm can fully evaluate the
loop and only then consider successor blocks.


Full diff: https://github.com/llvm/llvm-project/pull/80030.diff

3 Files Affected:

  • (modified) clang/include/clang/Analysis/Analyses/PostOrderCFGView.h (+35-1)
  • (modified) clang/unittests/Analysis/CFGTest.cpp (+1-9)
  • (modified) clang/unittests/Analysis/FlowSensitive/LoggerTest.cpp (+7-8)
diff --git a/clang/include/clang/Analysis/Analyses/PostOrderCFGView.h b/clang/include/clang/Analysis/Analyses/PostOrderCFGView.h
index 4356834adf76f..031c4bc15338e 100644
--- a/clang/include/clang/Analysis/Analyses/PostOrderCFGView.h
+++ b/clang/include/clang/Analysis/Analyses/PostOrderCFGView.h
@@ -70,7 +70,41 @@ class PostOrderCFGView : public ManagedAnalysis {
   };
 
 private:
-  using po_iterator = llvm::po_iterator<const CFG *, CFGBlockSet, true>;
+  // The CFG orders the blocks of loop bodies before those of loop successors
+  // (both numerically, and in the successor order of the loop condition
+  // block). So, RPO necessarily reverses that order, placing the loop successor
+  // *before* the loop body. For many analyses, particularly those that converge
+  // to a fixpoint, this results in potentially significant extra work as loop
+  // successors will necessarily need to be reconsidered once the algorithm has
+  // reached a fixpoint on the loop body.
+  //
+  // This definition of CFG graph traits reverses the order of children, so that
+  // loop bodies will come first in an RPO.
+  struct CFGLoopBodyFirstTraits {
+    using NodeRef = const ::clang::CFGBlock *;
+    using ChildIteratorType = ::clang::CFGBlock::const_succ_reverse_iterator;
+
+    static ChildIteratorType child_begin(NodeRef N) { return N->succ_rbegin(); }
+    static ChildIteratorType child_end(NodeRef N) { return N->succ_rend(); }
+
+    using nodes_iterator = ::clang::CFG::const_iterator;
+
+    static NodeRef getEntryNode(const ::clang::CFG *F) {
+      return &F->getEntry();
+    }
+
+    static nodes_iterator nodes_begin(const ::clang::CFG *F) {
+      return F->nodes_begin();
+    }
+
+    static nodes_iterator nodes_end(const ::clang::CFG *F) {
+      return F->nodes_end();
+    }
+
+    static unsigned size(const ::clang::CFG *F) { return F->size(); }
+  };
+  using po_iterator =
+      llvm::po_iterator<const CFG *, CFGBlockSet, true, CFGLoopBodyFirstTraits>;
   std::vector<const CFGBlock *> Blocks;
 
   using BlockOrderTy = llvm::DenseMap<const CFGBlock *, unsigned>;
diff --git a/clang/unittests/Analysis/CFGTest.cpp b/clang/unittests/Analysis/CFGTest.cpp
index 9fa034a0e472e..2b27da0081425 100644
--- a/clang/unittests/Analysis/CFGTest.cpp
+++ b/clang/unittests/Analysis/CFGTest.cpp
@@ -276,14 +276,6 @@ TEST(CFG, Worklists) {
                            ForwardNodes.begin()));
   }
 
-  // RPO: 876321054
-  // WTO: 876534210
-  // So, we construct the WTO order accordingly from the reference order.
-  std::vector<const CFGBlock *> WTOOrder = {
-      ReferenceOrder[0], ReferenceOrder[1], ReferenceOrder[2],
-      ReferenceOrder[7], ReferenceOrder[3], ReferenceOrder[8],
-      ReferenceOrder[4], ReferenceOrder[5], ReferenceOrder[6]};
-
   {
     using ::testing::ElementsAreArray;
     std::optional<WeakTopologicalOrdering> WTO = getIntervalWTO(*CFG);
@@ -297,7 +289,7 @@ TEST(CFG, Worklists) {
     while (const CFGBlock *B = WTOWorklist.dequeue())
       WTONodes.push_back(B);
 
-    EXPECT_THAT(WTONodes, ElementsAreArray(WTOOrder));
+    EXPECT_THAT(WTONodes, ElementsAreArray(*WTO));
   }
 
   std::reverse(ReferenceOrder.begin(), ReferenceOrder.end());
diff --git a/clang/unittests/Analysis/FlowSensitive/LoggerTest.cpp b/clang/unittests/Analysis/FlowSensitive/LoggerTest.cpp
index c5594aa3fe9d1..57920c49a7d3d 100644
--- a/clang/unittests/Analysis/FlowSensitive/LoggerTest.cpp
+++ b/clang/unittests/Analysis/FlowSensitive/LoggerTest.cpp
@@ -5,7 +5,6 @@
 #include "clang/Analysis/FlowSensitive/DataflowLattice.h"
 #include "llvm/Testing/Support/Error.h"
 #include "gtest/gtest.h"
-#include <optional>
 
 namespace clang::dataflow::test {
 namespace {
@@ -90,7 +89,7 @@ class TestLogger : public Logger {
 AnalysisInputs<TestAnalysis> makeInputs() {
   const char *Code = R"cpp(
 int target(bool b, int p, int q) {
-  return b ? p : q;    
+  return b ? p : q;
 }
 )cpp";
   static const std::vector<std::string> Args = {
@@ -125,17 +124,17 @@ transfer()
 recordState(Elements=2, Branches=0, Joins=0)
 recordState(Elements=2, Branches=0, Joins=0)
 
-enterBlock(3, false)
-transferBranch(0)
+enterBlock(2, false)
+transferBranch(1)
 recordState(Elements=2, Branches=1, Joins=0)
-enterElement(q)
+enterElement(p)
 transfer()
 recordState(Elements=3, Branches=1, Joins=0)
 
-enterBlock(2, false)
-transferBranch(1)
+enterBlock(3, false)
+transferBranch(0)
 recordState(Elements=2, Branches=1, Joins=0)
-enterElement(p)
+enterElement(q)
 transfer()
 recordState(Elements=3, Branches=1, Joins=0)
 

@llvmbot
Copy link
Member

llvmbot commented Jan 30, 2024

@llvm/pr-subscribers-clang

Author: Yitzhak Mandelbaum (ymand)

Changes

The CFG orders the blocks of loop bodies before those of loop successors
(both numerically, and in the successor order of the loop condition
block). So, RPO necessarily reverses that order, placing the loop successor
before the loop body. For many analyses, particularly those that converge
to a fixpoint, this results in potentially significant extra work, because loop
successors will necessarily need to be reconsidered once the algorithm has
reached a fixpoint on the loop body.

This definition of CFG graph traits reverses the order of children, so that loop
bodies will come first in an RPO. Then, the algorithm can fully evaluate the
loop and only then consider successor blocks.


Full diff: https://github.com/llvm/llvm-project/pull/80030.diff

3 Files Affected:

  • (modified) clang/include/clang/Analysis/Analyses/PostOrderCFGView.h (+35-1)
  • (modified) clang/unittests/Analysis/CFGTest.cpp (+1-9)
  • (modified) clang/unittests/Analysis/FlowSensitive/LoggerTest.cpp (+7-8)
diff --git a/clang/include/clang/Analysis/Analyses/PostOrderCFGView.h b/clang/include/clang/Analysis/Analyses/PostOrderCFGView.h
index 4356834adf76f..031c4bc15338e 100644
--- a/clang/include/clang/Analysis/Analyses/PostOrderCFGView.h
+++ b/clang/include/clang/Analysis/Analyses/PostOrderCFGView.h
@@ -70,7 +70,41 @@ class PostOrderCFGView : public ManagedAnalysis {
   };
 
 private:
-  using po_iterator = llvm::po_iterator<const CFG *, CFGBlockSet, true>;
+  // The CFG orders the blocks of loop bodies before those of loop successors
+  // (both numerically, and in the successor order of the loop condition
+  // block). So, RPO necessarily reverses that order, placing the loop successor
+  // *before* the loop body. For many analyses, particularly those that converge
+  // to a fixpoint, this results in potentially significant extra work as loop
+  // successors will necessarily need to be reconsidered once the algorithm has
+  // reached a fixpoint on the loop body.
+  //
+  // This definition of CFG graph traits reverses the order of children, so that
+  // loop bodies will come first in an RPO.
+  struct CFGLoopBodyFirstTraits {
+    using NodeRef = const ::clang::CFGBlock *;
+    using ChildIteratorType = ::clang::CFGBlock::const_succ_reverse_iterator;
+
+    static ChildIteratorType child_begin(NodeRef N) { return N->succ_rbegin(); }
+    static ChildIteratorType child_end(NodeRef N) { return N->succ_rend(); }
+
+    using nodes_iterator = ::clang::CFG::const_iterator;
+
+    static NodeRef getEntryNode(const ::clang::CFG *F) {
+      return &F->getEntry();
+    }
+
+    static nodes_iterator nodes_begin(const ::clang::CFG *F) {
+      return F->nodes_begin();
+    }
+
+    static nodes_iterator nodes_end(const ::clang::CFG *F) {
+      return F->nodes_end();
+    }
+
+    static unsigned size(const ::clang::CFG *F) { return F->size(); }
+  };
+  using po_iterator =
+      llvm::po_iterator<const CFG *, CFGBlockSet, true, CFGLoopBodyFirstTraits>;
   std::vector<const CFGBlock *> Blocks;
 
   using BlockOrderTy = llvm::DenseMap<const CFGBlock *, unsigned>;
diff --git a/clang/unittests/Analysis/CFGTest.cpp b/clang/unittests/Analysis/CFGTest.cpp
index 9fa034a0e472e..2b27da0081425 100644
--- a/clang/unittests/Analysis/CFGTest.cpp
+++ b/clang/unittests/Analysis/CFGTest.cpp
@@ -276,14 +276,6 @@ TEST(CFG, Worklists) {
                            ForwardNodes.begin()));
   }
 
-  // RPO: 876321054
-  // WTO: 876534210
-  // So, we construct the WTO order accordingly from the reference order.
-  std::vector<const CFGBlock *> WTOOrder = {
-      ReferenceOrder[0], ReferenceOrder[1], ReferenceOrder[2],
-      ReferenceOrder[7], ReferenceOrder[3], ReferenceOrder[8],
-      ReferenceOrder[4], ReferenceOrder[5], ReferenceOrder[6]};
-
   {
     using ::testing::ElementsAreArray;
     std::optional<WeakTopologicalOrdering> WTO = getIntervalWTO(*CFG);
@@ -297,7 +289,7 @@ TEST(CFG, Worklists) {
     while (const CFGBlock *B = WTOWorklist.dequeue())
       WTONodes.push_back(B);
 
-    EXPECT_THAT(WTONodes, ElementsAreArray(WTOOrder));
+    EXPECT_THAT(WTONodes, ElementsAreArray(*WTO));
   }
 
   std::reverse(ReferenceOrder.begin(), ReferenceOrder.end());
diff --git a/clang/unittests/Analysis/FlowSensitive/LoggerTest.cpp b/clang/unittests/Analysis/FlowSensitive/LoggerTest.cpp
index c5594aa3fe9d1..57920c49a7d3d 100644
--- a/clang/unittests/Analysis/FlowSensitive/LoggerTest.cpp
+++ b/clang/unittests/Analysis/FlowSensitive/LoggerTest.cpp
@@ -5,7 +5,6 @@
 #include "clang/Analysis/FlowSensitive/DataflowLattice.h"
 #include "llvm/Testing/Support/Error.h"
 #include "gtest/gtest.h"
-#include <optional>
 
 namespace clang::dataflow::test {
 namespace {
@@ -90,7 +89,7 @@ class TestLogger : public Logger {
 AnalysisInputs<TestAnalysis> makeInputs() {
   const char *Code = R"cpp(
 int target(bool b, int p, int q) {
-  return b ? p : q;    
+  return b ? p : q;
 }
 )cpp";
   static const std::vector<std::string> Args = {
@@ -125,17 +124,17 @@ transfer()
 recordState(Elements=2, Branches=0, Joins=0)
 recordState(Elements=2, Branches=0, Joins=0)
 
-enterBlock(3, false)
-transferBranch(0)
+enterBlock(2, false)
+transferBranch(1)
 recordState(Elements=2, Branches=1, Joins=0)
-enterElement(q)
+enterElement(p)
 transfer()
 recordState(Elements=3, Branches=1, Joins=0)
 
-enterBlock(2, false)
-transferBranch(1)
+enterBlock(3, false)
+transferBranch(0)
 recordState(Elements=2, Branches=1, Joins=0)
-enterElement(p)
+enterElement(q)
 transfer()
 recordState(Elements=3, Branches=1, Joins=0)
 

@ymand
Copy link
Collaborator Author

ymand commented Jan 30, 2024

Gabor, please let me know if you think anyone else should review this change, given that there are assorted clients of the PostCFG view. We think this is a benefit for anyone using a post-order view in a worklist algorithm, but have a hard time anticipating all the potential uses.

@Xazax-hun
Copy link
Collaborator

Xazax-hun commented Jan 30, 2024

I agree that this is a change other clients would also benefit from. I am not that worried about breaking other users, RPO is a partial ordering, and now we are picking a different (but valid) order. Clients depending on this detail are depending on an implementation detail and it would probably be Hyrum's law in action. I do not have an example where the old behavior would be desirable, so I think it might be better to change/fix the clients in that case. If we are proven wrong, we can always introduce a non-type template argument as a customization point to revert back to the old behavior.

The CFG orders the blocks of loop bodies before those of loop successors
(both numerically, and in the successor order of the loop condition
block). So, RPO necessarily reverses that order, placing the loop successor
*before* the loop body. For many analyses, particularly those that converge
to a fixpoint, this results in potentially significant extra work, because loop
successors will necessarily need to be reconsidered once the algorithm has
reached a fixpoint on the loop body.

This definition of CFG graph traits reverses the order of children, so that loop
bodies will come first in an RPO. Then, the algorithm can fully evaluate the
loop and only then consider successor blocks.
@ymand ymand merged commit 2c5a0d3 into llvm:main Jan 30, 2024
@ymand ymand deleted the fix-rpo branch January 30, 2024 21:09
copybara-service bot pushed a commit to google/crubit that referenced this pull request Jan 31, 2024
This expands to three nested for loops and led to "max blocks visited" errors
when ymand@ initially tried to change the order in which block successors are
visited in the dataflow framework
([#80030](llvm/llvm-project#80030)).

We now no longer see these "max blocks visited" errors with this test because
of
[other](3dfd35e)
[fixes](llvm/llvm-project#80033) that have landed in the
meantime. Nevertheless, it seems worthwhile to put this test in place to prevent
regressions.

PiperOrigin-RevId: 603020024
Change-Id: I7a8428a77a2f4252d012eec30e42ccb0c1b96bc5
copybara-service bot pushed a commit to google/crubit that referenced this pull request Jan 31, 2024
Our intent is to signal "perform the default behavior", i.e. keep the
`MergedVal` that has been produced by the framework. This would be done by
returning true (which is also what the [default
implementation](https://github.com/llvm/llvm-project/blob/d5c9d402f07e7448cd46870a59d981f582682548/clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h#L105)
does).

However, we were returning false, which means that the framework should drop
the `MergedVal` and remove the corresponding entry from `LocToVal`.

This was initially causing "max blocks visited" errors on the
`TriplyNestedForLoopSingleIteration` test when ymand@ tried to
change the order in which the dataflow framework visits block successors
([#80030](llvm/llvm-project#80030)).

We now no longer see these errors because of
[other](3dfd35e)
[fixes](llvm/llvm-project#80033) that have landed in the
meantime. Nevertheless, it seems worthwhile to fix `merge()`, as this reduces
the number of block visits required to achieve convergence. On the
`TriplyNestedForLoopSingleIteration` test, we require 66 block visits before
this patch, and 53 block visits after.

The issue was that when performing the join at the loop head, our custom merge
logic was causing the merged value for the variable `int c` to be discarded from
`LocToVal`. This in turn was blocking convergence because widening
[considers](https://github.com/llvm/llvm-project/blob/d5c9d402f07e7448cd46870a59d981f582682548/clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp#L626)
two environments to be different if the sizes of their `LocToVal` maps differ.

As an aside, widening is inconsistent in this respect with the convergence check
that we perform on blocks that don't get widened.
[`Environment::equivalentTo()`](https://github.com/llvm/llvm-project/blob/d5c9d402f07e7448cd46870a59d981f582682548/clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp#L570),
which performs this check, uses `compareKeyToValueMaps()` to compare the
`LocToVal` maps of the two environments, and `compareKeyToValueMaps()` only
compares values for locations that are in both `LocToVal` maps and ignores
locations that only occur in one of the maps. We should probably fix this
inconsistency.

PiperOrigin-RevId: 603025629
Change-Id: Id02993e576023c44a335c8991020f7e233a75320
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:analysis clang:dataflow Clang Dataflow Analysis framework - https://clang.llvm.org/docs/DataFlowAnalysisIntro.html clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants