Skip to content

[clang][dataflow] Add an early-out to flowConditionImplies() / flowConditionAllows(). #77453

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 9, 2024

Conversation

martinboehme
Copy link
Contributor

This saves having to assemble the set of constraints and run the SAT solver in
the trivial case where F is true.

This is a performance win on the benchmarks for the Crubit nullability checker:

name                              old cpu/op   new cpu/op   delta
BM_PointerAnalysisCopyPointer     64.1µs ± 5%  63.1µs ± 0%   -1.56%  (p=0.000 n=20+17)
BM_PointerAnalysisIntLoop          172µs ± 2%   171µs ± 0%     ~     (p=0.752 n=20+17)
BM_PointerAnalysisPointerLoop      408µs ± 3%   355µs ± 0%  -12.99%  (p=0.000 n=20+17)
BM_PointerAnalysisBranch           201µs ± 2%   184µs ± 0%   -8.28%  (p=0.000 n=20+19)
BM_PointerAnalysisLoopAndBranch    684µs ± 2%   613µs ± 2%  -10.38%  (p=0.000 n=20+19)
BM_PointerAnalysisTwoLoops         309µs ± 2%   308µs ± 2%     ~     (p=0.728 n=20+19)
BM_PointerAnalysisJoinFilePath    37.9ms ± 2%  37.9ms ± 2%   +0.06%  (p=0.041 n=20+19)
BM_PointerAnalysisCallInLoop      26.5ms ± 2%  26.4ms ± 4%   -0.59%  (p=0.024 n=20+20)

When running clang-tidy on real-world code, the results are less clear. In
three runs, averaged, on an arbitrarily chosen input file, I get 11.91 s of user
time without this patch and 11.81 s with it, though with considerable
measurement noise (I'm seeing up to 0.2 s of variation between runs).

Still, this is a very simple change, and it is a clear win in benchmarks, so I
think it is worth making.

…wConditionAllows()`.

This saves having to assemble the set of constraints and run the SAT solver in
the trivial case where `F` is true.

This is a performance win on the benchmarks for the Crubit nullability checker:

```
name                              old cpu/op   new cpu/op   delta
BM_PointerAnalysisCopyPointer     64.1µs ± 5%  63.1µs ± 0%   -1.56%  (p=0.000 n=20+17)
BM_PointerAnalysisIntLoop          172µs ± 2%   171µs ± 0%     ~     (p=0.752 n=20+17)
BM_PointerAnalysisPointerLoop      408µs ± 3%   355µs ± 0%  -12.99%  (p=0.000 n=20+17)
BM_PointerAnalysisBranch           201µs ± 2%   184µs ± 0%   -8.28%  (p=0.000 n=20+19)
BM_PointerAnalysisLoopAndBranch    684µs ± 2%   613µs ± 2%  -10.38%  (p=0.000 n=20+19)
BM_PointerAnalysisTwoLoops         309µs ± 2%   308µs ± 2%     ~     (p=0.728 n=20+19)
BM_PointerAnalysisJoinFilePath    37.9ms ± 2%  37.9ms ± 2%   +0.06%  (p=0.041 n=20+19)
BM_PointerAnalysisCallInLoop      26.5ms ± 2%  26.4ms ± 4%   -0.59%  (p=0.024 n=20+20)
```

When running clang-tidy on real-world code, the results are less clear. In
three runs, averaged, on an arbitrarily chosen input file, I get 11.91 s of user
time without this patch and 11.81 s with it, though with considerable
measurement noise (I'm seeing up to 0.2 s of variation between runs).

Still, this is a very simple change, and it is a clear win in benchmarks, so I
think it is worth making.
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:dataflow Clang Dataflow Analysis framework - https://clang.llvm.org/docs/DataFlowAnalysisIntro.html clang:analysis labels Jan 9, 2024
@llvmbot
Copy link
Member

llvmbot commented Jan 9, 2024

@llvm/pr-subscribers-clang

Author: None (martinboehme)

Changes

This saves having to assemble the set of constraints and run the SAT solver in
the trivial case where F is true.

This is a performance win on the benchmarks for the Crubit nullability checker:

name                              old cpu/op   new cpu/op   delta
BM_PointerAnalysisCopyPointer     64.1µs ± 5%  63.1µs ± 0%   -1.56%  (p=0.000 n=20+17)
BM_PointerAnalysisIntLoop          172µs ± 2%   171µs ± 0%     ~     (p=0.752 n=20+17)
BM_PointerAnalysisPointerLoop      408µs ± 3%   355µs ± 0%  -12.99%  (p=0.000 n=20+17)
BM_PointerAnalysisBranch           201µs ± 2%   184µs ± 0%   -8.28%  (p=0.000 n=20+19)
BM_PointerAnalysisLoopAndBranch    684µs ± 2%   613µs ± 2%  -10.38%  (p=0.000 n=20+19)
BM_PointerAnalysisTwoLoops         309µs ± 2%   308µs ± 2%     ~     (p=0.728 n=20+19)
BM_PointerAnalysisJoinFilePath    37.9ms ± 2%  37.9ms ± 2%   +0.06%  (p=0.041 n=20+19)
BM_PointerAnalysisCallInLoop      26.5ms ± 2%  26.4ms ± 4%   -0.59%  (p=0.024 n=20+20)

When running clang-tidy on real-world code, the results are less clear. In
three runs, averaged, on an arbitrarily chosen input file, I get 11.91 s of user
time without this patch and 11.81 s with it, though with considerable
measurement noise (I'm seeing up to 0.2 s of variation between runs).

Still, this is a very simple change, and it is a clear win in benchmarks, so I
think it is worth making.


Full diff: https://github.com/llvm/llvm-project/pull/77453.diff

2 Files Affected:

  • (modified) clang/include/clang/Analysis/FlowSensitive/Formula.h (+4)
  • (modified) clang/lib/Analysis/FlowSensitive/DataflowAnalysisContext.cpp (+6)
diff --git a/clang/include/clang/Analysis/FlowSensitive/Formula.h b/clang/include/clang/Analysis/FlowSensitive/Formula.h
index 982e400c1deff1..0e6352403a832f 100644
--- a/clang/include/clang/Analysis/FlowSensitive/Formula.h
+++ b/clang/include/clang/Analysis/FlowSensitive/Formula.h
@@ -75,6 +75,10 @@ class alignas(const Formula *) Formula {
     return static_cast<bool>(Value);
   }
 
+  bool isLiteral(bool b) const {
+    return kind() == Literal && static_cast<bool>(Value) == b;
+  }
+
   ArrayRef<const Formula *> operands() const {
     return ArrayRef(reinterpret_cast<Formula *const *>(this + 1),
                     numOperands(kind()));
diff --git a/clang/lib/Analysis/FlowSensitive/DataflowAnalysisContext.cpp b/clang/lib/Analysis/FlowSensitive/DataflowAnalysisContext.cpp
index fa114979c8e326..500fbb39955d20 100644
--- a/clang/lib/Analysis/FlowSensitive/DataflowAnalysisContext.cpp
+++ b/clang/lib/Analysis/FlowSensitive/DataflowAnalysisContext.cpp
@@ -174,6 +174,9 @@ Solver::Result DataflowAnalysisContext::querySolver(
 
 bool DataflowAnalysisContext::flowConditionImplies(Atom Token,
                                                    const Formula &F) {
+  if (F.isLiteral(true))
+    return true;
+
   // Returns true if and only if truth assignment of the flow condition implies
   // that `F` is also true. We prove whether or not this property holds by
   // reducing the problem to satisfiability checking. In other words, we attempt
@@ -188,6 +191,9 @@ bool DataflowAnalysisContext::flowConditionImplies(Atom Token,
 
 bool DataflowAnalysisContext::flowConditionAllows(Atom Token,
                                                   const Formula &F) {
+  if (F.isLiteral(true))
+    return true;
+
   llvm::SetVector<const Formula *> Constraints;
   Constraints.insert(&arena().makeAtomRef(Token));
   Constraints.insert(&F);

@llvmbot
Copy link
Member

llvmbot commented Jan 9, 2024

@llvm/pr-subscribers-clang-analysis

Author: None (martinboehme)

Changes

This saves having to assemble the set of constraints and run the SAT solver in
the trivial case where F is true.

This is a performance win on the benchmarks for the Crubit nullability checker:

name                              old cpu/op   new cpu/op   delta
BM_PointerAnalysisCopyPointer     64.1µs ± 5%  63.1µs ± 0%   -1.56%  (p=0.000 n=20+17)
BM_PointerAnalysisIntLoop          172µs ± 2%   171µs ± 0%     ~     (p=0.752 n=20+17)
BM_PointerAnalysisPointerLoop      408µs ± 3%   355µs ± 0%  -12.99%  (p=0.000 n=20+17)
BM_PointerAnalysisBranch           201µs ± 2%   184µs ± 0%   -8.28%  (p=0.000 n=20+19)
BM_PointerAnalysisLoopAndBranch    684µs ± 2%   613µs ± 2%  -10.38%  (p=0.000 n=20+19)
BM_PointerAnalysisTwoLoops         309µs ± 2%   308µs ± 2%     ~     (p=0.728 n=20+19)
BM_PointerAnalysisJoinFilePath    37.9ms ± 2%  37.9ms ± 2%   +0.06%  (p=0.041 n=20+19)
BM_PointerAnalysisCallInLoop      26.5ms ± 2%  26.4ms ± 4%   -0.59%  (p=0.024 n=20+20)

When running clang-tidy on real-world code, the results are less clear. In
three runs, averaged, on an arbitrarily chosen input file, I get 11.91 s of user
time without this patch and 11.81 s with it, though with considerable
measurement noise (I'm seeing up to 0.2 s of variation between runs).

Still, this is a very simple change, and it is a clear win in benchmarks, so I
think it is worth making.


Full diff: https://github.com/llvm/llvm-project/pull/77453.diff

2 Files Affected:

  • (modified) clang/include/clang/Analysis/FlowSensitive/Formula.h (+4)
  • (modified) clang/lib/Analysis/FlowSensitive/DataflowAnalysisContext.cpp (+6)
diff --git a/clang/include/clang/Analysis/FlowSensitive/Formula.h b/clang/include/clang/Analysis/FlowSensitive/Formula.h
index 982e400c1deff1..0e6352403a832f 100644
--- a/clang/include/clang/Analysis/FlowSensitive/Formula.h
+++ b/clang/include/clang/Analysis/FlowSensitive/Formula.h
@@ -75,6 +75,10 @@ class alignas(const Formula *) Formula {
     return static_cast<bool>(Value);
   }
 
+  bool isLiteral(bool b) const {
+    return kind() == Literal && static_cast<bool>(Value) == b;
+  }
+
   ArrayRef<const Formula *> operands() const {
     return ArrayRef(reinterpret_cast<Formula *const *>(this + 1),
                     numOperands(kind()));
diff --git a/clang/lib/Analysis/FlowSensitive/DataflowAnalysisContext.cpp b/clang/lib/Analysis/FlowSensitive/DataflowAnalysisContext.cpp
index fa114979c8e326..500fbb39955d20 100644
--- a/clang/lib/Analysis/FlowSensitive/DataflowAnalysisContext.cpp
+++ b/clang/lib/Analysis/FlowSensitive/DataflowAnalysisContext.cpp
@@ -174,6 +174,9 @@ Solver::Result DataflowAnalysisContext::querySolver(
 
 bool DataflowAnalysisContext::flowConditionImplies(Atom Token,
                                                    const Formula &F) {
+  if (F.isLiteral(true))
+    return true;
+
   // Returns true if and only if truth assignment of the flow condition implies
   // that `F` is also true. We prove whether or not this property holds by
   // reducing the problem to satisfiability checking. In other words, we attempt
@@ -188,6 +191,9 @@ bool DataflowAnalysisContext::flowConditionImplies(Atom Token,
 
 bool DataflowAnalysisContext::flowConditionAllows(Atom Token,
                                                   const Formula &F) {
+  if (F.isLiteral(true))
+    return true;
+
   llvm::SetVector<const Formula *> Constraints;
   Constraints.insert(&arena().makeAtomRef(Token));
   Constraints.insert(&F);

@martinboehme martinboehme merged commit 03a0bfa into llvm:main Jan 9, 2024
@martinboehme
Copy link
Contributor Author

This breaks some internal integration tests. Reverting while we investigate.

martinboehme added a commit that referenced this pull request Jan 10, 2024
…` / `flowConditionAllows()`. (#77453)"

This reverts commit 03a0bfa.
martinboehme added a commit that referenced this pull request Jan 10, 2024
@martinboehme
Copy link
Contributor Author

Closing the loop: My logic for flowConditionAllows() was wrong. If the flow condition is contradictory, flowConditionAllows(true) == false, so the early-out was wrong.

@martinboehme martinboehme deleted the piper_export_cl_596889570 branch January 15, 2024 15:40
martinboehme added a commit to martinboehme/llvm-project that referenced this pull request Jan 15, 2024
…wConditionAllows()`.

This saves having to assemble the set of constraints and run the SAT solver in
the trivial case of `flowConditionImplies(true)` or
`flowConditionAllows(false)`.

This is an update / reland of my previous reverted
[llvm#77453](llvm#77453). That PR contained a
logic bug -- the early-out for `flowConditionAllows()` was wrong because my
intuition about the logic was wrong. (In particular, note that
`flowConditionImplies(F)` does not imply `flowConditionAllows(F)`, even though
this may run counter to intuition.)

I've now done what I should have done on the first iteration and added more
tests. These pass both with and without my early-outs.

This patch is a performance win on the benchmarks for the Crubit nullability
checker, except for one slight regression on a relatively short benchmark:

```
name                              old cpu/op   new cpu/op   delta
BM_PointerAnalysisCopyPointer     68.5µs ± 7%  67.6µs ± 4%    ~     (p=0.159 n=18+19)
BM_PointerAnalysisIntLoop          173µs ± 3%   162µs ± 4%  -6.40%  (p=0.000 n=19+20)
BM_PointerAnalysisPointerLoop      307µs ± 2%   312µs ± 4%  +1.56%  (p=0.013 n=18+20)
BM_PointerAnalysisBranch           199µs ± 4%   181µs ± 4%  -8.81%  (p=0.000 n=20+20)
BM_PointerAnalysisLoopAndBranch    503µs ± 3%   508µs ± 2%    ~     (p=0.081 n=18+19)
BM_PointerAnalysisTwoLoops         304µs ± 4%   286µs ± 2%  -6.04%  (p=0.000 n=19+20)
BM_PointerAnalysisJoinFilePath    4.78ms ± 3%  4.54ms ± 4%  -4.97%  (p=0.000 n=20+20)
BM_PointerAnalysisCallInLoop      3.05ms ± 3%  2.90ms ± 4%  -5.05%  (p=0.000 n=19+20)
```

When running clang-tidy on real-world code, the results are less clear. In
three runs, averaged, on an arbitrarily chosen input file, I get 11.60 s of user
time without this patch and 11.40 s with it, though with considerable
measurement noise (I'm seeing up to 0.2 s of variation between runs).

Still, this is a very simple change, and it is a clear win in benchmarks, so I
think it is worth making.
martinboehme added a commit that referenced this pull request Jan 16, 2024
…wConditionAllows()`. (#78172)

This saves having to assemble the set of constraints and run the SAT
solver in
the trivial case of `flowConditionImplies(true)` or
`flowConditionAllows(false)`.

This is an update / reland of my previous reverted
[#77453](#77453). That PR
contained a
logic bug -- the early-out for `flowConditionAllows()` was wrong because
my
intuition about the logic was wrong. (In particular, note that
`flowConditionImplies(F)` does not imply `flowConditionAllows(F)`, even
though
this may run counter to intuition.)

I've now done what I should have done on the first iteration and added
more
tests. These pass both with and without my early-outs.

This patch is a performance win on the benchmarks for the Crubit
nullability
checker, except for one slight regression on a relatively short
benchmark:

```
name                              old cpu/op   new cpu/op   delta
BM_PointerAnalysisCopyPointer     68.5µs ± 7%  67.6µs ± 4%    ~     (p=0.159 n=18+19)
BM_PointerAnalysisIntLoop          173µs ± 3%   162µs ± 4%  -6.40%  (p=0.000 n=19+20)
BM_PointerAnalysisPointerLoop      307µs ± 2%   312µs ± 4%  +1.56%  (p=0.013 n=18+20)
BM_PointerAnalysisBranch           199µs ± 4%   181µs ± 4%  -8.81%  (p=0.000 n=20+20)
BM_PointerAnalysisLoopAndBranch    503µs ± 3%   508µs ± 2%    ~     (p=0.081 n=18+19)
BM_PointerAnalysisTwoLoops         304µs ± 4%   286µs ± 2%  -6.04%  (p=0.000 n=19+20)
BM_PointerAnalysisJoinFilePath    4.78ms ± 3%  4.54ms ± 4%  -4.97%  (p=0.000 n=20+20)
BM_PointerAnalysisCallInLoop      3.05ms ± 3%  2.90ms ± 4%  -5.05%  (p=0.000 n=19+20)
```

When running clang-tidy on real-world code, the results are less clear.
In
three runs, averaged, on an arbitrarily chosen input file, I get 11.60 s
of user
time without this patch and 11.40 s with it, though with considerable
measurement noise (I'm seeing up to 0.2 s of variation between runs).

Still, this is a very simple change, and it is a clear win in
benchmarks, so I
think it is worth making.
justinfargnoli pushed a commit to justinfargnoli/llvm-project that referenced this pull request Jan 28, 2024
…wConditionAllows()`. (llvm#77453)

This saves having to assemble the set of constraints and run the SAT
solver in
the trivial case where `F` is true.

This is a performance win on the benchmarks for the Crubit nullability
checker:

```
name                              old cpu/op   new cpu/op   delta
BM_PointerAnalysisCopyPointer     64.1µs ± 5%  63.1µs ± 0%   -1.56%  (p=0.000 n=20+17)
BM_PointerAnalysisIntLoop          172µs ± 2%   171µs ± 0%     ~     (p=0.752 n=20+17)
BM_PointerAnalysisPointerLoop      408µs ± 3%   355µs ± 0%  -12.99%  (p=0.000 n=20+17)
BM_PointerAnalysisBranch           201µs ± 2%   184µs ± 0%   -8.28%  (p=0.000 n=20+19)
BM_PointerAnalysisLoopAndBranch    684µs ± 2%   613µs ± 2%  -10.38%  (p=0.000 n=20+19)
BM_PointerAnalysisTwoLoops         309µs ± 2%   308µs ± 2%     ~     (p=0.728 n=20+19)
BM_PointerAnalysisJoinFilePath    37.9ms ± 2%  37.9ms ± 2%   +0.06%  (p=0.041 n=20+19)
BM_PointerAnalysisCallInLoop      26.5ms ± 2%  26.4ms ± 4%   -0.59%  (p=0.024 n=20+20)
```

When running clang-tidy on real-world code, the results are less clear.
In
three runs, averaged, on an arbitrarily chosen input file, I get 11.91 s
of user
time without this patch and 11.81 s with it, though with considerable
measurement noise (I'm seeing up to 0.2 s of variation between runs).

Still, this is a very simple change, and it is a clear win in
benchmarks, so I
think it is worth making.
justinfargnoli pushed a commit to justinfargnoli/llvm-project that referenced this pull request Jan 28, 2024
justinfargnoli pushed a commit to justinfargnoli/llvm-project that referenced this pull request Jan 28, 2024
…wConditionAllows()`. (llvm#78172)

This saves having to assemble the set of constraints and run the SAT
solver in
the trivial case of `flowConditionImplies(true)` or
`flowConditionAllows(false)`.

This is an update / reland of my previous reverted
[llvm#77453](llvm#77453). That PR
contained a
logic bug -- the early-out for `flowConditionAllows()` was wrong because
my
intuition about the logic was wrong. (In particular, note that
`flowConditionImplies(F)` does not imply `flowConditionAllows(F)`, even
though
this may run counter to intuition.)

I've now done what I should have done on the first iteration and added
more
tests. These pass both with and without my early-outs.

This patch is a performance win on the benchmarks for the Crubit
nullability
checker, except for one slight regression on a relatively short
benchmark:

```
name                              old cpu/op   new cpu/op   delta
BM_PointerAnalysisCopyPointer     68.5µs ± 7%  67.6µs ± 4%    ~     (p=0.159 n=18+19)
BM_PointerAnalysisIntLoop          173µs ± 3%   162µs ± 4%  -6.40%  (p=0.000 n=19+20)
BM_PointerAnalysisPointerLoop      307µs ± 2%   312µs ± 4%  +1.56%  (p=0.013 n=18+20)
BM_PointerAnalysisBranch           199µs ± 4%   181µs ± 4%  -8.81%  (p=0.000 n=20+20)
BM_PointerAnalysisLoopAndBranch    503µs ± 3%   508µs ± 2%    ~     (p=0.081 n=18+19)
BM_PointerAnalysisTwoLoops         304µs ± 4%   286µs ± 2%  -6.04%  (p=0.000 n=19+20)
BM_PointerAnalysisJoinFilePath    4.78ms ± 3%  4.54ms ± 4%  -4.97%  (p=0.000 n=20+20)
BM_PointerAnalysisCallInLoop      3.05ms ± 3%  2.90ms ± 4%  -5.05%  (p=0.000 n=19+20)
```

When running clang-tidy on real-world code, the results are less clear.
In
three runs, averaged, on an arbitrarily chosen input file, I get 11.60 s
of user
time without this patch and 11.40 s with it, though with considerable
measurement noise (I'm seeing up to 0.2 s of variation between runs).

Still, this is a very simple change, and it is a clear win in
benchmarks, so I
think it is worth making.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:analysis clang:dataflow Clang Dataflow Analysis framework - https://clang.llvm.org/docs/DataFlowAnalysisIntro.html clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants