Skip to content

[clang][dataflow] Model conditional operator correctly. #89213

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Apr 22, 2024

Conversation

martinboehme
Copy link
Contributor

No description provided.

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:dataflow Clang Dataflow Analysis framework - https://clang.llvm.org/docs/DataFlowAnalysisIntro.html clang:analysis labels Apr 18, 2024
@martinboehme martinboehme requested review from ymand and Xazax-hun April 18, 2024 10:53
@llvmbot
Copy link
Member

llvmbot commented Apr 18, 2024

@llvm/pr-subscribers-clang

Author: None (martinboehme)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/89213.diff

7 Files Affected:

  • (modified) clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h (+15)
  • (modified) clang/include/clang/Analysis/FlowSensitive/Transfer.h (+2-1)
  • (modified) clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp (+24-22)
  • (modified) clang/lib/Analysis/FlowSensitive/Transfer.cpp (+23-15)
  • (modified) clang/lib/Analysis/FlowSensitive/TypeErasedDataflowAnalysis.cpp (+2-2)
  • (modified) clang/unittests/Analysis/FlowSensitive/TestingSupport.h (+2-2)
  • (modified) clang/unittests/Analysis/FlowSensitive/TransferTest.cpp (+62-4)
diff --git a/clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h b/clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h
index 4277792219c0af..7ad866ba3f62ff 100644
--- a/clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h
+++ b/clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h
@@ -244,6 +244,21 @@ class Environment {
                           Environment::ValueModel &Model,
                           ExprJoinBehavior ExprBehavior);
 
+  /// Returns a value that approximates both `Val1` and `Val2`, or null if no
+  /// such value can be produced.
+  ///
+  /// `Env1` and `Env2` can be used to query child values and path condition
+  /// implications of `Val1` and `Val2` respectively. The joined value will be
+  /// produced in `JoinedEnv`.
+  ///
+  /// Requirements:
+  ///
+  ///  `Val1` and `Val2` must model values of type `Type`.
+  static Value *joinValues(QualType Ty, Value *Val1, const Environment &Env1,
+                           Value *Val2, const Environment &Env2,
+                           Environment &JoinedEnv,
+                           Environment::ValueModel &Model);
+
   /// Widens the environment point-wise, using `PrevEnv` as needed to inform the
   /// approximation.
   ///
diff --git a/clang/include/clang/Analysis/FlowSensitive/Transfer.h b/clang/include/clang/Analysis/FlowSensitive/Transfer.h
index ed148250d8eb29..940025e02100f9 100644
--- a/clang/include/clang/Analysis/FlowSensitive/Transfer.h
+++ b/clang/include/clang/Analysis/FlowSensitive/Transfer.h
@@ -53,7 +53,8 @@ class StmtToEnvMap {
 /// Requirements:
 ///
 ///  `S` must not be `ParenExpr` or `ExprWithCleanups`.
-void transfer(const StmtToEnvMap &StmtToEnv, const Stmt &S, Environment &Env);
+void transfer(const StmtToEnvMap &StmtToEnv, const Stmt &S, Environment &Env,
+              Environment::ValueModel &Model);
 
 } // namespace dataflow
 } // namespace clang
diff --git a/clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp b/clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp
index 3f1600d9ac5d87..18fd6476dedf4b 100644
--- a/clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp
+++ b/clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp
@@ -255,13 +255,8 @@ joinLocToVal(const llvm::MapVector<const StorageLocation *, Value *> &LocToVal,
       continue;
     assert(It->second != nullptr);
 
-    if (areEquivalentValues(*Val, *It->second)) {
-      Result.insert({Loc, Val});
-      continue;
-    }
-
-    if (Value *JoinedVal = joinDistinctValues(
-            Loc->getType(), *Val, Env1, *It->second, Env2, JoinedEnv, Model)) {
+    if (Value *JoinedVal = Environment::joinValues(
+            Loc->getType(), Val, Env1, It->second, Env2, JoinedEnv, Model)) {
       Result.insert({Loc, JoinedVal});
     }
   }
@@ -788,27 +783,16 @@ Environment Environment::join(const Environment &EnvA, const Environment &EnvB,
   JoinedEnv.LocForRecordReturnVal = EnvA.LocForRecordReturnVal;
   JoinedEnv.ThisPointeeLoc = EnvA.ThisPointeeLoc;
 
-  if (EnvA.ReturnVal == nullptr || EnvB.ReturnVal == nullptr) {
-    // `ReturnVal` might not always get set -- for example if we have a return
-    // statement of the form `return some_other_func()` and we decide not to
-    // analyze `some_other_func()`.
-    // In this case, we can't say anything about the joined return value -- we
-    // don't simply want to propagate the return value that we do have, because
-    // it might not be the correct one.
-    // This occurs for example in the test `ContextSensitiveMutualRecursion`.
+  if (EnvA.CallStack.empty()) {
     JoinedEnv.ReturnVal = nullptr;
-  } else if (areEquivalentValues(*EnvA.ReturnVal, *EnvB.ReturnVal)) {
-    JoinedEnv.ReturnVal = EnvA.ReturnVal;
   } else {
-    assert(!EnvA.CallStack.empty());
     // FIXME: Make `CallStack` a vector of `FunctionDecl` so we don't need this
     // cast.
     auto *Func = dyn_cast<FunctionDecl>(EnvA.CallStack.back());
     assert(Func != nullptr);
-    if (Value *JoinedVal =
-            joinDistinctValues(Func->getReturnType(), *EnvA.ReturnVal, EnvA,
-                               *EnvB.ReturnVal, EnvB, JoinedEnv, Model))
-      JoinedEnv.ReturnVal = JoinedVal;
+    JoinedEnv.ReturnVal =
+        joinValues(Func->getReturnType(), EnvA.ReturnVal, EnvA, EnvB.ReturnVal,
+                   EnvB, JoinedEnv, Model);
   }
 
   if (EnvA.ReturnLoc == EnvB.ReturnLoc)
@@ -834,6 +818,24 @@ Environment Environment::join(const Environment &EnvA, const Environment &EnvB,
   return JoinedEnv;
 }
 
+Value *Environment::joinValues(QualType Ty, Value *Val1,
+                               const Environment &Env1, Value *Val2,
+                               const Environment &Env2, Environment &JoinedEnv,
+                               Environment::ValueModel &Model) {
+  if (Val1 == nullptr || Val2 == nullptr)
+    // We can't say anything about the joined value -- even if one of the values
+    // is non-null, we don't want to simply propagate it, because it would be
+    // too specific: Because the other value is null, that means we have no
+    // information at all about the value (i.e. the value is unconstrained).
+    return nullptr;
+
+  if (areEquivalentValues(*Val1, *Val2))
+    // Arbitrarily return one of the two values.
+    return Val1;
+
+  return joinDistinctValues(Ty, *Val1, Env1, *Val2, Env2, JoinedEnv, Model);
+}
+
 StorageLocation &Environment::createStorageLocation(QualType Type) {
   return DACtx->createStorageLocation(Type);
 }
diff --git a/clang/lib/Analysis/FlowSensitive/Transfer.cpp b/clang/lib/Analysis/FlowSensitive/Transfer.cpp
index 1e034771014eaa..37862e8695cccc 100644
--- a/clang/lib/Analysis/FlowSensitive/Transfer.cpp
+++ b/clang/lib/Analysis/FlowSensitive/Transfer.cpp
@@ -122,8 +122,9 @@ namespace {
 
 class TransferVisitor : public ConstStmtVisitor<TransferVisitor> {
 public:
-  TransferVisitor(const StmtToEnvMap &StmtToEnv, Environment &Env)
-      : StmtToEnv(StmtToEnv), Env(Env) {}
+  TransferVisitor(const StmtToEnvMap &StmtToEnv, Environment &Env,
+                  Environment::ValueModel &Model)
+      : StmtToEnv(StmtToEnv), Env(Env), Model(Model) {}
 
   void VisitBinaryOperator(const BinaryOperator *S) {
     const Expr *LHS = S->getLHS();
@@ -657,17 +658,22 @@ class TransferVisitor : public ConstStmtVisitor<TransferVisitor> {
   }
 
   void VisitConditionalOperator(const ConditionalOperator *S) {
-    // FIXME: Revisit this once flow conditions are added to the framework. For
-    // `a = b ? c : d` we can add `b => a == c && !b => a == d` to the flow
-    // condition.
-    // When we do this, we will need to retrieve the values of the operands from
-    // the environments for the basic blocks they are computed in, in a similar
-    // way to how this is done for short-circuited logical operators in
-    // `getLogicOperatorSubExprValue()`.
-    if (S->isGLValue())
-      Env.setStorageLocation(*S, Env.createObject(S->getType()));
-    else if (!S->getType()->isRecordType()) {
-      if (Value *Val = Env.createValue(S->getType()))
+    const Environment *TrueEnv = StmtToEnv.getEnvironment(*S->getTrueExpr());
+    const Environment *FalseEnv = StmtToEnv.getEnvironment(*S->getFalseExpr());
+
+    if (TrueEnv == nullptr || FalseEnv == nullptr)
+      return;
+
+    if (S->isGLValue()) {
+      StorageLocation *TrueLoc = TrueEnv->getStorageLocation(*S->getTrueExpr());
+      StorageLocation *FalseLoc =
+          FalseEnv->getStorageLocation(*S->getFalseExpr());
+      if (TrueLoc == FalseLoc && TrueLoc != nullptr)
+        Env.setStorageLocation(*S, *TrueLoc);
+    } else if (!S->getType()->isRecordType()) {
+      if (Value *Val = Environment::joinValues(
+              S->getType(), TrueEnv->getValue(*S->getTrueExpr()), *TrueEnv,
+              FalseEnv->getValue(*S->getFalseExpr()), *FalseEnv, Env, Model))
         Env.setValue(*S, *Val);
     }
   }
@@ -830,12 +836,14 @@ class TransferVisitor : public ConstStmtVisitor<TransferVisitor> {
 
   const StmtToEnvMap &StmtToEnv;
   Environment &Env;
+  Environment::ValueModel &Model;
 };
 
 } // namespace
 
-void transfer(const StmtToEnvMap &StmtToEnv, const Stmt &S, Environment &Env) {
-  TransferVisitor(StmtToEnv, Env).Visit(&S);
+void transfer(const StmtToEnvMap &StmtToEnv, const Stmt &S, Environment &Env,
+              Environment::ValueModel &Model) {
+  TransferVisitor(StmtToEnv, Env, Model).Visit(&S);
 }
 
 } // namespace dataflow
diff --git a/clang/lib/Analysis/FlowSensitive/TypeErasedDataflowAnalysis.cpp b/clang/lib/Analysis/FlowSensitive/TypeErasedDataflowAnalysis.cpp
index 1b73c5d6830161..5a21a85b90ee9b 100644
--- a/clang/lib/Analysis/FlowSensitive/TypeErasedDataflowAnalysis.cpp
+++ b/clang/lib/Analysis/FlowSensitive/TypeErasedDataflowAnalysis.cpp
@@ -316,7 +316,7 @@ builtinTransferStatement(unsigned CurBlockID, const CFGStmt &Elt,
   const Stmt *S = Elt.getStmt();
   assert(S != nullptr);
   transfer(StmtToEnvMap(AC.ACFG, AC.BlockStates, CurBlockID, InputState), *S,
-           InputState.Env);
+           InputState.Env, AC.Analysis);
 }
 
 /// Built-in transfer function for `CFGInitializer`.
@@ -452,7 +452,7 @@ transferCFGBlock(const CFGBlock &Block, AnalysisContext &AC,
       // terminator condition, but not as a `CFGElement`. The condition of an if
       // statement is one such example.
       transfer(StmtToEnvMap(AC.ACFG, AC.BlockStates, Block.getBlockID(), State),
-               *TerminatorCond, State.Env);
+               *TerminatorCond, State.Env, AC.Analysis);
 
     // If the transfer function didn't produce a value, create an atom so that
     // we have *some* value for the condition expression. This ensures that
diff --git a/clang/unittests/Analysis/FlowSensitive/TestingSupport.h b/clang/unittests/Analysis/FlowSensitive/TestingSupport.h
index e3c7ff685f5724..3b0e05ed72220e 100644
--- a/clang/unittests/Analysis/FlowSensitive/TestingSupport.h
+++ b/clang/unittests/Analysis/FlowSensitive/TestingSupport.h
@@ -456,7 +456,7 @@ const IndirectFieldDecl *findIndirectFieldDecl(ASTContext &ASTCtx,
 /// Requirements:
 ///
 ///   `Name` must be unique in `ASTCtx`.
-template <class LocT>
+template <class LocT = StorageLocation>
 LocT &getLocForDecl(ASTContext &ASTCtx, const Environment &Env,
                     llvm::StringRef Name) {
   const ValueDecl *VD = findValueDecl(ASTCtx, Name);
@@ -470,7 +470,7 @@ LocT &getLocForDecl(ASTContext &ASTCtx, const Environment &Env,
 /// Requirements:
 ///
 ///   `Name` must be unique in `ASTCtx`.
-template <class ValueT>
+template <class ValueT = Value>
 ValueT &getValueForDecl(ASTContext &ASTCtx, const Environment &Env,
                         llvm::StringRef Name) {
   const ValueDecl *VD = findValueDecl(ASTCtx, Name);
diff --git a/clang/unittests/Analysis/FlowSensitive/TransferTest.cpp b/clang/unittests/Analysis/FlowSensitive/TransferTest.cpp
index 97ec32126c1dc4..cff8d1e6324640 100644
--- a/clang/unittests/Analysis/FlowSensitive/TransferTest.cpp
+++ b/clang/unittests/Analysis/FlowSensitive/TransferTest.cpp
@@ -5243,6 +5243,67 @@ TEST(TransferTest, BinaryOperatorComma) {
       });
 }
 
+TEST(TransferTest, ConditionalOperatorValue) {
+  std::string Code = R"(
+    void target(bool Cond, bool B1, bool B2) {
+      bool JoinSame = Cond ? B1 : B1;
+      bool JoinDifferent = Cond ? B1 : B2;
+      // [[p]]
+    }
+  )";
+  runDataflow(
+      Code,
+      [](const llvm::StringMap<DataflowAnalysisState<NoopLattice>> &Results,
+         ASTContext &ASTCtx) {
+        Environment Env = getEnvironmentAtAnnotation(Results, "p").fork();
+
+        auto &B1 = getValueForDecl<BoolValue>(ASTCtx, Env, "B1");
+        auto &B2 = getValueForDecl<BoolValue>(ASTCtx, Env, "B2");
+        auto &JoinSame = getValueForDecl<BoolValue>(ASTCtx, Env, "JoinSame");
+        auto &JoinDifferent =
+            getValueForDecl<BoolValue>(ASTCtx, Env, "JoinDifferent");
+
+        EXPECT_EQ(&JoinSame, &B1);
+
+        const Formula &JoinDifferentEqB1 =
+            Env.arena().makeEquals(JoinDifferent.formula(), B1.formula());
+        EXPECT_TRUE(Env.allows(JoinDifferentEqB1));
+        EXPECT_FALSE(Env.proves(JoinDifferentEqB1));
+
+        const Formula &JoinDifferentEqB2 =
+            Env.arena().makeEquals(JoinDifferent.formula(), B2.formula());
+        EXPECT_TRUE(Env.allows(JoinDifferentEqB2));
+        EXPECT_FALSE(Env.proves(JoinDifferentEqB1));
+      });
+}
+
+TEST(TransferTest, ConditionalOperatorLocation) {
+  std::string Code = R"(
+    void target(bool Cond, int I1, int I2) {
+      int &JoinSame = Cond ? I1 : I1;
+      int &JoinDifferent = Cond ? I1 : I2;
+      // [[p]]
+    }
+  )";
+  runDataflow(
+      Code,
+      [](const llvm::StringMap<DataflowAnalysisState<NoopLattice>> &Results,
+         ASTContext &ASTCtx) {
+        Environment Env = getEnvironmentAtAnnotation(Results, "p").fork();
+
+        StorageLocation &I1 = getLocForDecl(ASTCtx, Env, "I1");
+        StorageLocation &I2 = getLocForDecl(ASTCtx, Env, "I2");
+        StorageLocation &JoinSame = getLocForDecl(ASTCtx, Env, "JoinSame");
+        StorageLocation &JoinDifferent =
+            getLocForDecl(ASTCtx, Env, "JoinDifferent");
+
+        EXPECT_EQ(&JoinSame, &I1);
+
+        EXPECT_NE(&JoinDifferent, &I1);
+        EXPECT_NE(&JoinDifferent, &I2);
+      });
+}
+
 TEST(TransferTest, IfStmtBranchExtendsFlowCondition) {
   std::string Code = R"(
     void target(bool Foo) {
@@ -5492,10 +5553,7 @@ TEST(TransferTest, ContextSensitiveReturnReferenceWithConditionalOperator) {
         ASSERT_THAT(Loc, NotNull());
         EXPECT_THAT(Env.getValue(*Loc), NotNull());
 
-        // TODO: We would really like to make this stronger assertion, but that
-        // doesn't work because we don't propagate values correctly through
-        // the conditional operator yet.
-        // ASSERT_THAT(Loc, Eq(SLoc));
+        EXPECT_THAT(Loc, Eq(SLoc));
       },
       {BuiltinOptions{ContextSensitiveOptions{}}});
 }

@llvmbot
Copy link
Member

llvmbot commented Apr 18, 2024

@llvm/pr-subscribers-clang-analysis

Author: None (martinboehme)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/89213.diff

7 Files Affected:

  • (modified) clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h (+15)
  • (modified) clang/include/clang/Analysis/FlowSensitive/Transfer.h (+2-1)
  • (modified) clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp (+24-22)
  • (modified) clang/lib/Analysis/FlowSensitive/Transfer.cpp (+23-15)
  • (modified) clang/lib/Analysis/FlowSensitive/TypeErasedDataflowAnalysis.cpp (+2-2)
  • (modified) clang/unittests/Analysis/FlowSensitive/TestingSupport.h (+2-2)
  • (modified) clang/unittests/Analysis/FlowSensitive/TransferTest.cpp (+62-4)
diff --git a/clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h b/clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h
index 4277792219c0af..7ad866ba3f62ff 100644
--- a/clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h
+++ b/clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h
@@ -244,6 +244,21 @@ class Environment {
                           Environment::ValueModel &Model,
                           ExprJoinBehavior ExprBehavior);
 
+  /// Returns a value that approximates both `Val1` and `Val2`, or null if no
+  /// such value can be produced.
+  ///
+  /// `Env1` and `Env2` can be used to query child values and path condition
+  /// implications of `Val1` and `Val2` respectively. The joined value will be
+  /// produced in `JoinedEnv`.
+  ///
+  /// Requirements:
+  ///
+  ///  `Val1` and `Val2` must model values of type `Type`.
+  static Value *joinValues(QualType Ty, Value *Val1, const Environment &Env1,
+                           Value *Val2, const Environment &Env2,
+                           Environment &JoinedEnv,
+                           Environment::ValueModel &Model);
+
   /// Widens the environment point-wise, using `PrevEnv` as needed to inform the
   /// approximation.
   ///
diff --git a/clang/include/clang/Analysis/FlowSensitive/Transfer.h b/clang/include/clang/Analysis/FlowSensitive/Transfer.h
index ed148250d8eb29..940025e02100f9 100644
--- a/clang/include/clang/Analysis/FlowSensitive/Transfer.h
+++ b/clang/include/clang/Analysis/FlowSensitive/Transfer.h
@@ -53,7 +53,8 @@ class StmtToEnvMap {
 /// Requirements:
 ///
 ///  `S` must not be `ParenExpr` or `ExprWithCleanups`.
-void transfer(const StmtToEnvMap &StmtToEnv, const Stmt &S, Environment &Env);
+void transfer(const StmtToEnvMap &StmtToEnv, const Stmt &S, Environment &Env,
+              Environment::ValueModel &Model);
 
 } // namespace dataflow
 } // namespace clang
diff --git a/clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp b/clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp
index 3f1600d9ac5d87..18fd6476dedf4b 100644
--- a/clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp
+++ b/clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp
@@ -255,13 +255,8 @@ joinLocToVal(const llvm::MapVector<const StorageLocation *, Value *> &LocToVal,
       continue;
     assert(It->second != nullptr);
 
-    if (areEquivalentValues(*Val, *It->second)) {
-      Result.insert({Loc, Val});
-      continue;
-    }
-
-    if (Value *JoinedVal = joinDistinctValues(
-            Loc->getType(), *Val, Env1, *It->second, Env2, JoinedEnv, Model)) {
+    if (Value *JoinedVal = Environment::joinValues(
+            Loc->getType(), Val, Env1, It->second, Env2, JoinedEnv, Model)) {
       Result.insert({Loc, JoinedVal});
     }
   }
@@ -788,27 +783,16 @@ Environment Environment::join(const Environment &EnvA, const Environment &EnvB,
   JoinedEnv.LocForRecordReturnVal = EnvA.LocForRecordReturnVal;
   JoinedEnv.ThisPointeeLoc = EnvA.ThisPointeeLoc;
 
-  if (EnvA.ReturnVal == nullptr || EnvB.ReturnVal == nullptr) {
-    // `ReturnVal` might not always get set -- for example if we have a return
-    // statement of the form `return some_other_func()` and we decide not to
-    // analyze `some_other_func()`.
-    // In this case, we can't say anything about the joined return value -- we
-    // don't simply want to propagate the return value that we do have, because
-    // it might not be the correct one.
-    // This occurs for example in the test `ContextSensitiveMutualRecursion`.
+  if (EnvA.CallStack.empty()) {
     JoinedEnv.ReturnVal = nullptr;
-  } else if (areEquivalentValues(*EnvA.ReturnVal, *EnvB.ReturnVal)) {
-    JoinedEnv.ReturnVal = EnvA.ReturnVal;
   } else {
-    assert(!EnvA.CallStack.empty());
     // FIXME: Make `CallStack` a vector of `FunctionDecl` so we don't need this
     // cast.
     auto *Func = dyn_cast<FunctionDecl>(EnvA.CallStack.back());
     assert(Func != nullptr);
-    if (Value *JoinedVal =
-            joinDistinctValues(Func->getReturnType(), *EnvA.ReturnVal, EnvA,
-                               *EnvB.ReturnVal, EnvB, JoinedEnv, Model))
-      JoinedEnv.ReturnVal = JoinedVal;
+    JoinedEnv.ReturnVal =
+        joinValues(Func->getReturnType(), EnvA.ReturnVal, EnvA, EnvB.ReturnVal,
+                   EnvB, JoinedEnv, Model);
   }
 
   if (EnvA.ReturnLoc == EnvB.ReturnLoc)
@@ -834,6 +818,24 @@ Environment Environment::join(const Environment &EnvA, const Environment &EnvB,
   return JoinedEnv;
 }
 
+Value *Environment::joinValues(QualType Ty, Value *Val1,
+                               const Environment &Env1, Value *Val2,
+                               const Environment &Env2, Environment &JoinedEnv,
+                               Environment::ValueModel &Model) {
+  if (Val1 == nullptr || Val2 == nullptr)
+    // We can't say anything about the joined value -- even if one of the values
+    // is non-null, we don't want to simply propagate it, because it would be
+    // too specific: Because the other value is null, that means we have no
+    // information at all about the value (i.e. the value is unconstrained).
+    return nullptr;
+
+  if (areEquivalentValues(*Val1, *Val2))
+    // Arbitrarily return one of the two values.
+    return Val1;
+
+  return joinDistinctValues(Ty, *Val1, Env1, *Val2, Env2, JoinedEnv, Model);
+}
+
 StorageLocation &Environment::createStorageLocation(QualType Type) {
   return DACtx->createStorageLocation(Type);
 }
diff --git a/clang/lib/Analysis/FlowSensitive/Transfer.cpp b/clang/lib/Analysis/FlowSensitive/Transfer.cpp
index 1e034771014eaa..37862e8695cccc 100644
--- a/clang/lib/Analysis/FlowSensitive/Transfer.cpp
+++ b/clang/lib/Analysis/FlowSensitive/Transfer.cpp
@@ -122,8 +122,9 @@ namespace {
 
 class TransferVisitor : public ConstStmtVisitor<TransferVisitor> {
 public:
-  TransferVisitor(const StmtToEnvMap &StmtToEnv, Environment &Env)
-      : StmtToEnv(StmtToEnv), Env(Env) {}
+  TransferVisitor(const StmtToEnvMap &StmtToEnv, Environment &Env,
+                  Environment::ValueModel &Model)
+      : StmtToEnv(StmtToEnv), Env(Env), Model(Model) {}
 
   void VisitBinaryOperator(const BinaryOperator *S) {
     const Expr *LHS = S->getLHS();
@@ -657,17 +658,22 @@ class TransferVisitor : public ConstStmtVisitor<TransferVisitor> {
   }
 
   void VisitConditionalOperator(const ConditionalOperator *S) {
-    // FIXME: Revisit this once flow conditions are added to the framework. For
-    // `a = b ? c : d` we can add `b => a == c && !b => a == d` to the flow
-    // condition.
-    // When we do this, we will need to retrieve the values of the operands from
-    // the environments for the basic blocks they are computed in, in a similar
-    // way to how this is done for short-circuited logical operators in
-    // `getLogicOperatorSubExprValue()`.
-    if (S->isGLValue())
-      Env.setStorageLocation(*S, Env.createObject(S->getType()));
-    else if (!S->getType()->isRecordType()) {
-      if (Value *Val = Env.createValue(S->getType()))
+    const Environment *TrueEnv = StmtToEnv.getEnvironment(*S->getTrueExpr());
+    const Environment *FalseEnv = StmtToEnv.getEnvironment(*S->getFalseExpr());
+
+    if (TrueEnv == nullptr || FalseEnv == nullptr)
+      return;
+
+    if (S->isGLValue()) {
+      StorageLocation *TrueLoc = TrueEnv->getStorageLocation(*S->getTrueExpr());
+      StorageLocation *FalseLoc =
+          FalseEnv->getStorageLocation(*S->getFalseExpr());
+      if (TrueLoc == FalseLoc && TrueLoc != nullptr)
+        Env.setStorageLocation(*S, *TrueLoc);
+    } else if (!S->getType()->isRecordType()) {
+      if (Value *Val = Environment::joinValues(
+              S->getType(), TrueEnv->getValue(*S->getTrueExpr()), *TrueEnv,
+              FalseEnv->getValue(*S->getFalseExpr()), *FalseEnv, Env, Model))
         Env.setValue(*S, *Val);
     }
   }
@@ -830,12 +836,14 @@ class TransferVisitor : public ConstStmtVisitor<TransferVisitor> {
 
   const StmtToEnvMap &StmtToEnv;
   Environment &Env;
+  Environment::ValueModel &Model;
 };
 
 } // namespace
 
-void transfer(const StmtToEnvMap &StmtToEnv, const Stmt &S, Environment &Env) {
-  TransferVisitor(StmtToEnv, Env).Visit(&S);
+void transfer(const StmtToEnvMap &StmtToEnv, const Stmt &S, Environment &Env,
+              Environment::ValueModel &Model) {
+  TransferVisitor(StmtToEnv, Env, Model).Visit(&S);
 }
 
 } // namespace dataflow
diff --git a/clang/lib/Analysis/FlowSensitive/TypeErasedDataflowAnalysis.cpp b/clang/lib/Analysis/FlowSensitive/TypeErasedDataflowAnalysis.cpp
index 1b73c5d6830161..5a21a85b90ee9b 100644
--- a/clang/lib/Analysis/FlowSensitive/TypeErasedDataflowAnalysis.cpp
+++ b/clang/lib/Analysis/FlowSensitive/TypeErasedDataflowAnalysis.cpp
@@ -316,7 +316,7 @@ builtinTransferStatement(unsigned CurBlockID, const CFGStmt &Elt,
   const Stmt *S = Elt.getStmt();
   assert(S != nullptr);
   transfer(StmtToEnvMap(AC.ACFG, AC.BlockStates, CurBlockID, InputState), *S,
-           InputState.Env);
+           InputState.Env, AC.Analysis);
 }
 
 /// Built-in transfer function for `CFGInitializer`.
@@ -452,7 +452,7 @@ transferCFGBlock(const CFGBlock &Block, AnalysisContext &AC,
       // terminator condition, but not as a `CFGElement`. The condition of an if
       // statement is one such example.
       transfer(StmtToEnvMap(AC.ACFG, AC.BlockStates, Block.getBlockID(), State),
-               *TerminatorCond, State.Env);
+               *TerminatorCond, State.Env, AC.Analysis);
 
     // If the transfer function didn't produce a value, create an atom so that
     // we have *some* value for the condition expression. This ensures that
diff --git a/clang/unittests/Analysis/FlowSensitive/TestingSupport.h b/clang/unittests/Analysis/FlowSensitive/TestingSupport.h
index e3c7ff685f5724..3b0e05ed72220e 100644
--- a/clang/unittests/Analysis/FlowSensitive/TestingSupport.h
+++ b/clang/unittests/Analysis/FlowSensitive/TestingSupport.h
@@ -456,7 +456,7 @@ const IndirectFieldDecl *findIndirectFieldDecl(ASTContext &ASTCtx,
 /// Requirements:
 ///
 ///   `Name` must be unique in `ASTCtx`.
-template <class LocT>
+template <class LocT = StorageLocation>
 LocT &getLocForDecl(ASTContext &ASTCtx, const Environment &Env,
                     llvm::StringRef Name) {
   const ValueDecl *VD = findValueDecl(ASTCtx, Name);
@@ -470,7 +470,7 @@ LocT &getLocForDecl(ASTContext &ASTCtx, const Environment &Env,
 /// Requirements:
 ///
 ///   `Name` must be unique in `ASTCtx`.
-template <class ValueT>
+template <class ValueT = Value>
 ValueT &getValueForDecl(ASTContext &ASTCtx, const Environment &Env,
                         llvm::StringRef Name) {
   const ValueDecl *VD = findValueDecl(ASTCtx, Name);
diff --git a/clang/unittests/Analysis/FlowSensitive/TransferTest.cpp b/clang/unittests/Analysis/FlowSensitive/TransferTest.cpp
index 97ec32126c1dc4..cff8d1e6324640 100644
--- a/clang/unittests/Analysis/FlowSensitive/TransferTest.cpp
+++ b/clang/unittests/Analysis/FlowSensitive/TransferTest.cpp
@@ -5243,6 +5243,67 @@ TEST(TransferTest, BinaryOperatorComma) {
       });
 }
 
+TEST(TransferTest, ConditionalOperatorValue) {
+  std::string Code = R"(
+    void target(bool Cond, bool B1, bool B2) {
+      bool JoinSame = Cond ? B1 : B1;
+      bool JoinDifferent = Cond ? B1 : B2;
+      // [[p]]
+    }
+  )";
+  runDataflow(
+      Code,
+      [](const llvm::StringMap<DataflowAnalysisState<NoopLattice>> &Results,
+         ASTContext &ASTCtx) {
+        Environment Env = getEnvironmentAtAnnotation(Results, "p").fork();
+
+        auto &B1 = getValueForDecl<BoolValue>(ASTCtx, Env, "B1");
+        auto &B2 = getValueForDecl<BoolValue>(ASTCtx, Env, "B2");
+        auto &JoinSame = getValueForDecl<BoolValue>(ASTCtx, Env, "JoinSame");
+        auto &JoinDifferent =
+            getValueForDecl<BoolValue>(ASTCtx, Env, "JoinDifferent");
+
+        EXPECT_EQ(&JoinSame, &B1);
+
+        const Formula &JoinDifferentEqB1 =
+            Env.arena().makeEquals(JoinDifferent.formula(), B1.formula());
+        EXPECT_TRUE(Env.allows(JoinDifferentEqB1));
+        EXPECT_FALSE(Env.proves(JoinDifferentEqB1));
+
+        const Formula &JoinDifferentEqB2 =
+            Env.arena().makeEquals(JoinDifferent.formula(), B2.formula());
+        EXPECT_TRUE(Env.allows(JoinDifferentEqB2));
+        EXPECT_FALSE(Env.proves(JoinDifferentEqB1));
+      });
+}
+
+TEST(TransferTest, ConditionalOperatorLocation) {
+  std::string Code = R"(
+    void target(bool Cond, int I1, int I2) {
+      int &JoinSame = Cond ? I1 : I1;
+      int &JoinDifferent = Cond ? I1 : I2;
+      // [[p]]
+    }
+  )";
+  runDataflow(
+      Code,
+      [](const llvm::StringMap<DataflowAnalysisState<NoopLattice>> &Results,
+         ASTContext &ASTCtx) {
+        Environment Env = getEnvironmentAtAnnotation(Results, "p").fork();
+
+        StorageLocation &I1 = getLocForDecl(ASTCtx, Env, "I1");
+        StorageLocation &I2 = getLocForDecl(ASTCtx, Env, "I2");
+        StorageLocation &JoinSame = getLocForDecl(ASTCtx, Env, "JoinSame");
+        StorageLocation &JoinDifferent =
+            getLocForDecl(ASTCtx, Env, "JoinDifferent");
+
+        EXPECT_EQ(&JoinSame, &I1);
+
+        EXPECT_NE(&JoinDifferent, &I1);
+        EXPECT_NE(&JoinDifferent, &I2);
+      });
+}
+
 TEST(TransferTest, IfStmtBranchExtendsFlowCondition) {
   std::string Code = R"(
     void target(bool Foo) {
@@ -5492,10 +5553,7 @@ TEST(TransferTest, ContextSensitiveReturnReferenceWithConditionalOperator) {
         ASSERT_THAT(Loc, NotNull());
         EXPECT_THAT(Env.getValue(*Loc), NotNull());
 
-        // TODO: We would really like to make this stronger assertion, but that
-        // doesn't work because we don't propagate values correctly through
-        // the conditional operator yet.
-        // ASSERT_THAT(Loc, Eq(SLoc));
+        EXPECT_THAT(Loc, Eq(SLoc));
       },
       {BuiltinOptions{ContextSensitiveOptions{}}});
 }

if (TrueLoc == FalseLoc && TrueLoc != nullptr)
Env.setStorageLocation(*S, *TrueLoc);
} else if (!S->getType()->isRecordType()) {
if (Value *Val = Environment::joinValues(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it might be worth commenting why we need a join here. IIUC, we lack any concept of a "result expression" of a basic block. So, when the environments were joined in the normal basic-block processing algorithm, these two expressions would have been simply dropped (existing at respectively different locations in LocToVal). Hence, we need to explicitly extract them and join at this point.

Copy link
Contributor Author

@martinboehme martinboehme Apr 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it might be worth commenting why we need a join here. IIUC, we lack any concept of a "result expression" of a basic block.

Just to make sure I understand: The idea of such a concept would be that each of the branches of the conditional operator sets its result expression, and then when we perform the join for those two branches, we join the values of the result expressions?

This would work for the case of the conditional operator -- but it doesn't work for && and ||, where we also have two basic blocks with "result expressions", but at the place where the branches join, we don't want to join the result expressions but rather combine them using logical "and" or "or".

So I'm not sure a concept like this would add much of value.

So, when the environments were joined in the normal basic-block processing algorithm, these two expressions would have been simply dropped (existing at respectively different locations in LocToVal).

(I think you mean ExprToVal?)

Yes, in the joined environment these values don't exist -- so we have to extract them from the environments for the two branches (which is exactly what we do for && and || too). We then combine the values using a join because that's the operation that happens to be appropriate for the conditional operator (whereas for the logical operators we combine them using a logical operation).

I've added a short comment that hopefully makes this a bit clearer -- WDYT?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I also think join feels a bit out of place here. I wonder if this would belong to computeBlockInputState.

Copy link
Contributor Author

@martinboehme martinboehme Apr 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand your reaction; generally, we want to perform joins in computeBlockInputState(). But the conditional operator is a special case.

Consider: In computeBlockInputState() (which calls through to Environment::join()), we join values that are associated with the same expression or the same storage location, and then we associate the joined value with that same expression or storage location in the joined environment.

For the conditional operator, we want to join values that are associated with different expressions (the two branches of the conditional operator), and then we associate the joined value with a third expression (the conditional operator itself). This join is what it means to perform transfer on the conditional operator.

Here's a simple example (godbolt) that hopefully clarifies this:

int f(bool b, int i, int j) {
  return b ? i : j;
}

Here's the CFG:

 [B5 (ENTRY)]
   Succs (1): B4

 [B1]
   1: [B4.2] ? [B2.1] : [B3.1]
   2: [B1.1] (ImplicitCastExpr, LValueToRValue, int)
   3: return [B1.2];
   Preds (2): B2 B3
   Succs (1): B0

 [B2]
   1: i
   Preds (1): B4
   Succs (1): B1

 [B3]
   1: j
   Preds (1): B4
   Succs (1): B1

 [B4]
   1: b
   2: [B4.1] (ImplicitCastExpr, LValueToRValue, _Bool)
   T: [B4.2] ? ... : ...
   Preds (1): B5
   Succs (2): B2 B3

 [B0 (EXIT)]
   Preds (1): B1

The expressions whose values we are joining are i ([B2.1]) and j ([B3.1]). The joined value is associated with the conditional operator ([B1.1]).

What would it look like if we wanted to do this join within computeBlockInputState()?

  • We would have to put code that is specific to ConditionalOperator in computeBlockInputState(). This would be incongruous, as computeBlockInputState() is otherwise completely general -- it doesn't contain any code that's specific to a particular statement kind.
  • We would be associating the joined value with the expression [B1.1] in the input state of [B1], i.e. before we have started performing transfer on [B1]. This seems wrong: [B1.1] is an expression in [B1], and we should set its value when we transfer [B1], not before. (Put differently: If we put the logic for this in computeBlockInputState(), what would there be left for TransferVisitor::VisitConditionalOperator() to do?)

I hope this makes sense. If not, maybe it would be easiest to do a quick VC to discuss?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation, now this makes perfect sense. I think this is a sign that our CFGs are not the best abstraction for dataflow analysis in their current form, and we might be able to come up with a better representation for ternaries that would not have this problem (but that representation would probably be a departure from the AST).

That being said, I think this is not too bad to explain. I am happy with the PR now. I wonder if it is worth to extend the comment why the join here is inherently part of the transfer in case it is possible to do it in a couple of concise sentences. If that is not the case, I am ok with merging as is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That being said, I think this is not too bad to explain. I am happy with the PR now. I wonder if it is worth to extend the comment why the join here is inherently part of the transfer in case it is possible to do it in a couple of concise sentences. If that is not the case, I am ok with merging as is.

I've added my attempt at this, though it ended up being more than two sentences. I'll merge this as-is, as it's just a comment change, but If you think this explanation is too much, happy to remove it in a followup PR.

const Environment *TrueEnv = StmtToEnv.getEnvironment(*S->getTrueExpr());
const Environment *FalseEnv = StmtToEnv.getEnvironment(*S->getFalseExpr());

if (TrueEnv == nullptr || FalseEnv == nullptr)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When could this happen? Wouldn't our traversal order guarantee that we always visit predecessors first, so always have an environment available?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. This is just a defence against crashes in production if we unexpectedly don't find an environment; but obviously this means the bailout should be paired with an assertion, which I have now added.

if (TrueLoc == FalseLoc && TrueLoc != nullptr)
Env.setStorageLocation(*S, *TrueLoc);
} else if (!S->getType()->isRecordType()) {
if (Value *Val = Environment::joinValues(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I also think join feels a bit out of place here. I wonder if this would belong to computeBlockInputState.

if (TrueLoc == FalseLoc && TrueLoc != nullptr)
Env.setStorageLocation(*S, *TrueLoc);
} else if (!S->getType()->isRecordType()) {
if (Value *Val = Environment::joinValues(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation, now this makes perfect sense. I think this is a sign that our CFGs are not the best abstraction for dataflow analysis in their current form, and we might be able to come up with a better representation for ternaries that would not have this problem (but that representation would probably be a departure from the AST).

That being said, I think this is not too bad to explain. I am happy with the PR now. I wonder if it is worth to extend the comment why the join here is inherently part of the transfer in case it is possible to do it in a couple of concise sentences. If that is not the case, I am ok with merging as is.

@martinboehme martinboehme force-pushed the piper_export_cl_625978585 branch from f26df9a to 1a6a463 Compare April 22, 2024 07:10
@martinboehme martinboehme merged commit abb958f into llvm:main Apr 22, 2024
@martinboehme
Copy link
Contributor Author

This is causing buildbot failures. Reverting.

@martinboehme
Copy link
Contributor Author

Needed to revert this because it turns out that, embarrassingly, I was locally running my tests in Release mode, i.e. with assert() disabled. I'm working on a PR to re-land this that fixes the assertion failures that were occurring.

martinboehme added a commit that referenced this pull request Apr 23, 2024
…h fixes (#89596)

I reverted #89213 beause it was
causing buildbots to fail with assertion failures.

Embarrassingly, it turns out I had been running tests locally in
`Release` mode, i.e. with `assert()` compiled away.

This PR re-lands #89213 with fixes for the failing assertions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:analysis clang:dataflow Clang Dataflow Analysis framework - https://clang.llvm.org/docs/DataFlowAnalysisIntro.html clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants