Skip to content

[X86,SimplifyCFG] Allow more PHIs when sinking common code on target supports CMOV #110420

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

phoebewang
Copy link
Contributor

No description provided.

@llvmbot
Copy link
Member

llvmbot commented Sep 29, 2024

@llvm/pr-subscribers-llvm-transforms

Author: Phoebe Wang (phoebewang)

Changes

Patch is 24.10 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/110420.diff

8 Files Affected:

  • (modified) llvm/include/llvm/Analysis/TargetTransformInfo.h (+15-4)
  • (modified) llvm/include/llvm/Analysis/TargetTransformInfoImpl.h (+4-1)
  • (modified) llvm/lib/Analysis/TargetTransformInfo.cpp (+3-2)
  • (modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp (+4-2)
  • (modified) llvm/lib/Target/X86/X86TargetTransformInfo.cpp (+6-3)
  • (modified) llvm/lib/Target/X86/X86TargetTransformInfo.h (+4-1)
  • (modified) llvm/lib/Transforms/Utils/SimplifyCFG.cpp (+16-6)
  • (modified) llvm/test/Transforms/SimplifyCFG/X86/sink-common-code.ll (+56-80)
diff --git a/llvm/include/llvm/Analysis/TargetTransformInfo.h b/llvm/include/llvm/Analysis/TargetTransformInfo.h
index 89a85bc8a90864..787b471cc328ff 100644
--- a/llvm/include/llvm/Analysis/TargetTransformInfo.h
+++ b/llvm/include/llvm/Analysis/TargetTransformInfo.h
@@ -1114,9 +1114,17 @@ class TargetTransformInfo {
   /// \return the number of registers in the target-provided register class.
   unsigned getNumberOfRegisters(unsigned ClassID) const;
 
+  enum class MoveType {
+    NoMem = 1,
+    MemLD = 2,
+    MemST = 4,
+    All = 7,
+  };
+
   /// \return true if the target supports load/store that enables fault
   /// suppression of memory operands when the source condition is false.
-  bool hasConditionalLoadStoreForType(Type *Ty = nullptr) const;
+  bool hasConditionalMoveForType(Type *Ty = nullptr,
+                                 MoveType MT = MoveType::NoMem) const;
 
   /// \return the target-provided register class ID for the provided type,
   /// accounting for type promotion and other type-legalization techniques that
@@ -1978,7 +1986,9 @@ class TargetTransformInfo::Concept {
   virtual bool preferToKeepConstantsAttached(const Instruction &Inst,
                                              const Function &Fn) const = 0;
   virtual unsigned getNumberOfRegisters(unsigned ClassID) const = 0;
-  virtual bool hasConditionalLoadStoreForType(Type *Ty = nullptr) const = 0;
+  virtual bool
+  hasConditionalMoveForType(Type *Ty = nullptr,
+                            MoveType MT = MoveType::NoMem) const = 0;
   virtual unsigned getRegisterClassForType(bool Vector,
                                            Type *Ty = nullptr) const = 0;
   virtual const char *getRegisterClassName(unsigned ClassID) const = 0;
@@ -2571,8 +2581,9 @@ class TargetTransformInfo::Model final : public TargetTransformInfo::Concept {
   unsigned getNumberOfRegisters(unsigned ClassID) const override {
     return Impl.getNumberOfRegisters(ClassID);
   }
-  bool hasConditionalLoadStoreForType(Type *Ty = nullptr) const override {
-    return Impl.hasConditionalLoadStoreForType(Ty);
+  bool hasConditionalMoveForType(Type *Ty = nullptr,
+                                 MoveType MT = MoveType::NoMem) const override {
+    return Impl.hasConditionalMoveForType(Ty, MT);
   }
   unsigned getRegisterClassForType(bool Vector,
                                    Type *Ty = nullptr) const override {
diff --git a/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h b/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
index eca8818cc25e62..cf1449b3a7e141 100644
--- a/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
+++ b/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
@@ -470,7 +470,10 @@ class TargetTransformInfoImplBase {
   }
 
   unsigned getNumberOfRegisters(unsigned ClassID) const { return 8; }
-  bool hasConditionalLoadStoreForType(Type *Ty) const { return false; }
+  bool hasConditionalMoveForType(Type *Ty,
+                                 TargetTransformInfo::MoveType MT) const {
+    return false;
+  }
 
   unsigned getRegisterClassForType(bool Vector, Type *Ty = nullptr) const {
     return Vector ? 1 : 0;
diff --git a/llvm/lib/Analysis/TargetTransformInfo.cpp b/llvm/lib/Analysis/TargetTransformInfo.cpp
index b5195f764cbd1c..231731b2e7446c 100644
--- a/llvm/lib/Analysis/TargetTransformInfo.cpp
+++ b/llvm/lib/Analysis/TargetTransformInfo.cpp
@@ -727,8 +727,9 @@ unsigned TargetTransformInfo::getNumberOfRegisters(unsigned ClassID) const {
   return TTIImpl->getNumberOfRegisters(ClassID);
 }
 
-bool TargetTransformInfo::hasConditionalLoadStoreForType(Type *Ty) const {
-  return TTIImpl->hasConditionalLoadStoreForType(Ty);
+bool TargetTransformInfo::hasConditionalMoveForType(Type *Ty,
+                                                    MoveType MT) const {
+  return TTIImpl->hasConditionalMoveForType(Ty, MT);
 }
 
 unsigned TargetTransformInfo::getRegisterClassForType(bool Vector,
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
index 25213f587116d5..5c35142c951d68 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
@@ -4806,7 +4806,8 @@ void SelectionDAGBuilder::visitMaskedStore(const CallInst &I,
       TLI.getTargetMachine().getTargetTransformInfo(*I.getFunction());
   SDValue StoreNode =
       !IsCompressing &&
-              TTI.hasConditionalLoadStoreForType(I.getArgOperand(0)->getType())
+              TTI.hasConditionalMoveForType(I.getArgOperand(0)->getType(),
+                                            TargetTransformInfo::MoveType::All)
           ? TLI.visitMaskedStore(DAG, sdl, getMemoryRoot(), MMO, Ptr, Src0,
                                  Mask)
           : DAG.getMaskedStore(getMemoryRoot(), sdl, Src0, Ptr, Offset, Mask,
@@ -4992,7 +4993,8 @@ void SelectionDAGBuilder::visitMaskedLoad(const CallInst &I, bool IsExpanding) {
   SDValue Load;
   SDValue Res;
   if (!IsExpanding &&
-      TTI.hasConditionalLoadStoreForType(Src0Operand->getType()))
+      TTI.hasConditionalMoveForType(Src0Operand->getType(),
+                                    TargetTransformInfo::MoveType::All))
     Res = TLI.visitMaskedLoad(DAG, sdl, InChain, MMO, Load, Ptr, Src0, Mask);
   else
     Res = Load =
diff --git a/llvm/lib/Target/X86/X86TargetTransformInfo.cpp b/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
index 46bc73c5e928e0..6c213bd8eb6968 100644
--- a/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
+++ b/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
@@ -176,8 +176,11 @@ unsigned X86TTIImpl::getNumberOfRegisters(unsigned ClassID) const {
   return 8;
 }
 
-bool X86TTIImpl::hasConditionalLoadStoreForType(Type *Ty) const {
-  if (!ST->hasCF())
+bool X86TTIImpl::hasConditionalMoveForType(
+    Type *Ty, TargetTransformInfo::MoveType MT) const {
+  if (!ST->canUseCMOV())
+    return false;
+  if (MT != TargetTransformInfo::MoveType::NoMem && !ST->hasCF())
     return false;
   if (!Ty)
     return true;
@@ -6064,7 +6067,7 @@ bool X86TTIImpl::isLegalMaskedLoad(Type *DataTy, Align Alignment) {
 
   // The backend can't handle a single element vector w/o CFCMOV.
   if (isa<VectorType>(DataTy) && cast<FixedVectorType>(DataTy)->getNumElements() == 1)
-    return ST->hasCF() && hasConditionalLoadStoreForType(ScalarTy);
+    return ST->hasCF() && hasConditionalMoveForType(ScalarTy);
 
   if (!ST->hasAVX())
     return false;
diff --git a/llvm/lib/Target/X86/X86TargetTransformInfo.h b/llvm/lib/Target/X86/X86TargetTransformInfo.h
index c16461b157e07f..4515f29bd2a6e5 100644
--- a/llvm/lib/Target/X86/X86TargetTransformInfo.h
+++ b/llvm/lib/Target/X86/X86TargetTransformInfo.h
@@ -132,7 +132,10 @@ class X86TTIImpl : public BasicTTIImplBase<X86TTIImpl> {
   /// @{
 
   unsigned getNumberOfRegisters(unsigned ClassID) const;
-  bool hasConditionalLoadStoreForType(Type *Ty = nullptr) const;
+  bool
+  hasConditionalMoveForType(Type *Ty = nullptr,
+                            TargetTransformInfo::MoveType MT =
+                                TargetTransformInfo::MoveType::NoMem) const;
   TypeSize getRegisterBitWidth(TargetTransformInfo::RegisterKind K) const;
   unsigned getLoadStoreVecRegBitWidth(unsigned AS) const;
   unsigned getMaxInterleaveFactor(ElementCount VF);
diff --git a/llvm/lib/Transforms/Utils/SimplifyCFG.cpp b/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
index 1f2c9389c008bd..800ea07c619298 100644
--- a/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
+++ b/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
@@ -129,6 +129,10 @@ static cl::opt<unsigned> HoistLoadsStoresWithCondFaultingThreshold(
              "to speculatively execute to eliminate conditional branch "
              "(default = 6)"));
 
+static cl::opt<unsigned> ProfitableToSinkInstructionThreshold(
+    "profitable-to-sink-instruction-threshold", cl::Hidden, cl::init(6),
+    cl::desc("Control the maximal PHI instructions"));
+
 static cl::opt<unsigned>
     HoistCommonSkipLimit("simplifycfg-hoist-common-skip-limit", cl::Hidden,
                          cl::init(20),
@@ -1742,7 +1746,8 @@ static bool isSafeCheapLoadStore(const Instruction *I,
   // llvm.masked.load/store use i32 for alignment while load/store use i64.
   // That's why we have the alignment limitation.
   // FIXME: Update the prototype of the intrinsics?
-  return TTI.hasConditionalLoadStoreForType(getLoadStoreType(I)) &&
+  return TTI.hasConditionalMoveForType(getLoadStoreType(I),
+                                       TargetTransformInfo::MoveType::All) &&
          getLoadStoreAlignment(I) < Value::MaximumAlignment;
 }
 
@@ -2386,8 +2391,8 @@ namespace {
 
 /// Check whether BB's predecessors end with unconditional branches. If it is
 /// true, sink any common code from the predecessors to BB.
-static bool sinkCommonCodeFromPredecessors(BasicBlock *BB,
-                                           DomTreeUpdater *DTU) {
+static bool sinkCommonCodeFromPredecessors(BasicBlock *BB, DomTreeUpdater *DTU,
+                                           const TargetTransformInfo &TTI) {
   // We support two situations:
   //   (1) all incoming arcs are unconditional
   //   (2) there are non-unconditional incoming arcs
@@ -2492,12 +2497,16 @@ static bool sinkCommonCodeFromPredecessors(BasicBlock *BB,
     // sink?
     auto ProfitableToSinkInstruction = [&](LockstepReverseIterator &LRI) {
       unsigned NumPHIInsts = 0;
+      unsigned NumCmovInsts = 0;
       for (Use &U : (*LRI)[0]->operands()) {
         auto It = PHIOperands.find(&U);
         if (It != PHIOperands.end() && !all_of(It->second, [&](Value *V) {
               return InstructionsToSink.contains(V);
             })) {
-          ++NumPHIInsts;
+          if (TTI.hasConditionalMoveForType(U->getType()))
+            ++NumCmovInsts;
+          else
+            ++NumPHIInsts;
           // Do not separate a load/store from the gep producing the address.
           // The gep can likely be folded into the load/store as an addressing
           // mode. Additionally, a load of a gep is easier to analyze than a
@@ -2511,7 +2520,8 @@ static bool sinkCommonCodeFromPredecessors(BasicBlock *BB,
         }
       }
       LLVM_DEBUG(dbgs() << "SINK: #phi insts: " << NumPHIInsts << "\n");
-      return NumPHIInsts <= 1;
+      return NumPHIInsts <= 1 &&
+             NumCmovInsts < ProfitableToSinkInstructionThreshold;
     };
 
     // We've determined that we are going to sink last ScanIdx instructions,
@@ -8119,7 +8129,7 @@ bool SimplifyCFGOpt::simplifyOnce(BasicBlock *BB) {
     return true;
 
   if (SinkCommon && Options.SinkCommonInsts)
-    if (sinkCommonCodeFromPredecessors(BB, DTU) ||
+    if (sinkCommonCodeFromPredecessors(BB, DTU, TTI) ||
         mergeCompatibleInvokes(BB, DTU)) {
       // sinkCommonCodeFromPredecessors() does not automatically CSE PHI's,
       // so we may now how duplicate PHI's.
diff --git a/llvm/test/Transforms/SimplifyCFG/X86/sink-common-code.ll b/llvm/test/Transforms/SimplifyCFG/X86/sink-common-code.ll
index 170f8d1592c2fe..912ba0d8c8f3a3 100644
--- a/llvm/test/Transforms/SimplifyCFG/X86/sink-common-code.ll
+++ b/llvm/test/Transforms/SimplifyCFG/X86/sink-common-code.ll
@@ -220,17 +220,15 @@ define i32 @test8(i1 zeroext %flag, i32 %x, ptr %y) {
 ; CHECK-LABEL: @test8(
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    [[Z:%.*]] = load volatile i32, ptr [[Y:%.*]], align 4
-; CHECK-NEXT:    br i1 [[FLAG:%.*]], label [[IF_THEN:%.*]], label [[IF_ELSE:%.*]]
-; CHECK:       if.then:
-; CHECK-NEXT:    [[A:%.*]] = add i32 [[Z]], 5
-; CHECK-NEXT:    br label [[IF_END:%.*]]
+; CHECK-NEXT:    br i1 [[FLAG:%.*]], label [[IF_END:%.*]], label [[IF_ELSE:%.*]]
 ; CHECK:       if.else:
 ; CHECK-NEXT:    [[W:%.*]] = load volatile i32, ptr [[Y]], align 4
-; CHECK-NEXT:    [[B:%.*]] = add i32 [[W]], 7
 ; CHECK-NEXT:    br label [[IF_END]]
 ; CHECK:       if.end:
-; CHECK-NEXT:    [[B_SINK:%.*]] = phi i32 [ [[B]], [[IF_ELSE]] ], [ [[A]], [[IF_THEN]] ]
-; CHECK-NEXT:    store volatile i32 [[B_SINK]], ptr [[Y]], align 4
+; CHECK-NEXT:    [[DOTSINK:%.*]] = phi i32 [ 7, [[IF_ELSE]] ], [ 5, [[ENTRY:%.*]] ]
+; CHECK-NEXT:    [[W_SINK:%.*]] = phi i32 [ [[W]], [[IF_ELSE]] ], [ [[Z]], [[ENTRY]] ]
+; CHECK-NEXT:    [[B:%.*]] = add i32 [[W_SINK]], [[DOTSINK]]
+; CHECK-NEXT:    store volatile i32 [[B]], ptr [[Y]], align 4
 ; CHECK-NEXT:    ret i32 1
 ;
 entry:
@@ -262,15 +260,15 @@ define i32 @test9(i1 zeroext %flag, i32 %x, ptr %y, ptr %p) {
 ; CHECK-NEXT:    store i32 7, ptr [[P:%.*]], align 4
 ; CHECK-NEXT:    [[Z:%.*]] = load volatile i32, ptr [[Y:%.*]], align 4
 ; CHECK-NEXT:    store i32 6, ptr [[P]], align 4
-; CHECK-NEXT:    [[A:%.*]] = add i32 [[Z]], 5
 ; CHECK-NEXT:    br label [[IF_END:%.*]]
 ; CHECK:       if.else:
 ; CHECK-NEXT:    [[W:%.*]] = load volatile i32, ptr [[Y]], align 4
-; CHECK-NEXT:    [[B:%.*]] = add i32 [[W]], 7
 ; CHECK-NEXT:    br label [[IF_END]]
 ; CHECK:       if.end:
-; CHECK-NEXT:    [[B_SINK:%.*]] = phi i32 [ [[B]], [[IF_ELSE]] ], [ [[A]], [[IF_THEN]] ]
-; CHECK-NEXT:    store volatile i32 [[B_SINK]], ptr [[Y]], align 4
+; CHECK-NEXT:    [[DOTSINK:%.*]] = phi i32 [ 7, [[IF_ELSE]] ], [ 5, [[IF_THEN]] ]
+; CHECK-NEXT:    [[W_SINK:%.*]] = phi i32 [ [[W]], [[IF_ELSE]] ], [ [[Z]], [[IF_THEN]] ]
+; CHECK-NEXT:    [[B:%.*]] = add i32 [[W_SINK]], [[DOTSINK]]
+; CHECK-NEXT:    store volatile i32 [[B]], ptr [[Y]], align 4
 ; CHECK-NEXT:    ret i32 1
 ;
 entry:
@@ -570,16 +568,16 @@ define zeroext i1 @test_crash(i1 zeroext %flag, ptr %i4, ptr %m, ptr %n) {
 ; CHECK-NEXT:    br i1 [[FLAG:%.*]], label [[IF_THEN:%.*]], label [[IF_ELSE:%.*]]
 ; CHECK:       if.then:
 ; CHECK-NEXT:    [[TMP1:%.*]] = load i32, ptr [[I4:%.*]], align 4
-; CHECK-NEXT:    [[TMP2:%.*]] = add i32 [[TMP1]], -1
 ; CHECK-NEXT:    br label [[IF_END:%.*]]
 ; CHECK:       if.else:
 ; CHECK-NEXT:    [[TMP3:%.*]] = load i32, ptr [[M:%.*]], align 4
 ; CHECK-NEXT:    [[TMP4:%.*]] = load i32, ptr [[N:%.*]], align 4
-; CHECK-NEXT:    [[TMP5:%.*]] = add i32 [[TMP3]], [[TMP4]]
 ; CHECK-NEXT:    br label [[IF_END]]
 ; CHECK:       if.end:
-; CHECK-NEXT:    [[TMP5_SINK:%.*]] = phi i32 [ [[TMP5]], [[IF_ELSE]] ], [ [[TMP2]], [[IF_THEN]] ]
-; CHECK-NEXT:    store i32 [[TMP5_SINK]], ptr [[I4]], align 4
+; CHECK-NEXT:    [[TMP4_SINK:%.*]] = phi i32 [ [[TMP4]], [[IF_ELSE]] ], [ -1, [[IF_THEN]] ]
+; CHECK-NEXT:    [[TMP3_SINK:%.*]] = phi i32 [ [[TMP3]], [[IF_ELSE]] ], [ [[TMP1]], [[IF_THEN]] ]
+; CHECK-NEXT:    [[TMP5:%.*]] = add i32 [[TMP3_SINK]], [[TMP4_SINK]]
+; CHECK-NEXT:    store i32 [[TMP5]], ptr [[I4]], align 4
 ; CHECK-NEXT:    ret i1 true
 ;
 entry:
@@ -1475,22 +1473,14 @@ declare void @llvm.lifetime.end.p0(i64, ptr nocapture)
 
 define void @creating_too_many_phis(i1 %cond, i32 %a, i32 %b, i32 %c, i32 %d, i32 %e, i32 %f, i32 %g, i32 %h) {
 ; CHECK-LABEL: @creating_too_many_phis(
-; CHECK-NEXT:    br i1 [[COND:%.*]], label [[BB0:%.*]], label [[BB1:%.*]]
-; CHECK:       bb0:
-; CHECK-NEXT:    [[V0:%.*]] = add i32 [[A:%.*]], [[B:%.*]]
-; CHECK-NEXT:    [[V1:%.*]] = add i32 [[V0]], [[C:%.*]]
-; CHECK-NEXT:    [[V2:%.*]] = add i32 [[D:%.*]], [[E:%.*]]
-; CHECK-NEXT:    [[R3:%.*]] = add i32 [[V1]], [[V2]]
-; CHECK-NEXT:    br label [[END:%.*]]
-; CHECK:       bb1:
-; CHECK-NEXT:    [[V4:%.*]] = add i32 [[A]], [[B]]
-; CHECK-NEXT:    [[V5:%.*]] = add i32 [[V4]], [[C]]
-; CHECK-NEXT:    [[V6:%.*]] = add i32 [[G:%.*]], [[H:%.*]]
+; CHECK-NEXT:  end:
+; CHECK-NEXT:    [[E_H:%.*]] = select i1 [[COND:%.*]], i32 [[E:%.*]], i32 [[H:%.*]]
+; CHECK-NEXT:    [[D_G:%.*]] = select i1 [[COND]], i32 [[D:%.*]], i32 [[G:%.*]]
+; CHECK-NEXT:    [[V4:%.*]] = add i32 [[A:%.*]], [[B:%.*]]
+; CHECK-NEXT:    [[V5:%.*]] = add i32 [[V4]], [[C:%.*]]
+; CHECK-NEXT:    [[V6:%.*]] = add i32 [[D_G]], [[E_H]]
 ; CHECK-NEXT:    [[R7:%.*]] = add i32 [[V5]], [[V6]]
-; CHECK-NEXT:    br label [[END]]
-; CHECK:       end:
-; CHECK-NEXT:    [[R7_SINK:%.*]] = phi i32 [ [[R7]], [[BB1]] ], [ [[R3]], [[BB0]] ]
-; CHECK-NEXT:    call void @use32(i32 [[R7_SINK]])
+; CHECK-NEXT:    call void @use32(i32 [[R7]])
 ; CHECK-NEXT:    ret void
 ;
   br i1 %cond, label %bb0, label %bb1
@@ -1806,17 +1796,17 @@ define i64 @multi_use_in_block_inconsistent(i1 %cond, ptr %p, i64 %a, i64 %b) {
 ; CHECK-NEXT:    call void @dummy()
 ; CHECK-NEXT:    [[GEP1_A:%.*]] = getelementptr i8, ptr [[P:%.*]], i64 [[A:%.*]]
 ; CHECK-NEXT:    [[V_A:%.*]] = load i64, ptr [[GEP1_A]], align 8
-; CHECK-NEXT:    [[GEP2_A:%.*]] = getelementptr i8, ptr [[GEP1_A]], i64 [[V_A]]
 ; CHECK-NEXT:    br label [[JOIN:%.*]]
 ; CHECK:       else:
 ; CHECK-NEXT:    [[GEP1_B:%.*]] = getelementptr i8, ptr [[P]], i64 [[A]]
 ; CHECK-NEXT:    [[V_B:%.*]] = load i64, ptr [[P]], align 8
-; CHECK-NEXT:    [[GEP2_B:%.*]] = getelementptr i8, ptr [[GEP1_B]], i64 [[V_B]]
 ; CHECK-NEXT:    br label [[JOIN]]
 ; CHECK:       join:
+; CHECK-NEXT:    [[V_B_SINK:%.*]] = phi i64 [ [[V_B]], [[ELSE]] ], [ [[V_A]], [[IF]] ]
+; CHECK-NEXT:    [[GEP1_B_SINK:%.*]] = phi ptr [ [[GEP1_B]], [[ELSE]] ], [ [[GEP1_A]], [[IF]] ]
 ; CHECK-NEXT:    [[PHI1:%.*]] = phi i64 [ [[V_A]], [[IF]] ], [ [[V_B]], [[ELSE]] ]
-; CHECK-NEXT:    [[PHI2:%.*]] = phi ptr [ [[GEP2_A]], [[IF]] ], [ [[GEP2_B]], [[ELSE]] ]
-; CHECK-NEXT:    call void @use.ptr(ptr [[PHI2]])
+; CHECK-NEXT:    [[GEP2_B:%.*]] = getelementptr i8, ptr [[GEP1_B_SINK]], i64 [[V_B_SINK]]
+; CHECK-NEXT:    call void @use.ptr(ptr [[GEP2_B]])
 ; CHECK-NEXT:    ret i64 [[PHI1]]
 ;
   br i1 %cond, label %if, label %else
@@ -1873,19 +1863,16 @@ join:
 
 define i64 @load_with_non_sunk_gep_both(i1 %cond, ptr %p.a, ptr %p.b, i64 %a, i64 %b) {
 ; CHECK-LABEL: @load_with_non_sunk_gep_both(
-; CHECK-NEXT:    br i1 [[COND:%.*]], label [[IF:%.*]], label [[ELSE:%.*]]
+; CHECK-NEXT:    br i1 [[COND:%.*]], label [[IF:%.*]], label [[JOIN:%.*]]
 ; CHECK:       if:
 ; CHECK-NEXT:    call void @dummy()
-; CHECK-NEXT:    [[GEP_A:%.*]] = getelementptr i8, ptr [[P_A:%.*]], i64 [[A:%.*]]
-; CHECK-NEXT:    [[V_A:%.*]] = load i64, ptr [[GEP_A]], align 8
-; CHECK-NEXT:    br label [[JOIN:%.*]]
-; CHECK:       else:
-; CHECK-NEXT:    [[GEP_B:%.*]] = getelementptr i8, ptr [[P_B:%.*]], i64 [[B:%.*]]
-; CHECK-NEXT:    [[V_B:%.*]] = load i64, ptr [[GEP_B]], align 8
 ; CHECK-NEXT:    br label [[JOIN]]
 ; CHECK:       join:
-; CHECK-NEXT:    [[V:%.*]] = phi i64 [ [[V_A]], [[IF]] ], [ [[V_B]], [[ELSE]] ]
-; CHECK-NEXT:    ret i64 [[V]]
+; CHECK-NEXT:    [[B_SINK:%.*]] = phi i64 [ [[A:%.*]], [[IF]] ], [ [[B:%.*]], [[TMP0:%.*]] ]
+; CHECK-NEXT:    [[P_B_SINK:%.*]] = phi ptr [ [[P_A:%.*]], [[IF]] ], [ [[P_B:%.*]], [[TMP0]] ]
+; CHECK-NEXT:    [[GEP_B:%.*]] = getelementptr i8, ptr [[P_B_SINK]], i64 [[B_SINK]]
+; CHECK-NEXT:    [[V_B:%.*]] = load i64, ptr [[GEP_B]], align 8
+; CHECK-NEXT:    ret i64 [[V_B]]
 ;
   br i1 %cond, label %if, label %else
 
@@ -1971,17 +1958,15 @@ join:
 
 define void @store_with_non_sunk_gep(i1 %cond, ptr %p.a, ptr %p.b, i64 %a, i64 %b) {
 ; CHECK-LABEL: @store_with_non_sunk_gep(
-; CHECK-NEXT:    br i1 [[COND:%.*]], label [[IF:%.*]], label [[ELSE:%.*]]
+; CHECK-NEXT:    br i1 [[COND:%.*]], label [[IF:%.*]], label [[JOIN:%.*]]
 ; CHECK:       if:
 ; CHECK-NEXT:    call void @dummy()
-; CHECK-NEXT:    [[GEP_A:%.*]] = getelementptr i8, ptr [[P_A:%.*]], i64 [[A:%.*]]
-; CHECK-NEXT:    store i64 0, ptr [[GEP_A]], align 8
-; CHECK-NEXT:    br label [[JOIN:%.*]]
-; CHECK:       else:
-; CHECK-NEXT:    [[GEP_B:%.*]] = getelementptr i8, ptr [[P_B:%.*]], i64 [[B:%.*]]
-; CHECK-NEXT:    store i64 0, ptr [[GEP_B]], align 8
 ; CHECK-NEXT:    br label [[JOIN]]
 ; CHECK:       join:
+; CHECK-NEXT:    [[B_SINK:%.*]] = phi i64 [ [[A:%.*]], [[IF]] ], [ [[B:%.*]], [[TMP0:%.*]] ]
+; CHECK-NEXT:    [[P_B_SINK:%.*]] = phi ptr [ [[P_A:%.*]], [[IF]] ], [ [[P_B:%.*]], [[TMP0]] ]
+; CHECK-NEXT:    [[GEP_B:%.*]] = getelementptr i8, ptr [[P_B_SINK]], i64 [[B_SINK]]
+; CHECK-NEXT:    store i64 0, ptr [[GEP_B]], align 8
 ; CHECK-NEXT:    ret void
 ;
   br i1 %cond, label %if, label %else
@@ -2003,17 +1988,15 @@ join:
 
 define void @store_with_non_sunk_gep_as_value(i1 %cond, ptr %p, ptr %p.a, ptr %p.b, i64 %a, i64 %b) {
 ; CHECK-LABEL: @store_with_non_sunk_gep_as_value(
-; CHECK-NEXT:    br i1 [[COND:%.*]], label [[IF:%.*]], label [[ELSE:%.*]]
+; CHECK-NEXT:    br i1 [[COND:%.*]], label [[IF:%.*]], label [[JOIN:%.*]]
 ; CHECK:       if:
 ; CHECK-NEXT:    call void @dummy()
-; CHECK-NEXT:    [[GEP_A:%.*]] = getelementptr i8, ptr [[P_A:%.*]], i64 [[A:%.*]]
-; CHECK-NEXT:    br label [[JOIN:%.*]]
-; CHECK:       else:
-; CHECK-NEXT:   ...
[truncated]

@llvmbot
Copy link
Member

llvmbot commented Sep 29, 2024

@llvm/pr-subscribers-llvm-analysis

Author: Phoebe Wang (phoebewang)

Changes

Patch is 24.10 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/110420.diff

8 Files Affected:

  • (modified) llvm/include/llvm/Analysis/TargetTransformInfo.h (+15-4)
  • (modified) llvm/include/llvm/Analysis/TargetTransformInfoImpl.h (+4-1)
  • (modified) llvm/lib/Analysis/TargetTransformInfo.cpp (+3-2)
  • (modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp (+4-2)
  • (modified) llvm/lib/Target/X86/X86TargetTransformInfo.cpp (+6-3)
  • (modified) llvm/lib/Target/X86/X86TargetTransformInfo.h (+4-1)
  • (modified) llvm/lib/Transforms/Utils/SimplifyCFG.cpp (+16-6)
  • (modified) llvm/test/Transforms/SimplifyCFG/X86/sink-common-code.ll (+56-80)
diff --git a/llvm/include/llvm/Analysis/TargetTransformInfo.h b/llvm/include/llvm/Analysis/TargetTransformInfo.h
index 89a85bc8a90864..787b471cc328ff 100644
--- a/llvm/include/llvm/Analysis/TargetTransformInfo.h
+++ b/llvm/include/llvm/Analysis/TargetTransformInfo.h
@@ -1114,9 +1114,17 @@ class TargetTransformInfo {
   /// \return the number of registers in the target-provided register class.
   unsigned getNumberOfRegisters(unsigned ClassID) const;
 
+  enum class MoveType {
+    NoMem = 1,
+    MemLD = 2,
+    MemST = 4,
+    All = 7,
+  };
+
   /// \return true if the target supports load/store that enables fault
   /// suppression of memory operands when the source condition is false.
-  bool hasConditionalLoadStoreForType(Type *Ty = nullptr) const;
+  bool hasConditionalMoveForType(Type *Ty = nullptr,
+                                 MoveType MT = MoveType::NoMem) const;
 
   /// \return the target-provided register class ID for the provided type,
   /// accounting for type promotion and other type-legalization techniques that
@@ -1978,7 +1986,9 @@ class TargetTransformInfo::Concept {
   virtual bool preferToKeepConstantsAttached(const Instruction &Inst,
                                              const Function &Fn) const = 0;
   virtual unsigned getNumberOfRegisters(unsigned ClassID) const = 0;
-  virtual bool hasConditionalLoadStoreForType(Type *Ty = nullptr) const = 0;
+  virtual bool
+  hasConditionalMoveForType(Type *Ty = nullptr,
+                            MoveType MT = MoveType::NoMem) const = 0;
   virtual unsigned getRegisterClassForType(bool Vector,
                                            Type *Ty = nullptr) const = 0;
   virtual const char *getRegisterClassName(unsigned ClassID) const = 0;
@@ -2571,8 +2581,9 @@ class TargetTransformInfo::Model final : public TargetTransformInfo::Concept {
   unsigned getNumberOfRegisters(unsigned ClassID) const override {
     return Impl.getNumberOfRegisters(ClassID);
   }
-  bool hasConditionalLoadStoreForType(Type *Ty = nullptr) const override {
-    return Impl.hasConditionalLoadStoreForType(Ty);
+  bool hasConditionalMoveForType(Type *Ty = nullptr,
+                                 MoveType MT = MoveType::NoMem) const override {
+    return Impl.hasConditionalMoveForType(Ty, MT);
   }
   unsigned getRegisterClassForType(bool Vector,
                                    Type *Ty = nullptr) const override {
diff --git a/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h b/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
index eca8818cc25e62..cf1449b3a7e141 100644
--- a/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
+++ b/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
@@ -470,7 +470,10 @@ class TargetTransformInfoImplBase {
   }
 
   unsigned getNumberOfRegisters(unsigned ClassID) const { return 8; }
-  bool hasConditionalLoadStoreForType(Type *Ty) const { return false; }
+  bool hasConditionalMoveForType(Type *Ty,
+                                 TargetTransformInfo::MoveType MT) const {
+    return false;
+  }
 
   unsigned getRegisterClassForType(bool Vector, Type *Ty = nullptr) const {
     return Vector ? 1 : 0;
diff --git a/llvm/lib/Analysis/TargetTransformInfo.cpp b/llvm/lib/Analysis/TargetTransformInfo.cpp
index b5195f764cbd1c..231731b2e7446c 100644
--- a/llvm/lib/Analysis/TargetTransformInfo.cpp
+++ b/llvm/lib/Analysis/TargetTransformInfo.cpp
@@ -727,8 +727,9 @@ unsigned TargetTransformInfo::getNumberOfRegisters(unsigned ClassID) const {
   return TTIImpl->getNumberOfRegisters(ClassID);
 }
 
-bool TargetTransformInfo::hasConditionalLoadStoreForType(Type *Ty) const {
-  return TTIImpl->hasConditionalLoadStoreForType(Ty);
+bool TargetTransformInfo::hasConditionalMoveForType(Type *Ty,
+                                                    MoveType MT) const {
+  return TTIImpl->hasConditionalMoveForType(Ty, MT);
 }
 
 unsigned TargetTransformInfo::getRegisterClassForType(bool Vector,
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
index 25213f587116d5..5c35142c951d68 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
@@ -4806,7 +4806,8 @@ void SelectionDAGBuilder::visitMaskedStore(const CallInst &I,
       TLI.getTargetMachine().getTargetTransformInfo(*I.getFunction());
   SDValue StoreNode =
       !IsCompressing &&
-              TTI.hasConditionalLoadStoreForType(I.getArgOperand(0)->getType())
+              TTI.hasConditionalMoveForType(I.getArgOperand(0)->getType(),
+                                            TargetTransformInfo::MoveType::All)
           ? TLI.visitMaskedStore(DAG, sdl, getMemoryRoot(), MMO, Ptr, Src0,
                                  Mask)
           : DAG.getMaskedStore(getMemoryRoot(), sdl, Src0, Ptr, Offset, Mask,
@@ -4992,7 +4993,8 @@ void SelectionDAGBuilder::visitMaskedLoad(const CallInst &I, bool IsExpanding) {
   SDValue Load;
   SDValue Res;
   if (!IsExpanding &&
-      TTI.hasConditionalLoadStoreForType(Src0Operand->getType()))
+      TTI.hasConditionalMoveForType(Src0Operand->getType(),
+                                    TargetTransformInfo::MoveType::All))
     Res = TLI.visitMaskedLoad(DAG, sdl, InChain, MMO, Load, Ptr, Src0, Mask);
   else
     Res = Load =
diff --git a/llvm/lib/Target/X86/X86TargetTransformInfo.cpp b/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
index 46bc73c5e928e0..6c213bd8eb6968 100644
--- a/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
+++ b/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
@@ -176,8 +176,11 @@ unsigned X86TTIImpl::getNumberOfRegisters(unsigned ClassID) const {
   return 8;
 }
 
-bool X86TTIImpl::hasConditionalLoadStoreForType(Type *Ty) const {
-  if (!ST->hasCF())
+bool X86TTIImpl::hasConditionalMoveForType(
+    Type *Ty, TargetTransformInfo::MoveType MT) const {
+  if (!ST->canUseCMOV())
+    return false;
+  if (MT != TargetTransformInfo::MoveType::NoMem && !ST->hasCF())
     return false;
   if (!Ty)
     return true;
@@ -6064,7 +6067,7 @@ bool X86TTIImpl::isLegalMaskedLoad(Type *DataTy, Align Alignment) {
 
   // The backend can't handle a single element vector w/o CFCMOV.
   if (isa<VectorType>(DataTy) && cast<FixedVectorType>(DataTy)->getNumElements() == 1)
-    return ST->hasCF() && hasConditionalLoadStoreForType(ScalarTy);
+    return ST->hasCF() && hasConditionalMoveForType(ScalarTy);
 
   if (!ST->hasAVX())
     return false;
diff --git a/llvm/lib/Target/X86/X86TargetTransformInfo.h b/llvm/lib/Target/X86/X86TargetTransformInfo.h
index c16461b157e07f..4515f29bd2a6e5 100644
--- a/llvm/lib/Target/X86/X86TargetTransformInfo.h
+++ b/llvm/lib/Target/X86/X86TargetTransformInfo.h
@@ -132,7 +132,10 @@ class X86TTIImpl : public BasicTTIImplBase<X86TTIImpl> {
   /// @{
 
   unsigned getNumberOfRegisters(unsigned ClassID) const;
-  bool hasConditionalLoadStoreForType(Type *Ty = nullptr) const;
+  bool
+  hasConditionalMoveForType(Type *Ty = nullptr,
+                            TargetTransformInfo::MoveType MT =
+                                TargetTransformInfo::MoveType::NoMem) const;
   TypeSize getRegisterBitWidth(TargetTransformInfo::RegisterKind K) const;
   unsigned getLoadStoreVecRegBitWidth(unsigned AS) const;
   unsigned getMaxInterleaveFactor(ElementCount VF);
diff --git a/llvm/lib/Transforms/Utils/SimplifyCFG.cpp b/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
index 1f2c9389c008bd..800ea07c619298 100644
--- a/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
+++ b/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
@@ -129,6 +129,10 @@ static cl::opt<unsigned> HoistLoadsStoresWithCondFaultingThreshold(
              "to speculatively execute to eliminate conditional branch "
              "(default = 6)"));
 
+static cl::opt<unsigned> ProfitableToSinkInstructionThreshold(
+    "profitable-to-sink-instruction-threshold", cl::Hidden, cl::init(6),
+    cl::desc("Control the maximal PHI instructions"));
+
 static cl::opt<unsigned>
     HoistCommonSkipLimit("simplifycfg-hoist-common-skip-limit", cl::Hidden,
                          cl::init(20),
@@ -1742,7 +1746,8 @@ static bool isSafeCheapLoadStore(const Instruction *I,
   // llvm.masked.load/store use i32 for alignment while load/store use i64.
   // That's why we have the alignment limitation.
   // FIXME: Update the prototype of the intrinsics?
-  return TTI.hasConditionalLoadStoreForType(getLoadStoreType(I)) &&
+  return TTI.hasConditionalMoveForType(getLoadStoreType(I),
+                                       TargetTransformInfo::MoveType::All) &&
          getLoadStoreAlignment(I) < Value::MaximumAlignment;
 }
 
@@ -2386,8 +2391,8 @@ namespace {
 
 /// Check whether BB's predecessors end with unconditional branches. If it is
 /// true, sink any common code from the predecessors to BB.
-static bool sinkCommonCodeFromPredecessors(BasicBlock *BB,
-                                           DomTreeUpdater *DTU) {
+static bool sinkCommonCodeFromPredecessors(BasicBlock *BB, DomTreeUpdater *DTU,
+                                           const TargetTransformInfo &TTI) {
   // We support two situations:
   //   (1) all incoming arcs are unconditional
   //   (2) there are non-unconditional incoming arcs
@@ -2492,12 +2497,16 @@ static bool sinkCommonCodeFromPredecessors(BasicBlock *BB,
     // sink?
     auto ProfitableToSinkInstruction = [&](LockstepReverseIterator &LRI) {
       unsigned NumPHIInsts = 0;
+      unsigned NumCmovInsts = 0;
       for (Use &U : (*LRI)[0]->operands()) {
         auto It = PHIOperands.find(&U);
         if (It != PHIOperands.end() && !all_of(It->second, [&](Value *V) {
               return InstructionsToSink.contains(V);
             })) {
-          ++NumPHIInsts;
+          if (TTI.hasConditionalMoveForType(U->getType()))
+            ++NumCmovInsts;
+          else
+            ++NumPHIInsts;
           // Do not separate a load/store from the gep producing the address.
           // The gep can likely be folded into the load/store as an addressing
           // mode. Additionally, a load of a gep is easier to analyze than a
@@ -2511,7 +2520,8 @@ static bool sinkCommonCodeFromPredecessors(BasicBlock *BB,
         }
       }
       LLVM_DEBUG(dbgs() << "SINK: #phi insts: " << NumPHIInsts << "\n");
-      return NumPHIInsts <= 1;
+      return NumPHIInsts <= 1 &&
+             NumCmovInsts < ProfitableToSinkInstructionThreshold;
     };
 
     // We've determined that we are going to sink last ScanIdx instructions,
@@ -8119,7 +8129,7 @@ bool SimplifyCFGOpt::simplifyOnce(BasicBlock *BB) {
     return true;
 
   if (SinkCommon && Options.SinkCommonInsts)
-    if (sinkCommonCodeFromPredecessors(BB, DTU) ||
+    if (sinkCommonCodeFromPredecessors(BB, DTU, TTI) ||
         mergeCompatibleInvokes(BB, DTU)) {
       // sinkCommonCodeFromPredecessors() does not automatically CSE PHI's,
       // so we may now how duplicate PHI's.
diff --git a/llvm/test/Transforms/SimplifyCFG/X86/sink-common-code.ll b/llvm/test/Transforms/SimplifyCFG/X86/sink-common-code.ll
index 170f8d1592c2fe..912ba0d8c8f3a3 100644
--- a/llvm/test/Transforms/SimplifyCFG/X86/sink-common-code.ll
+++ b/llvm/test/Transforms/SimplifyCFG/X86/sink-common-code.ll
@@ -220,17 +220,15 @@ define i32 @test8(i1 zeroext %flag, i32 %x, ptr %y) {
 ; CHECK-LABEL: @test8(
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    [[Z:%.*]] = load volatile i32, ptr [[Y:%.*]], align 4
-; CHECK-NEXT:    br i1 [[FLAG:%.*]], label [[IF_THEN:%.*]], label [[IF_ELSE:%.*]]
-; CHECK:       if.then:
-; CHECK-NEXT:    [[A:%.*]] = add i32 [[Z]], 5
-; CHECK-NEXT:    br label [[IF_END:%.*]]
+; CHECK-NEXT:    br i1 [[FLAG:%.*]], label [[IF_END:%.*]], label [[IF_ELSE:%.*]]
 ; CHECK:       if.else:
 ; CHECK-NEXT:    [[W:%.*]] = load volatile i32, ptr [[Y]], align 4
-; CHECK-NEXT:    [[B:%.*]] = add i32 [[W]], 7
 ; CHECK-NEXT:    br label [[IF_END]]
 ; CHECK:       if.end:
-; CHECK-NEXT:    [[B_SINK:%.*]] = phi i32 [ [[B]], [[IF_ELSE]] ], [ [[A]], [[IF_THEN]] ]
-; CHECK-NEXT:    store volatile i32 [[B_SINK]], ptr [[Y]], align 4
+; CHECK-NEXT:    [[DOTSINK:%.*]] = phi i32 [ 7, [[IF_ELSE]] ], [ 5, [[ENTRY:%.*]] ]
+; CHECK-NEXT:    [[W_SINK:%.*]] = phi i32 [ [[W]], [[IF_ELSE]] ], [ [[Z]], [[ENTRY]] ]
+; CHECK-NEXT:    [[B:%.*]] = add i32 [[W_SINK]], [[DOTSINK]]
+; CHECK-NEXT:    store volatile i32 [[B]], ptr [[Y]], align 4
 ; CHECK-NEXT:    ret i32 1
 ;
 entry:
@@ -262,15 +260,15 @@ define i32 @test9(i1 zeroext %flag, i32 %x, ptr %y, ptr %p) {
 ; CHECK-NEXT:    store i32 7, ptr [[P:%.*]], align 4
 ; CHECK-NEXT:    [[Z:%.*]] = load volatile i32, ptr [[Y:%.*]], align 4
 ; CHECK-NEXT:    store i32 6, ptr [[P]], align 4
-; CHECK-NEXT:    [[A:%.*]] = add i32 [[Z]], 5
 ; CHECK-NEXT:    br label [[IF_END:%.*]]
 ; CHECK:       if.else:
 ; CHECK-NEXT:    [[W:%.*]] = load volatile i32, ptr [[Y]], align 4
-; CHECK-NEXT:    [[B:%.*]] = add i32 [[W]], 7
 ; CHECK-NEXT:    br label [[IF_END]]
 ; CHECK:       if.end:
-; CHECK-NEXT:    [[B_SINK:%.*]] = phi i32 [ [[B]], [[IF_ELSE]] ], [ [[A]], [[IF_THEN]] ]
-; CHECK-NEXT:    store volatile i32 [[B_SINK]], ptr [[Y]], align 4
+; CHECK-NEXT:    [[DOTSINK:%.*]] = phi i32 [ 7, [[IF_ELSE]] ], [ 5, [[IF_THEN]] ]
+; CHECK-NEXT:    [[W_SINK:%.*]] = phi i32 [ [[W]], [[IF_ELSE]] ], [ [[Z]], [[IF_THEN]] ]
+; CHECK-NEXT:    [[B:%.*]] = add i32 [[W_SINK]], [[DOTSINK]]
+; CHECK-NEXT:    store volatile i32 [[B]], ptr [[Y]], align 4
 ; CHECK-NEXT:    ret i32 1
 ;
 entry:
@@ -570,16 +568,16 @@ define zeroext i1 @test_crash(i1 zeroext %flag, ptr %i4, ptr %m, ptr %n) {
 ; CHECK-NEXT:    br i1 [[FLAG:%.*]], label [[IF_THEN:%.*]], label [[IF_ELSE:%.*]]
 ; CHECK:       if.then:
 ; CHECK-NEXT:    [[TMP1:%.*]] = load i32, ptr [[I4:%.*]], align 4
-; CHECK-NEXT:    [[TMP2:%.*]] = add i32 [[TMP1]], -1
 ; CHECK-NEXT:    br label [[IF_END:%.*]]
 ; CHECK:       if.else:
 ; CHECK-NEXT:    [[TMP3:%.*]] = load i32, ptr [[M:%.*]], align 4
 ; CHECK-NEXT:    [[TMP4:%.*]] = load i32, ptr [[N:%.*]], align 4
-; CHECK-NEXT:    [[TMP5:%.*]] = add i32 [[TMP3]], [[TMP4]]
 ; CHECK-NEXT:    br label [[IF_END]]
 ; CHECK:       if.end:
-; CHECK-NEXT:    [[TMP5_SINK:%.*]] = phi i32 [ [[TMP5]], [[IF_ELSE]] ], [ [[TMP2]], [[IF_THEN]] ]
-; CHECK-NEXT:    store i32 [[TMP5_SINK]], ptr [[I4]], align 4
+; CHECK-NEXT:    [[TMP4_SINK:%.*]] = phi i32 [ [[TMP4]], [[IF_ELSE]] ], [ -1, [[IF_THEN]] ]
+; CHECK-NEXT:    [[TMP3_SINK:%.*]] = phi i32 [ [[TMP3]], [[IF_ELSE]] ], [ [[TMP1]], [[IF_THEN]] ]
+; CHECK-NEXT:    [[TMP5:%.*]] = add i32 [[TMP3_SINK]], [[TMP4_SINK]]
+; CHECK-NEXT:    store i32 [[TMP5]], ptr [[I4]], align 4
 ; CHECK-NEXT:    ret i1 true
 ;
 entry:
@@ -1475,22 +1473,14 @@ declare void @llvm.lifetime.end.p0(i64, ptr nocapture)
 
 define void @creating_too_many_phis(i1 %cond, i32 %a, i32 %b, i32 %c, i32 %d, i32 %e, i32 %f, i32 %g, i32 %h) {
 ; CHECK-LABEL: @creating_too_many_phis(
-; CHECK-NEXT:    br i1 [[COND:%.*]], label [[BB0:%.*]], label [[BB1:%.*]]
-; CHECK:       bb0:
-; CHECK-NEXT:    [[V0:%.*]] = add i32 [[A:%.*]], [[B:%.*]]
-; CHECK-NEXT:    [[V1:%.*]] = add i32 [[V0]], [[C:%.*]]
-; CHECK-NEXT:    [[V2:%.*]] = add i32 [[D:%.*]], [[E:%.*]]
-; CHECK-NEXT:    [[R3:%.*]] = add i32 [[V1]], [[V2]]
-; CHECK-NEXT:    br label [[END:%.*]]
-; CHECK:       bb1:
-; CHECK-NEXT:    [[V4:%.*]] = add i32 [[A]], [[B]]
-; CHECK-NEXT:    [[V5:%.*]] = add i32 [[V4]], [[C]]
-; CHECK-NEXT:    [[V6:%.*]] = add i32 [[G:%.*]], [[H:%.*]]
+; CHECK-NEXT:  end:
+; CHECK-NEXT:    [[E_H:%.*]] = select i1 [[COND:%.*]], i32 [[E:%.*]], i32 [[H:%.*]]
+; CHECK-NEXT:    [[D_G:%.*]] = select i1 [[COND]], i32 [[D:%.*]], i32 [[G:%.*]]
+; CHECK-NEXT:    [[V4:%.*]] = add i32 [[A:%.*]], [[B:%.*]]
+; CHECK-NEXT:    [[V5:%.*]] = add i32 [[V4]], [[C:%.*]]
+; CHECK-NEXT:    [[V6:%.*]] = add i32 [[D_G]], [[E_H]]
 ; CHECK-NEXT:    [[R7:%.*]] = add i32 [[V5]], [[V6]]
-; CHECK-NEXT:    br label [[END]]
-; CHECK:       end:
-; CHECK-NEXT:    [[R7_SINK:%.*]] = phi i32 [ [[R7]], [[BB1]] ], [ [[R3]], [[BB0]] ]
-; CHECK-NEXT:    call void @use32(i32 [[R7_SINK]])
+; CHECK-NEXT:    call void @use32(i32 [[R7]])
 ; CHECK-NEXT:    ret void
 ;
   br i1 %cond, label %bb0, label %bb1
@@ -1806,17 +1796,17 @@ define i64 @multi_use_in_block_inconsistent(i1 %cond, ptr %p, i64 %a, i64 %b) {
 ; CHECK-NEXT:    call void @dummy()
 ; CHECK-NEXT:    [[GEP1_A:%.*]] = getelementptr i8, ptr [[P:%.*]], i64 [[A:%.*]]
 ; CHECK-NEXT:    [[V_A:%.*]] = load i64, ptr [[GEP1_A]], align 8
-; CHECK-NEXT:    [[GEP2_A:%.*]] = getelementptr i8, ptr [[GEP1_A]], i64 [[V_A]]
 ; CHECK-NEXT:    br label [[JOIN:%.*]]
 ; CHECK:       else:
 ; CHECK-NEXT:    [[GEP1_B:%.*]] = getelementptr i8, ptr [[P]], i64 [[A]]
 ; CHECK-NEXT:    [[V_B:%.*]] = load i64, ptr [[P]], align 8
-; CHECK-NEXT:    [[GEP2_B:%.*]] = getelementptr i8, ptr [[GEP1_B]], i64 [[V_B]]
 ; CHECK-NEXT:    br label [[JOIN]]
 ; CHECK:       join:
+; CHECK-NEXT:    [[V_B_SINK:%.*]] = phi i64 [ [[V_B]], [[ELSE]] ], [ [[V_A]], [[IF]] ]
+; CHECK-NEXT:    [[GEP1_B_SINK:%.*]] = phi ptr [ [[GEP1_B]], [[ELSE]] ], [ [[GEP1_A]], [[IF]] ]
 ; CHECK-NEXT:    [[PHI1:%.*]] = phi i64 [ [[V_A]], [[IF]] ], [ [[V_B]], [[ELSE]] ]
-; CHECK-NEXT:    [[PHI2:%.*]] = phi ptr [ [[GEP2_A]], [[IF]] ], [ [[GEP2_B]], [[ELSE]] ]
-; CHECK-NEXT:    call void @use.ptr(ptr [[PHI2]])
+; CHECK-NEXT:    [[GEP2_B:%.*]] = getelementptr i8, ptr [[GEP1_B_SINK]], i64 [[V_B_SINK]]
+; CHECK-NEXT:    call void @use.ptr(ptr [[GEP2_B]])
 ; CHECK-NEXT:    ret i64 [[PHI1]]
 ;
   br i1 %cond, label %if, label %else
@@ -1873,19 +1863,16 @@ join:
 
 define i64 @load_with_non_sunk_gep_both(i1 %cond, ptr %p.a, ptr %p.b, i64 %a, i64 %b) {
 ; CHECK-LABEL: @load_with_non_sunk_gep_both(
-; CHECK-NEXT:    br i1 [[COND:%.*]], label [[IF:%.*]], label [[ELSE:%.*]]
+; CHECK-NEXT:    br i1 [[COND:%.*]], label [[IF:%.*]], label [[JOIN:%.*]]
 ; CHECK:       if:
 ; CHECK-NEXT:    call void @dummy()
-; CHECK-NEXT:    [[GEP_A:%.*]] = getelementptr i8, ptr [[P_A:%.*]], i64 [[A:%.*]]
-; CHECK-NEXT:    [[V_A:%.*]] = load i64, ptr [[GEP_A]], align 8
-; CHECK-NEXT:    br label [[JOIN:%.*]]
-; CHECK:       else:
-; CHECK-NEXT:    [[GEP_B:%.*]] = getelementptr i8, ptr [[P_B:%.*]], i64 [[B:%.*]]
-; CHECK-NEXT:    [[V_B:%.*]] = load i64, ptr [[GEP_B]], align 8
 ; CHECK-NEXT:    br label [[JOIN]]
 ; CHECK:       join:
-; CHECK-NEXT:    [[V:%.*]] = phi i64 [ [[V_A]], [[IF]] ], [ [[V_B]], [[ELSE]] ]
-; CHECK-NEXT:    ret i64 [[V]]
+; CHECK-NEXT:    [[B_SINK:%.*]] = phi i64 [ [[A:%.*]], [[IF]] ], [ [[B:%.*]], [[TMP0:%.*]] ]
+; CHECK-NEXT:    [[P_B_SINK:%.*]] = phi ptr [ [[P_A:%.*]], [[IF]] ], [ [[P_B:%.*]], [[TMP0]] ]
+; CHECK-NEXT:    [[GEP_B:%.*]] = getelementptr i8, ptr [[P_B_SINK]], i64 [[B_SINK]]
+; CHECK-NEXT:    [[V_B:%.*]] = load i64, ptr [[GEP_B]], align 8
+; CHECK-NEXT:    ret i64 [[V_B]]
 ;
   br i1 %cond, label %if, label %else
 
@@ -1971,17 +1958,15 @@ join:
 
 define void @store_with_non_sunk_gep(i1 %cond, ptr %p.a, ptr %p.b, i64 %a, i64 %b) {
 ; CHECK-LABEL: @store_with_non_sunk_gep(
-; CHECK-NEXT:    br i1 [[COND:%.*]], label [[IF:%.*]], label [[ELSE:%.*]]
+; CHECK-NEXT:    br i1 [[COND:%.*]], label [[IF:%.*]], label [[JOIN:%.*]]
 ; CHECK:       if:
 ; CHECK-NEXT:    call void @dummy()
-; CHECK-NEXT:    [[GEP_A:%.*]] = getelementptr i8, ptr [[P_A:%.*]], i64 [[A:%.*]]
-; CHECK-NEXT:    store i64 0, ptr [[GEP_A]], align 8
-; CHECK-NEXT:    br label [[JOIN:%.*]]
-; CHECK:       else:
-; CHECK-NEXT:    [[GEP_B:%.*]] = getelementptr i8, ptr [[P_B:%.*]], i64 [[B:%.*]]
-; CHECK-NEXT:    store i64 0, ptr [[GEP_B]], align 8
 ; CHECK-NEXT:    br label [[JOIN]]
 ; CHECK:       join:
+; CHECK-NEXT:    [[B_SINK:%.*]] = phi i64 [ [[A:%.*]], [[IF]] ], [ [[B:%.*]], [[TMP0:%.*]] ]
+; CHECK-NEXT:    [[P_B_SINK:%.*]] = phi ptr [ [[P_A:%.*]], [[IF]] ], [ [[P_B:%.*]], [[TMP0]] ]
+; CHECK-NEXT:    [[GEP_B:%.*]] = getelementptr i8, ptr [[P_B_SINK]], i64 [[B_SINK]]
+; CHECK-NEXT:    store i64 0, ptr [[GEP_B]], align 8
 ; CHECK-NEXT:    ret void
 ;
   br i1 %cond, label %if, label %else
@@ -2003,17 +1988,15 @@ join:
 
 define void @store_with_non_sunk_gep_as_value(i1 %cond, ptr %p, ptr %p.a, ptr %p.b, i64 %a, i64 %b) {
 ; CHECK-LABEL: @store_with_non_sunk_gep_as_value(
-; CHECK-NEXT:    br i1 [[COND:%.*]], label [[IF:%.*]], label [[ELSE:%.*]]
+; CHECK-NEXT:    br i1 [[COND:%.*]], label [[IF:%.*]], label [[JOIN:%.*]]
 ; CHECK:       if:
 ; CHECK-NEXT:    call void @dummy()
-; CHECK-NEXT:    [[GEP_A:%.*]] = getelementptr i8, ptr [[P_A:%.*]], i64 [[A:%.*]]
-; CHECK-NEXT:    br label [[JOIN:%.*]]
-; CHECK:       else:
-; CHECK-NEXT:   ...
[truncated]

@llvmbot
Copy link
Member

llvmbot commented Sep 29, 2024

@llvm/pr-subscribers-backend-x86

Author: Phoebe Wang (phoebewang)

Changes

Patch is 24.10 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/110420.diff

8 Files Affected:

  • (modified) llvm/include/llvm/Analysis/TargetTransformInfo.h (+15-4)
  • (modified) llvm/include/llvm/Analysis/TargetTransformInfoImpl.h (+4-1)
  • (modified) llvm/lib/Analysis/TargetTransformInfo.cpp (+3-2)
  • (modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp (+4-2)
  • (modified) llvm/lib/Target/X86/X86TargetTransformInfo.cpp (+6-3)
  • (modified) llvm/lib/Target/X86/X86TargetTransformInfo.h (+4-1)
  • (modified) llvm/lib/Transforms/Utils/SimplifyCFG.cpp (+16-6)
  • (modified) llvm/test/Transforms/SimplifyCFG/X86/sink-common-code.ll (+56-80)
diff --git a/llvm/include/llvm/Analysis/TargetTransformInfo.h b/llvm/include/llvm/Analysis/TargetTransformInfo.h
index 89a85bc8a90864..787b471cc328ff 100644
--- a/llvm/include/llvm/Analysis/TargetTransformInfo.h
+++ b/llvm/include/llvm/Analysis/TargetTransformInfo.h
@@ -1114,9 +1114,17 @@ class TargetTransformInfo {
   /// \return the number of registers in the target-provided register class.
   unsigned getNumberOfRegisters(unsigned ClassID) const;
 
+  enum class MoveType {
+    NoMem = 1,
+    MemLD = 2,
+    MemST = 4,
+    All = 7,
+  };
+
   /// \return true if the target supports load/store that enables fault
   /// suppression of memory operands when the source condition is false.
-  bool hasConditionalLoadStoreForType(Type *Ty = nullptr) const;
+  bool hasConditionalMoveForType(Type *Ty = nullptr,
+                                 MoveType MT = MoveType::NoMem) const;
 
   /// \return the target-provided register class ID for the provided type,
   /// accounting for type promotion and other type-legalization techniques that
@@ -1978,7 +1986,9 @@ class TargetTransformInfo::Concept {
   virtual bool preferToKeepConstantsAttached(const Instruction &Inst,
                                              const Function &Fn) const = 0;
   virtual unsigned getNumberOfRegisters(unsigned ClassID) const = 0;
-  virtual bool hasConditionalLoadStoreForType(Type *Ty = nullptr) const = 0;
+  virtual bool
+  hasConditionalMoveForType(Type *Ty = nullptr,
+                            MoveType MT = MoveType::NoMem) const = 0;
   virtual unsigned getRegisterClassForType(bool Vector,
                                            Type *Ty = nullptr) const = 0;
   virtual const char *getRegisterClassName(unsigned ClassID) const = 0;
@@ -2571,8 +2581,9 @@ class TargetTransformInfo::Model final : public TargetTransformInfo::Concept {
   unsigned getNumberOfRegisters(unsigned ClassID) const override {
     return Impl.getNumberOfRegisters(ClassID);
   }
-  bool hasConditionalLoadStoreForType(Type *Ty = nullptr) const override {
-    return Impl.hasConditionalLoadStoreForType(Ty);
+  bool hasConditionalMoveForType(Type *Ty = nullptr,
+                                 MoveType MT = MoveType::NoMem) const override {
+    return Impl.hasConditionalMoveForType(Ty, MT);
   }
   unsigned getRegisterClassForType(bool Vector,
                                    Type *Ty = nullptr) const override {
diff --git a/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h b/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
index eca8818cc25e62..cf1449b3a7e141 100644
--- a/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
+++ b/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
@@ -470,7 +470,10 @@ class TargetTransformInfoImplBase {
   }
 
   unsigned getNumberOfRegisters(unsigned ClassID) const { return 8; }
-  bool hasConditionalLoadStoreForType(Type *Ty) const { return false; }
+  bool hasConditionalMoveForType(Type *Ty,
+                                 TargetTransformInfo::MoveType MT) const {
+    return false;
+  }
 
   unsigned getRegisterClassForType(bool Vector, Type *Ty = nullptr) const {
     return Vector ? 1 : 0;
diff --git a/llvm/lib/Analysis/TargetTransformInfo.cpp b/llvm/lib/Analysis/TargetTransformInfo.cpp
index b5195f764cbd1c..231731b2e7446c 100644
--- a/llvm/lib/Analysis/TargetTransformInfo.cpp
+++ b/llvm/lib/Analysis/TargetTransformInfo.cpp
@@ -727,8 +727,9 @@ unsigned TargetTransformInfo::getNumberOfRegisters(unsigned ClassID) const {
   return TTIImpl->getNumberOfRegisters(ClassID);
 }
 
-bool TargetTransformInfo::hasConditionalLoadStoreForType(Type *Ty) const {
-  return TTIImpl->hasConditionalLoadStoreForType(Ty);
+bool TargetTransformInfo::hasConditionalMoveForType(Type *Ty,
+                                                    MoveType MT) const {
+  return TTIImpl->hasConditionalMoveForType(Ty, MT);
 }
 
 unsigned TargetTransformInfo::getRegisterClassForType(bool Vector,
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
index 25213f587116d5..5c35142c951d68 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
@@ -4806,7 +4806,8 @@ void SelectionDAGBuilder::visitMaskedStore(const CallInst &I,
       TLI.getTargetMachine().getTargetTransformInfo(*I.getFunction());
   SDValue StoreNode =
       !IsCompressing &&
-              TTI.hasConditionalLoadStoreForType(I.getArgOperand(0)->getType())
+              TTI.hasConditionalMoveForType(I.getArgOperand(0)->getType(),
+                                            TargetTransformInfo::MoveType::All)
           ? TLI.visitMaskedStore(DAG, sdl, getMemoryRoot(), MMO, Ptr, Src0,
                                  Mask)
           : DAG.getMaskedStore(getMemoryRoot(), sdl, Src0, Ptr, Offset, Mask,
@@ -4992,7 +4993,8 @@ void SelectionDAGBuilder::visitMaskedLoad(const CallInst &I, bool IsExpanding) {
   SDValue Load;
   SDValue Res;
   if (!IsExpanding &&
-      TTI.hasConditionalLoadStoreForType(Src0Operand->getType()))
+      TTI.hasConditionalMoveForType(Src0Operand->getType(),
+                                    TargetTransformInfo::MoveType::All))
     Res = TLI.visitMaskedLoad(DAG, sdl, InChain, MMO, Load, Ptr, Src0, Mask);
   else
     Res = Load =
diff --git a/llvm/lib/Target/X86/X86TargetTransformInfo.cpp b/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
index 46bc73c5e928e0..6c213bd8eb6968 100644
--- a/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
+++ b/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
@@ -176,8 +176,11 @@ unsigned X86TTIImpl::getNumberOfRegisters(unsigned ClassID) const {
   return 8;
 }
 
-bool X86TTIImpl::hasConditionalLoadStoreForType(Type *Ty) const {
-  if (!ST->hasCF())
+bool X86TTIImpl::hasConditionalMoveForType(
+    Type *Ty, TargetTransformInfo::MoveType MT) const {
+  if (!ST->canUseCMOV())
+    return false;
+  if (MT != TargetTransformInfo::MoveType::NoMem && !ST->hasCF())
     return false;
   if (!Ty)
     return true;
@@ -6064,7 +6067,7 @@ bool X86TTIImpl::isLegalMaskedLoad(Type *DataTy, Align Alignment) {
 
   // The backend can't handle a single element vector w/o CFCMOV.
   if (isa<VectorType>(DataTy) && cast<FixedVectorType>(DataTy)->getNumElements() == 1)
-    return ST->hasCF() && hasConditionalLoadStoreForType(ScalarTy);
+    return ST->hasCF() && hasConditionalMoveForType(ScalarTy);
 
   if (!ST->hasAVX())
     return false;
diff --git a/llvm/lib/Target/X86/X86TargetTransformInfo.h b/llvm/lib/Target/X86/X86TargetTransformInfo.h
index c16461b157e07f..4515f29bd2a6e5 100644
--- a/llvm/lib/Target/X86/X86TargetTransformInfo.h
+++ b/llvm/lib/Target/X86/X86TargetTransformInfo.h
@@ -132,7 +132,10 @@ class X86TTIImpl : public BasicTTIImplBase<X86TTIImpl> {
   /// @{
 
   unsigned getNumberOfRegisters(unsigned ClassID) const;
-  bool hasConditionalLoadStoreForType(Type *Ty = nullptr) const;
+  bool
+  hasConditionalMoveForType(Type *Ty = nullptr,
+                            TargetTransformInfo::MoveType MT =
+                                TargetTransformInfo::MoveType::NoMem) const;
   TypeSize getRegisterBitWidth(TargetTransformInfo::RegisterKind K) const;
   unsigned getLoadStoreVecRegBitWidth(unsigned AS) const;
   unsigned getMaxInterleaveFactor(ElementCount VF);
diff --git a/llvm/lib/Transforms/Utils/SimplifyCFG.cpp b/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
index 1f2c9389c008bd..800ea07c619298 100644
--- a/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
+++ b/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
@@ -129,6 +129,10 @@ static cl::opt<unsigned> HoistLoadsStoresWithCondFaultingThreshold(
              "to speculatively execute to eliminate conditional branch "
              "(default = 6)"));
 
+static cl::opt<unsigned> ProfitableToSinkInstructionThreshold(
+    "profitable-to-sink-instruction-threshold", cl::Hidden, cl::init(6),
+    cl::desc("Control the maximal PHI instructions"));
+
 static cl::opt<unsigned>
     HoistCommonSkipLimit("simplifycfg-hoist-common-skip-limit", cl::Hidden,
                          cl::init(20),
@@ -1742,7 +1746,8 @@ static bool isSafeCheapLoadStore(const Instruction *I,
   // llvm.masked.load/store use i32 for alignment while load/store use i64.
   // That's why we have the alignment limitation.
   // FIXME: Update the prototype of the intrinsics?
-  return TTI.hasConditionalLoadStoreForType(getLoadStoreType(I)) &&
+  return TTI.hasConditionalMoveForType(getLoadStoreType(I),
+                                       TargetTransformInfo::MoveType::All) &&
          getLoadStoreAlignment(I) < Value::MaximumAlignment;
 }
 
@@ -2386,8 +2391,8 @@ namespace {
 
 /// Check whether BB's predecessors end with unconditional branches. If it is
 /// true, sink any common code from the predecessors to BB.
-static bool sinkCommonCodeFromPredecessors(BasicBlock *BB,
-                                           DomTreeUpdater *DTU) {
+static bool sinkCommonCodeFromPredecessors(BasicBlock *BB, DomTreeUpdater *DTU,
+                                           const TargetTransformInfo &TTI) {
   // We support two situations:
   //   (1) all incoming arcs are unconditional
   //   (2) there are non-unconditional incoming arcs
@@ -2492,12 +2497,16 @@ static bool sinkCommonCodeFromPredecessors(BasicBlock *BB,
     // sink?
     auto ProfitableToSinkInstruction = [&](LockstepReverseIterator &LRI) {
       unsigned NumPHIInsts = 0;
+      unsigned NumCmovInsts = 0;
       for (Use &U : (*LRI)[0]->operands()) {
         auto It = PHIOperands.find(&U);
         if (It != PHIOperands.end() && !all_of(It->second, [&](Value *V) {
               return InstructionsToSink.contains(V);
             })) {
-          ++NumPHIInsts;
+          if (TTI.hasConditionalMoveForType(U->getType()))
+            ++NumCmovInsts;
+          else
+            ++NumPHIInsts;
           // Do not separate a load/store from the gep producing the address.
           // The gep can likely be folded into the load/store as an addressing
           // mode. Additionally, a load of a gep is easier to analyze than a
@@ -2511,7 +2520,8 @@ static bool sinkCommonCodeFromPredecessors(BasicBlock *BB,
         }
       }
       LLVM_DEBUG(dbgs() << "SINK: #phi insts: " << NumPHIInsts << "\n");
-      return NumPHIInsts <= 1;
+      return NumPHIInsts <= 1 &&
+             NumCmovInsts < ProfitableToSinkInstructionThreshold;
     };
 
     // We've determined that we are going to sink last ScanIdx instructions,
@@ -8119,7 +8129,7 @@ bool SimplifyCFGOpt::simplifyOnce(BasicBlock *BB) {
     return true;
 
   if (SinkCommon && Options.SinkCommonInsts)
-    if (sinkCommonCodeFromPredecessors(BB, DTU) ||
+    if (sinkCommonCodeFromPredecessors(BB, DTU, TTI) ||
         mergeCompatibleInvokes(BB, DTU)) {
       // sinkCommonCodeFromPredecessors() does not automatically CSE PHI's,
       // so we may now how duplicate PHI's.
diff --git a/llvm/test/Transforms/SimplifyCFG/X86/sink-common-code.ll b/llvm/test/Transforms/SimplifyCFG/X86/sink-common-code.ll
index 170f8d1592c2fe..912ba0d8c8f3a3 100644
--- a/llvm/test/Transforms/SimplifyCFG/X86/sink-common-code.ll
+++ b/llvm/test/Transforms/SimplifyCFG/X86/sink-common-code.ll
@@ -220,17 +220,15 @@ define i32 @test8(i1 zeroext %flag, i32 %x, ptr %y) {
 ; CHECK-LABEL: @test8(
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    [[Z:%.*]] = load volatile i32, ptr [[Y:%.*]], align 4
-; CHECK-NEXT:    br i1 [[FLAG:%.*]], label [[IF_THEN:%.*]], label [[IF_ELSE:%.*]]
-; CHECK:       if.then:
-; CHECK-NEXT:    [[A:%.*]] = add i32 [[Z]], 5
-; CHECK-NEXT:    br label [[IF_END:%.*]]
+; CHECK-NEXT:    br i1 [[FLAG:%.*]], label [[IF_END:%.*]], label [[IF_ELSE:%.*]]
 ; CHECK:       if.else:
 ; CHECK-NEXT:    [[W:%.*]] = load volatile i32, ptr [[Y]], align 4
-; CHECK-NEXT:    [[B:%.*]] = add i32 [[W]], 7
 ; CHECK-NEXT:    br label [[IF_END]]
 ; CHECK:       if.end:
-; CHECK-NEXT:    [[B_SINK:%.*]] = phi i32 [ [[B]], [[IF_ELSE]] ], [ [[A]], [[IF_THEN]] ]
-; CHECK-NEXT:    store volatile i32 [[B_SINK]], ptr [[Y]], align 4
+; CHECK-NEXT:    [[DOTSINK:%.*]] = phi i32 [ 7, [[IF_ELSE]] ], [ 5, [[ENTRY:%.*]] ]
+; CHECK-NEXT:    [[W_SINK:%.*]] = phi i32 [ [[W]], [[IF_ELSE]] ], [ [[Z]], [[ENTRY]] ]
+; CHECK-NEXT:    [[B:%.*]] = add i32 [[W_SINK]], [[DOTSINK]]
+; CHECK-NEXT:    store volatile i32 [[B]], ptr [[Y]], align 4
 ; CHECK-NEXT:    ret i32 1
 ;
 entry:
@@ -262,15 +260,15 @@ define i32 @test9(i1 zeroext %flag, i32 %x, ptr %y, ptr %p) {
 ; CHECK-NEXT:    store i32 7, ptr [[P:%.*]], align 4
 ; CHECK-NEXT:    [[Z:%.*]] = load volatile i32, ptr [[Y:%.*]], align 4
 ; CHECK-NEXT:    store i32 6, ptr [[P]], align 4
-; CHECK-NEXT:    [[A:%.*]] = add i32 [[Z]], 5
 ; CHECK-NEXT:    br label [[IF_END:%.*]]
 ; CHECK:       if.else:
 ; CHECK-NEXT:    [[W:%.*]] = load volatile i32, ptr [[Y]], align 4
-; CHECK-NEXT:    [[B:%.*]] = add i32 [[W]], 7
 ; CHECK-NEXT:    br label [[IF_END]]
 ; CHECK:       if.end:
-; CHECK-NEXT:    [[B_SINK:%.*]] = phi i32 [ [[B]], [[IF_ELSE]] ], [ [[A]], [[IF_THEN]] ]
-; CHECK-NEXT:    store volatile i32 [[B_SINK]], ptr [[Y]], align 4
+; CHECK-NEXT:    [[DOTSINK:%.*]] = phi i32 [ 7, [[IF_ELSE]] ], [ 5, [[IF_THEN]] ]
+; CHECK-NEXT:    [[W_SINK:%.*]] = phi i32 [ [[W]], [[IF_ELSE]] ], [ [[Z]], [[IF_THEN]] ]
+; CHECK-NEXT:    [[B:%.*]] = add i32 [[W_SINK]], [[DOTSINK]]
+; CHECK-NEXT:    store volatile i32 [[B]], ptr [[Y]], align 4
 ; CHECK-NEXT:    ret i32 1
 ;
 entry:
@@ -570,16 +568,16 @@ define zeroext i1 @test_crash(i1 zeroext %flag, ptr %i4, ptr %m, ptr %n) {
 ; CHECK-NEXT:    br i1 [[FLAG:%.*]], label [[IF_THEN:%.*]], label [[IF_ELSE:%.*]]
 ; CHECK:       if.then:
 ; CHECK-NEXT:    [[TMP1:%.*]] = load i32, ptr [[I4:%.*]], align 4
-; CHECK-NEXT:    [[TMP2:%.*]] = add i32 [[TMP1]], -1
 ; CHECK-NEXT:    br label [[IF_END:%.*]]
 ; CHECK:       if.else:
 ; CHECK-NEXT:    [[TMP3:%.*]] = load i32, ptr [[M:%.*]], align 4
 ; CHECK-NEXT:    [[TMP4:%.*]] = load i32, ptr [[N:%.*]], align 4
-; CHECK-NEXT:    [[TMP5:%.*]] = add i32 [[TMP3]], [[TMP4]]
 ; CHECK-NEXT:    br label [[IF_END]]
 ; CHECK:       if.end:
-; CHECK-NEXT:    [[TMP5_SINK:%.*]] = phi i32 [ [[TMP5]], [[IF_ELSE]] ], [ [[TMP2]], [[IF_THEN]] ]
-; CHECK-NEXT:    store i32 [[TMP5_SINK]], ptr [[I4]], align 4
+; CHECK-NEXT:    [[TMP4_SINK:%.*]] = phi i32 [ [[TMP4]], [[IF_ELSE]] ], [ -1, [[IF_THEN]] ]
+; CHECK-NEXT:    [[TMP3_SINK:%.*]] = phi i32 [ [[TMP3]], [[IF_ELSE]] ], [ [[TMP1]], [[IF_THEN]] ]
+; CHECK-NEXT:    [[TMP5:%.*]] = add i32 [[TMP3_SINK]], [[TMP4_SINK]]
+; CHECK-NEXT:    store i32 [[TMP5]], ptr [[I4]], align 4
 ; CHECK-NEXT:    ret i1 true
 ;
 entry:
@@ -1475,22 +1473,14 @@ declare void @llvm.lifetime.end.p0(i64, ptr nocapture)
 
 define void @creating_too_many_phis(i1 %cond, i32 %a, i32 %b, i32 %c, i32 %d, i32 %e, i32 %f, i32 %g, i32 %h) {
 ; CHECK-LABEL: @creating_too_many_phis(
-; CHECK-NEXT:    br i1 [[COND:%.*]], label [[BB0:%.*]], label [[BB1:%.*]]
-; CHECK:       bb0:
-; CHECK-NEXT:    [[V0:%.*]] = add i32 [[A:%.*]], [[B:%.*]]
-; CHECK-NEXT:    [[V1:%.*]] = add i32 [[V0]], [[C:%.*]]
-; CHECK-NEXT:    [[V2:%.*]] = add i32 [[D:%.*]], [[E:%.*]]
-; CHECK-NEXT:    [[R3:%.*]] = add i32 [[V1]], [[V2]]
-; CHECK-NEXT:    br label [[END:%.*]]
-; CHECK:       bb1:
-; CHECK-NEXT:    [[V4:%.*]] = add i32 [[A]], [[B]]
-; CHECK-NEXT:    [[V5:%.*]] = add i32 [[V4]], [[C]]
-; CHECK-NEXT:    [[V6:%.*]] = add i32 [[G:%.*]], [[H:%.*]]
+; CHECK-NEXT:  end:
+; CHECK-NEXT:    [[E_H:%.*]] = select i1 [[COND:%.*]], i32 [[E:%.*]], i32 [[H:%.*]]
+; CHECK-NEXT:    [[D_G:%.*]] = select i1 [[COND]], i32 [[D:%.*]], i32 [[G:%.*]]
+; CHECK-NEXT:    [[V4:%.*]] = add i32 [[A:%.*]], [[B:%.*]]
+; CHECK-NEXT:    [[V5:%.*]] = add i32 [[V4]], [[C:%.*]]
+; CHECK-NEXT:    [[V6:%.*]] = add i32 [[D_G]], [[E_H]]
 ; CHECK-NEXT:    [[R7:%.*]] = add i32 [[V5]], [[V6]]
-; CHECK-NEXT:    br label [[END]]
-; CHECK:       end:
-; CHECK-NEXT:    [[R7_SINK:%.*]] = phi i32 [ [[R7]], [[BB1]] ], [ [[R3]], [[BB0]] ]
-; CHECK-NEXT:    call void @use32(i32 [[R7_SINK]])
+; CHECK-NEXT:    call void @use32(i32 [[R7]])
 ; CHECK-NEXT:    ret void
 ;
   br i1 %cond, label %bb0, label %bb1
@@ -1806,17 +1796,17 @@ define i64 @multi_use_in_block_inconsistent(i1 %cond, ptr %p, i64 %a, i64 %b) {
 ; CHECK-NEXT:    call void @dummy()
 ; CHECK-NEXT:    [[GEP1_A:%.*]] = getelementptr i8, ptr [[P:%.*]], i64 [[A:%.*]]
 ; CHECK-NEXT:    [[V_A:%.*]] = load i64, ptr [[GEP1_A]], align 8
-; CHECK-NEXT:    [[GEP2_A:%.*]] = getelementptr i8, ptr [[GEP1_A]], i64 [[V_A]]
 ; CHECK-NEXT:    br label [[JOIN:%.*]]
 ; CHECK:       else:
 ; CHECK-NEXT:    [[GEP1_B:%.*]] = getelementptr i8, ptr [[P]], i64 [[A]]
 ; CHECK-NEXT:    [[V_B:%.*]] = load i64, ptr [[P]], align 8
-; CHECK-NEXT:    [[GEP2_B:%.*]] = getelementptr i8, ptr [[GEP1_B]], i64 [[V_B]]
 ; CHECK-NEXT:    br label [[JOIN]]
 ; CHECK:       join:
+; CHECK-NEXT:    [[V_B_SINK:%.*]] = phi i64 [ [[V_B]], [[ELSE]] ], [ [[V_A]], [[IF]] ]
+; CHECK-NEXT:    [[GEP1_B_SINK:%.*]] = phi ptr [ [[GEP1_B]], [[ELSE]] ], [ [[GEP1_A]], [[IF]] ]
 ; CHECK-NEXT:    [[PHI1:%.*]] = phi i64 [ [[V_A]], [[IF]] ], [ [[V_B]], [[ELSE]] ]
-; CHECK-NEXT:    [[PHI2:%.*]] = phi ptr [ [[GEP2_A]], [[IF]] ], [ [[GEP2_B]], [[ELSE]] ]
-; CHECK-NEXT:    call void @use.ptr(ptr [[PHI2]])
+; CHECK-NEXT:    [[GEP2_B:%.*]] = getelementptr i8, ptr [[GEP1_B_SINK]], i64 [[V_B_SINK]]
+; CHECK-NEXT:    call void @use.ptr(ptr [[GEP2_B]])
 ; CHECK-NEXT:    ret i64 [[PHI1]]
 ;
   br i1 %cond, label %if, label %else
@@ -1873,19 +1863,16 @@ join:
 
 define i64 @load_with_non_sunk_gep_both(i1 %cond, ptr %p.a, ptr %p.b, i64 %a, i64 %b) {
 ; CHECK-LABEL: @load_with_non_sunk_gep_both(
-; CHECK-NEXT:    br i1 [[COND:%.*]], label [[IF:%.*]], label [[ELSE:%.*]]
+; CHECK-NEXT:    br i1 [[COND:%.*]], label [[IF:%.*]], label [[JOIN:%.*]]
 ; CHECK:       if:
 ; CHECK-NEXT:    call void @dummy()
-; CHECK-NEXT:    [[GEP_A:%.*]] = getelementptr i8, ptr [[P_A:%.*]], i64 [[A:%.*]]
-; CHECK-NEXT:    [[V_A:%.*]] = load i64, ptr [[GEP_A]], align 8
-; CHECK-NEXT:    br label [[JOIN:%.*]]
-; CHECK:       else:
-; CHECK-NEXT:    [[GEP_B:%.*]] = getelementptr i8, ptr [[P_B:%.*]], i64 [[B:%.*]]
-; CHECK-NEXT:    [[V_B:%.*]] = load i64, ptr [[GEP_B]], align 8
 ; CHECK-NEXT:    br label [[JOIN]]
 ; CHECK:       join:
-; CHECK-NEXT:    [[V:%.*]] = phi i64 [ [[V_A]], [[IF]] ], [ [[V_B]], [[ELSE]] ]
-; CHECK-NEXT:    ret i64 [[V]]
+; CHECK-NEXT:    [[B_SINK:%.*]] = phi i64 [ [[A:%.*]], [[IF]] ], [ [[B:%.*]], [[TMP0:%.*]] ]
+; CHECK-NEXT:    [[P_B_SINK:%.*]] = phi ptr [ [[P_A:%.*]], [[IF]] ], [ [[P_B:%.*]], [[TMP0]] ]
+; CHECK-NEXT:    [[GEP_B:%.*]] = getelementptr i8, ptr [[P_B_SINK]], i64 [[B_SINK]]
+; CHECK-NEXT:    [[V_B:%.*]] = load i64, ptr [[GEP_B]], align 8
+; CHECK-NEXT:    ret i64 [[V_B]]
 ;
   br i1 %cond, label %if, label %else
 
@@ -1971,17 +1958,15 @@ join:
 
 define void @store_with_non_sunk_gep(i1 %cond, ptr %p.a, ptr %p.b, i64 %a, i64 %b) {
 ; CHECK-LABEL: @store_with_non_sunk_gep(
-; CHECK-NEXT:    br i1 [[COND:%.*]], label [[IF:%.*]], label [[ELSE:%.*]]
+; CHECK-NEXT:    br i1 [[COND:%.*]], label [[IF:%.*]], label [[JOIN:%.*]]
 ; CHECK:       if:
 ; CHECK-NEXT:    call void @dummy()
-; CHECK-NEXT:    [[GEP_A:%.*]] = getelementptr i8, ptr [[P_A:%.*]], i64 [[A:%.*]]
-; CHECK-NEXT:    store i64 0, ptr [[GEP_A]], align 8
-; CHECK-NEXT:    br label [[JOIN:%.*]]
-; CHECK:       else:
-; CHECK-NEXT:    [[GEP_B:%.*]] = getelementptr i8, ptr [[P_B:%.*]], i64 [[B:%.*]]
-; CHECK-NEXT:    store i64 0, ptr [[GEP_B]], align 8
 ; CHECK-NEXT:    br label [[JOIN]]
 ; CHECK:       join:
+; CHECK-NEXT:    [[B_SINK:%.*]] = phi i64 [ [[A:%.*]], [[IF]] ], [ [[B:%.*]], [[TMP0:%.*]] ]
+; CHECK-NEXT:    [[P_B_SINK:%.*]] = phi ptr [ [[P_A:%.*]], [[IF]] ], [ [[P_B:%.*]], [[TMP0]] ]
+; CHECK-NEXT:    [[GEP_B:%.*]] = getelementptr i8, ptr [[P_B_SINK]], i64 [[B_SINK]]
+; CHECK-NEXT:    store i64 0, ptr [[GEP_B]], align 8
 ; CHECK-NEXT:    ret void
 ;
   br i1 %cond, label %if, label %else
@@ -2003,17 +1988,15 @@ join:
 
 define void @store_with_non_sunk_gep_as_value(i1 %cond, ptr %p, ptr %p.a, ptr %p.b, i64 %a, i64 %b) {
 ; CHECK-LABEL: @store_with_non_sunk_gep_as_value(
-; CHECK-NEXT:    br i1 [[COND:%.*]], label [[IF:%.*]], label [[ELSE:%.*]]
+; CHECK-NEXT:    br i1 [[COND:%.*]], label [[IF:%.*]], label [[JOIN:%.*]]
 ; CHECK:       if:
 ; CHECK-NEXT:    call void @dummy()
-; CHECK-NEXT:    [[GEP_A:%.*]] = getelementptr i8, ptr [[P_A:%.*]], i64 [[A:%.*]]
-; CHECK-NEXT:    br label [[JOIN:%.*]]
-; CHECK:       else:
-; CHECK-NEXT:   ...
[truncated]

@llvmbot
Copy link
Member

llvmbot commented Sep 29, 2024

@llvm/pr-subscribers-llvm-selectiondag

Author: Phoebe Wang (phoebewang)

Changes

Patch is 24.10 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/110420.diff

8 Files Affected:

  • (modified) llvm/include/llvm/Analysis/TargetTransformInfo.h (+15-4)
  • (modified) llvm/include/llvm/Analysis/TargetTransformInfoImpl.h (+4-1)
  • (modified) llvm/lib/Analysis/TargetTransformInfo.cpp (+3-2)
  • (modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp (+4-2)
  • (modified) llvm/lib/Target/X86/X86TargetTransformInfo.cpp (+6-3)
  • (modified) llvm/lib/Target/X86/X86TargetTransformInfo.h (+4-1)
  • (modified) llvm/lib/Transforms/Utils/SimplifyCFG.cpp (+16-6)
  • (modified) llvm/test/Transforms/SimplifyCFG/X86/sink-common-code.ll (+56-80)
diff --git a/llvm/include/llvm/Analysis/TargetTransformInfo.h b/llvm/include/llvm/Analysis/TargetTransformInfo.h
index 89a85bc8a90864..787b471cc328ff 100644
--- a/llvm/include/llvm/Analysis/TargetTransformInfo.h
+++ b/llvm/include/llvm/Analysis/TargetTransformInfo.h
@@ -1114,9 +1114,17 @@ class TargetTransformInfo {
   /// \return the number of registers in the target-provided register class.
   unsigned getNumberOfRegisters(unsigned ClassID) const;
 
+  enum class MoveType {
+    NoMem = 1,
+    MemLD = 2,
+    MemST = 4,
+    All = 7,
+  };
+
   /// \return true if the target supports load/store that enables fault
   /// suppression of memory operands when the source condition is false.
-  bool hasConditionalLoadStoreForType(Type *Ty = nullptr) const;
+  bool hasConditionalMoveForType(Type *Ty = nullptr,
+                                 MoveType MT = MoveType::NoMem) const;
 
   /// \return the target-provided register class ID for the provided type,
   /// accounting for type promotion and other type-legalization techniques that
@@ -1978,7 +1986,9 @@ class TargetTransformInfo::Concept {
   virtual bool preferToKeepConstantsAttached(const Instruction &Inst,
                                              const Function &Fn) const = 0;
   virtual unsigned getNumberOfRegisters(unsigned ClassID) const = 0;
-  virtual bool hasConditionalLoadStoreForType(Type *Ty = nullptr) const = 0;
+  virtual bool
+  hasConditionalMoveForType(Type *Ty = nullptr,
+                            MoveType MT = MoveType::NoMem) const = 0;
   virtual unsigned getRegisterClassForType(bool Vector,
                                            Type *Ty = nullptr) const = 0;
   virtual const char *getRegisterClassName(unsigned ClassID) const = 0;
@@ -2571,8 +2581,9 @@ class TargetTransformInfo::Model final : public TargetTransformInfo::Concept {
   unsigned getNumberOfRegisters(unsigned ClassID) const override {
     return Impl.getNumberOfRegisters(ClassID);
   }
-  bool hasConditionalLoadStoreForType(Type *Ty = nullptr) const override {
-    return Impl.hasConditionalLoadStoreForType(Ty);
+  bool hasConditionalMoveForType(Type *Ty = nullptr,
+                                 MoveType MT = MoveType::NoMem) const override {
+    return Impl.hasConditionalMoveForType(Ty, MT);
   }
   unsigned getRegisterClassForType(bool Vector,
                                    Type *Ty = nullptr) const override {
diff --git a/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h b/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
index eca8818cc25e62..cf1449b3a7e141 100644
--- a/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
+++ b/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
@@ -470,7 +470,10 @@ class TargetTransformInfoImplBase {
   }
 
   unsigned getNumberOfRegisters(unsigned ClassID) const { return 8; }
-  bool hasConditionalLoadStoreForType(Type *Ty) const { return false; }
+  bool hasConditionalMoveForType(Type *Ty,
+                                 TargetTransformInfo::MoveType MT) const {
+    return false;
+  }
 
   unsigned getRegisterClassForType(bool Vector, Type *Ty = nullptr) const {
     return Vector ? 1 : 0;
diff --git a/llvm/lib/Analysis/TargetTransformInfo.cpp b/llvm/lib/Analysis/TargetTransformInfo.cpp
index b5195f764cbd1c..231731b2e7446c 100644
--- a/llvm/lib/Analysis/TargetTransformInfo.cpp
+++ b/llvm/lib/Analysis/TargetTransformInfo.cpp
@@ -727,8 +727,9 @@ unsigned TargetTransformInfo::getNumberOfRegisters(unsigned ClassID) const {
   return TTIImpl->getNumberOfRegisters(ClassID);
 }
 
-bool TargetTransformInfo::hasConditionalLoadStoreForType(Type *Ty) const {
-  return TTIImpl->hasConditionalLoadStoreForType(Ty);
+bool TargetTransformInfo::hasConditionalMoveForType(Type *Ty,
+                                                    MoveType MT) const {
+  return TTIImpl->hasConditionalMoveForType(Ty, MT);
 }
 
 unsigned TargetTransformInfo::getRegisterClassForType(bool Vector,
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
index 25213f587116d5..5c35142c951d68 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
@@ -4806,7 +4806,8 @@ void SelectionDAGBuilder::visitMaskedStore(const CallInst &I,
       TLI.getTargetMachine().getTargetTransformInfo(*I.getFunction());
   SDValue StoreNode =
       !IsCompressing &&
-              TTI.hasConditionalLoadStoreForType(I.getArgOperand(0)->getType())
+              TTI.hasConditionalMoveForType(I.getArgOperand(0)->getType(),
+                                            TargetTransformInfo::MoveType::All)
           ? TLI.visitMaskedStore(DAG, sdl, getMemoryRoot(), MMO, Ptr, Src0,
                                  Mask)
           : DAG.getMaskedStore(getMemoryRoot(), sdl, Src0, Ptr, Offset, Mask,
@@ -4992,7 +4993,8 @@ void SelectionDAGBuilder::visitMaskedLoad(const CallInst &I, bool IsExpanding) {
   SDValue Load;
   SDValue Res;
   if (!IsExpanding &&
-      TTI.hasConditionalLoadStoreForType(Src0Operand->getType()))
+      TTI.hasConditionalMoveForType(Src0Operand->getType(),
+                                    TargetTransformInfo::MoveType::All))
     Res = TLI.visitMaskedLoad(DAG, sdl, InChain, MMO, Load, Ptr, Src0, Mask);
   else
     Res = Load =
diff --git a/llvm/lib/Target/X86/X86TargetTransformInfo.cpp b/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
index 46bc73c5e928e0..6c213bd8eb6968 100644
--- a/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
+++ b/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
@@ -176,8 +176,11 @@ unsigned X86TTIImpl::getNumberOfRegisters(unsigned ClassID) const {
   return 8;
 }
 
-bool X86TTIImpl::hasConditionalLoadStoreForType(Type *Ty) const {
-  if (!ST->hasCF())
+bool X86TTIImpl::hasConditionalMoveForType(
+    Type *Ty, TargetTransformInfo::MoveType MT) const {
+  if (!ST->canUseCMOV())
+    return false;
+  if (MT != TargetTransformInfo::MoveType::NoMem && !ST->hasCF())
     return false;
   if (!Ty)
     return true;
@@ -6064,7 +6067,7 @@ bool X86TTIImpl::isLegalMaskedLoad(Type *DataTy, Align Alignment) {
 
   // The backend can't handle a single element vector w/o CFCMOV.
   if (isa<VectorType>(DataTy) && cast<FixedVectorType>(DataTy)->getNumElements() == 1)
-    return ST->hasCF() && hasConditionalLoadStoreForType(ScalarTy);
+    return ST->hasCF() && hasConditionalMoveForType(ScalarTy);
 
   if (!ST->hasAVX())
     return false;
diff --git a/llvm/lib/Target/X86/X86TargetTransformInfo.h b/llvm/lib/Target/X86/X86TargetTransformInfo.h
index c16461b157e07f..4515f29bd2a6e5 100644
--- a/llvm/lib/Target/X86/X86TargetTransformInfo.h
+++ b/llvm/lib/Target/X86/X86TargetTransformInfo.h
@@ -132,7 +132,10 @@ class X86TTIImpl : public BasicTTIImplBase<X86TTIImpl> {
   /// @{
 
   unsigned getNumberOfRegisters(unsigned ClassID) const;
-  bool hasConditionalLoadStoreForType(Type *Ty = nullptr) const;
+  bool
+  hasConditionalMoveForType(Type *Ty = nullptr,
+                            TargetTransformInfo::MoveType MT =
+                                TargetTransformInfo::MoveType::NoMem) const;
   TypeSize getRegisterBitWidth(TargetTransformInfo::RegisterKind K) const;
   unsigned getLoadStoreVecRegBitWidth(unsigned AS) const;
   unsigned getMaxInterleaveFactor(ElementCount VF);
diff --git a/llvm/lib/Transforms/Utils/SimplifyCFG.cpp b/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
index 1f2c9389c008bd..800ea07c619298 100644
--- a/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
+++ b/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
@@ -129,6 +129,10 @@ static cl::opt<unsigned> HoistLoadsStoresWithCondFaultingThreshold(
              "to speculatively execute to eliminate conditional branch "
              "(default = 6)"));
 
+static cl::opt<unsigned> ProfitableToSinkInstructionThreshold(
+    "profitable-to-sink-instruction-threshold", cl::Hidden, cl::init(6),
+    cl::desc("Control the maximal PHI instructions"));
+
 static cl::opt<unsigned>
     HoistCommonSkipLimit("simplifycfg-hoist-common-skip-limit", cl::Hidden,
                          cl::init(20),
@@ -1742,7 +1746,8 @@ static bool isSafeCheapLoadStore(const Instruction *I,
   // llvm.masked.load/store use i32 for alignment while load/store use i64.
   // That's why we have the alignment limitation.
   // FIXME: Update the prototype of the intrinsics?
-  return TTI.hasConditionalLoadStoreForType(getLoadStoreType(I)) &&
+  return TTI.hasConditionalMoveForType(getLoadStoreType(I),
+                                       TargetTransformInfo::MoveType::All) &&
          getLoadStoreAlignment(I) < Value::MaximumAlignment;
 }
 
@@ -2386,8 +2391,8 @@ namespace {
 
 /// Check whether BB's predecessors end with unconditional branches. If it is
 /// true, sink any common code from the predecessors to BB.
-static bool sinkCommonCodeFromPredecessors(BasicBlock *BB,
-                                           DomTreeUpdater *DTU) {
+static bool sinkCommonCodeFromPredecessors(BasicBlock *BB, DomTreeUpdater *DTU,
+                                           const TargetTransformInfo &TTI) {
   // We support two situations:
   //   (1) all incoming arcs are unconditional
   //   (2) there are non-unconditional incoming arcs
@@ -2492,12 +2497,16 @@ static bool sinkCommonCodeFromPredecessors(BasicBlock *BB,
     // sink?
     auto ProfitableToSinkInstruction = [&](LockstepReverseIterator &LRI) {
       unsigned NumPHIInsts = 0;
+      unsigned NumCmovInsts = 0;
       for (Use &U : (*LRI)[0]->operands()) {
         auto It = PHIOperands.find(&U);
         if (It != PHIOperands.end() && !all_of(It->second, [&](Value *V) {
               return InstructionsToSink.contains(V);
             })) {
-          ++NumPHIInsts;
+          if (TTI.hasConditionalMoveForType(U->getType()))
+            ++NumCmovInsts;
+          else
+            ++NumPHIInsts;
           // Do not separate a load/store from the gep producing the address.
           // The gep can likely be folded into the load/store as an addressing
           // mode. Additionally, a load of a gep is easier to analyze than a
@@ -2511,7 +2520,8 @@ static bool sinkCommonCodeFromPredecessors(BasicBlock *BB,
         }
       }
       LLVM_DEBUG(dbgs() << "SINK: #phi insts: " << NumPHIInsts << "\n");
-      return NumPHIInsts <= 1;
+      return NumPHIInsts <= 1 &&
+             NumCmovInsts < ProfitableToSinkInstructionThreshold;
     };
 
     // We've determined that we are going to sink last ScanIdx instructions,
@@ -8119,7 +8129,7 @@ bool SimplifyCFGOpt::simplifyOnce(BasicBlock *BB) {
     return true;
 
   if (SinkCommon && Options.SinkCommonInsts)
-    if (sinkCommonCodeFromPredecessors(BB, DTU) ||
+    if (sinkCommonCodeFromPredecessors(BB, DTU, TTI) ||
         mergeCompatibleInvokes(BB, DTU)) {
       // sinkCommonCodeFromPredecessors() does not automatically CSE PHI's,
       // so we may now how duplicate PHI's.
diff --git a/llvm/test/Transforms/SimplifyCFG/X86/sink-common-code.ll b/llvm/test/Transforms/SimplifyCFG/X86/sink-common-code.ll
index 170f8d1592c2fe..912ba0d8c8f3a3 100644
--- a/llvm/test/Transforms/SimplifyCFG/X86/sink-common-code.ll
+++ b/llvm/test/Transforms/SimplifyCFG/X86/sink-common-code.ll
@@ -220,17 +220,15 @@ define i32 @test8(i1 zeroext %flag, i32 %x, ptr %y) {
 ; CHECK-LABEL: @test8(
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    [[Z:%.*]] = load volatile i32, ptr [[Y:%.*]], align 4
-; CHECK-NEXT:    br i1 [[FLAG:%.*]], label [[IF_THEN:%.*]], label [[IF_ELSE:%.*]]
-; CHECK:       if.then:
-; CHECK-NEXT:    [[A:%.*]] = add i32 [[Z]], 5
-; CHECK-NEXT:    br label [[IF_END:%.*]]
+; CHECK-NEXT:    br i1 [[FLAG:%.*]], label [[IF_END:%.*]], label [[IF_ELSE:%.*]]
 ; CHECK:       if.else:
 ; CHECK-NEXT:    [[W:%.*]] = load volatile i32, ptr [[Y]], align 4
-; CHECK-NEXT:    [[B:%.*]] = add i32 [[W]], 7
 ; CHECK-NEXT:    br label [[IF_END]]
 ; CHECK:       if.end:
-; CHECK-NEXT:    [[B_SINK:%.*]] = phi i32 [ [[B]], [[IF_ELSE]] ], [ [[A]], [[IF_THEN]] ]
-; CHECK-NEXT:    store volatile i32 [[B_SINK]], ptr [[Y]], align 4
+; CHECK-NEXT:    [[DOTSINK:%.*]] = phi i32 [ 7, [[IF_ELSE]] ], [ 5, [[ENTRY:%.*]] ]
+; CHECK-NEXT:    [[W_SINK:%.*]] = phi i32 [ [[W]], [[IF_ELSE]] ], [ [[Z]], [[ENTRY]] ]
+; CHECK-NEXT:    [[B:%.*]] = add i32 [[W_SINK]], [[DOTSINK]]
+; CHECK-NEXT:    store volatile i32 [[B]], ptr [[Y]], align 4
 ; CHECK-NEXT:    ret i32 1
 ;
 entry:
@@ -262,15 +260,15 @@ define i32 @test9(i1 zeroext %flag, i32 %x, ptr %y, ptr %p) {
 ; CHECK-NEXT:    store i32 7, ptr [[P:%.*]], align 4
 ; CHECK-NEXT:    [[Z:%.*]] = load volatile i32, ptr [[Y:%.*]], align 4
 ; CHECK-NEXT:    store i32 6, ptr [[P]], align 4
-; CHECK-NEXT:    [[A:%.*]] = add i32 [[Z]], 5
 ; CHECK-NEXT:    br label [[IF_END:%.*]]
 ; CHECK:       if.else:
 ; CHECK-NEXT:    [[W:%.*]] = load volatile i32, ptr [[Y]], align 4
-; CHECK-NEXT:    [[B:%.*]] = add i32 [[W]], 7
 ; CHECK-NEXT:    br label [[IF_END]]
 ; CHECK:       if.end:
-; CHECK-NEXT:    [[B_SINK:%.*]] = phi i32 [ [[B]], [[IF_ELSE]] ], [ [[A]], [[IF_THEN]] ]
-; CHECK-NEXT:    store volatile i32 [[B_SINK]], ptr [[Y]], align 4
+; CHECK-NEXT:    [[DOTSINK:%.*]] = phi i32 [ 7, [[IF_ELSE]] ], [ 5, [[IF_THEN]] ]
+; CHECK-NEXT:    [[W_SINK:%.*]] = phi i32 [ [[W]], [[IF_ELSE]] ], [ [[Z]], [[IF_THEN]] ]
+; CHECK-NEXT:    [[B:%.*]] = add i32 [[W_SINK]], [[DOTSINK]]
+; CHECK-NEXT:    store volatile i32 [[B]], ptr [[Y]], align 4
 ; CHECK-NEXT:    ret i32 1
 ;
 entry:
@@ -570,16 +568,16 @@ define zeroext i1 @test_crash(i1 zeroext %flag, ptr %i4, ptr %m, ptr %n) {
 ; CHECK-NEXT:    br i1 [[FLAG:%.*]], label [[IF_THEN:%.*]], label [[IF_ELSE:%.*]]
 ; CHECK:       if.then:
 ; CHECK-NEXT:    [[TMP1:%.*]] = load i32, ptr [[I4:%.*]], align 4
-; CHECK-NEXT:    [[TMP2:%.*]] = add i32 [[TMP1]], -1
 ; CHECK-NEXT:    br label [[IF_END:%.*]]
 ; CHECK:       if.else:
 ; CHECK-NEXT:    [[TMP3:%.*]] = load i32, ptr [[M:%.*]], align 4
 ; CHECK-NEXT:    [[TMP4:%.*]] = load i32, ptr [[N:%.*]], align 4
-; CHECK-NEXT:    [[TMP5:%.*]] = add i32 [[TMP3]], [[TMP4]]
 ; CHECK-NEXT:    br label [[IF_END]]
 ; CHECK:       if.end:
-; CHECK-NEXT:    [[TMP5_SINK:%.*]] = phi i32 [ [[TMP5]], [[IF_ELSE]] ], [ [[TMP2]], [[IF_THEN]] ]
-; CHECK-NEXT:    store i32 [[TMP5_SINK]], ptr [[I4]], align 4
+; CHECK-NEXT:    [[TMP4_SINK:%.*]] = phi i32 [ [[TMP4]], [[IF_ELSE]] ], [ -1, [[IF_THEN]] ]
+; CHECK-NEXT:    [[TMP3_SINK:%.*]] = phi i32 [ [[TMP3]], [[IF_ELSE]] ], [ [[TMP1]], [[IF_THEN]] ]
+; CHECK-NEXT:    [[TMP5:%.*]] = add i32 [[TMP3_SINK]], [[TMP4_SINK]]
+; CHECK-NEXT:    store i32 [[TMP5]], ptr [[I4]], align 4
 ; CHECK-NEXT:    ret i1 true
 ;
 entry:
@@ -1475,22 +1473,14 @@ declare void @llvm.lifetime.end.p0(i64, ptr nocapture)
 
 define void @creating_too_many_phis(i1 %cond, i32 %a, i32 %b, i32 %c, i32 %d, i32 %e, i32 %f, i32 %g, i32 %h) {
 ; CHECK-LABEL: @creating_too_many_phis(
-; CHECK-NEXT:    br i1 [[COND:%.*]], label [[BB0:%.*]], label [[BB1:%.*]]
-; CHECK:       bb0:
-; CHECK-NEXT:    [[V0:%.*]] = add i32 [[A:%.*]], [[B:%.*]]
-; CHECK-NEXT:    [[V1:%.*]] = add i32 [[V0]], [[C:%.*]]
-; CHECK-NEXT:    [[V2:%.*]] = add i32 [[D:%.*]], [[E:%.*]]
-; CHECK-NEXT:    [[R3:%.*]] = add i32 [[V1]], [[V2]]
-; CHECK-NEXT:    br label [[END:%.*]]
-; CHECK:       bb1:
-; CHECK-NEXT:    [[V4:%.*]] = add i32 [[A]], [[B]]
-; CHECK-NEXT:    [[V5:%.*]] = add i32 [[V4]], [[C]]
-; CHECK-NEXT:    [[V6:%.*]] = add i32 [[G:%.*]], [[H:%.*]]
+; CHECK-NEXT:  end:
+; CHECK-NEXT:    [[E_H:%.*]] = select i1 [[COND:%.*]], i32 [[E:%.*]], i32 [[H:%.*]]
+; CHECK-NEXT:    [[D_G:%.*]] = select i1 [[COND]], i32 [[D:%.*]], i32 [[G:%.*]]
+; CHECK-NEXT:    [[V4:%.*]] = add i32 [[A:%.*]], [[B:%.*]]
+; CHECK-NEXT:    [[V5:%.*]] = add i32 [[V4]], [[C:%.*]]
+; CHECK-NEXT:    [[V6:%.*]] = add i32 [[D_G]], [[E_H]]
 ; CHECK-NEXT:    [[R7:%.*]] = add i32 [[V5]], [[V6]]
-; CHECK-NEXT:    br label [[END]]
-; CHECK:       end:
-; CHECK-NEXT:    [[R7_SINK:%.*]] = phi i32 [ [[R7]], [[BB1]] ], [ [[R3]], [[BB0]] ]
-; CHECK-NEXT:    call void @use32(i32 [[R7_SINK]])
+; CHECK-NEXT:    call void @use32(i32 [[R7]])
 ; CHECK-NEXT:    ret void
 ;
   br i1 %cond, label %bb0, label %bb1
@@ -1806,17 +1796,17 @@ define i64 @multi_use_in_block_inconsistent(i1 %cond, ptr %p, i64 %a, i64 %b) {
 ; CHECK-NEXT:    call void @dummy()
 ; CHECK-NEXT:    [[GEP1_A:%.*]] = getelementptr i8, ptr [[P:%.*]], i64 [[A:%.*]]
 ; CHECK-NEXT:    [[V_A:%.*]] = load i64, ptr [[GEP1_A]], align 8
-; CHECK-NEXT:    [[GEP2_A:%.*]] = getelementptr i8, ptr [[GEP1_A]], i64 [[V_A]]
 ; CHECK-NEXT:    br label [[JOIN:%.*]]
 ; CHECK:       else:
 ; CHECK-NEXT:    [[GEP1_B:%.*]] = getelementptr i8, ptr [[P]], i64 [[A]]
 ; CHECK-NEXT:    [[V_B:%.*]] = load i64, ptr [[P]], align 8
-; CHECK-NEXT:    [[GEP2_B:%.*]] = getelementptr i8, ptr [[GEP1_B]], i64 [[V_B]]
 ; CHECK-NEXT:    br label [[JOIN]]
 ; CHECK:       join:
+; CHECK-NEXT:    [[V_B_SINK:%.*]] = phi i64 [ [[V_B]], [[ELSE]] ], [ [[V_A]], [[IF]] ]
+; CHECK-NEXT:    [[GEP1_B_SINK:%.*]] = phi ptr [ [[GEP1_B]], [[ELSE]] ], [ [[GEP1_A]], [[IF]] ]
 ; CHECK-NEXT:    [[PHI1:%.*]] = phi i64 [ [[V_A]], [[IF]] ], [ [[V_B]], [[ELSE]] ]
-; CHECK-NEXT:    [[PHI2:%.*]] = phi ptr [ [[GEP2_A]], [[IF]] ], [ [[GEP2_B]], [[ELSE]] ]
-; CHECK-NEXT:    call void @use.ptr(ptr [[PHI2]])
+; CHECK-NEXT:    [[GEP2_B:%.*]] = getelementptr i8, ptr [[GEP1_B_SINK]], i64 [[V_B_SINK]]
+; CHECK-NEXT:    call void @use.ptr(ptr [[GEP2_B]])
 ; CHECK-NEXT:    ret i64 [[PHI1]]
 ;
   br i1 %cond, label %if, label %else
@@ -1873,19 +1863,16 @@ join:
 
 define i64 @load_with_non_sunk_gep_both(i1 %cond, ptr %p.a, ptr %p.b, i64 %a, i64 %b) {
 ; CHECK-LABEL: @load_with_non_sunk_gep_both(
-; CHECK-NEXT:    br i1 [[COND:%.*]], label [[IF:%.*]], label [[ELSE:%.*]]
+; CHECK-NEXT:    br i1 [[COND:%.*]], label [[IF:%.*]], label [[JOIN:%.*]]
 ; CHECK:       if:
 ; CHECK-NEXT:    call void @dummy()
-; CHECK-NEXT:    [[GEP_A:%.*]] = getelementptr i8, ptr [[P_A:%.*]], i64 [[A:%.*]]
-; CHECK-NEXT:    [[V_A:%.*]] = load i64, ptr [[GEP_A]], align 8
-; CHECK-NEXT:    br label [[JOIN:%.*]]
-; CHECK:       else:
-; CHECK-NEXT:    [[GEP_B:%.*]] = getelementptr i8, ptr [[P_B:%.*]], i64 [[B:%.*]]
-; CHECK-NEXT:    [[V_B:%.*]] = load i64, ptr [[GEP_B]], align 8
 ; CHECK-NEXT:    br label [[JOIN]]
 ; CHECK:       join:
-; CHECK-NEXT:    [[V:%.*]] = phi i64 [ [[V_A]], [[IF]] ], [ [[V_B]], [[ELSE]] ]
-; CHECK-NEXT:    ret i64 [[V]]
+; CHECK-NEXT:    [[B_SINK:%.*]] = phi i64 [ [[A:%.*]], [[IF]] ], [ [[B:%.*]], [[TMP0:%.*]] ]
+; CHECK-NEXT:    [[P_B_SINK:%.*]] = phi ptr [ [[P_A:%.*]], [[IF]] ], [ [[P_B:%.*]], [[TMP0]] ]
+; CHECK-NEXT:    [[GEP_B:%.*]] = getelementptr i8, ptr [[P_B_SINK]], i64 [[B_SINK]]
+; CHECK-NEXT:    [[V_B:%.*]] = load i64, ptr [[GEP_B]], align 8
+; CHECK-NEXT:    ret i64 [[V_B]]
 ;
   br i1 %cond, label %if, label %else
 
@@ -1971,17 +1958,15 @@ join:
 
 define void @store_with_non_sunk_gep(i1 %cond, ptr %p.a, ptr %p.b, i64 %a, i64 %b) {
 ; CHECK-LABEL: @store_with_non_sunk_gep(
-; CHECK-NEXT:    br i1 [[COND:%.*]], label [[IF:%.*]], label [[ELSE:%.*]]
+; CHECK-NEXT:    br i1 [[COND:%.*]], label [[IF:%.*]], label [[JOIN:%.*]]
 ; CHECK:       if:
 ; CHECK-NEXT:    call void @dummy()
-; CHECK-NEXT:    [[GEP_A:%.*]] = getelementptr i8, ptr [[P_A:%.*]], i64 [[A:%.*]]
-; CHECK-NEXT:    store i64 0, ptr [[GEP_A]], align 8
-; CHECK-NEXT:    br label [[JOIN:%.*]]
-; CHECK:       else:
-; CHECK-NEXT:    [[GEP_B:%.*]] = getelementptr i8, ptr [[P_B:%.*]], i64 [[B:%.*]]
-; CHECK-NEXT:    store i64 0, ptr [[GEP_B]], align 8
 ; CHECK-NEXT:    br label [[JOIN]]
 ; CHECK:       join:
+; CHECK-NEXT:    [[B_SINK:%.*]] = phi i64 [ [[A:%.*]], [[IF]] ], [ [[B:%.*]], [[TMP0:%.*]] ]
+; CHECK-NEXT:    [[P_B_SINK:%.*]] = phi ptr [ [[P_A:%.*]], [[IF]] ], [ [[P_B:%.*]], [[TMP0]] ]
+; CHECK-NEXT:    [[GEP_B:%.*]] = getelementptr i8, ptr [[P_B_SINK]], i64 [[B_SINK]]
+; CHECK-NEXT:    store i64 0, ptr [[GEP_B]], align 8
 ; CHECK-NEXT:    ret void
 ;
   br i1 %cond, label %if, label %else
@@ -2003,17 +1988,15 @@ join:
 
 define void @store_with_non_sunk_gep_as_value(i1 %cond, ptr %p, ptr %p.a, ptr %p.b, i64 %a, i64 %b) {
 ; CHECK-LABEL: @store_with_non_sunk_gep_as_value(
-; CHECK-NEXT:    br i1 [[COND:%.*]], label [[IF:%.*]], label [[ELSE:%.*]]
+; CHECK-NEXT:    br i1 [[COND:%.*]], label [[IF:%.*]], label [[JOIN:%.*]]
 ; CHECK:       if:
 ; CHECK-NEXT:    call void @dummy()
-; CHECK-NEXT:    [[GEP_A:%.*]] = getelementptr i8, ptr [[P_A:%.*]], i64 [[A:%.*]]
-; CHECK-NEXT:    br label [[JOIN:%.*]]
-; CHECK:       else:
-; CHECK-NEXT:   ...
[truncated]

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change doesn't make sense to me, on multiple levels:

  • The limit you are adjusting here is a limit on the number of phi nodes per sunk instruction, not the total number of phis. You are saying here that on targets with cmov, you consider it profitable to create 6 phi nodes to sink one instruction. Surely that is not what you intend.
  • Whether the target has cmov only becomes relevant if all instructions in a block can be sunk, the branch eliminated and all phi nodes converted to select instructions. For "normal" sinking, presence of cmov is not relevant at all, as the phi nodes will be lowered as movs in the incoming basic blocks.

Based on past experience, I can tell you that it's very easy to cause regressions by allowing more sinking in SimplifyCFG. You need to tread carefully when adjusting these heuristics.

@phoebewang phoebewang marked this pull request as draft October 8, 2024 09:20
@phoebewang
Copy link
Contributor Author

@nikic Thanks for the comments! I agree with you and don't think that's the right way for cmove optimization. I don't have a good idea for now. Let me turn it into draft and revisit later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:X86 llvm:analysis Includes value tracking, cost tables and constant folding llvm:SelectionDAG SelectionDAGISel as well llvm:transforms
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants