Skip to content

Add a pass to convert jump tables to switches. #77709

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

alexander-shaposhnikov
Copy link
Collaborator

@alexander-shaposhnikov alexander-shaposhnikov commented Jan 11, 2024

Add a pass to convert jump tables to switches.
This replaces an indirect call with a switch + direct calls and enables the compiler to apply other optimizations.
Such IR patterns come e.g. from C/C++ codebases that use arrays of function pointers.
E.g. this optimization kicks in for

// Tested with clang -std=c++20  -stdlib=libc++ -fmerge-all-constants -O3 -S
#include <variant>
void f(std::variant<int *, long *, char *> v) {
    std::visit([](auto &&p) { delete p; }, v);
}

For Chrome, this results in a binary size improvement of ~0.25MB - 1.6MB (measured on Linux) depending on the build configuration / implementation of "variant" (but the optimization kicks in for the code unrelated to std::visit / std::variant too).

For several internal binaries, the size decrease was approximately 0.9%, depending on how frequently the optimized patterns occur in the codebase.

The impact on the compilation time is relatively small (see https://llvm-compile-time-tracker.com/?config=Overview&stat=instructions%3Au&remote=alexander-shaposhnikov](https://www.google.com/url?sa=D&q=https%3A%2F%2Fllvm-compile-time-tracker.com%2F%3Fconfig%3DOverview%26stat%3Dinstructions%253Au%26remote%3Dalexander-shaposhnikov
perf/convert_jt_do_not_submit_4). The pass always finishes after a single scan of instructions.

Rationale for a separate pass: the existing passes (e.g. CallSiteSplitting) implement different transformations with minimal overlap with the proposed one, and the new pass can be further extended in the future.

@llvmbot
Copy link
Member

llvmbot commented Jan 11, 2024

@llvm/pr-subscribers-llvm-transforms

Author: Alexander Shaposhnikov (alexander-shaposhnikov)

Changes

Add a pass to convert jump tables to switches.
This replaces an indirect call with a switch + direct calls and enables the compiler to apply other optimizations.
Such IR patterns come e.g. from C/C++ codebases that use arrays of function pointers.
E.g. this optimization kicks in for

// Tested with clang -std=c++20  -stdlib=libc++ -fmerge-all-constants -O3 -S
#include &lt;variant&gt;
void f(std::variant&lt;int *, long *, char *&gt; v) {
    std::visit([](auto &amp;&amp;p) { delete p; }, v);
}

For Chrome, this results in a binary size improvement of 1.6 MB (measured on Linux) for some build configurations. The improvement may be smaller for other configurations.

For several internal binaries, the size decrease was approximately 0.9%, depending on how frequently the optimized patterns occur in the codebase.

The impact on the compilation time is small (see https://llvm-compile-time-tracker.com/?config=Overview&amp;stat=instructions%3Au&amp;remote=alexander-shaposhnikov](https://www.google.com/url?sa=D&amp;q=https%3A%2F%2Fllvm-compile-time-tracker.com%2F%3Fconfig%3DOverview%26stat%3Dinstructions%253Au%26remote%3Dalexander-shaposhnikov
perf/convert_jt_do_not_submit_4). The pass always finishes after a single scan of instructions.

Rationale for a separate pass: the existing passes (e.g. CallSiteSplitting) implement different transformations with minimal overlap with the proposed one, and the new pass can be further extended in the future.


Patch is 24.50 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/77709.diff

16 Files Affected:

  • (added) llvm/include/llvm/Transforms/Scalar/JumpTableToSwitch.h (+24)
  • (modified) llvm/lib/Passes/PassBuilder.cpp (+1)
  • (modified) llvm/lib/Passes/PassBuilderPipelines.cpp (+4)
  • (modified) llvm/lib/Passes/PassRegistry.def (+1)
  • (modified) llvm/lib/Transforms/Scalar/CMakeLists.txt (+1)
  • (added) llvm/lib/Transforms/Scalar/JumpTableToSwitch.cpp (+206)
  • (modified) llvm/test/Other/new-pm-defaults.ll (+1)
  • (modified) llvm/test/Other/new-pm-thinlto-postlink-defaults.ll (+1)
  • (modified) llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll (+1)
  • (modified) llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll (+1)
  • (modified) llvm/test/Other/new-pm-thinlto-prelink-defaults.ll (+1)
  • (modified) llvm/test/Other/new-pm-thinlto-prelink-pgo-defaults.ll (+1)
  • (modified) llvm/test/Other/new-pm-thinlto-prelink-samplepgo-defaults.ll (+1)
  • (added) llvm/test/Transforms/JumpTableToSwitch/basic.ll (+94)
  • (added) llvm/test/Transforms/JumpTableToSwitch/skip.ll (+70)
  • (added) llvm/test/Transforms/JumpTableToSwitch/struct.ll (+42)
diff --git a/llvm/include/llvm/Transforms/Scalar/JumpTableToSwitch.h b/llvm/include/llvm/Transforms/Scalar/JumpTableToSwitch.h
new file mode 100644
index 00000000000000..61786227d7a335
--- /dev/null
+++ b/llvm/include/llvm/Transforms/Scalar/JumpTableToSwitch.h
@@ -0,0 +1,24 @@
+//===- JumpTableToSwitch.h - ------------------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_TRANSFORMS_SCALAR_JUMP_TABLE_TO_SWITCH_H
+#define LLVM_TRANSFORMS_SCALAR_JUMP_TABLE_TO_SWITCH_H
+
+#include "llvm/IR/PassManager.h"
+
+namespace llvm {
+
+class Function;
+
+struct JumpTableToSwitchPass : PassInfoMixin<JumpTableToSwitchPass> {
+  /// Run the pass over the function.
+  PreservedAnalyses run(Function &F, FunctionAnalysisManager &AM);
+};
+} // end namespace llvm
+
+#endif // LLVM_TRANSFORMS_SCALAR_JUMP_TABLE_TO_SWITCH_H
diff --git a/llvm/lib/Passes/PassBuilder.cpp b/llvm/lib/Passes/PassBuilder.cpp
index 649451edc0e2c6..e1da44316c8666 100644
--- a/llvm/lib/Passes/PassBuilder.cpp
+++ b/llvm/lib/Passes/PassBuilder.cpp
@@ -193,6 +193,7 @@
 #include "llvm/Transforms/Scalar/InferAlignment.h"
 #include "llvm/Transforms/Scalar/InstSimplifyPass.h"
 #include "llvm/Transforms/Scalar/JumpThreading.h"
+#include "llvm/Transforms/Scalar/JumpTableToSwitch.h"
 #include "llvm/Transforms/Scalar/LICM.h"
 #include "llvm/Transforms/Scalar/LoopAccessAnalysisPrinter.h"
 #include "llvm/Transforms/Scalar/LoopBoundSplit.h"
diff --git a/llvm/lib/Passes/PassBuilderPipelines.cpp b/llvm/lib/Passes/PassBuilderPipelines.cpp
index 5c6c391049a7b2..40fdef3e0f0724 100644
--- a/llvm/lib/Passes/PassBuilderPipelines.cpp
+++ b/llvm/lib/Passes/PassBuilderPipelines.cpp
@@ -92,6 +92,7 @@
 #include "llvm/Transforms/Scalar/InferAlignment.h"
 #include "llvm/Transforms/Scalar/InstSimplifyPass.h"
 #include "llvm/Transforms/Scalar/JumpThreading.h"
+#include "llvm/Transforms/Scalar/JumpTableToSwitch.h"
 #include "llvm/Transforms/Scalar/LICM.h"
 #include "llvm/Transforms/Scalar/LoopDeletion.h"
 #include "llvm/Transforms/Scalar/LoopDistribute.h"
@@ -557,6 +558,7 @@ PassBuilder::buildFunctionSimplificationPipeline(OptimizationLevel Level,
 
   // Optimize based on known information about branches, and cleanup afterward.
   FPM.addPass(JumpThreadingPass());
+  FPM.addPass(JumpTableToSwitchPass());
   FPM.addPass(CorrelatedValuePropagationPass());
 
   FPM.addPass(
@@ -695,6 +697,7 @@ PassBuilder::buildFunctionSimplificationPipeline(OptimizationLevel Level,
     FPM.addPass(DFAJumpThreadingPass());
 
   FPM.addPass(JumpThreadingPass());
+
   FPM.addPass(CorrelatedValuePropagationPass());
 
   // Finally, do an expensive DCE pass to catch all the dead code exposed by
@@ -1926,6 +1929,7 @@ PassBuilder::buildLTODefaultPipeline(OptimizationLevel Level,
 
   invokePeepholeEPCallbacks(MainFPM, Level);
   MainFPM.addPass(JumpThreadingPass());
+
   MPM.addPass(createModuleToFunctionPassAdaptor(std::move(MainFPM),
                                                 PTO.EagerlyInvalidateAnalyses));
 
diff --git a/llvm/lib/Passes/PassRegistry.def b/llvm/lib/Passes/PassRegistry.def
index eaa7c3fc89241b..7417d639a9ecd7 100644
--- a/llvm/lib/Passes/PassRegistry.def
+++ b/llvm/lib/Passes/PassRegistry.def
@@ -330,6 +330,7 @@ FUNCTION_PASS("interleaved-load-combine", InterleavedLoadCombinePass(TM))
 FUNCTION_PASS("invalidate<all>", InvalidateAllAnalysesPass())
 FUNCTION_PASS("irce", IRCEPass())
 FUNCTION_PASS("jump-threading", JumpThreadingPass())
+FUNCTION_PASS("jump-table-to-switch", JumpTableToSwitchPass());
 FUNCTION_PASS("kcfi", KCFIPass())
 FUNCTION_PASS("lcssa", LCSSAPass())
 FUNCTION_PASS("libcalls-shrinkwrap", LibCallsShrinkWrapPass())
diff --git a/llvm/lib/Transforms/Scalar/CMakeLists.txt b/llvm/lib/Transforms/Scalar/CMakeLists.txt
index 2dd27037a17de7..7b49c8010c874a 100644
--- a/llvm/lib/Transforms/Scalar/CMakeLists.txt
+++ b/llvm/lib/Transforms/Scalar/CMakeLists.txt
@@ -25,6 +25,7 @@ add_llvm_component_library(LLVMScalarOpts
   InferAlignment.cpp
   InstSimplifyPass.cpp
   JumpThreading.cpp
+  JumpTableToSwitch.cpp
   LICM.cpp
   LoopAccessAnalysisPrinter.cpp
   LoopBoundSplit.cpp
diff --git a/llvm/lib/Transforms/Scalar/JumpTableToSwitch.cpp b/llvm/lib/Transforms/Scalar/JumpTableToSwitch.cpp
new file mode 100644
index 00000000000000..60b451ff9d3db2
--- /dev/null
+++ b/llvm/lib/Transforms/Scalar/JumpTableToSwitch.cpp
@@ -0,0 +1,206 @@
+//===- JumpTableToSwitch.cpp ----------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "llvm/Transforms/Scalar/JumpTableToSwitch.h"
+#include "llvm/ADT/DenseMap.h"
+#include "llvm/ADT/SmallSet.h"
+#include "llvm/Analysis/DomTreeUpdater.h"
+#include "llvm/Analysis/PostDominators.h"
+#include "llvm/Analysis/TargetLibraryInfo.h"
+#include "llvm/Analysis/TargetTransformInfo.h"
+#include "llvm/Analysis/ValueTracking.h"
+#include "llvm/IR/IRBuilder.h"
+#include "llvm/IR/IntrinsicInst.h"
+#include "llvm/IR/PatternMatch.h"
+#include "llvm/Support/CommandLine.h"
+#include "llvm/Support/Debug.h"
+#include "llvm/Transforms/Utils/BasicBlockUtils.h"
+#include "llvm/Transforms/Utils/Cloning.h"
+#include "llvm/Transforms/Utils/Local.h"
+
+using namespace llvm;
+using namespace PatternMatch;
+
+static cl::opt<unsigned>
+    JumpTableSizeThreshold("jump-table-to-switch-size-threshold", cl::Hidden,
+                           cl::desc("Only split jump tables with size less or "
+                                    "equal than JumpTableSizeThreshold."),
+                           cl::init(10));
+
+#define DEBUG_TYPE "jump-table-to-switch"
+
+static Constant *getElementWithGivenTypeAtOffset(Constant *C, const Type *Ty,
+                                                 uint64_t Offset, Module &M) {
+  if (Offset == 0 && C->getType() == Ty)
+    return C;
+  if (auto *CS = dyn_cast<ConstantStruct>(C)) {
+    const DataLayout &DL = M.getDataLayout();
+    const StructLayout *SL = DL.getStructLayout(CS->getType());
+    if (Offset >= SL->getSizeInBytes())
+      return nullptr;
+    const unsigned Op = SL->getElementContainingOffset(Offset);
+    const uint64_t AdjustedOffset = Offset - SL->getElementOffset(Op);
+    Constant *Element = cast<Constant>(CS->getOperand(Op));
+    return getElementWithGivenTypeAtOffset(Element, Ty, AdjustedOffset, M);
+  }
+  // TODO: add support for arrays.
+  return nullptr;
+}
+
+namespace {
+struct JumpTableTy {
+  Value *Index;
+  SmallVector<Function *, 5> Funcs;
+};
+} // anonymous namespace
+
+static std::optional<JumpTableTy> parseJumpTable(GetElementPtrInst *GEP) {
+  if (!GEP || !GEP->isInBounds())
+    return std::nullopt;
+  ArrayType *ArrayTy = dyn_cast<ArrayType>(GEP->getSourceElementType());
+  if (!ArrayTy || ArrayTy->getArrayNumElements() > JumpTableSizeThreshold)
+    return std::nullopt;
+  Constant *Ptr = dyn_cast<Constant>(GEP->getPointerOperand());
+  if (!Ptr)
+    return std::nullopt;
+
+  Function &F = *GEP->getParent()->getParent();
+  const DataLayout &DL = F.getParent()->getDataLayout();
+  const unsigned BitWidth =
+      DL.getIndexSizeInBits(GEP->getPointerAddressSpace());
+  MapVector<Value *, APInt> VariableOffsets;
+  APInt ConstantOffset(BitWidth, 0);
+  if (!GEP->collectOffset(DL, BitWidth, VariableOffsets, ConstantOffset))
+    return std::nullopt;
+  if (VariableOffsets.empty() || VariableOffsets.size() > 1)
+    return std::nullopt;
+
+  Module &M = *F.getParent();
+  unsigned Offset = ConstantOffset.getZExtValue();
+  // TODO: support more general patterns
+  // (see also TODO in getElementWithGivenTypeAtOffset).
+  if (Offset != 0)
+    return std::nullopt;
+  if (!Ptr->getNumOperands())
+    return std::nullopt;
+  Constant *ConstArray = getElementWithGivenTypeAtOffset(
+      cast<Constant>(Ptr->getOperand(0)), ArrayTy, Offset, M);
+  if (!ConstArray)
+    return std::nullopt;
+
+  JumpTableTy JumpTable;
+  JumpTable.Index = VariableOffsets.front().first;
+
+  const uint64_t N = ArrayTy->getArrayNumElements();
+  JumpTable.Funcs.assign(N, nullptr);
+  for (uint64_t Index = 0; Index < N; ++Index) {
+    auto *Func =
+        dyn_cast_or_null<Function>(ConstArray->getAggregateElement(Index));
+    if (!Func || Func->isDeclaration())
+      return std::nullopt;
+    JumpTable.Funcs[Index] = Func;
+  }
+  return JumpTable;
+}
+
+static BasicBlock *split(CallBase *CB, const JumpTableTy &JT,
+                         DomTreeUpdater *DTU) {
+  const bool IsVoid = CB->getType() == Type::getVoidTy(CB->getContext());
+
+  SmallVector<DominatorTree::UpdateType, 8> DTUpdates;
+  BasicBlock *BB = CB->getParent();
+  BasicBlock *Tail =
+      SplitBlock(BB, CB, DTU, nullptr, nullptr, BB->getName() + Twine(".tail"));
+  DTUpdates.push_back({DominatorTree::Delete, BB, Tail});
+  BB->getTerminator()->eraseFromParent();
+
+  Function &F = *BB->getParent();
+  BasicBlock *BBUnreachable = BasicBlock::Create(
+      F.getContext(), "default.switch.case.unreachable", &F, Tail);
+  IRBuilder<> BuilderUnreachable(BBUnreachable);
+  BuilderUnreachable.CreateUnreachable();
+
+  IRBuilder<> Builder(BB);
+  SwitchInst *Switch = Builder.CreateSwitch(JT.Index, BBUnreachable);
+  DTUpdates.push_back({DominatorTree::Insert, BB, BBUnreachable});
+
+  IRBuilder<> BuilderTail(CB);
+  PHINode *PHI =
+      IsVoid ? nullptr : BuilderTail.CreatePHI(CB->getType(), JT.Funcs.size());
+
+  SmallVector<BasicBlock *, 10> Blocks;
+  for (auto [Index, Func] : llvm::enumerate(JT.Funcs)) {
+    BasicBlock *B = BasicBlock::Create(Func->getContext(),
+                                       "call." + Twine(Index), &F, Tail);
+    DTUpdates.push_back({DominatorTree::Insert, BB, B});
+    DTUpdates.push_back({DominatorTree::Insert, B, Tail});
+
+    CallBase *Call = cast<CallBase>(CB->clone());
+    Call->setCalledFunction(Func);
+    Call->insertInto(B, B->end());
+    Switch->addCase(
+        cast<ConstantInt>(ConstantInt::get(JT.Index->getType(), Index)), B);
+    BranchInst::Create(Tail, B);
+    if (PHI)
+      PHI->addIncoming(Call, B);
+  }
+  if (DTU)
+    DTU->applyUpdates(DTUpdates);
+  if (PHI)
+    CB->replaceAllUsesWith(PHI);
+  CB->eraseFromParent();
+  return Tail;
+}
+
+PreservedAnalyses JumpTableToSwitchPass::run(Function &F,
+                                             FunctionAnalysisManager &AM) {
+  DominatorTree *DT = AM.getCachedResult<DominatorTreeAnalysis>(F);
+  PostDominatorTree *PDT = AM.getCachedResult<PostDominatorTreeAnalysis>(F);
+  std::unique_ptr<DomTreeUpdater> DTU;
+  bool Changed = false;
+  for (BasicBlock &BB : make_early_inc_range(F)) {
+    BasicBlock *CurrentBB = &BB;
+    while (CurrentBB) {
+      BasicBlock *SplittedOutTail = nullptr;
+      for (Instruction &I : make_early_inc_range(*CurrentBB)) {
+        CallBase *CB = dyn_cast<CallBase>(&I);
+        if (!CB || isa<IntrinsicInst>(CB) || CB->getCalledFunction() ||
+            isa<InvokeInst>(CB) || CB->isMustTailCall())
+          continue;
+
+        Value *V;
+        if (!match(CB->getCalledOperand(), m_Load(m_Value(V))))
+          continue;
+        auto *GEP = dyn_cast<GetElementPtrInst>(V);
+        if (!GEP)
+          continue;
+
+        std::optional<JumpTableTy> JumpTable = parseJumpTable(GEP);
+        if (!JumpTable)
+          continue;
+        if ((DT || PDT) && !DTU)
+          DTU = std::make_unique<DomTreeUpdater>(
+              DT, PDT, DomTreeUpdater::UpdateStrategy::Lazy);
+        SplittedOutTail = split(CB, *JumpTable, DTU.get());
+        Changed = true;
+        break;
+      }
+      CurrentBB = SplittedOutTail ? SplittedOutTail : nullptr;
+    }
+  }
+
+  if (!Changed)
+    return PreservedAnalyses::all();
+
+  PreservedAnalyses PA;
+  if (DT)
+    PA.preserve<DominatorTreeAnalysis>();
+  if (PDT)
+    PA.preserve<PostDominatorTreeAnalysis>();
+  return PA;
+}
diff --git a/llvm/test/Other/new-pm-defaults.ll b/llvm/test/Other/new-pm-defaults.ll
index ecdb5a5e010d92..fb4254363c3eaa 100644
--- a/llvm/test/Other/new-pm-defaults.ll
+++ b/llvm/test/Other/new-pm-defaults.ll
@@ -149,6 +149,7 @@
 ; CHECK-O23SZ-NEXT: Running pass: SpeculativeExecutionPass
 ; CHECK-O23SZ-NEXT: Running pass: JumpThreadingPass
 ; CHECK-O23SZ-NEXT: Running analysis: LazyValueAnalysis
+; CHECK-O23SZ-NEXT: Running pass: JumpTableToSwitchPass
 ; CHECK-O23SZ-NEXT: Running pass: CorrelatedValuePropagationPass
 ; CHECK-O23SZ-NEXT: Invalidating analysis: LazyValueAnalysis
 ; CHECK-O-NEXT: Running pass: SimplifyCFGPass
diff --git a/llvm/test/Other/new-pm-thinlto-postlink-defaults.ll b/llvm/test/Other/new-pm-thinlto-postlink-defaults.ll
index 064362eabbf839..4f54becf5871f5 100644
--- a/llvm/test/Other/new-pm-thinlto-postlink-defaults.ll
+++ b/llvm/test/Other/new-pm-thinlto-postlink-defaults.ll
@@ -88,6 +88,7 @@
 ; CHECK-O23SZ-NEXT: Running pass: SpeculativeExecutionPass
 ; CHECK-O23SZ-NEXT: Running pass: JumpThreadingPass
 ; CHECK-O23SZ-NEXT: Running analysis: LazyValueAnalysis
+; CHECK-O23SZ-NEXT: Running pass: JumpTableToSwitchPass
 ; CHECK-O23SZ-NEXT: Running pass: CorrelatedValuePropagationPass
 ; CHECK-O23SZ-NEXT: Invalidating analysis: LazyValueAnalysis
 ; CHECK-O-NEXT: Running pass: SimplifyCFGPass
diff --git a/llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll b/llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll
index 19a44867e434ac..8b869b1871a815 100644
--- a/llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll
+++ b/llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll
@@ -76,6 +76,7 @@
 ; CHECK-O23SZ-NEXT: Running pass: SpeculativeExecutionPass
 ; CHECK-O23SZ-NEXT: Running pass: JumpThreadingPass
 ; CHECK-O23SZ-NEXT: Running analysis: LazyValueAnalysis
+; CHECK-O23SZ-NEXT: Running pass: JumpTableToSwitchPass
 ; CHECK-O23SZ-NEXT: Running pass: CorrelatedValuePropagationPass
 ; CHECK-O23SZ-NEXT: Invalidating analysis: LazyValueAnalysis
 ; CHECK-O-NEXT: Running pass: SimplifyCFGPass
diff --git a/llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll b/llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll
index ac80a31d8fd4bc..da5d8466711017 100644
--- a/llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll
+++ b/llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll
@@ -84,6 +84,7 @@
 ; CHECK-O23SZ-NEXT: Running pass: SpeculativeExecutionPass
 ; CHECK-O23SZ-NEXT: Running pass: JumpThreadingPass
 ; CHECK-O23SZ-NEXT: Running analysis: LazyValueAnalysis
+; CHECK-O23SZ-NEXT: Running pass: JumpTableToSwitchPass
 ; CHECK-O23SZ-NEXT: Running pass: CorrelatedValuePropagationPass
 ; CHECK-O23SZ-NEXT: Invalidating analysis: LazyValueAnalysis
 ; CHECK-O-NEXT: Running pass: SimplifyCFGPass
diff --git a/llvm/test/Other/new-pm-thinlto-prelink-defaults.ll b/llvm/test/Other/new-pm-thinlto-prelink-defaults.ll
index 6486639e07b49c..809b8abb206040 100644
--- a/llvm/test/Other/new-pm-thinlto-prelink-defaults.ll
+++ b/llvm/test/Other/new-pm-thinlto-prelink-defaults.ll
@@ -119,6 +119,7 @@
 ; CHECK-O23SZ-NEXT: Running pass: SpeculativeExecutionPass
 ; CHECK-O23SZ-NEXT: Running pass: JumpThreadingPass
 ; CHECK-O23SZ-NEXT: Running analysis: LazyValueAnalysis
+; CHECK-O23SZ-NEXT: Running pass: JumpTableToSwitchPass
 ; CHECK-O23SZ-NEXT: Running pass: CorrelatedValuePropagationPass
 ; CHECK-O23SZ-NEXT: Invalidating analysis: LazyValueAnalysis
 ; CHECK-O-NEXT: Running pass: SimplifyCFGPass
diff --git a/llvm/test/Other/new-pm-thinlto-prelink-pgo-defaults.ll b/llvm/test/Other/new-pm-thinlto-prelink-pgo-defaults.ll
index 09f9f0f48baddb..438c4ba364be3a 100644
--- a/llvm/test/Other/new-pm-thinlto-prelink-pgo-defaults.ll
+++ b/llvm/test/Other/new-pm-thinlto-prelink-pgo-defaults.ll
@@ -116,6 +116,7 @@
 ; CHECK-O23SZ-NEXT: Running pass: SpeculativeExecutionPass
 ; CHECK-O23SZ-NEXT: Running pass: JumpThreadingPass
 ; CHECK-O23SZ-NEXT: Running analysis: LazyValueAnalysis
+; CHECK-O23SZ-NEXT: Running pass: JumpTableToSwitchPass
 ; CHECK-O23SZ-NEXT: Running pass: CorrelatedValuePropagationPass
 ; CHECK-O23SZ-NEXT: Invalidating analysis: LazyValueAnalysis
 ; CHECK-O-NEXT: Running pass: SimplifyCFGPass
diff --git a/llvm/test/Other/new-pm-thinlto-prelink-samplepgo-defaults.ll b/llvm/test/Other/new-pm-thinlto-prelink-samplepgo-defaults.ll
index 47bdbfd2d357d4..1b7ce7bbf9680f 100644
--- a/llvm/test/Other/new-pm-thinlto-prelink-samplepgo-defaults.ll
+++ b/llvm/test/Other/new-pm-thinlto-prelink-samplepgo-defaults.ll
@@ -88,6 +88,7 @@
 ; CHECK-O23SZ-NEXT: Running pass: SpeculativeExecutionPass
 ; CHECK-O23SZ-NEXT: Running pass: JumpThreadingPass
 ; CHECK-O23SZ-NEXT: Running analysis: LazyValueAnalysis
+; CHECK-O23SZ-NEXT: Running pass: JumpTableToSwitchPass
 ; CHECK-O23SZ-NEXT: Running pass: CorrelatedValuePropagationPass
 ; CHECK-O23SZ-NEXT: Invalidating analysis: LazyValueAnalysis
 ; CHECK-O-NEXT: Running pass: SimplifyCFGPass
diff --git a/llvm/test/Transforms/JumpTableToSwitch/basic.ll b/llvm/test/Transforms/JumpTableToSwitch/basic.ll
new file mode 100644
index 00000000000000..3feb760909d08a
--- /dev/null
+++ b/llvm/test/Transforms/JumpTableToSwitch/basic.ll
@@ -0,0 +1,94 @@
+; RUN: opt < %s -passes=jump-table-to-switch -verify-dom-info -S | FileCheck %s
+; RUN: opt < %s -passes=jump-table-to-switch -jump-table-to-switch-size-threshold=0 -verify-dom-info -S | FileCheck %s --check-prefix=THRESHOLD-0
+
+@func_array = global [2 x ptr] [ptr @func0, ptr @func1]
+
+define i32 @func0() {
+  ret i32 1
+}
+
+define i32 @func1() {
+  ret i32 2
+}
+
+define i32 @function_with_jump_table(i32 %index) {
+  %gep = getelementptr inbounds [2 x ptr], ptr @func_array, i32 0, i32 %index
+  %func_ptr = load ptr, ptr %gep
+  %result = call i32 %func_ptr()
+  ret i32 %result
+}
+
+; CHECK-LABEL: define i32 @function_with_jump_table
+; CHECK:       [[GEP:%.*]] = getelementptr inbounds [2 x ptr], ptr @func_array, i32 0, i32 %index
+; CHECK:       switch i32 %index, label %[[DEFAULT_LABEL:.*]] [
+; CHECK-NEXT:    i32 0, label %[[CALL_0_LABEL:.*]]
+; CHECK-NEXT:    i32 1, label %[[CALL_1_LABEL:.*]]
+; CHECK-NEXT:  ]
+
+; CHECK:       [[DEFAULT_LABEL]]
+; CHECK-NEXT:    unreachable
+
+; CHECK:       [[CALL_0_LABEL]]:
+; CHECK-NEXT:    %1 = call i32 @func0()
+; CHECK-NEXT:    br label %[[TAIL_LABEL:.*]]
+
+; CHECK:       [[CALL_1_LABEL]]:
+; CHECK-NEXT:    %2 = call i32 @func1()
+; CHECK-NEXT:    br label %[[TAIL_LABEL]]
+
+; CHECK:       [[TAIL_LABEL]]:
+; CHECK-NEXT:    %3 = phi i32 [ %1, %[[CALL_0_LABEL]] ], [ %2, %[[CALL_1_LABEL]] ]
+; CHECK-NEXT:    ret i32 %3
+
+; THRESHOLD-0-LABEL:  define i32 @function_with_jump_table(i32 %index) {
+; THRESHOLD-0:          %gep = getelementptr inbounds [2 x ptr], ptr @func_array, i32 0, i32 %index
+; THRESHOLD-0-NEXT:     %func_ptr = load ptr, ptr %gep
+; THRESHOLD-0-NEXT:     %result = call i32 %func_ptr()
+; THRESHOLD-0-NEXT:     ret i32 %result
+; THRESHOLD-0-NEXT:   }
+
+define void @void_func0() {
+  ret void
+}
+
+define void @void_func1() {
+  ret void
+}
+
+@void_func_array = global [2 x ptr] [ptr @void_func0, ptr @void_func1]
+
+define void @void_function_with_jump_table(i32 %index) {
+  %gep = getelementptr inbounds [2 x ptr], ptr @void_func_array, i32 0, i32 %index
+  %func_ptr = load ptr, ptr %gep
+  call void %func_ptr()
+  ret void
+}
+
+; CHECK-LABEL: define void @void_function_with_jump_table
+; CHECK:       [[GEP:%.*]] = getelementptr inbounds [2 x ptr], ptr @void_func_array, i32 0, i32 %index
+; CHECK:       switch i32 %index, label %[[VOID_DEFAULT_LABEL:.*]] [
+; CHECK-NEXT:    i32 0, label %[[VOID_CALL_0_LABEL:.*]]
+; CHECK-NEXT:    i32 1, label %[[VOID_CALL_1_LABEL:.*]]
+; CHECK-NEXT:  ]
+
+; CHECK:       [[VOID_DEFAULT_LABEL]]
+; CHECK-NEXT:    unreachable
+
+; CHECK:       [[VOID_CALL_0_LABEL]]:
+; CHECK-NEXT:    call void @void_func0()
+; CHECK-NEXT:    br label %[[VOID_TAIL_LABEL:.*]]
+
+; CHECK:       [[VOID_CALL_1_LABEL]]:
+; CHECK-NEXT:    call void @void_func1()
+; CHECK-NEXT:    br label %[[VOID_TAIL_LABEL]]
+
+; CHECK:       [[VOID_TAIL_LABEL]]:
+; CHECK-NEXT:    ret void
+
+; THRESHOLD-0-LABEL:  define void @void_function_with_jump_table(i32 %index) {
+; THRESHOLD-0:          %gep = getelementptr inbounds [2 ...
[truncated]

Copy link

github-actions bot commented Jan 11, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@alexander-shaposhnikov alexander-shaposhnikov force-pushed the convert_jump_table_to_switch branch from a936548 to cfb6758 Compare January 11, 2024 00:59
@dtcxzyw dtcxzyw requested a review from nikic January 11, 2024 08:00
@david-xl
Copy link
Contributor

For the small example you have, I noticed that using libstdc++ produces clean code. Is the root cause of the size increase with std::variant in libc++'s implementation?

@dtcxzyw
Copy link
Member

dtcxzyw commented Jan 11, 2024

For the small example you have, I noticed that using libstdc++ produces clean code. Is the root cause of the size increase with std::variant in libc++'s implementation?

I believe there are other patterns which will benefit from this pass.

Example: https://github.com/zeux/pugixml/blob/96aebcb8ae2062ad020186e32fd301db7c3a0c88/src/pugixml.cpp#L2745-L2761

@switch.table._ZN4pugi4impl10xml_parser10parse_treeEPcPNS_15xml_node_structEjc.1 = private unnamed_addr constant [8 x ptr] [ptr @_ZN4pugi4impl19strconv_pcdata_implINS0_9opt_falseES2_S2_E5parseEPc, ptr @_ZN4pugi4impl19strconv_pcdata_implINS0_9opt_falseES2_NS0_8opt_trueEE5parseEPc, ptr @_ZN4pugi4impl19strconv_pcdata_implINS0_9opt_falseENS0_8opt_trueES2_E5parseEPc, ptr @_ZN4pugi4impl19strconv_pcdata_implINS0_9opt_falseENS0_8opt_trueES3_E5parseEPc, ptr @_ZN4pugi4impl19strconv_pcdata_implINS0_8opt_trueENS0_9opt_falseES3_E5parseEPc, ptr @_ZN4pugi4impl19strconv_pcdata_implINS0_8opt_trueENS0_9opt_falseES2_E5parseEPc, ptr @_ZN4pugi4impl19strconv_pcdata_implINS0_8opt_trueES2_NS0_9opt_falseEE5parseEPc, ptr @_ZN4pugi4impl19strconv_pcdata_implINS0_8opt_trueES2_S2_E5parseEPc], align 8

%1 = zext nneg i32 %or.i to i64
%switch.gep705 = getelementptr inbounds [8 x ptr], ptr @switch.table._ZN4pugi4impl10xml_parser10parse_treeEPcPNS_15xml_node_structEjc.1, i64 0, i64 %1
%switch.load706 = load ptr, ptr %switch.gep705, align 8
%call455 = call noundef ptr %switch.load706(ptr noundef nonnull %spec.select)

@david-xl
Copy link
Contributor

If the patch is intended as a size reduction optimization, it should be guarded under Os/Oz. Otherwise more performance data is needed. In particular performance with FDO (FDO does similar transformation but more selectively with value profile data.

@alexander-shaposhnikov
Copy link
Collaborator Author

@david-xl - correct, to the best of my knowledge libc++ tried to change the implementation in the past but then moved back.
I thought about revisiting it again, however, since this is not specific to std::variant rewriting other libraries is unfeasible.

@alexander-shaposhnikov alexander-shaposhnikov force-pushed the convert_jump_table_to_switch branch 3 times, most recently from 12896fd to d684cc4 Compare January 18, 2024 11:28
@nikic
Copy link
Contributor

nikic commented Jan 18, 2024

For the small example you have, I noticed that using libstdc++ produces clean code. Is the root cause of the size increase with std::variant in libc++'s implementation?

I believe there are other patterns which will benefit from this pass.

Example: https://github.com/zeux/pugixml/blob/96aebcb8ae2062ad020186e32fd301db7c3a0c88/src/pugixml.cpp#L2745-L2761

@switch.table._ZN4pugi4impl10xml_parser10parse_treeEPcPNS_15xml_node_structEjc.1 = private unnamed_addr constant [8 x ptr] [ptr @_ZN4pugi4impl19strconv_pcdata_implINS0_9opt_falseES2_S2_E5parseEPc, ptr @_ZN4pugi4impl19strconv_pcdata_implINS0_9opt_falseES2_NS0_8opt_trueEE5parseEPc, ptr @_ZN4pugi4impl19strconv_pcdata_implINS0_9opt_falseENS0_8opt_trueES2_E5parseEPc, ptr @_ZN4pugi4impl19strconv_pcdata_implINS0_9opt_falseENS0_8opt_trueES3_E5parseEPc, ptr @_ZN4pugi4impl19strconv_pcdata_implINS0_8opt_trueENS0_9opt_falseES3_E5parseEPc, ptr @_ZN4pugi4impl19strconv_pcdata_implINS0_8opt_trueENS0_9opt_falseES2_E5parseEPc, ptr @_ZN4pugi4impl19strconv_pcdata_implINS0_8opt_trueES2_NS0_9opt_falseEE5parseEPc, ptr @_ZN4pugi4impl19strconv_pcdata_implINS0_8opt_trueES2_S2_E5parseEPc], align 8

%1 = zext nneg i32 %or.i to i64
%switch.gep705 = getelementptr inbounds [8 x ptr], ptr @switch.table._ZN4pugi4impl10xml_parser10parse_treeEPcPNS_15xml_node_structEjc.1, i64 0, i64 %1
%switch.load706 = load ptr, ptr %switch.gep705, align 8
%call455 = call noundef ptr %switch.load706(ptr noundef nonnull %spec.select)

Isn't this an example where we do the inverse transform? We start from a switch and SimplifyCFG converts it into a vtable?

@alexander-shaposhnikov
Copy link
Collaborator Author

Regarding FDO - we've done more performance measurements (for builds with FDO), in several cases, the top-level metrics (e.g., throughput) have slightly improved (stat-sig). The concrete ranges depend on the particular service and workload; the observed ones were around ~0.2% to ~1%.

@davidben
Copy link
Contributor

Regarding libc++'s std::variant as compared to libstdc++, the context for that is in #62648.

@alexander-shaposhnikov alexander-shaposhnikov force-pushed the convert_jump_table_to_switch branch from d684cc4 to e031383 Compare January 21, 2024 10:44
@alexander-shaposhnikov
Copy link
Collaborator Author

Addressed comments, expanded test coverage.

@alexander-shaposhnikov
Copy link
Collaborator Author

gentle ping @david-xl , @nikic

@david-xl
Copy link
Contributor

david-xl commented Feb 6, 2024

Thanks. My high level concern with this patch is that profitability is quite unclear. There are two general cases:

  1. We convert the jump table, but the resulting calls do not get inlined. The end result is that we go from a load and indirect branch to a switch, which will probably get lowered in some less efficient way (e.g. a jump table to a direct call). See https://github.com/dtcxzyw/llvm-opt-benchmark/pull/194/files#diff-eec1b5b777c9e8fcf7801b8c80004f1055ae1c04951b49a38fa52a06bd15f760 for an example where this happens. Right now, I'm not sure if we are able to undo this transform at all -- I initially thought that normal switch to jump table conversion will handle it, but thinking about it again now, I don't think it supports this pattern involving calls.
  2. We convert the jump table and the calls do get inlined. This may or may not be beneficial. The case you probably have in mind here is the case where the calls are actually small functions and inlining them results in overall simplification. However, it can also happen that they are large functions that will not receive any substantial benefits from inlining. In that case, inlining is them into one large function with a switch is probably worse than the existing jump table.

Overall, I'm unsure whether this is a good idea, at least when done unconditionally.

Those are all valid concerns. It seems it needs some cost model for profitability analysis (also depend on it is optimize for size or speed). Things to consider when optimize for speed:

  1. with profile feedback, this optimization should probably be turned off:
    a) if there is a dominating case, the existing indirect call promotion will peel the case out and make it direct calls
    b) if the jump targets are equally likely, convert them into switch will likely hurt performance as Niki mentioned: when switch is not properly lowered, it does not help code layout, nor branch predictor.

  2. without profile feedback, it is probably better to check function body availability and consult Inliner to see if callee is profitable to be inlined (this API can be useful for other clients).

For now, if looks fine to me to go in if

a) have a flag that is off by default
b) the pass is on with Os/Oz but without profile (or cold function)
c) check target function definition and they need to be smaller than threshold.

nikic@, WDTY?

@alexander-shaposhnikov alexander-shaposhnikov force-pushed the convert_jump_table_to_switch branch 4 times, most recently from ae5fd1b to 6530281 Compare February 7, 2024 02:38
@alexander-shaposhnikov
Copy link
Collaborator Author

alexander-shaposhnikov commented Feb 7, 2024

Fwiw, I've logged more data (from quite a large build - the optimization kicked in ~10K times), in 80% of the cases the max size of the functions in the jump table is less than 25 instructions, and in 90% - less than 37 instructions.
Added an option to apply this transformation only if all the functions are small (i.e. less than the threshold).

UPD.
Added a TODO regarding a better cost model for this transformation. Doing this for small functions appears to be profitable (both for speed and size (based on the measurements)), we probably leave some opportunities on the table, but this seems to be a quite conservative approach.

@alexander-shaposhnikov alexander-shaposhnikov force-pushed the convert_jump_table_to_switch branch 2 times, most recently from 19b7d6a to 7474a6b Compare February 7, 2024 05:55
@alinas
Copy link
Contributor

alinas commented Feb 7, 2024

Could you add a cl::opt to disable the pass entirely and use that in the pipeline? First iteration should be to commit it with flag disabled and enable it in a follow up PR/merge.

Before turning on by default, consider adding checks using PSI/BPI and also check how/if it affects your testing.
But with the function size limit you added and if the "flag flip" is a separate change, I think this can go in and then iterate over the profitability separately.

@alexander-shaposhnikov alexander-shaposhnikov force-pushed the convert_jump_table_to_switch branch from 7474a6b to 3a455b3 Compare February 7, 2024 21:19
@alexander-shaposhnikov
Copy link
Collaborator Author

Added -enable-jump-table-to-switch, now the new pass is disabled by default.

@nikic
Copy link
Contributor

nikic commented Feb 8, 2024

@dtcxzyw Could you please rerun tests?

dtcxzyw added a commit to dtcxzyw/llvm-opt-benchmark that referenced this pull request Feb 8, 2024
@dtcxzyw
Copy link
Member

dtcxzyw commented Feb 8, 2024

@dtcxzyw Could you please rerun tests?

Done.

@alexander-shaposhnikov alexander-shaposhnikov force-pushed the convert_jump_table_to_switch branch from 3a455b3 to ff11d0e Compare February 8, 2024 19:32
@alexander-shaposhnikov
Copy link
Collaborator Author

@nikic - many thanks for the review

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@alexander-shaposhnikov alexander-shaposhnikov merged commit d26b43f into llvm:main Feb 10, 2024
@nikic
Copy link
Contributor

nikic commented Feb 21, 2024

Reminder that the flag still needs flipping.

@nico
Copy link
Contributor

nico commented Mar 7, 2025

#83229 flips the flag, but it's held up because of build time issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants