Skip to content

[mlir][inliner] Add doClone and canHandleMultipleBlocks callbacks to Inliner #131226

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Apr 5, 2025

Conversation

junfengd-nv
Copy link
Contributor

@junfengd-nv junfengd-nv commented Mar 13, 2025

Current inliner disables inlining when the caller is in a region with single block trait, while the callee function contains multiple blocks. the SingleBlock trait is used in operations such as do/while loop, for example fir.do_loop, fir.iterate_while and fir.if (refer to https://flang.llvm.org/docs/FIRLangRef.html). Typically, calls within loops are good candidates for inlining. However, functions with multiple blocks are also common. for example, any function with "if () then return" will result in multiple blocks in MLIR.

My change gives the flexibility of a customized inliner to handle such cases.
doClone: clones instructions and other information from the callee function into the caller function. .
canHandleMultipleBlocks: checks if functions with multiple blocks can be inlined into a region with the SingleBlock trait.
With the two callback functions that I added, the derived inliner can return true with function "canHandleMultipleBlocks" to avoid stopping during the inline decision process. And then add their extra transformations in the function "doClone".

The default behavior of the inliner remains unchanged.

…the Inliner

These callback functions enhance the flexibility to customize inliner behavior.
. doClone: clones instructions and other information from the callee function into the caller function.
. canHandleMultipleBlocks: checks if functions with multiple blocks can be inlined into a region with the SingleBlock trait.

The default behavior of the inliner remains unchanged.
@llvmbot llvmbot added mlir:core MLIR Core Infrastructure mlir labels Mar 13, 2025
@llvmbot
Copy link
Member

llvmbot commented Mar 13, 2025

@llvm/pr-subscribers-mlir

@llvm/pr-subscribers-mlir-core

Author: None (junfengd-nv)

Changes

These callback functions enhance the flexibility to customize inliner behavior. .

doClone: clones instructions and other information from the callee function into the caller function. .
canHandleMultipleBlocks: checks if functions with multiple blocks can be inlined into a region with the SingleBlock trait.

The default behavior of the inliner remains unchanged.


Full diff: https://github.com/llvm/llvm-project/pull/131226.diff

4 Files Affected:

  • (modified) mlir/include/mlir/Transforms/Inliner.h (+28-2)
  • (modified) mlir/lib/Transforms/InlinerPass.cpp (+5-1)
  • (modified) mlir/lib/Transforms/Utils/Inliner.cpp (+38-13)
  • (modified) mlir/lib/Transforms/Utils/InliningUtils.cpp (+4-9)
diff --git a/mlir/include/mlir/Transforms/Inliner.h b/mlir/include/mlir/Transforms/Inliner.h
index ec77319d6ac88..86fa6ce6a0c77 100644
--- a/mlir/include/mlir/Transforms/Inliner.h
+++ b/mlir/include/mlir/Transforms/Inliner.h
@@ -90,18 +90,40 @@ class Inliner {
   /// this hook's interface might need to be extended in future.
   using ProfitabilityCallbackTy = std::function<bool(const ResolvedCall &)>;
 
+  /// Type of the callback that determines if the inliner can inline a function
+  /// containing multiple blocks into a region that requires a single block. By
+  /// default, it is not allowed.
+  /// If this function return true, the static member function doClone()
+  /// should perform the actual transformation with its support.
+  using canHandleMultipleBlocksCbTy = std::function<bool()>;
+
+  using CloneCallbackTy =
+      std::function<void(OpBuilder &builder, Region *src, Block *inlineBlock,
+                         Block *postInsertBlock, IRMapping &mapper,
+                         bool shouldCloneInlinedRegion)>;
+
   Inliner(Operation *op, CallGraph &cg, Pass &pass, AnalysisManager am,
           RunPipelineHelperTy runPipelineHelper, const InlinerConfig &config,
-          ProfitabilityCallbackTy isProfitableToInline)
+          ProfitabilityCallbackTy isProfitableToInline,
+          canHandleMultipleBlocksCbTy canHandleMultipleBlocks)
       : op(op), cg(cg), pass(pass), am(am),
         runPipelineHelper(std::move(runPipelineHelper)), config(config),
-        isProfitableToInline(std::move(isProfitableToInline)) {}
+        isProfitableToInline(std::move(isProfitableToInline)),
+        canHandleMultipleBlocks(std::move(canHandleMultipleBlocks)) {}
   Inliner(Inliner &) = delete;
   void operator=(const Inliner &) = delete;
 
   /// Perform inlining on a OpTrait::SymbolTable operation.
   LogicalResult doInlining();
 
+  /// This function provides a callback mechanism to clone the instructions and
+  /// other information from the callee function into the caller function.
+  static CloneCallbackTy &doClone();
+
+  /// Set the clone callback function.
+  /// The provided function "func" will be invoked by Inliner::doClone().
+  void setCloneCallback(CloneCallbackTy func) { doClone() = func; }
+
 private:
   /// An OpTrait::SymbolTable operation to run the inlining on.
   Operation *op;
@@ -119,10 +141,14 @@ class Inliner {
   /// Returns true, if it is profitable to inline the callable operation
   /// at the call site.
   ProfitabilityCallbackTy isProfitableToInline;
+  /// Return true, if functions with multiple blocks can be inlined
+  /// into a region with the SingleBlock trait.
+  canHandleMultipleBlocksCbTy canHandleMultipleBlocks;
 
   /// Forward declaration of the class providing the actual implementation.
   class Impl;
 };
+
 } // namespace mlir
 
 #endif // MLIR_TRANSFORMS_INLINER_H
diff --git a/mlir/lib/Transforms/InlinerPass.cpp b/mlir/lib/Transforms/InlinerPass.cpp
index 703e517d45374..fc831a5ebf4fb 100644
--- a/mlir/lib/Transforms/InlinerPass.cpp
+++ b/mlir/lib/Transforms/InlinerPass.cpp
@@ -142,9 +142,13 @@ void InlinerPass::runOnOperation() {
     return isProfitableToInline(call, inliningThreshold);
   };
 
+  // By default, prevent inlining a functon containing multiple blocks into a
+  // region that requires a single block.
+  auto canHandleMultipleBlocksCb = [=]() { return false; };
+
   // Get an instance of the inliner.
   Inliner inliner(op, cg, *this, getAnalysisManager(), runPipelineHelper,
-                  config, profitabilityCb);
+                  config, profitabilityCb, canHandleMultipleBlocksCb);
 
   // Run the inlining.
   if (failed(inliner.doInlining()))
diff --git a/mlir/lib/Transforms/Utils/Inliner.cpp b/mlir/lib/Transforms/Utils/Inliner.cpp
index 756f5e379e7dd..f34fd089a8d98 100644
--- a/mlir/lib/Transforms/Utils/Inliner.cpp
+++ b/mlir/lib/Transforms/Utils/Inliner.cpp
@@ -344,6 +344,28 @@ static void collectCallOps(iterator_range<Region::iterator> blocks,
     }
   }
 }
+//===----------------------------------------------------------------------===//
+// Inliner
+//===----------------------------------------------------------------------===//
+// Initialize doClone function with the default implementation
+Inliner::CloneCallbackTy &Inliner::doClone() {
+  static Inliner::CloneCallbackTy doWork =
+      [](OpBuilder &builder, Region *src, Block *inlineBlock,
+         Block *postInsertBlock, IRMapping &mapper,
+         bool shouldCloneInlinedRegion) {
+        // Check to see if the region is being cloned, or moved inline. In
+        // either case, move the new blocks after the 'insertBlock' to improve
+        // IR readability.
+        Region *insertRegion = inlineBlock->getParent();
+        if (shouldCloneInlinedRegion)
+          src->cloneInto(insertRegion, postInsertBlock->getIterator(), mapper);
+        else
+          insertRegion->getBlocks().splice(postInsertBlock->getIterator(),
+                                           src->getBlocks(), src->begin(),
+                                           src->end());
+      };
+  return doWork;
+}
 
 //===----------------------------------------------------------------------===//
 // InlinerInterfaceImpl
@@ -729,19 +751,22 @@ bool Inliner::Impl::shouldInline(ResolvedCall &resolvedCall) {
 
   // Don't allow inlining if the callee has multiple blocks (unstructured
   // control flow) but we cannot be sure that the caller region supports that.
-  bool calleeHasMultipleBlocks =
-      llvm::hasNItemsOrMore(*callableRegion, /*N=*/2);
-  // If both parent ops have the same type, it is safe to inline. Otherwise,
-  // decide based on whether the op has the SingleBlock trait or not.
-  // Note: This check does currently not account for SizedRegion/MaxSizedRegion.
-  auto callerRegionSupportsMultipleBlocks = [&]() {
-    return callableRegion->getParentOp()->getName() ==
-               resolvedCall.call->getParentOp()->getName() ||
-           !resolvedCall.call->getParentOp()
-                ->mightHaveTrait<OpTrait::SingleBlock>();
-  };
-  if (calleeHasMultipleBlocks && !callerRegionSupportsMultipleBlocks())
-    return false;
+  if (!inliner.canHandleMultipleBlocks()) {
+    bool calleeHasMultipleBlocks =
+        llvm::hasNItemsOrMore(*callableRegion, /*N=*/2);
+    // If both parent ops have the same type, it is safe to inline. Otherwise,
+    // decide based on whether the op has the SingleBlock trait or not.
+    // Note: This check does currently not account for
+    // SizedRegion/MaxSizedRegion.
+    auto callerRegionSupportsMultipleBlocks = [&]() {
+      return callableRegion->getParentOp()->getName() ==
+                 resolvedCall.call->getParentOp()->getName() ||
+             !resolvedCall.call->getParentOp()
+                  ->mightHaveTrait<OpTrait::SingleBlock>();
+    };
+    if (calleeHasMultipleBlocks && !callerRegionSupportsMultipleBlocks())
+      return false;
+  }
 
   if (!inliner.isProfitableToInline(resolvedCall))
     return false;
diff --git a/mlir/lib/Transforms/Utils/InliningUtils.cpp b/mlir/lib/Transforms/Utils/InliningUtils.cpp
index 0cae63c58ca7b..4531dd17ee3b2 100644
--- a/mlir/lib/Transforms/Utils/InliningUtils.cpp
+++ b/mlir/lib/Transforms/Utils/InliningUtils.cpp
@@ -11,6 +11,7 @@
 //===----------------------------------------------------------------------===//
 
 #include "mlir/Transforms/InliningUtils.h"
+#include "mlir/Transforms/Inliner.h"
 
 #include "mlir/IR/Builders.h"
 #include "mlir/IR/IRMapping.h"
@@ -275,16 +276,10 @@ inlineRegionImpl(InlinerInterface &interface, Region *src, Block *inlineBlock,
   if (call && callable)
     handleArgumentImpl(interface, builder, call, callable, mapper);
 
-  // Check to see if the region is being cloned, or moved inline. In either
-  // case, move the new blocks after the 'insertBlock' to improve IR
-  // readability.
+  // Clone the callee's source into the caller.
   Block *postInsertBlock = inlineBlock->splitBlock(inlinePoint);
-  if (shouldCloneInlinedRegion)
-    src->cloneInto(insertRegion, postInsertBlock->getIterator(), mapper);
-  else
-    insertRegion->getBlocks().splice(postInsertBlock->getIterator(),
-                                     src->getBlocks(), src->begin(),
-                                     src->end());
+  Inliner::doClone()(builder, src, inlineBlock, postInsertBlock, mapper,
+                     shouldCloneInlinedRegion);
 
   // Get the range of newly inserted blocks.
   auto newBlocks = llvm::make_range(std::next(inlineBlock->getIterator()),

Copy link
Contributor

@jeanPerier jeanPerier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me. Letting people more familiar with this code reviewing/approving.

It would be great to be able to add a test showing the motivation of this interface by having a Clone version that can generate an scf.execute_region, but I understand it is not desired to make the InlinerPass dependent on the SCF dialect for the sake of this test.

@joker-eph
Copy link
Collaborator

I don't quite get the motivation for these and the description isn't clear on this: can you elaborate on why are these useful / desirable?

@junfengd-nv
Copy link
Contributor Author

Current inliner disables inlining when the caller is in a region with single block trait, while the callee function contains multiple blocks. the SingleBlock trait is used in operations such as do/while loop, for example fir.do_loop, fir.iterate_while and fir.if (refer to https://flang.llvm.org/docs/FIRLangRef.html). Typically, calls within loops are good candidates for inlining. However, functions with multiple blocks are also common. for example, any function with "if () then return" will result in multiple blocks in MLIR.
My change gives the flexibility of a customized inliner to handle such cases. One possible solution is wrapping the callee body in a scf.execute_region. Using the existing test (from mlir/test/Transforms/inlining.mlir) as an example

func.func @func_with_multiple_blocks(%arg0 : i32) {
  cf.br ^bb1(%arg0 : i32)
^bb1(%x : i32):
  "test.foo" (%x) : (i32) -> () loc("bar")
  return
}

// CHECK-LABEL: func @func_with_multiple_blocks_callee1
func.func @func_with_multiple_blocks_callee1(%arg0 : i32) {
  "test.dummy_op"() ({
    // Call cannot be inlined because "test.dummy" may not support unstructured
    // control flow in its body.
    call @func_with_multiple_blocks(%arg0) : (i32) -> ()
    "test.terminator"() : () -> ()
  }) : () -> ()
  return
}

The body of the callee will be wrapped into a scf.execute_region operation and then inlined as follows:

    "test.dummy_op"() ({
      scf.execute_region {
        cf.br ^bb1(%arg0 : i32)
      ^bb1(%0: i32):  // pred: ^bb0
        "test.foo"(%0) : (i32) -> ()
        scf.yield
      }
      "test.terminator"() : () -> ()
    }) : () -> ()

With the two callback functions that I added, the derived inliner can return true with function "canHandleMultipleBlocks" to avoid stopping during the inline decision process. And then add their extra transformations in the function "doClone".

@joker-eph
Copy link
Collaborator

Can you please:

  1. Add this information to the PR description, this is very clear, thanks.
  2. Add tests for this in the test dialect?

@junfengd-nv
Copy link
Contributor Author

junfengd-nv commented Mar 18, 2025

  1. Add this information to the PR description, this is very clear, thanks.

Updated

  1. Add tests for this in the test dialect?

The default behavior of MLIR inliner doesn't change, so I have not changed and added any tests. Do you want an example of demonstrating the creation of a customized inliner with wrapping in scf.execute_region? or just tests showing doClone/canHandleMultipleBlocks could be overridden? could you please pointing me an existing example if it exists? Thanks

@joker-eph
Copy link
Collaborator

joker-eph commented Mar 18, 2025

. Do you want an example of demonstrating the creation of a customized inliner with wrapping in scf.execute_region?

I think that would be a good test to have!

Should be doable by implementing this in the test directory right?

@junfengd-nv
Copy link
Contributor Author

@joker-eph @jeanPerier I have added a test under the test dialect to demonstrate the concept of customizing the inliner to wrap the callee into an SCF execute region. The SCF dialect has been added as a dependency solely for that test pass, and not for the default inliner pass.

Could you please review the changes again? Thank you very much.

Copy link

github-actions bot commented Apr 3, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

Copy link
Collaborator

@joker-eph joker-eph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@joker-eph joker-eph merged commit aeec945 into llvm:main Apr 5, 2025
11 checks passed
@@ -253,43 +254,51 @@ class InlinerInterface
/// provided, will be used to update the inlined operations' location
/// information. 'shouldCloneInlinedRegion' corresponds to whether the source
/// region should be cloned into the 'inlinePoint' or spliced directly.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, the documentation the all the API changes should be updated, can you do it in a follow-up PR please?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I will.

@llvm-ci
Copy link
Collaborator

llvm-ci commented Apr 5, 2025

LLVM Buildbot has detected a new failure on builder clang-aarch64-sve-vla running on linaro-g3-04 while building mlir at step 7 "ninja check 1".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/17/builds/7060

Here is the relevant piece of the build log for the reference
Step 7 (ninja check 1) failure: stage 1 checked (failure)
...
PASS: DataFlowSanitizer-aarch64 :: origin_ldst.c (25212 of 97666)
PASS: Flang :: Driver/plugin-invalid-name.f90 (25213 of 97666)
PASS: Flang :: Driver/print-resource-dir.F90 (25214 of 97666)
PASS: Flang :: Driver/print-effective-triple.f90 (25215 of 97666)
PASS: Flang :: Driver/override-triple.ll (25216 of 97666)
PASS: Flang :: Driver/parse-error.ll (25217 of 97666)
PASS: Flang :: Driver/predefined-macros-x86.f90 (25218 of 97666)
PASS: Flang :: Driver/color-diagnostics.f90 (25219 of 97666)
PASS: Flang :: Driver/bbc-openmp-version-macro.f90 (25220 of 97666)
UNRESOLVED: Flang :: Driver/slp-vectorize.ll (25221 of 97666)
******************** TEST 'Flang :: Driver/slp-vectorize.ll' FAILED ********************
Test has no 'RUN:' line
********************
PASS: Flang :: Driver/print-target-triple.f90 (25222 of 97666)
PASS: Flang :: Driver/missing-arg.f90 (25223 of 97666)
PASS: Flang :: Driver/parse-fir-error.ll (25224 of 97666)
PASS: Flang :: Driver/predefined-macros-compiler-version.F90 (25225 of 97666)
PASS: Flang :: Driver/mlir-pass-pipeline.f90 (25226 of 97666)
PASS: Flang :: Driver/phases.f90 (25227 of 97666)
PASS: Flang :: Driver/linker-flags.f90 (25228 of 97666)
PASS: Flang :: Driver/include-header.f90 (25229 of 97666)
PASS: Flang :: Driver/mlink-builtin-bc.f90 (25230 of 97666)
PASS: Flang :: Driver/pthread.f90 (25231 of 97666)
PASS: Flang :: Driver/fd-lines-as.f90 (25232 of 97666)
PASS: Flang :: Driver/std2018-wrong.f90 (25233 of 97666)
PASS: Flang :: Driver/supported-suffices/f03-suffix.f03 (25234 of 97666)
PASS: Flang :: Driver/print-pipeline-passes.f90 (25235 of 97666)
PASS: Flang :: Driver/supported-suffices/f08-suffix.f08 (25236 of 97666)
PASS: Flang :: Driver/scanning-error.f95 (25237 of 97666)
PASS: Clangd Unit Tests :: ./ClangdTests/69/81 (25238 of 97666)
PASS: Flang :: Driver/lto-bc.f90 (25239 of 97666)
PASS: Flang :: Driver/tco-code-gen-llvm.fir (25240 of 97666)
PASS: Flang :: Driver/config-file.f90 (25241 of 97666)
PASS: Flang :: Driver/target-gpu-features.f90 (25242 of 97666)
PASS: Flang :: Driver/pp-fixed-form.f90 (25243 of 97666)
PASS: Flang :: Driver/parse-ir-error.f95 (25244 of 97666)
PASS: Flang :: Driver/target.f90 (25245 of 97666)
PASS: Flang :: Driver/pass-plugin-not-found.f90 (25246 of 97666)
PASS: Flang :: Driver/no-duplicate-main.f90 (25247 of 97666)
PASS: Flang :: Driver/q-unused-arguments.f90 (25248 of 97666)
PASS: Flang :: Driver/fixed-line-length.f90 (25249 of 97666)
PASS: Flang :: Driver/unparse-with-modules.f90 (25250 of 97666)
PASS: Flang :: Driver/unsupported-vscale-max-min.f90 (25251 of 97666)
PASS: Clangd Unit Tests :: ./ClangdTests/75/81 (25252 of 97666)
PASS: Flang :: Driver/optimization-remark-backend.f90 (25253 of 97666)
PASS: Flang :: Driver/target-machine-error.f90 (25254 of 97666)
PASS: Flang :: Driver/mllvm.f90 (25255 of 97666)
PASS: Flang :: Driver/optimization-remark-invalid.f90 (25256 of 97666)
PASS: Flang :: Driver/lto-flags.f90 (25257 of 97666)

kuhar added a commit to iree-org/iree that referenced this pull request Apr 8, 2025
The major changes come from the following PRs:
* llvm/llvm-project#131226
* llvm/llvm-project#134264
* llvm/llvm-project#134169
* llvm/llvm-project#134517

---------

Signed-off-by: MaheshRavishankar <[email protected]>
Signed-off-by: Jakub Kuderski <[email protected]>
Co-authored-by: MaheshRavishankar <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
mlir:core MLIR Core Infrastructure mlir
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants