Skip to content

[mlir] Python: Parse ModuleOp from file path #125736

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Feb 5, 2025
Merged

Conversation

nikalra
Copy link
Contributor

@nikalra nikalra commented Feb 4, 2025

For extremely large models, it may be inefficient to load the model into memory in Python prior to passing it to the MLIR C APIs for deserialization. This change adds an API to parse a ModuleOp directly from a file path.

For extremely large models, it may be inefficient to load the model into memory in Python prior to passing it to the MLIR C APIs for deserialization. This change adds an API to parse a ModuleOp directly from a file path.
@llvmbot
Copy link
Member

llvmbot commented Feb 4, 2025

@llvm/pr-subscribers-mlir

Author: Nikhil Kalra (nikalra)

Changes

For extremely large models, it may be inefficient to load the model into memory in Python prior to passing it to the MLIR C APIs for deserialization. This change adds an API to parse a ModuleOp directly from a file path.


Full diff: https://github.com/llvm/llvm-project/pull/125736.diff

5 Files Affected:

  • (modified) mlir/include/mlir-c/IR.h (+4)
  • (modified) mlir/include/mlir/Bindings/Python/Nanobind.h (+1)
  • (modified) mlir/lib/Bindings/Python/IRCore.cpp (+15-1)
  • (modified) mlir/lib/CAPI/IR/IR.cpp (+10)
  • (modified) mlir/python/mlir/_mlir_libs/_mlir/ir.pyi (+2-1)
diff --git a/mlir/include/mlir-c/IR.h b/mlir/include/mlir-c/IR.h
index 7d2fd89e8560fc..14ccae650606af 100644
--- a/mlir/include/mlir-c/IR.h
+++ b/mlir/include/mlir-c/IR.h
@@ -309,6 +309,10 @@ MLIR_CAPI_EXPORTED MlirModule mlirModuleCreateEmpty(MlirLocation location);
 MLIR_CAPI_EXPORTED MlirModule mlirModuleCreateParse(MlirContext context,
                                                     MlirStringRef module);
 
+/// Parses a module from file and transfers ownership to the caller.
+MLIR_CAPI_EXPORTED MlirModule
+mlirModuleCreateParseFromFile(MlirContext context, MlirStringRef fileName);
+
 /// Gets the context that a module was created with.
 MLIR_CAPI_EXPORTED MlirContext mlirModuleGetContext(MlirModule module);
 
diff --git a/mlir/include/mlir/Bindings/Python/Nanobind.h b/mlir/include/mlir/Bindings/Python/Nanobind.h
index ca942c83d3e2fa..bc8bddf4caf7e7 100644
--- a/mlir/include/mlir/Bindings/Python/Nanobind.h
+++ b/mlir/include/mlir/Bindings/Python/Nanobind.h
@@ -23,6 +23,7 @@
 #endif
 #include <nanobind/nanobind.h>
 #include <nanobind/ndarray.h>
+#include <nanobind/stl/filesystem.h>
 #include <nanobind/stl/function.h>
 #include <nanobind/stl/optional.h>
 #include <nanobind/stl/pair.h>
diff --git a/mlir/lib/Bindings/Python/IRCore.cpp b/mlir/lib/Bindings/Python/IRCore.cpp
index 8e351cb22eb948..b772c9a583a6b5 100644
--- a/mlir/lib/Bindings/Python/IRCore.cpp
+++ b/mlir/lib/Bindings/Python/IRCore.cpp
@@ -6,6 +6,7 @@
 //
 //===----------------------------------------------------------------------===//
 
+#include <filesystem>
 #include <optional>
 #include <utility>
 
@@ -299,7 +300,7 @@ struct PyAttrBuilderMap {
     return *builder;
   }
   static void dunderSetItemNamed(const std::string &attributeKind,
-                                nb::callable func, bool replace) {
+                                 nb::callable func, bool replace) {
     PyGlobals::get().registerAttributeBuilder(attributeKind, std::move(func),
                                               replace);
   }
@@ -3047,6 +3048,19 @@ void mlir::python::populateIRCore(nb::module_ &m) {
           },
           nb::arg("asm"), nb::arg("context").none() = nb::none(),
           kModuleParseDocstring)
+      .def_static(
+          "parse",
+          [](const std::filesystem::path &path,
+             DefaultingPyMlirContext context) {
+            PyMlirContext::ErrorCapture errors(context->getRef());
+            MlirModule module = mlirModuleCreateParseFromFile(
+                context->get(), toMlirStringRef(path.string()));
+            if (mlirModuleIsNull(module))
+              throw MLIRError("Unable to parse module assembly", errors.take());
+            return PyModule::forModule(module).releaseObject();
+          },
+          nb::arg("asm"), nb::arg("context").none() = nb::none(),
+          kModuleParseDocstring)
       .def_static(
           "create",
           [](DefaultingPyLocation loc) {
diff --git a/mlir/lib/CAPI/IR/IR.cpp b/mlir/lib/CAPI/IR/IR.cpp
index f27af0ca9a2c78..999e8cbda1295a 100644
--- a/mlir/lib/CAPI/IR/IR.cpp
+++ b/mlir/lib/CAPI/IR/IR.cpp
@@ -22,6 +22,7 @@
 #include "mlir/IR/Location.h"
 #include "mlir/IR/Operation.h"
 #include "mlir/IR/OperationSupport.h"
+#include "mlir/IR/OwningOpRef.h"
 #include "mlir/IR/Types.h"
 #include "mlir/IR/Value.h"
 #include "mlir/IR/Verifier.h"
@@ -328,6 +329,15 @@ MlirModule mlirModuleCreateParse(MlirContext context, MlirStringRef module) {
   return MlirModule{owning.release().getOperation()};
 }
 
+MlirModule mlirModuleCreateParseFromFile(MlirContext context,
+                                         MlirStringRef fileName) {
+  OwningOpRef<ModuleOp> owning =
+      parseSourceFile<ModuleOp>(unwrap(fileName), unwrap(context));
+  if (!owning)
+    return MlirModule{nullptr};
+  return MlirModule{owning.release().getOperation()};
+}
+
 MlirContext mlirModuleGetContext(MlirModule module) {
   return wrap(unwrap(module).getContext());
 }
diff --git a/mlir/python/mlir/_mlir_libs/_mlir/ir.pyi b/mlir/python/mlir/_mlir_libs/_mlir/ir.pyi
index fb7efb8cd28a5e..096b87b3624436 100644
--- a/mlir/python/mlir/_mlir_libs/_mlir/ir.pyi
+++ b/mlir/python/mlir/_mlir_libs/_mlir/ir.pyi
@@ -46,6 +46,7 @@ import abc
 import collections
 from collections.abc import Callable, Sequence
 import io
+from pathlib import Path
 from typing import Any, ClassVar, TypeVar, overload
 
 __all__ = [
@@ -2123,7 +2124,7 @@ class Module:
         Creates an empty module
         """
     @staticmethod
-    def parse(asm: str | bytes, context: Context | None = None) -> Module:
+    def parse(asm: str | bytes | Path, context: Context | None = None) -> Module:
         """
         Parses a module's assembly format from a string.
 

@makslevental
Copy link
Contributor

Seems reasonable - can you add a test though?

@nikalra
Copy link
Contributor Author

nikalra commented Feb 5, 2025

Seems reasonable - can you add a test though?

done!

Copy link
Contributor

@makslevental makslevental left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool LGTM. Let me know if you need me to merge (not sure if you have commit privileges).

@nikalra nikalra merged commit 4e14b8a into llvm:main Feb 5, 2025
8 checks passed
@nikalra nikalra deleted the bc-parse-file branch February 5, 2025 19:48
@nikalra
Copy link
Contributor Author

nikalra commented Feb 5, 2025

Looks like this change broke the mlir-nvidia-gcc7 build because gcc7 doesn't support filesystem.

@joker-eph -- It looks like you're listed as the owner of this job. Is there any chance that builder can be upgraded to gcc8 for C++17 support? If not, I can put up a PR to guard the API with #if __has_include(<fileystem>).

@@ -6,6 +6,7 @@
//
//===----------------------------------------------------------------------===//

#include <filesystem>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this broke the bot, see here: https://lab.llvm.org/buildbot/#/builders/116/builds/9857

@joker-eph
Copy link
Collaborator

joker-eph commented Feb 10, 2025

Looks like this change broke the mlir-nvidia-gcc7 build because gcc7 doesn't support filesystem.

Please revert (I just did) or fix immediately when a bot is broken: it is critical to keep them green otherwise other changes can silently add breakages and it's hard to track these new failures.

Is there any chance that builder can be upgraded to gcc8 for C++17 support?

This bot is meant to test that LLVM can build with gcc7, which is officially still supported, so that would make the point of this bot moot.

Any reason you're not using llvm::filesystem API instead?

joker-eph added a commit that referenced this pull request Feb 10, 2025
joker-eph added a commit that referenced this pull request Feb 10, 2025
Reverts #125736

The gcc7 Bot is broken at the moment.
github-actions bot pushed a commit to arm/arm-toolchain that referenced this pull request Feb 10, 2025
@nikalra
Copy link
Contributor Author

nikalra commented Feb 10, 2025

Any reason you're not using llvm::filesystem API instead?

std::filesystem::path is bridged to pathlib.Path in Python so we'd get a cleaner API surface (versus using String).

Icohedron pushed a commit to Icohedron/llvm-project that referenced this pull request Feb 11, 2025
For extremely large models, it may be inefficient to load the model into
memory in Python prior to passing it to the MLIR C APIs for
deserialization. This change adds an API to parse a ModuleOp directly
from a file path.
Icohedron pushed a commit to Icohedron/llvm-project that referenced this pull request Feb 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
mlir:python MLIR Python bindings mlir
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants