Skip to content

[mlir][python] remove eager loading of dialect module (for type and value casting) #72338

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Nov 21, 2023

Conversation

makslevental
Copy link
Contributor

@makslevental makslevental commented Nov 15, 2023

So it turns out there are funny people (CIRCT ahem...) that depend on MLIRPythonSources.Core sources but choose not to build/depend on any of the upstream dialects, in particular builtin. Unfortunately this breaks all of the type casters and such after #70831 because all of the fancy lookup... functions will early exit if loadDialectModule fails. So let's do these funny people a favor and automatically include the builtin dialect for them with MLIRPythonSources.Core.

cc @mikeurbach

EDIT: okay cutting the gordian knot: dialect module loading prior to type caster or value caster use is "best effort" - i.e., we ignore the return value for only those two lookups (lookupDialectClass and lookupOperationClass stay strict). It was probably overly cautious anyway - the worst that happens if you don't early exit after a failed (absent) dialect module load is the type caster lookup also fails (i.e., not really an error, just a return to the default behavior).

EDIT2: I propose we remove eager dialect module loading from lookupTypeCaster and lookupValueCaster entirely. I don't remember who it was that asked for that kind of safety but IMHO if you want type casting to your own dialect's types, it's not unreasonable to ask you to first register those type casters - whether that be by importing a python module that performs the registration using the decorator syntax, or importing the C extension module that implements mlir_type_subclass. I don't think an explicit import if you want to avail yourself of type casting to a type in your dialect is an excessive complexity/boilerplate burden. And it's failsafe - if the type/value caster isn't registered then nbd - the corresponding map (typeCasterMap, valueCasterMap) will not hit and default types (ir.Type) will be returned. But note: builtin types and attributes will be cast successfully (because casters for builtin are registered in the pybind code itself). The current PR commit implements this proposal.

@makslevental makslevental changed the title [mlir][python] automatically bundle builtin dialect with core [mlir][python] automatically bundle builtin dialect with core sources Nov 15, 2023
@llvmbot llvmbot added mlir:python MLIR Python bindings mlir labels Nov 15, 2023
@llvmbot
Copy link
Member

llvmbot commented Nov 15, 2023

@llvm/pr-subscribers-mlir

Author: Maksim Levental (makslevental)

Changes

So it turns out there are funny people (CIRCT ahem...) that depend on MLIRPythonSources.Core sources but choose not to build/depend any of the upstream dialects. Unfortunately this breaks all of the type casters and such after #70831 because all of the fancy lookup... functions will early exit if loadDialectModule fails. So let's do these funny people a favor and automatically including the builtin dialect for them with MLIRPythonSources.Core.


Full diff: https://github.com/llvm/llvm-project/pull/72338.diff

1 Files Affected:

  • (modified) mlir/python/CMakeLists.txt (+11-8)
diff --git a/mlir/python/CMakeLists.txt b/mlir/python/CMakeLists.txt
index 971ad2dd214a15f..6be84ec311e23ca 100644
--- a/mlir/python/CMakeLists.txt
+++ b/mlir/python/CMakeLists.txt
@@ -44,6 +44,17 @@ declare_mlir_python_sources(MLIRPythonCAPI.HeaderSources
   SOURCES_GLOB "mlir-c/*.h"
 )
 
+# The builtin dialect is special (e.g., type casters and such expect it to be loaded
+# in order to work) so we add it to Core instead of Dialects so that anyone depending on
+# Core will get it automatically.
+declare_mlir_dialect_python_bindings(
+  ADD_TO_PARENT MLIRPythonSources.Core
+  ROOT_DIR "${CMAKE_CURRENT_SOURCE_DIR}/mlir"
+  TD_FILE dialects/BuiltinOps.td
+  SOURCES
+    dialects/builtin.py
+  DIALECT_NAME builtin)
+
 ################################################################################
 # Dialect bindings
 ################################################################################
@@ -84,14 +95,6 @@ declare_mlir_dialect_python_bindings(
     "../../include/mlir/Dialect/Bufferization/IR/BufferizationEnums.td"
 )
 
-declare_mlir_dialect_python_bindings(
-  ADD_TO_PARENT MLIRPythonSources.Dialects
-  ROOT_DIR "${CMAKE_CURRENT_SOURCE_DIR}/mlir"
-  TD_FILE dialects/BuiltinOps.td
-  SOURCES
-    dialects/builtin.py
-  DIALECT_NAME builtin)
-
 declare_mlir_dialect_python_bindings(
   ADD_TO_PARENT MLIRPythonSources.Dialects
   ROOT_DIR "${CMAKE_CURRENT_SOURCE_DIR}/mlir"

Copy link
Contributor

@stellaraccident stellaraccident left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't they just depend on the built-in dialect? That's what iree does: https://github.com/openxla/iree/blob/98e31728b7765664a0a87e5e9eeec76936548ff3/compiler/bindings/python/CMakeLists.txt#L142

(It is also picky about only depending on dialects or cares about and enumerates then explicitly)

@mikeurbach
Copy link
Contributor

I think this makes sense, but I'm curious what others think. When we initially made the switch in CIRCT to the recommended approach where downstream users of the Python bindings "vendor" MLIR in their own package, we only ever wanted to depend on MLIRPythonSources.Core. I seem to recall at the time we made the switch, we didn't need to do anything else for the builtin dialect to be available.

These days, with loadDialectModule expecting to be able to load the builtin dialect module, it seems reasonable to have the builtin dialect generated and available for that function to find. I will point out that something else I tried was just patching loadDialectModule to immediately return true for the builtin dialect, and this works. IMHO that would be fine too.

@mikeurbach
Copy link
Contributor

@stellaraccident that is what we just added, so if that's the recommended thing to do, that's fine.

I guess I was just a little surprised, because in general in MLIR I feel like you don't usually have to ask for the builtin dialect explicitly, and it seems with some of the recentish Python changes you now do.

@stellaraccident
Copy link
Contributor

Yeah, I'm not sure exactly what regressed here and why... Reading some code

@stellaraccident
Copy link
Contributor

Why can't the just depend on the built-in dialect? That's what iree does: https://github.com/openxla/iree/blob/98e31728b7765664a0a87e5e9eeec76936548ff3/compiler/bindings/python/CMakeLists.txt#L142

(It is also picky about only depending on dialects or cares about and enumerates then explicitly)

Yeah, I'm not sure exactly what regressed here and why... Reading some code

I don't think that early exit is right. It is perfectly fine to use the python API without generated dialect helper modules. Trying to load the helper module may make sense but not sure why it is required.

@makslevental
Copy link
Contributor Author

Why can't they just depend on the built-in dialect? That's what iree does:

They could but it's not really optional anymore (at least in so far as getting access to the type casters on builtin types). Although I believe nothing actually breaks (you just don't get type and attribute casting).

I will point out that something else I tried was just patching loadDialectModule to immediately return true for the builtin dialect, and this works. IMHO that would be fine too.

This was intentionally done away with - the previous caching system had negative caches which effectively accomplished the same thing (a miss was "registered" as a py::none()) but then there needed to be checks and extra complexity in all of the lookups.

@makslevental
Copy link
Contributor Author

makslevental commented Nov 15, 2023

I don't think that early exit is right. It is perfectly fine to use the python API without generated dialect helper modules. Trying to load the helper module may make sense but not sure why it is required.

The reasoning for the exit (for type and attribute casters) was that if your dialect isn't loaded then your types aren't registered either. Indeed there isn't anything that fails if you don't load the helper module except you shouldn't be able to get type casting.

@stellaraccident
Copy link
Contributor

It looks like that caching pr has added a behavior that was never intended: not having dialect helper modules should be perfectly legal.

@makslevental
Copy link
Contributor Author

It looks like that caching pr has added a behavior that was never intended: not having dialect helper modules should be perfectly legal.

It's important to emphasize - it is still legal. No breakages occur except for in the functionality that depends on loaded dialects (such as type and attribute casting).

@stellaraccident
Copy link
Contributor

Ok. I vote for leaving it as is. It might just be my itchiness about crossing different things in the same name grouping. I don't feel strongly about it.

@makslevental
Copy link
Contributor Author

makslevental commented Nov 15, 2023

The alternative is to drop those dialect module loads (prior to type/attribute casting) which will also surprise people because now something like this

t = Type.parse('!transform.op<"foo.bar">', Context())
# CHECK: !transform.op<"foo.bar">
print(t)
# CHECK: OperationType(!transform.op<"foo.bar">)
print(repr(t))

without a prior from mlir.dialects import transform won't work. It's the same failure mode - a necessary module (either C extension or dialect) hasn't been loaded.

@makslevental
Copy link
Contributor Author

makslevental commented Nov 15, 2023

Ok. I vote for leaving it as is. It might just be my itchiness about crossing different things in the same name grouping. I don't feel strongly about it.

I buy that but builtin is special isn't it?

Another alternative is

diff --git a/mlir/lib/Bindings/Python/Globals.h b/mlir/lib/Bindings/Python/Globals.h
index a022067f5c7e..b16bf4fd3c46 100644
--- a/mlir/lib/Bindings/Python/Globals.h
+++ b/mlir/lib/Bindings/Python/Globals.h
@@ -124,7 +124,7 @@ private:
   llvm::DenseMap<MlirTypeID, pybind11::object> valueCasterMap;
   /// Set of dialect namespaces that we have attempted to import implementation
   /// modules for.
-  llvm::StringSet<> loadedDialectModules;
+  llvm::StringSet<> loadedDialectModules{"builtin"};
 };
 
 } // namespace python

which solves the problem (the check will pass - no early exit) and everything will work, except the OpViews generated for builtin (ModuleOp, UnrealizedConversionCastOp) won't be registered. Not a good solution I think...

@makslevental
Copy link
Contributor Author

And of course, a third (fourth?) option is just to document that you need to add MLIRPythonSources.Dialects.builtin to your DECLARED_SOURCES under add_mlir_python_modules as @mikeurbach did. Mediocre solution?

@stellaraccident
Copy link
Contributor

I don't have a strong opinion. Do what you think is right. I'm typically working on more systematic uses of the API anyway, and personally, I find most of the sugar to get in the way when used in that way. I'm not sure I'm current on more of the "fully loaded" use of the API and defer to those who are.

@mikeurbach
Copy link
Contributor

And of course, a third (fourth?) option is just to document that you need to add MLIRPythonSources.Dialects.builtin to your DECLARED_SOURCES under add_mlir_python_modules as @mikeurbach did.

This is fine by me. I want the automatic casting for builtin types and attributes to the concrete subclass, that's a pareto improvement IMHO for any op whose Python bindings return builtin types or attributes. It feels a little weird that you have to depend on the builtin dialect's generated Python file for this to work, since the register_dialect call in there doesn't do anything; the builtin dialect is already registered. That's why I previously suggested just letting loadDialectModule immediately return true for the builtin dialect. But I don't have a very strong opinion here, if the change we just made in CIRCT is what is recommended, that's fine by me.

@makslevental makslevental changed the title [mlir][python] automatically bundle builtin dialect with core sources [mlir][python] loading dialect module is best effort only Nov 15, 2023
@makslevental makslevental changed the title [mlir][python] loading dialect module is best effort only [mlir][python] loading dialect module (for type and value casting) is best effort only Nov 15, 2023
@makslevental makslevental changed the title [mlir][python] loading dialect module (for type and value casting) is best effort only [mlir][python] make loading dialect module (for type and value casting) best effort only Nov 15, 2023
Copy link
Member

@rkayaith rkayaith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current approach seems fine to me, but since the negative caching was removed this setup will cause the module imports to be attempted on every downcast. I wonder if it's noticeable?

Comment on lines +135 to +136
// Try to load dialect module.
(void)loadDialectModule(unwrap(mlirDialectGetNamespace(dialect)));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't really think of a nice way to test this, so can you at least leave a comment here about why we continue on instead of returning.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No one has noticed it, but the Python API overhead for all of the fancy stuff is raising on my hit list. There are a wide swatch of use cases that need no frills, and I worry about the overhead of this kind of thing. For some cases, I've already wanted a big switch that just turns off all OpViews, for example.

@makslevental
Copy link
Contributor Author

The current approach seems fine to me, but since the negative caching was removed this setup will cause the module imports to be attempted on every downcast. I wonder if it's noticeable?

Something special needs to be done for builtin and just hard-coding "builtin" somewhere seems like a hack (because the dialect module will appear to be loaded but the OpViews won't be registered). The only other thing I can think of is just to generate the _builtin_ops_gen.py once, take that code and stick it in eg ir.py. The likelihood that builtin grows anymore ops is small right?

@stellaraccident
Copy link
Contributor

The current approach seems fine to me, but since the negative caching was removed this setup will cause the module imports to be attempted on every downcast. I wonder if it's noticeable?

Something special needs to be done for builtin and just hard-coding "builtin" somewhere seems like a hack (because the dialect module will appear to be loaded but the OpViews won't be registered). The only other thing I can think of is just to generate the _builtin_ops_gen.py once, take that code and stick it in eg ir.py. The likelihood that builtin grows anymore ops is small right?

Let's not do that...

@rkayaith
Copy link
Member

The current approach seems fine to me, but since the negative caching was removed this setup will cause the module imports to be attempted on every downcast. I wonder if it's noticeable?

Something special needs to be done for builtin and just hard-coding "builtin" somewhere seems like a hack (because the dialect module will appear to be loaded but the OpViews won't be registered). The only other thing I can think of is just to generate the _builtin_ops_gen.py once, take that code and stick it in eg ir.py. The likelihood that builtin grows anymore ops is small right?

imo adding back the negative caching would be preferable over that

@makslevental
Copy link
Contributor Author

makslevental commented Nov 15, 2023

I propose we remove eager dialect module loading from lookupTypeCaster and lookupValueCaster entirely.

I don't remember who it was that asked for that kind of safety but IMHO if you want type casting to your own dialect's types, it's not unreasonable to ask you to first register those type casters - whether that be by importing a python module that performs the registration using the decorator syntax, or importing the C extension module that implements mlir_type_subclass. I don't think an explicit import if you want to avail yourself of type casting to a type in your dialect is an excessive complexity/boilerplate burden.

And it's failsafe - if the type/value caster isn't registered then nbd - the corresponding map (typeCasterMap, valueCasterMap) will not hit and default types (ir.Type) will be returned. But note: builtin types and attributes will be cast successfully (because casters for builtin are registered in the pybind code itself). The current PR commit implements this proposal.

@makslevental makslevental changed the title [mlir][python] make loading dialect module (for type and value casting) best effort only [mlir][python] remove eager loading dialect module (for type and value casting) Nov 15, 2023
@makslevental makslevental changed the title [mlir][python] remove eager loading dialect module (for type and value casting) [mlir][python] remove eager loading of dialect module (for type and value casting) Nov 15, 2023
@mikeurbach
Copy link
Contributor

The latest version seems to look like a good solution to me.

@stellaraccident
Copy link
Contributor

The latest version seems to look like a good solution to me.

I haven't been following the extensible type casting work closely, but what you say makes sense to me.

@rkayaith
Copy link
Member

I don't remember who it was that asked for that kind of safety but IMHO if you want type casting to your own dialect's types, it's not unreasonable to ask you to first register those type casters - whether that be by importing a python module that performs the registration using the decorator syntax, or importing the C extension module that implements mlir_type_subclass. I don't think an explicit import if you want to avail yourself of type casting to a type in your dialect is an excessive complexity/boilerplate burden.

Doesn't this apply to OpViews as well? If we're removing the auto-import stuff for type/attribute downcasting, why not go all the way and remove it everywhere? I still don't think having to explicitly do side-effecting imports is great UX for end users who don't really care how all the downcasting works behind the scenes (I'm not sure how they'd even know that the import is the thing they have to do to enable downcasting), but I won't die on that hill; I do think the behaviour should at least be consistent between all these downcasting features.

@makslevental makslevental merged commit 26dc765 into llvm:main Nov 21, 2023
@makslevental makslevental deleted the fix_builtin_dialect branch November 21, 2023 01:54
mikeurbach added a commit to llvm/circt that referenced this pull request Jul 20, 2024
This specific dependency no longer works after an upstream change:
llvm/llvm-project#97167. Furthermore, we never
actually wanted this dependency, since we don't actually do anything
with the builtin dialect ops. We only used this to convince the MLIR
bindings to do the automatic downcasting for builtin dialect types and
attributes, which required the builtin dialect to be registered (even
though it always is) until this change went in:
llvm/llvm-project#72338. Now that these
changes are behind us, we can remove our workaround and avoid the
issues that ensue by depending on a target no one usually depends on.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
mlir:python MLIR Python bindings mlir
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants