-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[MLIR] Fix thread safety of the deleter in PyDenseResourceElementsAttribute #124832
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ribute. In general, `PyDenseResourceElementsAttribute` can get deleted at any time and any thread, where unlike the `getFromBuffer` call, the Python interpreter may not be initialized and the GIL may not be held. This PR fixes segfaults caused by `PyBuffer_Release` when the GIL is not being held by the thread calling the deleter.
@llvm/pr-subscribers-mlir Author: Fabian Tschopp (naibaf7) ChangesIn general, This PR fixes segfaults caused by Full diff: https://github.com/llvm/llvm-project/pull/124832.diff 1 Files Affected:
diff --git a/mlir/lib/Bindings/Python/IRAttributes.cpp b/mlir/lib/Bindings/Python/IRAttributes.cpp
index 7bc21a31c3c84c..142b6eca11c438 100644
--- a/mlir/lib/Bindings/Python/IRAttributes.cpp
+++ b/mlir/lib/Bindings/Python/IRAttributes.cpp
@@ -1468,7 +1468,10 @@ class PyDenseResourceElementsAttribute
// The userData is a Py_buffer* that the deleter owns.
auto deleter = [](void *userData, const void *data, size_t size,
size_t align) {
+ if (!Py_IsInitialized())
+ Py_Initialize();
Py_buffer *ownedView = static_cast<Py_buffer *>(userData);
+ nb::gil_scoped_acquire gil;
PyBuffer_Release(ownedView);
delete ownedView;
};
|
…mentsAttribute (llvm#124832)" This reverts commit 28507ac.
@naibaf7 can you give a bit more context when this happens? Do you have a reproducer for this? I'm looking at our end why this change causes problems for us. I'll post with more info later. |
Use a `torch.export.ExportedProgram` to generate the initial MLIR module. This requires us to create an `ExportedProgram` from the initial `GraphModule`. Benefits: - We can use the torch-mlir's official entrypoint - This handles in-place ops for us - We can run decompositions and **keep** location data - This location data will stick around throughout the compile process Issues: - `aten.clamp` is decomposed by torch-mlir to `maximum(minimum(input, max), min)`. `ttnn.maximum` requires that the operand which needs to be broadcasted is on the RHS. Currently, in tt-mlir the `PartiallyBroadcastable` op trait only enforces that the broadcasted operand is on the LHS - tt-torch issue: #431 - tt-mlir issue: tenstorrent/tt-mlir#2458 - Graph parameters are inlined as constants in the graph. To have the `FxImporter` treat them as graph inputs we need to edit the `ExportedModule`s `ExportedGraphSignature` and force all "parameter" types to "user inputs" - This is a hack as the `ExportedGraphSignature` is meant to be a private member of `ExportedProgram` - Ideally we can configure the `FxImporter` to _not_ inline the parameters if we pass a flag of some sort. Perhaps a future contribution to torch-mlir. Other Info: - We need to upgrade to PyTorch 2.6.0 as it contains crucial changes which allow us to use custom decompositions (necessary to support interpolation) - AdaptiveAvgPool2d is lowered AvgPool2d and eventually to `stablehlo.reduce_window **even in the case where the op is equivalent to a global average**. Since we do not have support for lowering a sum_pool in `StablehloToTTIRPatterns.cpp` (sum because the division is afterward), I've temporarily added a custom decomposition of `aten.avg_pool2d` which will convert to a mean over the spatial dimensions when the `avg_pool2d` is equivalent to it. - `aten.split` is no longer lowered to a series of `narrow` ops. Instead it is now lowered to a series of `as_strided` ops. - `narrow` is lowered to `slice`, which can be lowered to `stablehlo.slice`. `as_strided` cannot be lowered from Torch Backend IR to Stablehlo. I've temporarily added back the old decomposition from PyTorch 2.5.0 which uses narrow as a custom decomposition. - I've made a PR which adds a lowering of `AtenAsStridedOp` to `stablehlo::SliceOp` in our fork of torch-mlir: tenstorrent/llvm-torch-mlir#4 - The tracer which generates the `GraphModule` which is passed to `backend` does not account for control flow, I believe in PyTorch 2.5.0 a graph break would be triggered during `.generate` methods in `transformers` LLMs. It does not anymore and so `.generate` will run until the max length is reached. - **this means that the entire generation becomes one program** - Once the first EOS token is generated, the rest of the length is filled with padding. We cannot compare the golden output to the result from the `GraphModule` as the output shapes are different. - Since the output of `.generate` graphs are integers PCC/atol verification is not quite useful but does return `True` when the outputs are _identical_ - The tokenizer can decode the outputs and strip padding. - I've added a flag to `ModelTester` that informs the `ModelTester` it is testing a `.generate` call. It will decode the output tokens and we compare the resulting strings. - PyTorch has an experimental `torch.cond` which they seem to intend to use to trace data-dependent control-flow. There's a note in the `transformers` source that says they intend to use it when it is no longer experimental - When the graph is compiled, the user inputs are placed **at the end** of the arguments passed to the program rather than the front. That is graph constants first, then inputs. - I needed to implement an `FxImporter` hook for importing literals to the graph. By default it will make all non-scalars `DenseElementsResourceAttr`s, however, this causes the process to hang upon cleanup whether the test fails or not. So the hook just uses `DenseElementsAttr` for all literals. - Someone has mentioned this problem in an IREE issue as well: iree-org/iree#20102 - They've traced it down to this PR in llvm that adds a GIL acquire when destroying the `DenseElementsResourceAttr`: llvm/llvm-project#124832
@naibaf7 @lattner @weiweichen I now see this PR referenced in another project. Could you share how to reproduce the error you saw? |
@chrsmcgrr This change can cause issues if the GIL, for example, is held by another thread than the releasing one. In that case it may deadlock, but that would be correct behaviour, and the other thread should release the GIL. Though without this change it's not thread safe and may fail with segfaults. |
…8b4b (#54922) - Previously it was assumed that nb::gil_scoped_release release; could not be used for compiling torch models, due to handling `PyDenseResourceElementsAttribute` in constant folding. However, there is no guarantee that the thread releasing the PyDenseResource will be holding the GIL (this was addressed in llvm/llvm-project#124832). This PR avoids GIL deadlocks by releasing the GIL in both code paths. - Unblocks Python 3.12 support. - Update lldb20.0.0git to lldb21.0.0git - Add `funcRange.front().GetBaseAddress()` to the `Function` constructor call. - Additional dependencies from LLVM/MLIR are necessary since they are no longer included transitively. - llvm/llvm-project#123488 causes new dependencies on UBDialect. MAX_GRAPH_API_ORIG_REV_ID: cab252a7205f18c0a1320bc4e0f1689f2731932a
…8b4b (#54922) - Previously it was assumed that nb::gil_scoped_release release; could not be used for compiling torch models, due to handling `PyDenseResourceElementsAttribute` in constant folding. However, there is no guarantee that the thread releasing the PyDenseResource will be holding the GIL (this was addressed in llvm/llvm-project#124832). This PR avoids GIL deadlocks by releasing the GIL in both code paths. - Unblocks Python 3.12 support. - Update lldb20.0.0git to lldb21.0.0git - Add `funcRange.front().GetBaseAddress()` to the `Function` constructor call. - Additional dependencies from LLVM/MLIR are necessary since they are no longer included transitively. - llvm/llvm-project#123488 causes new dependencies on UBDialect. MAX_GRAPH_API_ORIG_REV_ID: cab252a7205f18c0a1320bc4e0f1689f2731932a
In general,
PyDenseResourceElementsAttribute
can get deleted at any time and any thread, where unlike thegetFromBuffer
call, the Python interpreter may not be initialized and the GIL may not be held.This PR fixes segfaults caused by
PyBuffer_Release
when the GIL is not being held by the thread calling the deleter.