-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[MLIR] Ensure deterministic parallel verification #134963
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MLIR] Ensure deterministic parallel verification #134963
Conversation
@llvm/pr-subscribers-mlir-core Author: Nachi G (nacgarg) Changes
This PR uses Other potential fixes include:
Full diff: https://github.com/llvm/llvm-project/pull/134963.diff 1 Files Affected:
diff --git a/mlir/lib/IR/Verifier.cpp b/mlir/lib/IR/Verifier.cpp
index 90ff8ef3b497f..20f259391c9d4 100644
--- a/mlir/lib/IR/Verifier.cpp
+++ b/mlir/lib/IR/Verifier.cpp
@@ -226,10 +226,15 @@ LogicalResult OperationVerifier::verifyOnExit(Operation &op) {
o.hasTrait<OpTrait::IsIsolatedFromAbove>())
opsWithIsolatedRegions.push_back(&o);
}
- if (failed(failableParallelForEach(
- op.getContext(), opsWithIsolatedRegions,
- [&](Operation *o) { return verifyOpAndDominance(*o); })))
+
+ std::atomic<bool> opFailedVerify = false;
+ parallelForEach(op.getContext(), opsWithIsolatedRegions, [&](Operation *o) {
+ if (failed(verifyOpAndDominance(*o)))
+ opFailedVerify.store(true, std::memory_order_relaxed);
+ });
+ if (opFailedVerify.load(std::memory_order_relaxed))
return failure();
+
OperationName opName = op.getName();
std::optional<RegisteredOperationName> registeredInfo =
opName.getRegisteredInfo();
|
@llvm/pr-subscribers-mlir Author: Nachi G (nacgarg) Changes
This PR uses Other potential fixes include:
Full diff: https://github.com/llvm/llvm-project/pull/134963.diff 1 Files Affected:
diff --git a/mlir/lib/IR/Verifier.cpp b/mlir/lib/IR/Verifier.cpp
index 90ff8ef3b497f..20f259391c9d4 100644
--- a/mlir/lib/IR/Verifier.cpp
+++ b/mlir/lib/IR/Verifier.cpp
@@ -226,10 +226,15 @@ LogicalResult OperationVerifier::verifyOnExit(Operation &op) {
o.hasTrait<OpTrait::IsIsolatedFromAbove>())
opsWithIsolatedRegions.push_back(&o);
}
- if (failed(failableParallelForEach(
- op.getContext(), opsWithIsolatedRegions,
- [&](Operation *o) { return verifyOpAndDominance(*o); })))
+
+ std::atomic<bool> opFailedVerify = false;
+ parallelForEach(op.getContext(), opsWithIsolatedRegions, [&](Operation *o) {
+ if (failed(verifyOpAndDominance(*o)))
+ opFailedVerify.store(true, std::memory_order_relaxed);
+ });
+ if (opFailedVerify.load(std::memory_order_relaxed))
return failure();
+
OperationName opName = op.getName();
std::optional<RegisteredOperationName> registeredInfo =
opName.getRegisteredInfo();
|
`failableParallelForEach` will non-deterministically early terminate upon failure, leading to inconsistent and potentially missing diagnostics. This PR uses `parallelForEach` to ensure all operations are verified and all diagnostics are handled, while tracking the failure state separately. Other potential fixes include: - Making `failableParallelForEach` have deterministic early-exit behavior (or have an option for it) - I didn't want to change more than what was required (and potentially incur perf hits for unrelated code), but if this is a better fix I'm happy to submit a patch. - I think all diagnostics that can be detected from verification failures should be reported, so I don't even think this would be correct behavior anyway - Adding an option for `failableParallelForEach` to still execute on every element on the range while still returning `LogicalResult`
8e7e6b5
to
fdd386a
Compare
fdd386a
to
f6edf25
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I've hit the non-deterministic verifier diagnostics several times. Thanks for fixing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me, seems obvious
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LG, Thanks for fixing.
if (failed(verifyOpAndDominance(*o))) | ||
opFailedVerify.store(true, std::memory_order_relaxed); | ||
}); | ||
if (opFailedVerify.load(std::memory_order_relaxed)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need the memory_order_relaxed
for this kind of granularity here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it really matters. Technically we only need atomicity here, so by specifying memory_order_relaxed
we don't impose any additional ordering constraints (like the default memory_order_seq_cst
would).
`failableParallelForEach` will non-deterministically early terminate upon failure, leading to inconsistent and potentially missing diagnostics. This PR uses `parallelForEach` to ensure all operations are verified and all diagnostics are handled, while tracking the failure state separately. Other potential fixes include: - Making `failableParallelForEach` have deterministic early-exit behavior (or have an option for it) - I didn't want to change more than what was required (and potentially incur perf hits for unrelated code), but if this is a better fix I'm happy to submit a patch. - I think all diagnostics that can be detected from verification failures should be reported, so I don't even think this would be correct behavior anyway - Adding an option for `failableParallelForEach` to still execute on every element on the range while still returning `LogicalResult`
`failableParallelForEach` will non-deterministically early terminate upon failure, leading to inconsistent and potentially missing diagnostics. This PR uses `parallelForEach` to ensure all operations are verified and all diagnostics are handled, while tracking the failure state separately. Other potential fixes include: - Making `failableParallelForEach` have deterministic early-exit behavior (or have an option for it) - I didn't want to change more than what was required (and potentially incur perf hits for unrelated code), but if this is a better fix I'm happy to submit a patch. - I think all diagnostics that can be detected from verification failures should be reported, so I don't even think this would be correct behavior anyway - Adding an option for `failableParallelForEach` to still execute on every element on the range while still returning `LogicalResult`
failableParallelForEach
will non-deterministically early terminate upon failure, leading to inconsistent and potentially missing diagnostics.This PR uses
parallelForEach
to ensure all operations are verified and all diagnostics are handled, while tracking the failure state separately.Other potential fixes include:
Making
failableParallelForEach
have deterministic early-exit behavior (or have an option for it)Adding an option for
failableParallelForEach
to still execute on every element on the range while still returningLogicalResult