Skip to content

[mlir][tensor] Remove assertion in ExpandShapeOp::build #91361

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 7, 2024

Conversation

bjacob
Copy link
Contributor

@bjacob bjacob commented May 7, 2024

Unblocking downstream integrate where an expected-to-fail test was expecting this to be a runtime verifier error, not a compiler crash: llvm/torch-mlir#3279.

@llvmbot
Copy link
Member

llvmbot commented May 7, 2024

@llvm/pr-subscribers-mlir

@llvm/pr-subscribers-mlir-tensor

Author: Benoit Jacob (bjacob)

Changes

Unblocking downstream integrate where an expected-to-fail test was expecting this to be a runtime verifier error, not a compiler crash: llvm/torch-mlir#3279.


Full diff: https://github.com/llvm/llvm-project/pull/91361.diff

1 Files Affected:

  • (modified) mlir/lib/Dialect/Tensor/IR/TensorOps.cpp (+6-4)
diff --git a/mlir/lib/Dialect/Tensor/IR/TensorOps.cpp b/mlir/lib/Dialect/Tensor/IR/TensorOps.cpp
index 4c65045084dc..7a13f7a7d135 100644
--- a/mlir/lib/Dialect/Tensor/IR/TensorOps.cpp
+++ b/mlir/lib/Dialect/Tensor/IR/TensorOps.cpp
@@ -1676,10 +1676,12 @@ void ExpandShapeOp::build(OpBuilder &builder, OperationState &result,
   auto tensorResultTy = cast<RankedTensorType>(resultType);
   FailureOr<SmallVector<OpFoldResult>> outputShape = inferOutputShape(
       builder, result.location, tensorResultTy, reassociation, inputShape);
-  // Failure of this assertion usually indicates presence of multiple
-  // dynamic dimensions in the same reassociation group.
-  assert(succeeded(outputShape) && "unable to infer output shape");
-  build(builder, result, tensorResultTy, src, reassociation, *outputShape);
+  SmallVector<OpFoldResult> outputShapeOrEmpty;
+  if (succeeded(outputShape)) {
+    outputShapeOrEmpty = *outputShape;
+  }
+  build(builder, result, tensorResultTy, src, reassociation,
+        outputShapeOrEmpty);
 }
 
 SmallVector<AffineMap, 4> CollapseShapeOp::getReassociationMaps() {

@hanhanW hanhanW changed the title Remove assertion in ExpandShapeOp::build [mlir][tensor] Remove assertion in ExpandShapeOp::build May 7, 2024
Copy link
Member

@jpienaar jpienaar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to add test?

@bjacob
Copy link
Contributor Author

bjacob commented May 7, 2024

as a follow-up please :-) need to unblock the downstreams.

also, the test would ideally test for a helpful error message, which runs into a problem i had writing this PR: I dont know how if it's safe to call emitOpError before the end of build(). I didn't want to take any chances with unblocking the integrates.

So +1 to a follow-up that finds how to generate a helpful error , and tests that.

@hanhanW
Copy link
Contributor

hanhanW commented May 7, 2024

I think we can capture the invalid op in op verifier. I'd suggest that we update the below code as well. It should look at expand_shape shape inference too. Then we can add the invalid op test to https://github.com/llvm/llvm-project/blob/main/mlir/test/Dialect/Tensor/invalid.mlir . What do you think?

static LogicalResult verifyTensorReshapeOp(TensorReshapeOp op,
RankedTensorType expandedType,
RankedTensorType collapsedType) {
if (failed(
verifyReshapeLikeTypes(op, expandedType, collapsedType, isExpansion)))
return failure();
auto maps = op.getReassociationMaps();
RankedTensorType expectedType =
CollapseShapeOp::inferCollapsedType(expandedType, maps);
if (!isSameTypeWithoutEncoding(collapsedType, expectedType))
return op.emitOpError("expected collapsed type to be ")
<< expectedType << ", but got " << collapsedType;
return success();
}
LogicalResult ExpandShapeOp::verify() {
auto srcType = getSrcType();
auto resultType = getResultType();
if ((int64_t)getStaticOutputShape().size() != resultType.getRank())
return emitOpError("expected number of static shape dims to be equal to "
"the output rank (")
<< resultType.getRank() << ") but found "
<< getStaticOutputShape().size() << " inputs instead";
if ((int64_t)getOutputShape().size() !=
llvm::count(getStaticOutputShape(), ShapedType::kDynamic))
return emitOpError("mismatch in dynamic dims in output_shape and "
"static_output_shape: static_output_shape has ")
<< llvm::count(getStaticOutputShape(), ShapedType::kDynamic)
<< " dynamic dims while output_shape has " << getOutputShape().size()
<< " values";
return verifyTensorReshapeOp(*this, resultType, srcType);
}

@bjacob
Copy link
Contributor Author

bjacob commented May 7, 2024

oh... ok, i can give that a try.

@bjacob
Copy link
Contributor Author

bjacob commented May 7, 2024

Actually there is another problem: this condition in C++ builder code is impossible to exercise from mlir source in tensor dialect, as omitting output_shape is a parsing error. It can only be exercised from C++. The bad caller here is a C++ caller, in TosaToTensor.cpp. So if we added a MLIR test, it would have to be in TOSA dialect.

@hanhanW
Copy link
Contributor

hanhanW commented May 7, 2024

Sorry that my comment was not clear.. It is true that it is hard to test c++ builder code. My point is that the build methods and verify methods should be aligned in some sense. It is okay to generate invalid op, but the verifier should signal it. In the case we've seen in torch-mlir repo, it is generating something like:

tensor.expand_shape %arg0 [[0, 1]] output_shape [%sz0, %sz1] : tensor<?xf32> into tensor<?x?xf32>

So I'd suggest add the invalid op to invalid.mlir, and signal some errors from the verifier. I'm concerned that an invalid op is generated silently but not captured. With the test, at least we can inform users that such invalid op is generated.

@Shukla-Gaurav
Copy link
Contributor

Shukla-Gaurav commented May 7, 2024

Sorry that my comment was not clear.. It is true that it is hard to test c++ builder code. My point is that the build methods and verify methods should be aligned in some sense. It is okay to generate invalid op, but the verifier should signal it. In the case we've seen in torch-mlir repo, it is generating something like:

tensor.expand_shape %arg0 [[0, 1]] output_shape [%sz0, %sz1] : tensor<?xf32> into tensor<?x?xf32>

So I'd suggest add the invalid op to invalid.mlir, and signal some errors from the verifier. I'm concerned that an invalid op is generated silently but not captured. With the test, at least we can inform users that such invalid op is generated.

Actually, this IR is not invalid anymore(#90040). This indicates that we are calling the wrong build method(that's why the assertion). We need to raise some sort of signal in place of assertion(maybe in follow up PR).
Also tosa.reshape lowering needs to be fixed(call the right build method by passing output_shape) for such cases.

@bjacob
Copy link
Contributor Author

bjacob commented May 7, 2024

Yeah IR like this would go to the ExpandShapeOp::build overload that takes a outputShape.

To exercise the error condition we are talking about here, which is in the ExpandShapeOp::build overload NOT taking a outputShape, I think the only way to hit that is from C++. I don't see it being hit from any kind of tensor-dialect MLIR source, since there the output_shape is mandatory in the syntax, and then of course it's going to call the C++ overload with outputShape parameter.

Copy link
Contributor

@hanhanW hanhanW left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see... thanks for filling me the context

@bjacob bjacob merged commit 62bed56 into llvm:main May 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants