-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[OpenMP][MLIR] Add omp.canonical_loop operation, !omp.cli type, omp.new_cli operation #71712
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@llvm/pr-subscribers-flang-openmp @llvm/pr-subscribers-mlir-openmp Author: Jan Leyonberg (jsjodin) ChangesThis work is a continuation of #65380. This patch adds the omp.canonical_loop to represent canonical loops in OpenMP. This will expand how loops are represented in the OMP dialect. It also allows simpler code generation by using canonical loop codegen the OpenMPIRBuilder. The !omp.cli type and omp.new_cli operation are added to be able to represent dependencies between canonical loops and future loop transformation operations. Compared to #65380, instead of returning CLI values from the omp.canonical_loop operations and propagating the CLI values by using omp.yield, we introduce a new operation omp.new_cli that creates a new CLI value which can be passed to a omp.canonical_loop op to associate the value with that loop. This eliminates the need for using omp.yield, and since the CLI values are optional inputs to omp.canonical_loop there is no need to use them unless loop transformations are used. Full diff: https://github.com/llvm/llvm-project/pull/71712.diff 3 Files Affected:
diff --git a/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td b/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
index 99ac5cfb7b9e922..e544be3e6564b03 100644
--- a/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
+++ b/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
@@ -405,6 +405,170 @@ def SingleOp : OpenMP_Op<"single", [AttrSizedOperandSegments]> {
let hasVerifier = 1;
}
+//===---------------------------------------------------------------------===//
+// OpenMP Canonical Loop Info Type
+//===---------------------------------------------------------------------===//
+
+def CanonicalLoopInfoType : OpenMP_Type<"CanonicalLoopInfo", "cli"> {
+ let summary = "Type for representing a reference to a canonical loop";
+ let description = [{
+ A variable of type CanonicalLoopInfo refers to an OpenMP-compatible
+ canonical loop in the same function. Variables of this type are not
+ available at runtime and therefore cannot be used by the program itself,
+ i.e. an opaque type. It is similar to the transform dialect's
+ `!transform.interface` type, but instead of implementing an interface
+ for each transformation, the OpenMP dialect itself defines possible
+ operations on this type.
+
+ A CanonicalLoopInfo variable can
+
+ 1. passed to omp.canonical_loop to assiciate the loop to that variable
+ 2. passed to omp operations that take a CanonicalLoopInfo argument,
+ such as `omp.unroll`.
+
+ A CanonicalLoopInfo variable can not
+
+ 1. be returned from a function,
+ 2. passed to operations that are not specifically designed to take a
+ CanonicalLoopInfo, including AnyType.
+
+ A CanonicalLoopInfo variable directly corresponds to an object of
+ OpenMPIRBuilder's CanonicalLoopInfo struct when lowering to LLVM-IR.
+ }];
+}
+
+//===---------------------------------------------------------------------===//
+// OpenMP Canonical Loop Info Operation
+//===---------------------------------------------------------------------===//
+
+def NewCliOp : OpenMP_Op<"new_cli"> {
+ let summary = "Create a new Canonical Loop Info value.";
+ let description = [{
+ Create a new CLI that can be passed as an argument to a CanonicalLoopOp
+ and to loop transformation operations to handle dependencies between
+ loop transformation operations.
+ }];
+ let results = (outs CanonicalLoopInfoType:$result);
+ let assemblyFormat = [{
+ attr-dict `:` type($result)
+ }];
+}
+
+
+//===---------------------------------------------------------------------===//
+// OpenMP Canonical Loop Operation
+//===---------------------------------------------------------------------===//
+def CanonicalLoopOp : OpenMP_Op<"canonical_loop", [SingleBlockImplicitTerminator<"omp::YieldOp">]> {
+ let summary = "OpenMP Canonical Loop Operation";
+ let description = [{
+ All loops that conform to OpenMP's definition of a canonical loop can be
+ simplified to a CanonicalLoopOp. In particular, there are no loop-carried
+ variables and the number of iterations it will execute is know before the
+ operation. This allows e.g. to determine the number of threads and chunks
+ the iterations space is split into before executing any iteration. More
+ restrictions may apply in cases such as (collapsed) loop nests, doacross
+ loops, etc.
+
+ The induction variable is always of the same type as the tripcount argument.
+ Since it can never be negative, tripcount is always interpreted as an
+ unsigned integer. It is the caller's responsbility to ensure the tripcount
+ is not negative when its interpretation is signed, i.e.
+ `%tripcount = max(0,%tripcount)`.
+
+ In contrast to other loop operations such as `scf.for`, the number of
+ iterations is determined by only a single variable, the trip-count. The
+ induction variable value is the logical iteration number of that iteration,
+ which OpenMP defines to be between 0 and the trip-count (exclusive).
+ Loop representation having lower-bound, upper-bound, and step-size operands,
+ require passes to do more work than necessary, including handling special
+ cases such as upper-bound smaller than lower-bound, upper-bound equal to
+ the integer type's maximal value, negative step size, etc. This complexity
+ is better only handled once by the front-end and can apply its semantics
+ for such cases while still being able to represent any kind of loop, which
+ kind of the point of a mid-end intermediate representation. User-defined
+ types such as random-access iterators in C++ could not directly be
+ represented anyway.
+
+ An optional argument to a omp.canonical_loop that can be passed in
+ is a CanonicalLoopInfo variale that can be used to refer to the canonical
+ loop to apply transformations -- such as tiling, unrolling, or
+ work-sharing -- to the loop, similar to the transform dialect but
+ with OpenMP-specific semantics.
+
+ A CanonicalLoopOp can be lowered to LLVM-IR using OpenMPIRBuilder's
+ createCanonicalLoop method.
+
+ #### Examples
+
+ Translation from lower-bound, upper-bount, step-size to trip-count.
+ ```c
+ for (int i = 3; i < 42; i+=2) {
+ B[i] = A[i];
+ }
+ ```
+
+ ```mlir
+ %lb = arith.constant 3 : i32
+ %ub = arith.constant 42 : i32
+ %step = arith.constant 2 : i32
+ %range = arith.sub %ub, %lb : i32
+ %tc = arith.div %range, %step : i32
+ omp.canonical_loop %iv : i32 in [0, %tc) {
+ %offset = arith.mul %iv, %step : i32
+ %i = arith.add %offset, %lb : i32
+ %a = load %arrA[%i] : memref<?xf32>
+ store %a, %arrB[%i] : memref<?xf32>
+ }
+ ```
+
+ Nested canonical loop with transformation.
+ ```mlir
+ %outer = omp.cli
+ %inner = omp.cli
+ %outer,%inner = omp.canonical_loop %iv1 : i32 in [0, %tripcount), %outer : !omp.cli{
+ %inner = omp.canonical_loop %iv2 : i32 in [0, %tc), %inner : !omp.cli {
+ %a = load %arrA[%iv1, %iv2] : memref<?x?xf32>
+ store %a, %arrB[%iv1, %iv2] : memref<?x?xf32>
+ }
+ }
+ omp.tile(%outer, %inner : !omp.cli, !omp.cli)
+ ```
+
+ Nested canonical loop with other constructs. The `omp.distribute`
+ operation has not been added yet, so this is suggested use with other
+ constructs.
+ ```mlir
+ omp.target {
+ omp.teams {
+ omp.distribute {
+ %outer = omp.cli
+ %inner = omp.cli
+ omp.canonical_loop %iv1 : i32 in [0, %tripcount), %outer : !omp.cli {
+ %inner = omp.canonical_loop %iv2 : i32 in [0, %tc), %inner : !omp.cli {
+ %a = load %arrA[%iv1, %iv2] : memref<?x?xf32>
+ store %a, %arrB[%iv1, %iv2] : memref<?x?xf32>
+ }
+ }
+ omp.collapse(%outer, %inner)
+ }
+ }
+ }
+ ```
+
+ }];
+ let hasCustomAssemblyFormat = 1;
+ let hasVerifier = 1;
+
+ let arguments = (ins IntLikeType:$tripCount,
+ Optional<CanonicalLoopInfoType>:$cli);
+
+ let regions = (region AnyRegion:$region);
+
+ let extraClassDeclaration = [{
+ ::mlir::Value getInductionVar();
+ }];
+}
+
//===----------------------------------------------------------------------===//
// 2.9.2 Workshare Loop Construct
//===----------------------------------------------------------------------===//
@@ -619,7 +783,7 @@ def SimdLoopOp : OpenMP_Op<"simdloop", [AttrSizedOperandSegments,
def YieldOp : OpenMP_Op<"yield",
[Pure, ReturnLike, Terminator,
ParentOneOf<["WsLoopOp", "ReductionDeclareOp",
- "AtomicUpdateOp", "SimdLoopOp"]>]> {
+ "AtomicUpdateOp", "SimdLoopOp", "CanonicalLoopOp"]>]> {
let summary = "loop yield and termination operation";
let description = [{
"omp.yield" yields SSA values from the OpenMP dialect op region and
diff --git a/mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp b/mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp
index 480af0e1307c158..6faff0d531e90cb 100644
--- a/mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp
+++ b/mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp
@@ -1551,6 +1551,88 @@ LogicalResult DataBoundsOp::verify() {
return success();
}
+//===----------------------------------------------------------------------===//
+// CanonicaLoopOp
+//===----------------------------------------------------------------------===//
+
+Value mlir::omp::CanonicalLoopOp::getInductionVar() {
+ return getRegion().getArgument(0);
+}
+
+void mlir::omp::CanonicalLoopOp::print(OpAsmPrinter &p) {
+ p << " " << getInductionVar() << " : " << getInductionVar().getType()
+ << " in [0, " << getTripCount() << ")";
+ if (getCli()) {
+ p << ", " << getCli() << " : " << getCli().getType();
+ }
+ p << " ";
+
+ // omp.yield is implicit if no arguments passed to it.
+ p.printRegion(getRegion(), /*printEntryBlockArgs=*/false,
+ /*printBlockTerminators=*/false);
+
+ p.printOptionalAttrDict((*this)->getAttrs());
+}
+
+mlir::ParseResult
+mlir::omp::CanonicalLoopOp::parse(::mlir::OpAsmParser &parser,
+ ::mlir::OperationState &result) {
+ Builder &builder = parser.getBuilder();
+
+ // We derive the type of tripCount from inductionVariable. Unfortunatelty we
+ // cannot do the other way around because MLIR requires the type of tripCount
+ // to be known when calling resolveOperand.
+ OpAsmParser::Argument inductionVariable;
+ if (parser.parseArgument(inductionVariable, /*allowType*/ true) ||
+ parser.parseKeyword("in") || parser.parseLSquare())
+ return failure();
+
+ int zero = -1;
+ SMLoc zeroLoc = parser.getCurrentLocation();
+ if (parser.parseInteger(zero))
+ return failure();
+ if (zero != 0) {
+ parser.emitError(zeroLoc, "Logical iteration space starts with zero");
+ return failure();
+ }
+
+ OpAsmParser::UnresolvedOperand tripcount;
+ if (parser.parseComma() || parser.parseOperand(tripcount) ||
+ parser.parseRParen() ||
+ parser.resolveOperand(tripcount, inductionVariable.type, result.operands))
+ return failure();
+
+ OpAsmParser::UnresolvedOperand cli;
+ Type type;
+ if (succeeded(parser.parseOptionalComma()))
+ if (parser.parseOperand(cli) || parser.parseColonType(type) ||
+ parser.resolveOperand(cli, type, result.operands))
+ return failure();
+
+ // Parse the loop body.
+ Region *region = result.addRegion();
+ if (parser.parseRegion(*region, {inductionVariable}))
+ return failure();
+ CanonicalLoopOp::ensureTerminator(*region, builder, result.location);
+
+ // Parse the optional attribute list.
+ if (parser.parseOptionalAttrDict(result.attributes))
+ return failure();
+
+ return mlir::success();
+}
+
+LogicalResult CanonicalLoopOp::verify() {
+ Value indVar = getInductionVar();
+ Value tripCount = getTripCount();
+
+ if (indVar.getType() != tripCount.getType())
+ return emitOpError(
+ "Region argument must be the same type as the trip count");
+
+ return success();
+}
+
#define GET_ATTRDEF_CLASSES
#include "mlir/Dialect/OpenMP/OpenMPOpsAttributes.cpp.inc"
diff --git a/mlir/test/Dialect/OpenMP/cli.mlir b/mlir/test/Dialect/OpenMP/cli.mlir
new file mode 100644
index 000000000000000..a397b442fd61d96
--- /dev/null
+++ b/mlir/test/Dialect/OpenMP/cli.mlir
@@ -0,0 +1,56 @@
+// RUN: mlir-opt %s | mlir-opt | FileCheck %s
+
+// CHECK-LABEL: @omp_canonloop_raw
+// CHECK-SAME: (%[[tc:.*]]: i32)
+func.func @omp_canonloop_raw(%tc : i32) -> () {
+ // CHECK: omp.canonical_loop %{{.*}} : i32 in [0, %[[tc]]) {
+ "omp.canonical_loop" (%tc) ({
+ ^bb0(%iv: i32):
+ omp.yield
+ }) : (i32) -> ()
+ return
+}
+
+// CHECK-LABEL: @omp_nested_canonloop_raw
+// CHECK-SAME: (%[[tc_outer:.*]]: i32, %[[tc_inner:.*]]: i32)
+func.func @omp_nested_canonloop_raw(%tc_outer : i32, %tc_inner : i32) -> () {
+ // CHECK: %[[outer_cli:.*]] = omp.new_cli : !omp.cli
+ %outer = "omp.new_cli" () : () -> (!omp.cli)
+ // CHECK: %[[inner_cli:.*]] = omp.new_cli : !omp.cli
+ %inner = "omp.new_cli" () : () -> (!omp.cli)
+ // CHECK: omp.canonical_loop %{{.*}} : i32 in [0, %[[tc_outer]]), %[[outer_cli]] : !omp.cli {
+ "omp.canonical_loop" (%tc_outer, %outer) ({
+ ^bb_outer(%iv_outer: i32):
+ // CHECK: omp.canonical_loop %{{.*}} : i32 in [0, %[[tc_inner]]), %[[inner_cli]] : !omp.cli {
+ "omp.canonical_loop" (%tc_inner, %inner) ({
+ ^bb_inner(%iv_inner: i32):
+ omp.yield
+ }) : (i32, !omp.cli) -> ()
+ omp.yield
+ }) : (i32, !omp.cli) -> ()
+ return
+}
+
+// CHECK-LABEL: @omp_canonloop_pretty
+// CHECK-SAME: (%[[tc:.*]]: i32)
+func.func @omp_canonloop_pretty(%tc : i32) -> () {
+ // CHECK: omp.canonical_loop %[[iv:.*]] : i32 in [0, %[[tc]]) {
+ omp.canonical_loop %iv : i32 in [0, %tc) {
+ // CHECK-NEXT: %{{.*}} = llvm.add %[[iv]], %[[iv]] : i32
+ %newval = llvm.add %iv, %iv: i32
+ }
+ return
+}
+
+// CHECK-LABEL: @omp_canonloop_nested_pretty
+func.func @omp_canonloop_nested_pretty(%tc : i32) -> () {
+ // CHECK: %[[cli:.*]] = omp.new_cli : !omp.cli
+ %cli = omp.new_cli : !omp.cli
+ // CHECK: omp.canonical_loop %{{.*}} : i32 in [0, %{{.*}}), %[[cli]] : !omp.cli {
+ omp.canonical_loop %iv1 : i32 in [0, %tc), %cli : !omp.cli {
+ // CHECK: omp.canonical_loop %{{.*}} : i32 in [0, %{{.*}}) {
+ omp.canonical_loop %iv2 : i32 in [0, %tc) {}
+ }
+ return
+}
+
|
…ew_cli operation This patch adds the omp.canonical_loop to represent canonical loops in OpenMP. The operation is important in order to represent loops that can be modified using loop transformation directives in OpenMP. It also allows simpler code generation by using utilities in the OpenMPIRBuilder. In addition, the !omp.cli type and omp.new_cli operation are added to be able to represent dependencies between canonical loops and future loop transformation operations.
3f877fb
to
e894603
Compare
@ftynse It will be great if you can have a look at this patch. I guess it is a bit long, but this follows from some of the earlier discussion in https://discourse.llvm.org/t/rfc-mlir-openmp-loop-transformation-tile-and-unroll-directive-operation-support-for-omp-dialect/65301/14 and subsequent proposals by @Meinersbur and follow up work by @jsjodin. Basically, we are looking for feedback from people experienced with loop related constructs in MLIR. Essentially this introduces canonical loops, adds a canonical loop type, a new operation to generate canonical loop ids. The canonical loop ids are operands of canonical loops and this is what distinguishes one canonical loop from another. The loop transformation operation (omp.tile below) operates on these canonical loop ids. This way was preferred over
Thanks in advance. |
Co-authored-by: Kiran Chandramohan <[email protected]>
Co-authored-by: Kiran Chandramohan <[email protected]>
Co-authored-by: Kiran Chandramohan <[email protected]>
The only concern I have is that the dependence between the loop transformation operation and the canonical loop is indirectly expressed through the CLI. This has a possible issue that the MLIR transformation passes might think that there is no direct dependence between the canonical loop and the transformation operation and could cause some issues.
|
|
||
1. be returned from a function, | ||
2. passed to operations that are not specifically designed to take a | ||
CanonicalLoopInfo, including AnyType. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excluding AnyType
might not be practically possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you think of an example where this can be an issue?
simplified to a CanonicalLoopOp. In particular, there are no loop-carried | ||
variables and the number of iterations it will execute is know before the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it required to have no loop-carried variables?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
scf.for
models loop-carried variables using (1) an additional operation parameter, (2) an argument for the first block, and (3) an additional argument to yield. It's semantics are sequential, i.e. the yielded value is the block argument for the next iteration. This semantics makes any multiprocessing impossible, hence rather unusual for an OpenMP loop.
For the cases where it is needed, the value can be carried across iterations using an alloca, i.e. mem2reg would just not promote the variable to a register. This also preserves the semantics it would have when reordering the execution of iterations including race conditions, which is hard to reason about with promoted registers.
The exception is reduction
(and maybe lastprivate
) where the register itself is privatized (hence can be promoted), and the parallel semantics is defined by OpenMP. For wsloop
and simd
this is done by an argument to omp.yield
.
mlir::ParseResult | ||
mlir::omp::CanonicalLoopOp::parse(::mlir::OpAsmParser &parser, | ||
::mlir::OperationState &result) { | ||
Builder &builder = parser.getBuilder(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: unused variable.
The way we thought about it was that omp.new_cli was like an alloca, and the loop operations like loads and stores. Maybe we need model this as a resource and add side effects to the ops? |
Co-authored-by: Kiran Chandramohan <[email protected]>
Co-authored-by: Kiran Chandramohan <[email protected]>
Co-authored-by: Kiran Chandramohan <[email protected]>
Co-authored-by: Kiran Chandramohan <[email protected]>
Co-authored-by: Kiran Chandramohan <[email protected]>
Co-authored-by: Kiran Chandramohan <[email protected]>
Co-authored-by: Kiran Chandramohan <[email protected]>
Possibly. I have asked for some help with review (https://discord.com/channels/636084430946959380/642426447167881246/1174731728494010388). |
I'd agree with this characterization. There is an analogy to
Each Personally, I think that the "value-semantics" row better fits the intended semantics (e.g. exactly one definition), but would come at the cost of having to pass it out of scopes in nested loops. I talked with @jsjodin about this at the last DevMtg, he prefers the reference-semantics, so that's what we are going with. |
This work is a continuation of #65380.
This patch adds the omp.canonical_loop to represent canonical loops in OpenMP. This will expand how loops are represented in the OMP dialect. It also allows simpler code generation by using canonical loop codegen the OpenMPIRBuilder. The !omp.cli type and omp.new_cli operation are added to be able to represent dependencies between canonical loops and future loop transformation operations.
Compared to #65380, instead of returning CLI values from the omp.canonical_loop operations and propagating the CLI values by using omp.yield, we introduce a new operation omp.new_cli that creates a new CLI value which can be passed to a omp.canonical_loop op to associate the value with that loop. This eliminates the need for using omp.yield, and since the CLI values are optional inputs to omp.canonical_loop there is no need to use them unless loop transformations are used.