Skip to content

[MLIR][OpenMP] Add omp.loop_nest operation #87083

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 12, 2024

Conversation

skatrak
Copy link
Member

@skatrak skatrak commented Mar 29, 2024

This patch introduces an operation intended to hold loop information associated to the omp.distribute, omp.simdloop, omp.taskloop and omp.wsloop operations. This is a stopgap solution to unblock work on transitioning these operations to becoming wrappers, as discussed in this RFC.

Long-term, this operation will likely be replaced by omp.canonical_loop, which is being designed to address missing support for loop transformations, etc.

This patch introduces an operation intended to hold loop information associated
to the `omp.distribute`, `omp.simdloop`, `omp.taskloop` and `omp.wsloop`
operations. This is a stopgap solution to unblock work on transitioning these
operations to becoming wrappers, as discussed in
[this RFC](https://discourse.llvm.org/t/rfc-representing-combined-composite-constructs-in-the-openmp-dialect/76986).

Long-term, this operation will likely be replaced by `omp.canonical_loop`,
which is being designed to address missing support for loop transformations,
etc.
@llvmbot
Copy link
Member

llvmbot commented Mar 29, 2024

@llvm/pr-subscribers-mlir
@llvm/pr-subscribers-mlir-openmp

@llvm/pr-subscribers-flang-openmp

Author: Sergio Afonso (skatrak)

Changes

This patch introduces an operation intended to hold loop information associated to the omp.distribute, omp.simdloop, omp.taskloop and omp.wsloop operations. This is a stopgap solution to unblock work on transitioning these operations to becoming wrappers, as discussed in this RFC.

Long-term, this operation will likely be replaced by omp.canonical_loop, which is being designed to address missing support for loop transformations, etc.


Full diff: https://github.com/llvm/llvm-project/pull/87083.diff

4 Files Affected:

  • (modified) mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td (+64-1)
  • (modified) mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp (+71)
  • (modified) mlir/test/Dialect/OpenMP/invalid.mlir (+37)
  • (modified) mlir/test/Dialect/OpenMP/ops.mlir (+79)
diff --git a/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td b/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
index f33942b3c7c02d..ffd00948915153 100644
--- a/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
+++ b/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
@@ -511,6 +511,69 @@ def SingleOp : OpenMP_Op<"single", [AttrSizedOperandSegments]> {
   let hasVerifier = 1;
 }
 
+//===----------------------------------------------------------------------===//
+// Loop Nest
+//===----------------------------------------------------------------------===//
+
+def LoopNestOp : OpenMP_Op<"loop_nest", [SameVariadicOperandSize,
+                        AllTypesMatch<["lowerBound", "upperBound", "step"]>,
+                        ParentOneOf<["DistributeOp", "SimdLoopOp", "TaskloopOp",
+                                     "WsloopOp"]>,
+                        RecursiveMemoryEffects]> {
+  let summary = "rectangular loop nest";
+  let description = [{
+    This operation represents a collapsed rectangular loop nest. For each
+    rectangular loop of the nest represented by an instance of this operation,
+    lower and upper bounds, as well as a step variable, must be defined.
+
+    The lower and upper bounds specify a half-open range: the range includes the
+    lower bound but does not include the upper bound. If the `inclusive`
+    attribute is specified then the upper bound is also included.
+
+    The body region can contain any number of blocks. The region is terminated
+    by an `omp.yield` instruction without operands. The induction variables,
+    represented as entry block arguments to the loop nest operation's single
+    region, match the types of the `lowerBound`, `upperBound` and `step`
+    arguments.
+
+    ```mlir
+    omp.loop_nest (%i1, %i2) : i32 = (%c0, %c0) to (%c10, %c10) step (%c1, %c1) {
+      %a = load %arrA[%i1, %i2] : memref<?x?xf32>
+      %b = load %arrB[%i1, %i2] : memref<?x?xf32>
+      %sum = arith.addf %a, %b : f32
+      store %sum, %arrC[%i1, %i2] : memref<?x?xf32>
+      omp.yield
+    }
+    ```
+
+    This is a temporary simplified definition of a loop based on existing OpenMP
+    loop operations intended to serve as a stopgap solution until the long-term
+    representation of canonical loops is defined. Specifically, this operation
+    is intended to serve as a unique source for loop information during the
+    transition to making `omp.distribute`, `omp.simdloop`, `omp.taskloop` and
+    `omp.wsloop` wrapper operations. It is not intended to help with the
+    addition of support for loop transformations.
+  }];
+
+  let arguments = (ins Variadic<IntLikeType>:$lowerBound,
+                       Variadic<IntLikeType>:$upperBound,
+                       Variadic<IntLikeType>:$step,
+                       UnitAttr:$inclusive);
+
+  let regions = (region AnyRegion:$region);
+
+  let extraClassDeclaration = [{
+    /// Returns the number of loops in the loop nest.
+    unsigned getNumLoops() { return getLowerBound().size(); }
+
+    /// Returns the induction variables of the loop nest.
+    ArrayRef<BlockArgument> getIVs() { return getRegion().getArguments(); }
+  }];
+
+  let hasCustomAssemblyFormat = 1;
+  let hasVerifier = 1;
+}
+
 //===----------------------------------------------------------------------===//
 // 2.9.2 Workshare Loop Construct
 //===----------------------------------------------------------------------===//
@@ -724,7 +787,7 @@ def SimdLoopOp : OpenMP_Op<"simdloop", [AttrSizedOperandSegments,
 
 def YieldOp : OpenMP_Op<"yield",
     [Pure, ReturnLike, Terminator,
-     ParentOneOf<["WsloopOp", "DeclareReductionOp",
+     ParentOneOf<["LoopNestOp", "WsloopOp", "DeclareReductionOp",
      "AtomicUpdateOp", "SimdLoopOp", "PrivateClauseOp"]>]> {
   let summary = "loop yield and termination operation";
   let description = [{
diff --git a/mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp b/mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp
index bf5875071e0dc4..796df1d13e6564 100644
--- a/mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp
+++ b/mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp
@@ -1662,6 +1662,77 @@ LogicalResult TaskloopOp::verify() {
   return success();
 }
 
+//===----------------------------------------------------------------------===//
+// LoopNestOp
+//===----------------------------------------------------------------------===//
+
+ParseResult LoopNestOp::parse(OpAsmParser &parser, OperationState &result) {
+  // Parse an opening `(` followed by induction variables followed by `)`
+  SmallVector<OpAsmParser::Argument> ivs;
+  SmallVector<OpAsmParser::UnresolvedOperand> lbs, ubs;
+  Type loopVarType;
+  if (parser.parseArgumentList(ivs, OpAsmParser::Delimiter::Paren) ||
+      parser.parseColonType(loopVarType) ||
+      // Parse loop bounds.
+      parser.parseEqual() ||
+      parser.parseOperandList(lbs, ivs.size(), OpAsmParser::Delimiter::Paren) ||
+      parser.parseKeyword("to") ||
+      parser.parseOperandList(ubs, ivs.size(), OpAsmParser::Delimiter::Paren))
+    return failure();
+
+  for (auto &iv : ivs)
+    iv.type = loopVarType;
+
+  // Parse "inclusive" flag.
+  if (succeeded(parser.parseOptionalKeyword("inclusive")))
+    result.addAttribute("inclusive",
+                        UnitAttr::get(parser.getBuilder().getContext()));
+
+  // Parse step values.
+  SmallVector<OpAsmParser::UnresolvedOperand> steps;
+  if (parser.parseKeyword("step") ||
+      parser.parseOperandList(steps, ivs.size(), OpAsmParser::Delimiter::Paren))
+    return failure();
+
+  // Parse the body.
+  Region *region = result.addRegion();
+  if (parser.parseRegion(*region, ivs))
+    return failure();
+
+  // Resolve operands.
+  if (parser.resolveOperands(lbs, loopVarType, result.operands) ||
+      parser.resolveOperands(ubs, loopVarType, result.operands) ||
+      parser.resolveOperands(steps, loopVarType, result.operands))
+    return failure();
+
+  // Parse the optional attribute list.
+  return parser.parseOptionalAttrDict(result.attributes);
+}
+
+void LoopNestOp::print(OpAsmPrinter &p) {
+  Region &region = getRegion();
+  auto args = region.getArguments();
+  p << " (" << args << ") : " << args[0].getType() << " = (" << getLowerBound()
+    << ") to (" << getUpperBound() << ") ";
+  if (getInclusive())
+    p << "inclusive ";
+  p << "step (" << getStep() << ") ";
+  p.printRegion(region, /*printEntryBlockArgs=*/false);
+}
+
+LogicalResult LoopNestOp::verify() {
+  if (getLowerBound().size() != getIVs().size())
+    return emitOpError() << "number of range arguments and IVs do not match";
+
+  for (auto [lb, iv] : llvm::zip_equal(getLowerBound(), getIVs())) {
+    if (lb.getType() != iv.getType())
+      return emitOpError()
+             << "range argument type does not match corresponding IV type";
+  }
+
+  return success();
+}
+
 //===----------------------------------------------------------------------===//
 // WsloopOp
 //===----------------------------------------------------------------------===//
diff --git a/mlir/test/Dialect/OpenMP/invalid.mlir b/mlir/test/Dialect/OpenMP/invalid.mlir
index a00383cf44057c..760ebb14d94121 100644
--- a/mlir/test/Dialect/OpenMP/invalid.mlir
+++ b/mlir/test/Dialect/OpenMP/invalid.mlir
@@ -87,6 +87,43 @@ func.func @proc_bind_once() {
 
 // -----
 
+func.func @invalid_parent(%lb : index, %ub : index, %step : index) {
+  // expected-error@+1 {{op expects parent op to be one of 'omp.distribute, omp.simdloop, omp.taskloop, omp.wsloop'}}
+  omp.loop_nest (%iv) : index = (%lb) to (%ub) step (%step) {
+    omp.yield
+  }
+}
+
+// -----
+
+func.func @type_mismatch(%lb : index, %ub : index, %step : index) {
+  // TODO Remove induction variables from omp.wsloop.
+  omp.wsloop for (%iv) : index = (%lb) to (%ub) step (%step) {
+    // expected-error@+1 {{range argument type does not match corresponding IV type}}
+    "omp.loop_nest" (%lb, %ub, %step) ({
+    ^bb0(%iv2: i32):
+      omp.yield
+    }) : (index, index, index) -> ()
+    omp.yield
+  }
+}
+
+// -----
+
+func.func @iv_number_mismatch(%lb : index, %ub : index, %step : index) {
+  // TODO Remove induction variables from omp.wsloop.
+  omp.wsloop for (%iv) : index = (%lb) to (%ub) step (%step) {
+    // expected-error@+1 {{number of range arguments and IVs do not match}}
+    "omp.loop_nest" (%lb, %ub, %step) ({
+    ^bb0(%iv1 : index, %iv2 : index):
+      omp.yield
+    }) : (index, index, index) -> ()
+    omp.yield
+  }
+}
+
+// -----
+
 func.func @inclusive_not_a_clause(%lb : index, %ub : index, %step : index) {
   // expected-error @below {{expected 'for'}}
   omp.wsloop nowait inclusive
diff --git a/mlir/test/Dialect/OpenMP/ops.mlir b/mlir/test/Dialect/OpenMP/ops.mlir
index 30ce77423005ac..8d9acab67e0358 100644
--- a/mlir/test/Dialect/OpenMP/ops.mlir
+++ b/mlir/test/Dialect/OpenMP/ops.mlir
@@ -133,6 +133,85 @@ func.func @omp_parallel_pretty(%data_var : memref<i32>, %if_cond : i1, %num_thre
   return
 }
 
+// CHECK-LABEL: omp_loop_nest
+func.func @omp_loop_nest(%lb : index, %ub : index, %step : index) -> () {
+  // TODO Remove induction variables from omp.wsloop.
+  omp.wsloop for (%iv) : index = (%lb) to (%ub) step (%step) {
+    // CHECK: omp.loop_nest
+    // CHECK-SAME: (%{{.*}}) : index =
+    // CHECK-SAME: (%{{.*}}) to (%{{.*}}) step (%{{.*}})
+    "omp.loop_nest" (%lb, %ub, %step) ({
+    ^bb0(%iv2: index):
+      omp.yield
+    }) : (index, index, index) -> ()
+    omp.yield
+  }
+
+  // TODO Remove induction variables from omp.wsloop.
+  omp.wsloop for (%iv) : index = (%lb) to (%ub) step (%step) {
+    // CHECK: omp.loop_nest
+    // CHECK-SAME: (%{{.*}}) : index =
+    // CHECK-SAME: (%{{.*}}) to (%{{.*}}) inclusive step (%{{.*}})
+    "omp.loop_nest" (%lb, %ub, %step) ({
+    ^bb0(%iv2: index):
+      omp.yield
+    }) {inclusive} : (index, index, index) -> ()
+    omp.yield
+  }
+
+  // TODO Remove induction variables from omp.wsloop.
+  omp.wsloop for (%iv) : index = (%lb) to (%ub) step (%step) {
+    // CHECK: omp.loop_nest
+    // CHECK-SAME: (%{{.*}}, %{{.*}}) : index =
+    // CHECK-SAME: (%{{.*}}, %{{.*}}) to (%{{.*}}, %{{.*}}) step (%{{.*}}, %{{.*}})
+    "omp.loop_nest" (%lb, %lb, %ub, %ub, %step, %step) ({
+    ^bb0(%iv2: index, %iv3: index):
+      omp.yield
+    }) : (index, index, index, index, index, index) -> ()
+    omp.yield
+  }
+
+  return
+}
+
+// CHECK-LABEL: omp_loop_nest_pretty
+func.func @omp_loop_nest_pretty(%lb : index, %ub : index, %step : index) -> () {
+  // TODO Remove induction variables from omp.wsloop.
+  omp.wsloop for (%iv) : index = (%lb) to (%ub) step (%step) {
+    // CHECK: omp.loop_nest
+    // CHECK-SAME: (%{{.*}}) : index =
+    // CHECK-SAME: (%{{.*}}) to (%{{.*}}) step (%{{.*}})
+    omp.loop_nest (%iv2) : index = (%lb) to (%ub) step (%step) {
+      omp.yield
+    }
+    omp.yield
+  }
+
+  // TODO Remove induction variables from omp.wsloop.
+  omp.wsloop for (%iv) : index = (%lb) to (%ub) step (%step) {
+    // CHECK: omp.loop_nest
+    // CHECK-SAME: (%{{.*}}) : index =
+    // CHECK-SAME: (%{{.*}}) to (%{{.*}}) inclusive step (%{{.*}})
+    omp.loop_nest (%iv2) : index = (%lb) to (%ub) inclusive step (%step) {
+      omp.yield
+    }
+    omp.yield
+  }
+
+  // TODO Remove induction variables from omp.wsloop.
+  omp.wsloop for (%iv) : index = (%lb) to (%ub) step (%step) {
+    // CHECK: omp.loop_nest
+    // CHECK-SAME: (%{{.*}}) : index =
+    // CHECK-SAME: (%{{.*}}, %{{.*}}) to (%{{.*}}, %{{.*}}) step (%{{.*}}, %{{.*}})
+    omp.loop_nest (%iv2, %iv3) : index = (%lb, %lb) to (%ub, %ub) step (%step, %step) {
+      omp.yield
+    }
+    omp.yield
+  }
+
+  return
+}
+
 // CHECK-LABEL: omp_wsloop
 func.func @omp_wsloop(%lb : index, %ub : index, %step : index, %data_var : memref<i32>, %linear_var : i32, %chunk_var : i32) -> () {
 

@skatrak skatrak requested a review from kparzysz April 1, 2024 12:18
Comment on lines +138 to +139
// TODO Remove induction variables from omp.wsloop.
omp.wsloop for (%iv) : index = (%lb) to (%ub) step (%step) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[remark] I fear there will be inconcistencies if potentially both omp.loop_nest and omp.wsloop both define potentially inconsistent loop bounds and intermediate passes not considering the wrapper passes. Since we already discussed this, only a remark here.

Copy link
Member Author

@skatrak skatrak Apr 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is true. However, the transition of omp.wsloop to be a wrapper rather than defining the whole loop will be part of a separate patch. So, after this patch alone, it's not expected to use omp.loop_nest inside of omp.wsloop. These tests just needed to add a "future-wrapper" parent operation so that the new op could be tested.

Copy link
Contributor

@mjklemm mjklemm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@kiranchandramohan
Copy link
Contributor

What is the mechanism we now have to avoid code appearing between loop_nest and the wrapper operation?

@skatrak
Copy link
Member Author

skatrak commented Apr 10, 2024

What is the mechanism we now have to avoid code appearing between loop_nest and the wrapper operation?

The LoopWrapperInterface (in the follow-up PR: #87232) defines an isWrapper() method that checks that the operation region only contains a single block with a single wrapper or omp.loop_nest and a terminator. Then, the wrapper op's verifier can be updated to make sure this is the case (e.g. #87239, #87253, #87365).

One operation where this doesn't always need to be checked would be omp.parallel, which can be a wrapper op when representing loop constructs such as distribute parallel do, but it can also work as a regular op when it is used standalone or as part of a combined (not composite) construct (e.g. parallel do).

@kiranchandramohan
Copy link
Contributor

What is the mechanism we now have to avoid code appearing between loop_nest and the wrapper operation?

The LoopWrapperInterface (in the follow-up PR: #87232) defines an isWrapper() method that checks that the operation region only contains a single block with a single wrapper or omp.loop_nest and a terminator. Then, the wrapper op's verifier can be updated to make sure this is the case (e.g. #87239, #87253, #87365).

Would the code motion passes still end up moving the operation and then subsequently the verifier errors out? Have you tried out a code sinking or hoisting pass to see what is the behaviour?

For omp.wsloop where will collapse be represented? In the loop_nest operation?

@skatrak
Copy link
Member Author

skatrak commented Apr 10, 2024

What is the mechanism we now have to avoid code appearing between loop_nest and the wrapper operation?

The LoopWrapperInterface (in the follow-up PR: #87232) defines an isWrapper() method that checks that the operation region only contains a single block with a single wrapper or omp.loop_nest and a terminator. Then, the wrapper op's verifier can be updated to make sure this is the case (e.g. #87239, #87253, #87365).

Would the code motion passes still end up moving the operation and then subsequently the verifier errors out? Have you tried out a code sinking or hoisting pass to see what is the behaviour?

I'm currently working on making omp.wsloop a wrapper, so I was able to make a small test to check this. I've noticed that CSE works as expected and doesn't try to insert anything inside of the wrapper, and the inlining pass seems to skip trying to inline anything inside of the loop nest region. Maybe you can suggest better ways to test this, but this is what I've tried so far:

// cat test.mlir
func.func private @bar(i32) -> ()
func.func @foo(%arg0 : i32) -> () {
  func.call @bar(%arg0) : (i32) -> ()
  return
}
func.func @test_cse(%x : i32) -> () {
  %c1_0 = arith.constant 1 : i32
  %x_plus_1_0 = arith.addi %x, %c1_0 : i32
  func.call @foo(%x_plus_1_0) : (i32) -> ()
  %c2_0 = arith.constant 2 : i32
  %x_plus_2_0 = arith.addi %x, %c2_0 : i32
  omp.wsloop {
    omp.loop_nest (%i) : i32 = (%x) to (%x_plus_2_0) step (%c1_0) {
      %c2_1 = arith.constant 2 : i32
      %x_plus_2_1 = arith.addi %x, %c2_1 : i32
      func.call @foo(%x_plus_2_1) : (i32) -> ()
      omp.yield
    }
    omp.terminator
  }
  return
}

// mlir-opt test.mlir -inline -cse -o -
func.func private @bar(i32)
func.func @foo(%arg0: i32) {
  call @bar(%arg0) : (i32) -> ()
  return
}
func.func @test_cse(%arg0: i32) {
  %c2_i32 = arith.constant 2 : i32
  %c1_i32 = arith.constant 1 : i32
  %0 = arith.addi %arg0, %c1_i32 : i32
  call @bar(%0) : (i32) -> ()
  %1 = arith.addi %arg0, %c2_i32 : i32
  omp.wsloop {
    omp.loop_nest (%arg1) : i32 = (%arg0) to (%1) step (%c1_i32) {
      func.call @foo(%1) : (i32) -> ()
      omp.yield
    }
    omp.terminator
  }
  return
}

For omp.wsloop where will collapse be represented? In the loop_nest operation?

Yes, so it will be possible to represent collapse for any loop construct, composite or otherwise (support would also need MLIR to LLVM IR translation implemented for that case). The way it's represented is the same as it's currently done for omp.wsloop, by having multiple IVs and range/step variables.

@kiranchandramohan
Copy link
Contributor

I have seen canonicalize transformation pull constants outside of OpenMP constructs.

func.func @constant_sink_test(%x : !llvm.ptr) {
  omp.parallel {
    %c1 = arith.constant 10 : i32
    llvm.store %c1, %x : i32, !llvm.ptr
    omp.terminator
  }
  return
}

./bin/mlir-opt --canonicalize t1.mlir

module {
  func.func @constant_sink_test(%arg0: !llvm.ptr) {
    %c10_i32 = arith.constant 10 : i32
    omp.parallel {
      llvm.store %c10_i32, %arg0 : i32, !llvm.ptr
      omp.terminator
    }
    return
  }
}

@skatrak
Copy link
Member Author

skatrak commented Apr 11, 2024

Yes, that's the constant hoisting part of the canonicalization pass. As per the documentation, it will hoist constants "into the entry block of the first parent barrier region". Which is an IsolatedFromAbove operation or one that implements the DialectFoldInterface and returns true in shouldMaterializeInto(). Wrappers are not intended to do either of these things and I can confirm that constants for such cases are hoisted straight through them:

func.func @constant_sink_test(%x : !llvm.ptr, %y : index) {
  omp.wsloop {
    omp.loop_nest (%i) : index = (%y) to (%y) step (%y) {
      %c1 = arith.constant 10 : i32
      llvm.store %c1, %x : i32, !llvm.ptr
      omp.yield
    }
    omp.terminator
  }
  return
}

mlir-opt --canonicalize test.mlir

func.func @constant_sink_test(%arg0: !llvm.ptr, %arg1: index) {
  %c10_i32 = arith.constant 10 : i32
  omp.wsloop {
    omp.loop_nest (%arg2) : index = (%arg1) to (%arg1) step (%arg1) {
      llvm.store %c10_i32, %arg0 : i32, !llvm.ptr
      omp.yield
    }
    omp.terminator
  }
  return
}

Copy link
Contributor

@kiranchandramohan kiranchandramohan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@skatrak skatrak merged commit 0e63672 into main Apr 12, 2024
@skatrak skatrak deleted the users/skatrak/spr/loop-nest-01-mlir branch April 12, 2024 09:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants