[mlir][OpenMP][flang] make private variable allocation implicit in omp.private #124019

tblah · 2025-01-22T21:46:32Z

The intention of this work is to give MLIR->LLVMIR conversion freedom to
control how the private variable is allocated so that it can be
allocated on the stack in ordinary cases or as part of a structure used
to give closure context for tasks which might outlive the current stack
frame. See RFC:
https://discourse.llvm.org/t/rfc-openmp-supporting-delayed-task-execution-with-firstprivate-variables/83084

For example, a privatizer for an integer used to look like

  omp.private {type = private} @x.privatizer : !fir.ref<i32> alloc {
  ^bb0(%arg0: !fir.ref<i32>):
    %0 = ... allocate proper memory for the private clone ...
    omp.yield(%0 : !fir.ref<i32>)
  }

After this change, allocation become implicit in the operation:

  omp.private {type = private} @x.privatizer : i32

For more complex types that require initialization after allocation, an init region can be used:

  omp.private {type = private} @x.privatizer : !some.type init {
  ^bb0(%arg0: !some.pointer<!some.type>, %arg1: !some.pointer<!some.type>):
    // initialize %arg1, using %arg0 as a mold for allocations
    omp.yield(%arg1 : !some.pointer<!some.type>)
  } dealloc {
    ^bb0(%arg0: !some.pointer<!some.type>):
    ... deallocate memory allocated by the init region ...
    omp.yield
  }

This patch lays the groundwork for delayed task execution but is not enough on its own.

After this patch all gfortran tests which previously passed still pass. There
are the following changes to the Fujitsu test suite:

0380_0009 and 0435_0009 are fixed
0688_0041 now fails at runtime. This patch is testing firstprivate variables with tasks. Previously we got lucky with the undefined behavior and won the race. After these changes we no longer get lucky. This patch lays the groundwork for a proper fix for this issue.

In flang the lowering re-uses the existing lowering used for reduction init and dealloc regions.

In flang, before this patch we hit a TODO with the same wording when
generating the copy region for firstprivate polymorphic variables. After
this patch the box-like fir.class is passed by reference into the copy
region, leading to a different path that didn't hit that old TODO but
the generated code still didn't work so I added a new TODO in
DataSharingProcessor.

Please read mlir changes first and then flang changes.

This has to be merged as one massive patch to avoid testsuite regressions,
but for ease of review I have broken it up into a series of (mostly) short patches.
In these patches I have only updated unit tests as I go along. They will only all
pass at the end of the series.

I have tried to make the changes to init/dealloc region generation self-contained
so that it is easier to see what changed when reviewing. For some commits I have
added extra commentary in the commit message for that commit.

By the end of this series the init/dealloc region generation is a bit hard to follow so I will refactor
it in a follow-up (hopefull NFC) patch.

In flang lowering I box up all arrays and pass the boxes by reference so
that the existing code for reduction init and dealloc regions can be
shared.

The TODOs for pointers, derived types, characters etc will be resolved
in later patches in this same series (to be squashed into this one). I
separated it to make it easier to review.

Assumed rank was already broken before this patch.
I can't find any mention of "assumed rank" in the openmp standard so
I guess it is not prohibited.

Other than the omp.private operation definition changes, the test changes
are mostly down to slightly different codegen from re-using the reduction
init region. That code is already well tested so I didn't want to change it.

@luporl please could you review my re-implementation of your recent fix in 4d52dde and especially the changes to the logic for the inclusion of barriers.

tblah · 2025-01-23T11:26:06Z

I'm not sure why the bot didn't run on this.

@llvm/pr-subscribers-flang-openmp

ergawy

Thanks for the great work @tblah! I partially reviewed since this is a huge PR (understandably). I will come back and continue reviewing the rest later.

mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td

flang/lib/Lower/OpenMP/DataSharingProcessor.cpp

flang/lib/Lower/OpenMP/PrivateReductionUtils.cpp

ergawy · 2025-01-23T11:29:50Z

flang/lib/Optimizer/OpenMP/MapsForPrivatizedSymbols.cpp

-    assert(declOp &&
-           "Expected defining Op of privatized var to be hlfir.declare");
+    assert(definingOp &&
+           "Privatizing a block argument without any hlfir.declare");


If we don't restrict ourselves to DeclareOp, do we still need that assertion? If so, the error message needs to be generalized.

MLIR values can come from two places:

the result of an operation

a block argument

We can't assume that getting the defining operation produces a non-null result because the value might be a block argument.

For example,

func.func @func(%arg0 : !type0, %arg1 : !type1) { %0 = hlfir.declare %arg0 omp.target private(%0, %arg1) }

Here we can get a defining operation for %0 because it is the hlfir.declare for that function argument (which is the normal way flang would lower a function argument). %arg1 has no defining operation because it is a block argument.

MLIR values can come from two places:

That's exactly my point. What prevents us from working with block args here? Why do we need to assume it is defined by an op?

I am not against that, we can keep the assertion. But beyond the fact that we need to call getBase below, we only care about var and a Value and not about its defining op.

I was nervous to make any functional change to the target stuff because I don't know how to test it. The previous implementation also wouldn't have worked for block arguments.

I can fix this if somebody at AMD is willing to test omp target for me?

We can remove the assertion and see if any of our smoke tests triggers any problems (I will monitor that). Then we can add handle whatever is missing and add a regression test.

Thank you! 44bbbe8

flang/lib/Optimizer/OpenMP/MapsForPrivatizedSymbols.cpp

ergawy

LGTM! Thanks Tom! However, I have to admit, the fir dialect type system is still a "semi-"blackbox to me, so someone more familiar with it needs to carefully review changes in PrivateReductionUtils.cpp.

mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td

ergawy · 2025-01-24T08:43:38Z

flang/lib/Optimizer/OpenMP/MapsForPrivatizedSymbols.cpp

-    assert(declOp &&
-           "Expected defining Op of privatized var to be hlfir.declare");
+    assert(definingOp &&
+           "Privatizing a block argument without any hlfir.declare");


MLIR values can come from two places:

That's exactly my point. What prevents us from working with block args here? Why do we need to assume it is defined by an op?

I am not against that, we can keep the assertion. But beyond the fact that we need to call getBase below, we only care about var and a Value and not about its defining op.

luporl

Thanks for all the effort you have put into this @tblah. I'm looking forward to privatization support in delayed tasks.

I like your approach for detecting the need of barriers, based on the uses of moldArg, it's clever.

I have a few comments, but no major concerns with this PR.

mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp

mlir/test/Target/LLVMIR/openmp-private.mlir

flang/lib/Lower/OpenMP/DataSharingProcessor.cpp

flang/lib/Lower/OpenMP/PrivateReductionUtils.cpp

luporl · 2025-01-24T19:18:04Z

flang/lib/Lower/OpenMP/PrivateReductionUtils.cpp

+    mlir::Value newBox =
+        builder.create<fir::EmboxOp>(loc, boxedTy, allocatedPrivVarArg);
+    mlir::Value moldBox = builder.create<fir::EmboxOp>(loc, boxedTy, moldArg);
+    initializeIfDerivedTypeBox(
+        builder, loc, newBox, moldBox, needsInitialization,
+        /*isFirstPrivate=*/kind == DeclOperationKind::FirstPrivate);


As a future optimization, you could move part of initializeIfDerivedTypeBox()'s body to a separate function. This new function would detect if any initialization is needed and could be used here to eliminate the need to always embox newBox and moldBox.

BTW, emboxing moldArg for derived types will always create an use of it, right? This causes barriers to be inserted for the privatization of any derived types.

Agreed. It is even worse than just derived types, we load moldArg to fetch the length parameters whether or not there really are any. That load (and probably these embox operations) get removed by DCE but not before the barriers are created.

flang/test/Lower/OpenMP/derived-type-allocatable.f90

The intention of this work is to give MLIR->LLVMIR conversion freedom to control how the private variable is allocated so that it can be allocated on the stack in ordinary cases or as part of a structure used to give closure context for tasks which might outlive the current stack frame. See RFC: https://discourse.llvm.org/t/rfc-openmp-supporting-delayed-task-execution-with-firstprivate-variables/83084 In flang, before this patch we hit a TODO with the same wording when generating the copy region for firstprivate polymorphic variables. After this patch the box-like fir.class is passed by reference into the copy region, leading to a different path that didn't hit that old TODO but the generated code still didn't work so I added a new TODO in DataSharingProcessor. --- Please read mlir changes first and then flang changes. In flang lowering I box up all arrays and pass the boxes by reference so that the existing code for reduction init and dealloc regions can be shared. The TODOs for pointers, derived types, characters etc will be resolved in later patches in this same series (to be squashed into this one). I separated it to make it easier to review. Assumed rank was already broken before this patch. I can't find any mention of "assumed rank" in the openmp standard so I guess it is not prohibited. Other than the omp.private operation definition changes, the test changes are mostly down to slightly different codegen from re-using the reduction init region. That code is already well tested so I didn't want to change it.

Arrays are boxed and those boxed arrays are passed by reference. This can cause problems if we try to perform eager privatization (e.g. lastprivate) on a variable already privatized using delayed privatization. Work around this by loading the reference to the box.

The way boxchars are handled isn't ideal, but !fir.ref<!fir.boxchar<>> seems to violate a lot of assumptions in the wider ecosystem of (hl)fir helpers making it difficult to generate a copy region. I suspect !fir.ref<!fir.boxchar<>> is not supposed to work (it looks like a mutable character box, which isn't possible because a boxchar should be an SSA value). Fixing this would be a big change beyond the scope of this already too large PR.

The copy region codegen hasn't changed as a result of this patch series. However I think there is a bug in the copy region generated in equivalence.f90. This patch series is already too big so I won't change the copy region here.

This only affects POINTERs (which should not be initialized) and NULL allocatables so the actual contents of the shape doesn't matter. This is just so the embox operation is converted to LLVMIR correctly.

There are three cases handled: - Boxed (maybe allocatable or pointer) scalar derived types - Boxed (maybe allocatable or pointer) arrays of derived types - Unboxed scalar derived types Currently I support both boxed and unboxed derived types because unboxed derived types aren't hlfir::Entities so I worry there could be cases where they are in fact boxed. The changes to parallel-private-clause.f90 are because of the arrays becoming boxed. I re-organised the test a bit because the CHECK-DAGs were matching completely the wrong lines outside of the parallel region.

For reference see both #120295 and #121808 This changes the barrier logic. See discussion here: https://github.com/llvm/llvm-project/pull/120295/files#r1910663473

This will deallocate in exactly the same cases as the old implementation before this series. I expect some cases are missed but I want to avoid functional changes wherever possible in this (already too big) patch series.

I made a mistake when rewriting the privatizer, which led to the test reading incorrectly.

tblah · 2025-01-30T13:51:04Z

@luporl ping for review

luporl

LGTM, thanks.

If I understood correctly, with this change the generated code will have more barriers than before, although some of those might be fixing existing race conditions.
Is my understanding correct, that the goal is to prioritize correctness for now and to remove unnecessary barriers in the future?

tblah · 2025-01-30T17:27:30Z

If I understood correctly, with this change the generated code will have more barriers than before, although some of those might be fixing existing race conditions. Is my understanding correct, that the goal is to prioritize correctness for now and to remove unnecessary barriers in the future?

Yes that is my intention

Thanks for the reivew

…it in omp.private (llvm#124019)" Breaks 535.weather , hangs on ringo This reverts commit aeaafce.

…p.private (llvm#124019) The intention of this work is to give MLIR->LLVMIR conversion freedom to control how the private variable is allocated so that it can be allocated on the stack in ordinary cases or as part of a structure used to give closure context for tasks which might outlive the current stack frame. See RFC: https://discourse.llvm.org/t/rfc-openmp-supporting-delayed-task-execution-with-firstprivate-variables/83084 For example, a privatizer for an integer used to look like ```mlir omp.private {type = private} @x.privatizer : !fir.ref<i32> alloc { ^bb0(%arg0: !fir.ref<i32>): %0 = ... allocate proper memory for the private clone ... omp.yield(%0 : !fir.ref<i32>) } ``` After this change, allocation become implicit in the operation: ```mlir omp.private {type = private} @x.privatizer : i32 ``` For more complex types that require initialization after allocation, an init region can be used: ``` mlir omp.private {type = private} @x.privatizer : !some.type init { ^bb0(%arg0: !some.pointer<!some.type>, %arg1: !some.pointer<!some.type>): // initialize %arg1, using %arg0 as a mold for allocations omp.yield(%arg1 : !some.pointer<!some.type>) } dealloc { ^bb0(%arg0: !some.pointer<!some.type>): ... deallocate memory allocated by the init region ... omp.yield } ``` This patch lays the groundwork for delayed task execution but is not enough on its own. After this patch all gfortran tests which previously passed still pass. There are the following changes to the Fujitsu test suite: - 0380_0009 and 0435_0009 are fixed - 0688_0041 now fails at runtime. This patch is testing firstprivate variables with tasks. Previously we got lucky with the undefined behavior and won the race. After these changes we no longer get lucky. This patch lays the groundwork for a proper fix for this issue. In flang the lowering re-uses the existing lowering used for reduction init and dealloc regions. In flang, before this patch we hit a TODO with the same wording when generating the copy region for firstprivate polymorphic variables. After this patch the box-like fir.class is passed by reference into the copy region, leading to a different path that didn't hit that old TODO but the generated code still didn't work so I added a new TODO in DataSharingProcessor.

skatrak · 2025-02-13T14:16:16Z

flang/lib/Lower/OpenMP/DataSharingProcessor.cpp

@@ -184,7 +218,8 @@ bool DataSharingProcessor::needBarrier() {
  // Emit implicit barrier for linear clause. Maybe on somewhere else.
  for (const semantics::Symbol *sym : allPrivatizedSymbols) {
    if (sym->test(semantics::Symbol::Flag::OmpLastPrivate) &&
-        (sym->test(semantics::Symbol::Flag::OmpFirstPrivate) || callsInitClone))
+        (sym->test(semantics::Symbol::Flag::OmpFirstPrivate) ||
+         mightHaveReadHostSym))


Sorry @tblah for the late comment to this PR. I am looking into an issue that came up recently, and I believe it's related to this check here and your extensions to initialize that flag also during delayed privatization. So, I wanted to ask about it to try to understand it better, and see what the best solution is.

As far as I can tell, the mightHaveReadHostSym flag is set globally. That is, whenever a symbol is known to either initialize a clone or to have a privatizer with an init mold argument that is used, this will be true. Then, a barrier will be inserted e.g. if it is true and there are any lastprivate symbols. Is that the intended behavior?

My concern is that it doesn't look at whether the lastprivate symbol is the same one that caused the flag to be set. This is something that is triggered in the following example:

subroutine simd() integer :: i, j integer, dimension ( 3 ) :: x !$omp simd collapse(2) private(x) do i = 1, 10 do j = 1, 10 call foo(x) end do end do !$omp end simd end subroutine simd

The i variable is implicitly marked as lastprivate (not sure where that's done, but I'm assuming it's simd-specific behavior) and there is a private clause for a non-primitive type, so this results in the insertion of a barrier. However, replacing simd for do no longer causes this issue because i is no longer lastprivate then. I was just wondering if it's on purpose that we're adding a barrier due to different symbols meeting each side of the condition.

I guess that would have been triggered previously as well, it's just that this change made it also happen for delayed privatization, which then impacts composite construct lowering and ends up triggering an MLIR verifier error for target SPMD due to the added barrier.

Maybe @ergawy or @luporl can also chime in if you know whether this is the expected behavior. I have a small patch updating mightHaveReadHostSym so that a barrier would not be inserted in this case, which I can share if this isn't expected behavior.

AFAIU, a barrier should not be needed in this case, since it is not possible to have concurrent read/write of the same memory region, as i is only written to in the last iteration and, as it seems, x is only read during privatization (this could probably be optimized, since x's shape is known at compile time).

I hadn't noticed it before, but having a single mightHaveReadHostSym variable for all privatized symbols is a problem indeed. A possible solution would be to have a per-symbol mightHaveReadHostSym.

Thank you for the quick response. I just created #127074 to address the second part of the issue. Though it looks like, from what you're saying, we might want to treat induction variables differently as well?

Thanks for the quick fix.
I only wanted to point out that, in this case, x wouldn't even need to be read at all.
Actually, this optimization is implemented in #125901.

Do you think there is any advantage in treating induction variables differently?

Ah, thank you for the explanation, that makes sense. I would try to have as few edge cases as possible, so if we can get away with treating every variable the same way, I'd prefer it.

DO loop bounds have to be scalars, so it seems to me that the induction variable couldn't trigger this via the mightHaveReadHostSym=true path. Also, semantics triggers an error about passing a loop induction variable into a firstprivate clause, so it seems like we should be fine without adding an edge case for this.

Sorry for my slow response @skatrak, I was on holiday. Thanks for fixing it, and thanks @luporl for chiming in.

tblah requested review from ergawy, skatrak, kiranchandramohan, pawosm-arm, luporl and kparzysz January 22, 2025 21:46

tblah changed the base branch from main to users/tblah/delayed-task-exec-0 January 22, 2025 21:48

tblah requested a review from kaviya2510 January 23, 2025 11:25

ergawy reviewed Jan 23, 2025

View reviewed changes

ergawy approved these changes Jan 24, 2025

View reviewed changes

Base automatically changed from users/tblah/delayed-task-exec-0 to main January 24, 2025 14:48

luporl reviewed Jan 24, 2025

View reviewed changes

tblah added 17 commits January 30, 2025 12:34

Fix omp target maps for privatized symbols

3385b3e

Support pointers

50c6e3a

The copy region codegen hasn't changed as a result of this patch series. However I think there is a bug in the copy region generated in equivalence.f90. This patch series is already too big so I won't change the copy region here.

Fix crash lowering fir::EmboxOp whithout shape

15e59a7

This only affects POINTERs (which should not be initialized) and NULL allocatables so the actual contents of the shape doesn't matter. This is just so the embox operation is converted to LLVMIR correctly.

Implement lupori's derive type initialization patch

0d8b42d

For reference see both #120295 and #121808 This changes the barrier logic. See discussion here: https://github.com/llvm/llvm-project/pull/120295/files#r1910663473

Deallocate derived types

eebf59c

This will deallocate in exactly the same cases as the old implementation before this series. I expect some cases are missed but I want to avoid functional changes wherever possible in this (already too big) patch series.

Support arrays of polymorphic types

6953f3f

Fix compiler crash for unnesecarry len params

81a6b26

Readability improvments

61c0d3b

Fix MLIR test

729ae4c

I made a mistake when rewriting the privatizer, which led to the test reading incorrectly.

More readability improvments

3d6141d

Allow block arguments in MapsForPrivatizedSymbols

e67683e

Add negative tests for barriers

9c22293

Fix test broken after rebase

9284bb3

tblah force-pushed the users/tblah/delayed-task-exec-1 branch from ce45103 to 9284bb3 Compare January 30, 2025 13:50

luporl approved these changes Jan 30, 2025

View reviewed changes

tblah merged commit aeaafce into main Jan 31, 2025
8 checks passed

tblah deleted the users/tblah/delayed-task-exec-1 branch January 31, 2025 09:35

searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Feb 3, 2025

Revert "[mlir][OpenMP][flang] make private variable allocation implic…

a6f7be1

…it in omp.private (llvm#124019)" Breaks 535.weather , hangs on ringo This reverts commit aeaafce.

skatrak reviewed Feb 13, 2025

View reviewed changes

mjklemm mentioned this pull request Mar 6, 2025

[flang][OpenMP] "Call parameter type does not match function signature!" for private variables #130159

Closed

[mlir][OpenMP][flang] make private variable allocation implicit in omp.private #124019

[mlir][OpenMP][flang] make private variable allocation implicit in omp.private #124019

Conversation

tblah commented Jan 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tblah commented Jan 23, 2025

Uh oh!

ergawy left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tblah Jan 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ergawy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

luporl left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tblah commented Jan 30, 2025

Uh oh!

luporl left a comment

Choose a reason for hiding this comment

Uh oh!

tblah commented Jan 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

skatrak Feb 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tblah commented Jan 22, 2025 •

edited

Loading

ergawy left a comment •

edited

Loading

tblah Jan 23, 2025 •

edited

Loading

tblah commented Jan 30, 2025 •

edited

Loading

skatrak Feb 13, 2025 •

edited

Loading