Skip to content

[flang][OpenMP] Translate OpenMP scopes when compiling for target device #130078

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Mar 19, 2025

Conversation

ergawy
Copy link
Member

@ergawy ergawy commented Mar 6, 2025

If a target directive is nested in a host OpenMP directive (e.g.
parallel, task, or a worksharing loop), flang currently crashes if the
target directive-related MLIR ops (e.g. omp.map.bounds and
omp.map.info depends on SSA values defined inside the parent host
OpenMP directives/ops.

This PR tries to solve this problem by treating these parent OpenMP ops
as "SSA scopes". Whenever we are translating for the device, instead of
completely translating host ops, we just tranlate their MLIR ops as pure
SSA values.

@llvmbot
Copy link
Member

llvmbot commented Mar 6, 2025

@llvm/pr-subscribers-mlir-openmp
@llvm/pr-subscribers-mlir-llvm
@llvm/pr-subscribers-flang-openmp

@llvm/pr-subscribers-mlir

Author: Kareem Ergawy (ergawy)

Changes

Todo:

  • Add tests
  • Handle all ops with block arguments.

With these changes, we can handle inputs like:

subroutine test_sections_target_with_allocatable
implicit none
integer,parameter :: N=100, rt=8
real(kind=rt), allocatable, dimension(:) :: a
integer :: i

allocate(a(1:N))

!$omp parallel
  !$omp parallel
    !$omp sections
      !$omp section
        !$omp target
          a(1) = 1.0_rt
        !$omp end target 
    !$omp end sections
  !$omp end parallel
!$omp end parallel


!$omp parallel
  !$omp parallel
    !$omp do
    do i=1,10
      !$omp target
        a(i) = 1.0_rt
      !$omp end target 
    end do
    !$omp end do
  !$omp end parallel
!$omp end parallel

end subroutine

Full diff: https://github.com/llvm/llvm-project/pull/130078.diff

1 Files Affected:

  • (modified) mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp (+53-9)
diff --git a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
index 32c7c501d03c3..6af8856618f27 100644
--- a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+++ b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
@@ -542,6 +542,13 @@ static llvm::omp::ProcBindKind getProcBindKind(omp::ClauseProcBindKind kind) {
   llvm_unreachable("Unknown ClauseProcBindKind kind");
 }
 
+static void forwardArgs(LLVM::ModuleTranslation &moduleTranslation,
+                        llvm::ArrayRef<BlockArgument> blockArgs,
+                        OperandRange operands) {
+  for (auto [arg, var] : llvm::zip_equal(blockArgs, operands))
+    moduleTranslation.mapValue(arg, moduleTranslation.lookupValue(var));
+}
+
 /// Helper function to map block arguments defined by ignored loop wrappers to
 /// LLVM values and prevent any uses of those from triggering null pointer
 /// dereferences.
@@ -554,18 +561,12 @@ convertIgnoredWrapper(omp::LoopWrapperInterface opInst,
   // Map block arguments directly to the LLVM value associated to the
   // corresponding operand. This is semantically equivalent to this wrapper not
   // being present.
-  auto forwardArgs =
-      [&moduleTranslation](llvm::ArrayRef<BlockArgument> blockArgs,
-                           OperandRange operands) {
-        for (auto [arg, var] : llvm::zip_equal(blockArgs, operands))
-          moduleTranslation.mapValue(arg, moduleTranslation.lookupValue(var));
-      };
-
   return llvm::TypeSwitch<Operation *, LogicalResult>(opInst)
       .Case([&](omp::SimdOp op) {
         auto blockArgIface = cast<omp::BlockArgOpenMPOpInterface>(*op);
-        forwardArgs(blockArgIface.getPrivateBlockArgs(), op.getPrivateVars());
-        forwardArgs(blockArgIface.getReductionBlockArgs(),
+        forwardArgs(moduleTranslation, blockArgIface.getPrivateBlockArgs(),
+                    op.getPrivateVars());
+        forwardArgs(moduleTranslation, blockArgIface.getReductionBlockArgs(),
                     op.getReductionVars());
         op.emitWarning() << "simd information on composite construct discarded";
         return success();
@@ -5255,6 +5256,49 @@ convertTargetOpsInNest(Operation *op, llvm::IRBuilderBase &builder,
               return WalkResult::interrupt();
             return WalkResult::skip();
           }
+
+          // Not-target ops might nest target-related ops, therefore, we
+          // translate them as non-OpenMP scopes. Translating them is needed by
+          // nested target-related ops since they might LLVM values defined in
+          // their parent non-target ops.
+          if (isa<omp::OpenMPDialect>(oper->getDialect()) &&
+              oper->getParentOfType<LLVM::LLVMFuncOp>() &&
+              !oper->getRegions().empty()) {
+
+            // TODO Handle other ops with entry block args.
+            if (auto wsloop = dyn_cast<omp::WsloopOp>(oper)) {
+              auto blockArgIface = cast<omp::BlockArgOpenMPOpInterface>(oper);
+              if (blockArgIface) {
+                forwardArgs(moduleTranslation,
+                            blockArgIface.getPrivateBlockArgs(),
+                            wsloop.getPrivateVars());
+                forwardArgs(moduleTranslation,
+                            blockArgIface.getReductionBlockArgs(),
+                            wsloop.getReductionVars());
+              }
+            }
+
+            if (auto loopNest = dyn_cast<omp::LoopNestOp>(oper)) {
+              for (auto iv : loopNest.getIVs()) {
+                // Create fake allocas just to maintain IR validity.
+                moduleTranslation.mapValue(
+                    iv, builder.CreateAlloca(
+                            moduleTranslation.convertType(iv.getType())));
+              }
+            }
+
+            for (Region &region : oper->getRegions()) {
+              auto result = convertOmpOpRegions(
+                  region, oper->getName().getStringRef().str() + ".fake.region",
+                  builder, moduleTranslation);
+              if (result.takeError())
+                return WalkResult::interrupt();
+              builder.SetInsertPoint(result.get(), result.get()->end());
+            }
+
+            return WalkResult::skip();
+          }
+
           return WalkResult::advance();
         }).wasInterrupted();
   return failure(interrupted);

@ergawy ergawy force-pushed the users/ergawy/try_to_fix_nested_target_conversion_2 branch from 43a0362 to b886d11 Compare March 6, 2025 11:42
Copy link
Member

@skatrak skatrak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this work Kareem. I think this approach should work, but we need to keep an eye out in case there are other edge cases that break this, since we're translating inapplicable operations that we discard later.

Long term, I think we should try to go the opposite direction and keep as few host operations as we can get away with (possibly just omp.map.info, omp.map.bounds, global definitions and recipe ops, like omp.declare_reduction, omp.private, etc). The root issue appears to be that we're gathering non-static map information during compilation for both host and device that only actually exists in the host and is only used there, and this wasn't crashing the compiler because top-level operations inside of an llvm.func are translated to LLVM IR. I think we should probably work on avoiding doing that and to stop keeping and translating these top-level operations at all, but something for the near future IMO.

One other nit: We would probably want to rename convertTargetOpsInNest to something like convertHostOpsForDevice, since that's what it would be doing now.

oper->getParentOfType<LLVM::LLVMFuncOp>() &&
!oper->getRegions().empty()) {

// TODO Handle other ops with entry block args.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an unfortunate limitation of the current implementation of the BlockArgOpenMPOpInterface that we can't also get the matching operation arguments associated to the block arguments. I think something like what's currently done for numXyzBlockArgs() methods should work. Then, here instead of checking for each type, we could do something like:

if (auto blockArgIface = dyn_cast<BlockArgOpenMPOpInterface>(oper)) {
  for (auto [arg, var] : llvm::zip_equal(llvm::concat(blockArgIface.getPrivateBlockArgs(), ...), blockArgIface.getPrivateVars(), ...))
    moduleTranslation.mapValue(arg, moduleTranslation.lookupValue(var));
}

I'll work on that, since it's been in my TODO list for some time, but no need to block this work waiting for it. We can simplify it later.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the interface updates. They are awesome. Used them in the PR.

@ergawy ergawy force-pushed the users/ergawy/try_to_fix_nested_target_conversion_2 branch from b886d11 to b55bae3 Compare March 7, 2025 08:23
@ergawy ergawy changed the title [WIP][flang][OpenMP] Translate OpenMP scopes when compiling for target device [flang][OpenMP] Translate OpenMP scopes when compiling for target device Mar 7, 2025
@ergawy ergawy marked this pull request as ready for review March 7, 2025 08:25
Copy link

github-actions bot commented Mar 7, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@ergawy ergawy force-pushed the users/ergawy/try_to_fix_nested_target_conversion_2 branch from b55bae3 to b377e75 Compare March 7, 2025 08:27
@ergawy ergawy force-pushed the users/ergawy/try_to_fix_nested_target_conversion_2 branch from b377e75 to bffc2aa Compare March 7, 2025 08:34
Comment on lines 5313 to 5343
iv, builder.CreateAlloca(
moduleTranslation.convertType(iv.getType())));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this somehow guaranteed to be inserting the allocas in the right spot or should we set the insertion point before this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is (indirectly) called from ModuleTranslation::convertOperation which dispatches convertOperation to all op interfaces. Looking at the implemetation, having a valid insertion point is a pre-condition assumed by all implementations of LLVMTranslationDialectInterface including for example: LLVMDialectLLVMIRTranslationInterface::convertOperation; see https://github.com/llvm/llvm-project/blob/main/mlir/lib/Target/LLVMIR/Dialect/LLVMIR/LLVMToLLVMIRTranslation.cpp#L297.

To be safe, I added an assert.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I meant that in other places we need to call findAllocaInsertPoint before adding new allocas. That will either return the entry block of the function or an alternative spot, if an OpenMPAllocaStackFrame was introduced by a parent operation.

So, the current insertion point there will always be valid, as you say, but maybe it wouldn't be the right place for allocas specifically. I'm not that familiar with this, so I may be wrong. At the end of the day operations created here are later removed from the LLVM module, so maybe this doesn't even matter.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing that I'm thinking for both this case and OpenMP ops that return values is whether it makes sense to use undef to represent them temporarily until they are removed or something else, rather than alloca.

Copy link
Member Author

@ergawy ergawy Mar 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think undef is a better idea than using a fake alloca, less confusing. Did that. edit: used poison since undef is deprecated.

})
.Case([&](omp::TaskOp taskOp) {
forwardPrivateArgs(taskOp, moduleTranslation);
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One way to figure out which ops we need to look at here is to search for OpenMP_InReductionClause, OpenMP_PrivateClause, OpenMP_ReductionClause, OpenMP_TaskReductionClause, OpenMP_UseDeviceAddrClause, OpenMP_UseDevicePtrClause in OpenMPOps.td. I think we need to have cases for all of them before merging this, to avoid crashing the compiler.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was going to say if there is a foolproof way of making sure we have an exhaustive and complete list of ops that can have omp.target ops nested in them and will need to forward their args, it'd be very useful.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should cover all ops now, after using Sergio's latest updates to BlockArgOpenMPOpInterface.

})
.Case([&](omp::TaskOp taskOp) {
forwardPrivateArgs(taskOp, moduleTranslation);
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was going to say if there is a foolproof way of making sure we have an exhaustive and complete list of ops that can have omp.target ops nested in them and will need to forward their args, it'd be very useful.


for (Region &region : oper->getRegions()) {
auto result = convertOmpOpRegions(
region, oper->getName().getStringRef().str() + ".fake.region",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this named .fake.region? As far as my understanding of the problem and the solution you have implemented goes, this is a region that has been created by this pass.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is fake in the sense that it is not a truthful translation of the OpenMP construct being converted. We just need this to prepare the kernel invocation args. Added a comment to clarify.

@ergawy ergawy force-pushed the users/ergawy/try_to_fix_nested_target_conversion_2 branch 2 times, most recently from bf94aeb to bdd5e15 Compare March 12, 2025 05:25
@ergawy ergawy changed the base branch from main to users/skatrak/omp-blockarg-iface-uses March 12, 2025 05:25
@skatrak skatrak force-pushed the users/skatrak/omp-blockarg-iface-uses branch from 270790b to cc7cb76 Compare March 12, 2025 11:53
// in their parent non-target ops.
if (isa<omp::OpenMPDialect>(oper->getDialect()) &&
oper->getParentOfType<LLVM::LLVMFuncOp>() &&
!oper->getRegions().empty()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering what should happen to OpenMP ops that don't have regions. If they return a value, it seems like that value could end up impacting what's passed into an omp.map.info as an argument. Maybe we should map their results to something as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you provide an example where this might happen?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of something like this (since omp.threadprivate seems to be the only non map-related OpenMP op that returns some value at the moment):

module test
  implicit none
  integer :: n
  
  !$omp threadprivate(n)
  
  contains
  subroutine foo(x)
    integer, intent(inout) :: x(10)

    !$omp target map(tofrom: x(1:n))
    call bar(x)
    !$omp end target
  end subroutine
end module

Lowering it to MLIR for the device results in the following sequence of operations:

%3 = omp.threadprivate ...
%4 = fir.declare %3
%13 = omp.map.info var_ptr(%4 : !fir.ref<i32>, i32) ...
omp.target map_entries(%13 -> %arg2 ...) {
  ...
}

Copy link
Member Author

@ergawy ergawy Mar 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the example. However, this is not an issue since when translating the surrounding region, the LLVM value corresponding to the original n value will be resolved:

omp.parallel.fake.region:
  %2 = load i32, ptr @_QFtestEn, align 4
  %3 = sext i32 %2 to i64
  %4 = sub i64 %3, 1
  %5 = sub i64 %4, 0
  %6 = add i64 %5, 1
  %7 = mul i64 1, %6
  %8 = mul i64 %7, 4
  br label %omp.region.cont

(Just to clarify, I surrounded the target op by parallel to make sure the issue could be reproduced.)
So the translation happens without issues.

I think we should keep things simple and not add the extra translation for non-region ops since there does not seem to be need for that, at least at the moment. I think it is not much more effort, the main reason is to simplify things even by a little bit. Let me know if you disagree.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised that the example above isn't causing crashes, since %3 is not mapped to any LLVM values and it contributes to the initialization of %4, which is passed to an omp.map.info.

Actually, if that had caused a crash it looks like any non-OpenMP dialect operation found during the op->walk() in convertTargetOpsInNest would have triggered it in the same way, since their results are not mapped to anything there either. I'm not clear how this is working, but apparently it is.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will dig deeper in this and try to provide a more detailed explanation.

Comment on lines 5313 to 5343
iv, builder.CreateAlloca(
moduleTranslation.convertType(iv.getType())));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I meant that in other places we need to call findAllocaInsertPoint before adding new allocas. That will either return the entry block of the function or an alternative spot, if an OpenMPAllocaStackFrame was introduced by a parent operation.

So, the current insertion point there will always be valid, as you say, but maybe it wouldn't be the right place for allocas specifically. I'm not that familiar with this, so I may be wrong. At the end of the day operations created here are later removed from the LLVM module, so maybe this doesn't even matter.

Comment on lines 5313 to 5343
iv, builder.CreateAlloca(
moduleTranslation.convertType(iv.getType())));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing that I'm thinking for both this case and OpenMP ops that return values is whether it makes sense to use undef to represent them temporarily until they are removed or something else, rather than alloca.

@ergawy ergawy force-pushed the users/ergawy/try_to_fix_nested_target_conversion_2 branch from 5ef645d to 59d8dfc Compare March 12, 2025 15:28
Copy link

github-actions bot commented Mar 12, 2025

✅ With the latest revision this PR passed the undef deprecator.

@ergawy ergawy force-pushed the users/ergawy/try_to_fix_nested_target_conversion_2 branch from 59d8dfc to 3480c75 Compare March 12, 2025 15:36
Base automatically changed from users/skatrak/omp-blockarg-iface-uses to main March 13, 2025 14:48
ergawy added 4 commits March 14, 2025 00:05
If a `target` directive is nested in a host OpenMP directive (e.g.
parallel, task, or a worksharing loop), flang currently crashes if the
target directive-related MLIR ops (e.g. `omp.map.bounds` and
`omp.map.info` depends on SSA values defined inside the parent host
OpenMP directives/ops.

This PR tries to solve this problem by treating these parent OpenMP ops
as "SSA scopes". Whenever we are translating for the device, instead of
completely translating host ops, we just tranlate their MLIR ops as pure
SSA values.
@ergawy ergawy force-pushed the users/ergawy/try_to_fix_nested_target_conversion_2 branch from 3480c75 to 714bac2 Compare March 14, 2025 05:05
@ergawy
Copy link
Member Author

ergawy commented Mar 17, 2025

Ping! Please take a look when you have time.

@ergawy ergawy merged commit e737b84 into main Mar 19, 2025
11 checks passed
@ergawy ergawy deleted the users/ergawy/try_to_fix_nested_target_conversion_2 branch March 19, 2025 07:26
@ronlieb
Copy link
Contributor

ronlieb commented Mar 20, 2025

Hi Kareem,
downstream we are seeing failures in ovo suite, and a few sollve tests

aomp/bin$ ./run_sollve.sh test_team_default_shared.F90

lvm-project/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp:464: llvm::Expe
ctedllvm::BasicBlock* convertOmpOpRegions(mlir::Region&, llvm::StringRef, llvm::IRBuilderBase&, mlir::LLVM::ModuleTranslation&, llvm:
:SmallVectorImplllvm::PHINode**): Assertion `continuationBlockPHIs && "expected continuation block PHIs if converted regions yield va
lues"' failed.

@ergawy
Copy link
Member Author

ergawy commented Mar 20, 2025

Hi Kareem, downstream we are seeing failures in ovo suite, and a few sollve tests

aomp/bin$ ./run_sollve.sh test_team_default_shared.F90

lvm-project/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp:464: llvm::Expe ctedllvm::BasicBlock* convertOmpOpRegions(mlir::Region&, llvm::StringRef, llvm::IRBuilderBase&, mlir::LLVM::ModuleTranslation&, llvm: :SmallVectorImplllvm::PHINode**): Assertion `continuationBlockPHIs && "expected continuation block PHIs if converted regions yield va lues"' failed.

Taking a look ...

ergawy added a commit that referenced this pull request Mar 20, 2025
…for target device

Fixes a bug introduced by #130078.

For non-BlockArgOpenMPOpInterface ops, we also want to map their entry block arguments to their operands, if any. For the current support in the OpenMP dialect, the table below lists all ops that have arguments (SSA operands and/or attributes) and not target-related. Of all these ops, we need to only process `omp.atomic.update` since it is the only op that has SSA operands & an attached region. Therefore, the region's entry block arguments must be mapped to the op's operands in case they are referenced inside the region.

=============================================================================
| op                | operands? | region(s)? | parent is func? | processed? |
=============================================================================
| atomic.read       | yes       | no         | yes             | no         |
| atomic.write      | yes       | no         | yes             | no         |
| atomic.update     | yes       | yes        | yes             | yes        |
| critical          | no        | no         | yes             | no         |
| declare_mapper    | no        | yes        | no              | no         |
| declare_reduction | no        | yes        | no              | no         |
| flush             | yes       | no         | yes             | no         |
| private           | no        | yes        | yes             | no         |
| threadprivate     | yes       | no         | yes             | no         |
| yield             | yes       | no         | yes             | no         |
=============================================================================
ergawy added a commit that referenced this pull request Mar 20, 2025
…for target device

Fixes a bug introduced by #130078.

For non-BlockArgOpenMPOpInterface ops, we also want to map their entry block arguments to their operands, if any. For the current support in the OpenMP dialect, the table below lists all ops that have arguments (SSA operands and/or attributes) and not target-related. Of all these ops, we need to only process `omp.atomic.update` since it is the only op that has SSA operands & an attached region. Therefore, the region's entry block arguments must be mapped to the op's operands in case they are referenced inside the region.

=============================================================================
| op                | operands? | region(s)? | parent is func? | processed? |
=============================================================================
| atomic.read       | yes       | no         | yes             | no         |
| atomic.write      | yes       | no         | yes             | no         |
| atomic.update     | yes       | yes        | yes             | yes        |
| critical          | no        | no         | yes             | no         |
| declare_mapper    | no        | yes        | no              | no         |
| declare_reduction | no        | yes        | no              | no         |
| flush             | yes       | no         | yes             | no         |
| private           | no        | yes        | yes             | no         |
| threadprivate     | yes       | no         | yes             | no         |
| yield             | yes       | no         | yes             | no         |
=============================================================================
dpalermo pushed a commit that referenced this pull request Mar 20, 2025
…for target device (#132165)

Fixes a bug introduced by
#130078.

For non-BlockArgOpenMPOpInterface ops, we also want to map their entry
block arguments to their operands, if any. For the current support in
the OpenMP dialect, the table below lists all ops that have arguments
(SSA operands and/or attributes) and not target-related. Of all these
ops, we need to only process `omp.atomic.update` since it is the only op
that has SSA operands & an attached region. Therefore, the region's
entry block arguments must be mapped to the op's operands in case they
are referenced inside the region.


| op | operands? | region(s)? | parent is func? | processed? |

|--------------|-------------|------------|------------------|-------------|
| atomic.read | yes | no | yes | no |
| atomic.write | yes | no | yes | no |
| atomic.update | yes | yes | yes | yes |
| critical | no | no | yes | no |
| declare_mapper | no | yes | no | no |
| declare_reduction | no | yes | no | no |
| flush | yes | no | yes | no |
| private | no | yes | yes | no |
| threadprivate | yes | no | yes | no |
| yield | yes | no | yes | no |
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Mar 20, 2025
…ing scopes for target device (#132165)

Fixes a bug introduced by
llvm/llvm-project#130078.

For non-BlockArgOpenMPOpInterface ops, we also want to map their entry
block arguments to their operands, if any. For the current support in
the OpenMP dialect, the table below lists all ops that have arguments
(SSA operands and/or attributes) and not target-related. Of all these
ops, we need to only process `omp.atomic.update` since it is the only op
that has SSA operands & an attached region. Therefore, the region's
entry block arguments must be mapped to the op's operands in case they
are referenced inside the region.

| op | operands? | region(s)? | parent is func? | processed? |

|--------------|-------------|------------|------------------|-------------|
| atomic.read | yes | no | yes | no |
| atomic.write | yes | no | yes | no |
| atomic.update | yes | yes | yes | yes |
| critical | no | no | yes | no |
| declare_mapper | no | yes | no | no |
| declare_reduction | no | yes | no | no |
| flush | yes | no | yes | no |
| private | no | yes | yes | no |
| threadprivate | yes | no | yes | no |
| yield | yes | no | yes | no |
searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Mar 21, 2025
searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Mar 21, 2025
…ice (llvm#130078)

If a `target` directive is nested in a host OpenMP directive (e.g.
parallel, task, or a worksharing loop), flang currently crashes if the
target directive-related MLIR ops (e.g. `omp.map.bounds` and
`omp.map.info` depends on SSA values defined inside the parent host
OpenMP directives/ops.

This PR tries to solve this problem by treating these parent OpenMP ops
as "SSA scopes". Whenever we are translating for the device, instead of
completely translating host ops, we just tranlate their MLIR ops as pure
SSA values.
searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Mar 21, 2025
…for target device (llvm#132165)

Fixes a bug introduced by
llvm#130078.

For non-BlockArgOpenMPOpInterface ops, we also want to map their entry
block arguments to their operands, if any. For the current support in
the OpenMP dialect, the table below lists all ops that have arguments
(SSA operands and/or attributes) and not target-related. Of all these
ops, we need to only process `omp.atomic.update` since it is the only op
that has SSA operands & an attached region. Therefore, the region's
entry block arguments must be mapped to the op's operands in case they
are referenced inside the region.


| op | operands? | region(s)? | parent is func? | processed? |

|--------------|-------------|------------|------------------|-------------|
| atomic.read | yes | no | yes | no |
| atomic.write | yes | no | yes | no |
| atomic.update | yes | yes | yes | yes |
| critical | no | no | yes | no |
| declare_mapper | no | yes | no | no |
| declare_reduction | no | yes | no | no |
| flush | yes | no | yes | no |
| private | no | yes | yes | no |
| threadprivate | yes | no | yes | no |
| yield | yes | no | yes | no |
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants