Skip to content

[flang] support fir.alloca operations inside of omp reduction ops #84952

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 15, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions flang/lib/Optimizer/Builder/FIRBuilder.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -208,6 +208,8 @@ mlir::Block *fir::FirOpBuilder::getAllocaBlock() {
.getParentOfType<mlir::omp::OutlineableOpenMPOpInterface>()) {
return ompOutlineableIface.getAllocaBlock();
}
if (mlir::isa<mlir::omp::ReductionDeclareOp>(getRegion().getParentOp()))
return &getRegion().front();
if (auto accRecipeIface =
getRegion().getParentOfType<mlir::acc::RecipeInterface>()) {
return accRecipeIface.getAllocaBlock(getRegion());
Expand Down
11 changes: 9 additions & 2 deletions flang/lib/Optimizer/CodeGen/CodeGen.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -410,8 +410,15 @@ class FIROpConversion : public mlir::ConvertOpToLLVMPattern<FromOp> {
mlir::ConversionPatternRewriter &rewriter) const {
auto thisPt = rewriter.saveInsertionPoint();
mlir::Operation *parentOp = rewriter.getInsertionBlock()->getParentOp();
mlir::Block *insertBlock = getBlockForAllocaInsert(parentOp);
rewriter.setInsertionPointToStart(insertBlock);
if (mlir::isa<mlir::omp::ReductionDeclareOp>(parentOp)) {
// ReductionDeclareOp has multiple child regions. We want to get the first
// block of whichever of those regions we are currently in
mlir::Region *parentRegion = rewriter.getInsertionBlock()->getParent();
rewriter.setInsertionPointToStart(&parentRegion->front());
} else {
mlir::Block *insertBlock = getBlockForAllocaInsert(parentOp);
rewriter.setInsertionPointToStart(insertBlock);
}
Comment on lines +413 to +421
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it not sufficient to modify getBlockForAllocaInsert to handle ReductionDeclareOp?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That function doesn't have access to the rewriter. I could have added it as an argument, but I felt that it reads a bit weird there because it isn't obvious out of context what the conditions on the rewriter are.

The awkwardness comes from the use of multiple regions in the reduction declare op. We need to make sure the alloca ends up in the same region that we are currently inserting into.

auto size = genI32Constant(loc, rewriter, 1);
unsigned allocaAs = getAllocaAddressSpace(rewriter);
unsigned programAs = getProgramAddressSpace(rewriter);
Expand Down
36 changes: 36 additions & 0 deletions flang/test/Fir/omp-reduction-embox-codegen.fir
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
// RUN: tco %s | FileCheck %s

// the fir.embox in the init region is turned into an alloca for the box. Test
// that CodeGen.cpp knows where to place an alloca when it is inside of an
// omp.reduction.declare

// regretably this has to be nonsense IR because we need the subsequent patches
// to process anything useful

omp.reduction.declare @test_reduction : !fir.ref<!fir.box<i32>> init {
^bb0(%arg0: !fir.ref<!fir.box<i32>>):
%0 = fir.alloca !fir.box<i32>
%1 = fir.alloca i32
%2 = fir.embox %1 : (!fir.ref<i32>) -> !fir.box<i32>

// use the embox for something so it isn't removed
fir.store %2 to %0 : !fir.ref<!fir.box<i32>>

omp.yield(%0 : !fir.ref<!fir.box<i32>>)
} combiner {
^bb0(%arg0: !fir.ref<!fir.box<i32>>, %arg1: !fir.ref<!fir.box<i32>>):
%0 = fir.undefined !fir.ref<!fir.box<i32>>
omp.yield(%0 : !fir.ref<!fir.box<i32>>)
}

func.func @_QQmain() attributes {fir.bindc_name = "reduce"} {
%4 = fir.alloca !fir.box<i32>
omp.parallel byref reduction(@test_reduction %4 -> %arg0 : !fir.ref<!fir.box<i32>>) {
omp.terminator
}
return
}

// basically we are testing that there isn't a crash
// CHECK-LABEL: define void @_QQmain
// CHECK-NEXT: alloca { ptr, i64, i32, i8, i8, i8, i8 }, i64 1, align 8