Skip to content

[mlir][OpenMP] Convert reduction alloc region to LLVMIR #102524

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Aug 22, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 32 additions & 8 deletions mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
Original file line number Diff line number Diff line change
Expand Up @@ -1528,21 +1528,32 @@ def DeclareReductionOp : OpenMP_Op<"declare_reduction", [IsolatedFromAbove,
Symbol]> {
let summary = "declares a reduction kind";
let description = [{
Declares an OpenMP reduction kind. This requires two mandatory and two
Declares an OpenMP reduction kind. This requires two mandatory and three
optional regions.

1. The initializer region specifies how to initialize the thread-local
1. The optional alloc region specifies how to allocate the thread-local
reduction value. This region should not contain control flow and all
IR should be suitable for inlining straight into an entry block. In
the common case this is expected to contain only allocas. It is
expected to `omp.yield` the allocated value on all control paths.
If allocation is conditional (e.g. only allocate if the mold is
allocated), this should be done in the initilizer region and this
region not included. The alloc region is not used for by-value
reductions (where allocation is implicit).
2. The initializer region specifies how to initialize the thread-local
reduction value. This is usually the neutral element of the reduction.
For convenience, the region has an argument that contains the value
of the reduction accumulator at the start of the reduction. It is
expected to `omp.yield` the new value on all control flow paths.
2. The reduction region specifies how to combine two values into one, i.e.
of the reduction accumulator at the start of the reduction. If an alloc
region is specified, there is a second block argument containing the
address of the allocated memory. The initializer region is expected to
`omp.yield` the new value on all control flow paths.
3. The reduction region specifies how to combine two values into one, i.e.
the reduction operator. It accepts the two values as arguments and is
expected to `omp.yield` the combined value on all control flow paths.
3. The atomic reduction region is optional and specifies how two values
4. The atomic reduction region is optional and specifies how two values
can be combined atomically given local accumulator variables. It is
expected to store the combined value in the first accumulator variable.
4. The cleanup region is optional and specifies how to clean up any memory
5. The cleanup region is optional and specifies how to clean up any memory
allocated by the initializer region. The region has an argument that
contains the value of the thread-local reduction accumulator. This will
be executed after the reduction has completed.
Expand All @@ -1558,12 +1569,14 @@ def DeclareReductionOp : OpenMP_Op<"declare_reduction", [IsolatedFromAbove,
let arguments = (ins SymbolNameAttr:$sym_name,
TypeAttr:$type);

let regions = (region AnyRegion:$initializerRegion,
let regions = (region MaxSizedRegion<1>:$allocRegion,
AnyRegion:$initializerRegion,
AnyRegion:$reductionRegion,
AnyRegion:$atomicReductionRegion,
AnyRegion:$cleanupRegion);

let assemblyFormat = "$sym_name `:` $type attr-dict-with-keyword "
"custom<AllocReductionRegion>($allocRegion) "
"`init` $initializerRegion "
"`combiner` $reductionRegion "
"custom<AtomicReductionRegion>($atomicReductionRegion) "
Expand All @@ -1576,6 +1589,17 @@ def DeclareReductionOp : OpenMP_Op<"declare_reduction", [IsolatedFromAbove,

return cast<PointerLikeType>(getAtomicReductionRegion().front().getArgument(0).getType());
}

Value getInitializerMoldArg() {
return getInitializerRegion().front().getArgument(0);
}

Value getInitializerAllocArg() {
if (getAllocRegion().empty() ||
getInitializerRegion().front().getNumArguments() != 2)
return {nullptr};
return getInitializerRegion().front().getArgument(1);
}
}];
let hasRegionVerifier = 1;
}
Expand Down
72 changes: 55 additions & 17 deletions mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1883,46 +1883,84 @@ LogicalResult DistributeOp::verify() {
// DeclareReductionOp
//===----------------------------------------------------------------------===//

static ParseResult parseAtomicReductionRegion(OpAsmParser &parser,
Region &region) {
if (parser.parseOptionalKeyword("atomic"))
static ParseResult parseOptionalReductionRegion(OpAsmParser &parser,
Region &region,
StringRef keyword) {
if (parser.parseOptionalKeyword(keyword))
return success();
return parser.parseRegion(region);
}

static void printAtomicReductionRegion(OpAsmPrinter &printer,
DeclareReductionOp op, Region &region) {
static void printOptionalReductionRegion(OpAsmPrinter &printer, Region &region,
StringRef keyword) {
if (region.empty())
return;
printer << "atomic ";
printer << keyword << " ";
printer.printRegion(region);
}

static ParseResult parseAllocReductionRegion(OpAsmParser &parser,
Region &region) {
return parseOptionalReductionRegion(parser, region, "alloc");
}

static void printAllocReductionRegion(OpAsmPrinter &printer,
DeclareReductionOp op, Region &region) {
printOptionalReductionRegion(printer, region, "alloc");
}

static ParseResult parseAtomicReductionRegion(OpAsmParser &parser,
Region &region) {
return parseOptionalReductionRegion(parser, region, "atomic");
}

static void printAtomicReductionRegion(OpAsmPrinter &printer,
DeclareReductionOp op, Region &region) {
printOptionalReductionRegion(printer, region, "atomic");
}

static ParseResult parseCleanupReductionRegion(OpAsmParser &parser,
Region &region) {
if (parser.parseOptionalKeyword("cleanup"))
return success();
return parser.parseRegion(region);
return parseOptionalReductionRegion(parser, region, "cleanup");
}

static void printCleanupReductionRegion(OpAsmPrinter &printer,
DeclareReductionOp op, Region &region) {
if (region.empty())
return;
printer << "cleanup ";
printer.printRegion(region);
printOptionalReductionRegion(printer, region, "cleanup");
}

LogicalResult DeclareReductionOp::verifyRegions() {
if (!getAllocRegion().empty()) {
for (YieldOp yieldOp : getAllocRegion().getOps<YieldOp>()) {
if (yieldOp.getResults().size() != 1 ||
yieldOp.getResults().getTypes()[0] != getType())
return emitOpError() << "expects alloc region to yield a value "
"of the reduction type";
}
}

if (getInitializerRegion().empty())
return emitOpError() << "expects non-empty initializer region";
Block &initializerEntryBlock = getInitializerRegion().front();
if (initializerEntryBlock.getNumArguments() != 1 ||
initializerEntryBlock.getArgument(0).getType() != getType()) {
return emitOpError() << "expects initializer region with one argument "
"of the reduction type";

if (initializerEntryBlock.getNumArguments() == 1) {
if (!getAllocRegion().empty())
return emitOpError() << "expects two arguments to the initializer region "
"when an allocation region is used";
} else if (initializerEntryBlock.getNumArguments() == 2) {
if (getAllocRegion().empty())
return emitOpError() << "expects one argument to the initializer region "
"when no allocation region is used";
} else {
return emitOpError()
<< "expects one or two arguments to the initializer region";
}

for (mlir::Value arg : initializerEntryBlock.getArguments())
if (arg.getType() != getType())
return emitOpError() << "expects initializer region argument to match "
"the reduction type";

for (YieldOp yieldOp : getInitializerRegion().getOps<YieldOp>()) {
if (yieldOp.getResults().size() != 1 ||
yieldOp.getResults().getTypes()[0] != getType())
Expand Down
134 changes: 99 additions & 35 deletions mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -594,45 +594,85 @@ convertOmpOrderedRegion(Operation &opInst, llvm::IRBuilderBase &builder,

/// Allocate space for privatized reduction variables.
template <typename T>
static void allocByValReductionVars(
T loop, ArrayRef<BlockArgument> reductionArgs, llvm::IRBuilderBase &builder,
LLVM::ModuleTranslation &moduleTranslation,
llvm::OpenMPIRBuilder::InsertPointTy &allocaIP,
SmallVectorImpl<omp::DeclareReductionOp> &reductionDecls,
SmallVectorImpl<llvm::Value *> &privateReductionVariables,
DenseMap<Value, llvm::Value *> &reductionVariableMap,
llvm::ArrayRef<bool> isByRefs) {
static LogicalResult
allocReductionVars(T loop, ArrayRef<BlockArgument> reductionArgs,
llvm::IRBuilderBase &builder,
LLVM::ModuleTranslation &moduleTranslation,
llvm::OpenMPIRBuilder::InsertPointTy &allocaIP,
SmallVectorImpl<omp::DeclareReductionOp> &reductionDecls,
SmallVectorImpl<llvm::Value *> &privateReductionVariables,
DenseMap<Value, llvm::Value *> &reductionVariableMap,
llvm::ArrayRef<bool> isByRefs) {
llvm::IRBuilderBase::InsertPointGuard guard(builder);
builder.SetInsertPoint(allocaIP.getBlock()->getTerminator());

// delay creating stores until after all allocas
SmallVector<std::pair<llvm::Value *, llvm::Value *>> storesToCreate;
storesToCreate.reserve(loop.getNumReductionVars());

for (std::size_t i = 0; i < loop.getNumReductionVars(); ++i) {
if (isByRefs[i])
continue;
llvm::Value *var = builder.CreateAlloca(
moduleTranslation.convertType(reductionDecls[i].getType()));
moduleTranslation.mapValue(reductionArgs[i], var);
privateReductionVariables[i] = var;
reductionVariableMap.try_emplace(loop.getReductionVars()[i], var);
Region &allocRegion = reductionDecls[i].getAllocRegion();
if (isByRefs[i]) {
if (allocRegion.empty())
Copy link
Contributor

@Leporacanthicus Leporacanthicus Aug 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does allocRegion empty mean here? It's not been created? If so, where does the alloca go?

Maybe we should have the opposite assert to below?

Or a comment like the one below saying "remove when all users are done"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The alloc region is optional. If it isn't included it could still be included in the initialization region as normal. This could happen for example if there is no part of allocation that is on the stack (because we don't want a call to malloc mixed into the middle of allocas).

continue;

SmallVector<llvm::Value *, 1> phis;
if (failed(inlineConvertOmpRegions(allocRegion, "omp.reduction.alloc",
builder, moduleTranslation, &phis)))
return failure();
assert(phis.size() == 1 && "expected one allocation to be yielded");

builder.SetInsertPoint(allocaIP.getBlock()->getTerminator());

// Allocate reduction variable (which is a pointer to the real reduction
// variable allocated in the inlined region)
llvm::Value *var = builder.CreateAlloca(
moduleTranslation.convertType(reductionDecls[i].getType()));
storesToCreate.emplace_back(phis[0], var);

privateReductionVariables[i] = var;
moduleTranslation.mapValue(reductionArgs[i], phis[0]);
reductionVariableMap.try_emplace(loop.getReductionVars()[i], phis[0]);
} else {
assert(allocRegion.empty() &&
"allocaction is implicit for by-val reduction");
llvm::Value *var = builder.CreateAlloca(
moduleTranslation.convertType(reductionDecls[i].getType()));
moduleTranslation.mapValue(reductionArgs[i], var);
privateReductionVariables[i] = var;
reductionVariableMap.try_emplace(loop.getReductionVars()[i], var);
}
}

// TODO: further delay this so it doesn't come in the entry block at all
for (auto [data, addr] : storesToCreate)
builder.CreateStore(data, addr);

return success();
}

/// Map input argument to all reduction initialization regions
/// Map input arguments to reduction initialization region
template <typename T>
static void
mapInitializationArg(T loop, LLVM::ModuleTranslation &moduleTranslation,
SmallVectorImpl<omp::DeclareReductionOp> &reductionDecls,
unsigned i) {
mapInitializationArgs(T loop, LLVM::ModuleTranslation &moduleTranslation,
SmallVectorImpl<omp::DeclareReductionOp> &reductionDecls,
DenseMap<Value, llvm::Value *> &reductionVariableMap,
unsigned i) {
// map input argument to the initialization region
mlir::omp::DeclareReductionOp &reduction = reductionDecls[i];
Region &initializerRegion = reduction.getInitializerRegion();
Block &entry = initializerRegion.front();
assert(entry.getNumArguments() == 1 &&
"the initialization region has one argument");

mlir::Value mlirSource = loop.getReductionVars()[i];
llvm::Value *llvmSource = moduleTranslation.lookupValue(mlirSource);
assert(llvmSource && "lookup reduction var");
moduleTranslation.mapValue(entry.getArgument(0), llvmSource);
moduleTranslation.mapValue(reduction.getInitializerMoldArg(), llvmSource);

if (entry.getNumArguments() > 1) {
llvm::Value *allocation =
reductionVariableMap.lookup(loop.getReductionVars()[i]);
moduleTranslation.mapValue(reduction.getInitializerAllocArg(), allocation);
}
}

/// Collect reduction info
Expand Down Expand Up @@ -779,18 +819,21 @@ static LogicalResult allocAndInitializeReductionVars(
if (op.getNumReductionVars() == 0)
return success();

allocByValReductionVars(op, reductionArgs, builder, moduleTranslation,
allocaIP, reductionDecls, privateReductionVariables,
reductionVariableMap, isByRef);
if (failed(allocReductionVars(op, reductionArgs, builder, moduleTranslation,
allocaIP, reductionDecls,
privateReductionVariables, reductionVariableMap,
isByRef)))
return failure();

// Before the loop, store the initial values of reductions into reduction
// variables. Although this could be done after allocas, we don't want to mess
// up with the alloca insertion point.
for (unsigned i = 0; i < op.getNumReductionVars(); ++i) {
SmallVector<llvm::Value *> phis;
SmallVector<llvm::Value *, 1> phis;

// map block argument to initializer region
mapInitializationArg(op, moduleTranslation, reductionDecls, i);
mapInitializationArgs(op, moduleTranslation, reductionDecls,
reductionVariableMap, i);

if (failed(inlineConvertOmpRegions(reductionDecls[i].getInitializerRegion(),
"omp.reduction.neutral", builder,
Expand All @@ -799,6 +842,13 @@ static LogicalResult allocAndInitializeReductionVars(
assert(phis.size() == 1 && "expected one value to be yielded from the "
"reduction neutral element declaration region");
if (isByRef[i]) {
if (!reductionDecls[i].getAllocRegion().empty())
// done in allocReductionVars
continue;

// TODO: this path can be removed once all users of by-ref are updated to
// use an alloc region

// Allocate reduction variable (which is a pointer to the real reduction
// variable allocated in the inlined region)
llvm::Value *var = builder.CreateAlloca(
Expand Down Expand Up @@ -1319,9 +1369,15 @@ convertOmpParallel(omp::ParallelOp opInst, llvm::IRBuilderBase &builder,
opInst.getNumAllocateVars() + opInst.getNumAllocatorsVars(),
opInst.getNumReductionVars());

allocByValReductionVars(opInst, reductionArgs, builder, moduleTranslation,
allocaIP, reductionDecls, privateReductionVariables,
reductionVariableMap, isByRef);
allocaIP =
InsertPointTy(allocaIP.getBlock(),
allocaIP.getBlock()->getTerminator()->getIterator());

if (failed(allocReductionVars(opInst, reductionArgs, builder,
moduleTranslation, allocaIP, reductionDecls,
privateReductionVariables,
reductionVariableMap, isByRef)))
bodyGenStatus = failure();

// Initialize reduction vars
builder.restoreIP(allocaIP);
Expand All @@ -1332,8 +1388,12 @@ convertOmpParallel(omp::ParallelOp opInst, llvm::IRBuilderBase &builder,
SmallVector<llvm::Value *> byRefVars(opInst.getNumReductionVars());
for (unsigned i = 0; i < opInst.getNumReductionVars(); ++i) {
if (isByRef[i]) {
// Allocate reduction variable (which is a pointer to the real reduciton
// variable allocated in the inlined region)
if (!reductionDecls[i].getAllocRegion().empty())
continue;

// TODO: remove after all users of by-ref are updated to use the alloc
// region: Allocate reduction variable (which is a pointer to the real
// reduciton variable allocated in the inlined region)
byRefVars[i] = builder.CreateAlloca(
moduleTranslation.convertType(reductionDecls[i].getType()));
}
Expand All @@ -1345,7 +1405,8 @@ convertOmpParallel(omp::ParallelOp opInst, llvm::IRBuilderBase &builder,
SmallVector<llvm::Value *> phis;

// map the block argument
mapInitializationArg(opInst, moduleTranslation, reductionDecls, i);
mapInitializationArgs(opInst, moduleTranslation, reductionDecls,
reductionVariableMap, i);
if (failed(inlineConvertOmpRegions(
reductionDecls[i].getInitializerRegion(), "omp.reduction.neutral",
builder, moduleTranslation, &phis)))
Expand All @@ -1354,11 +1415,14 @@ convertOmpParallel(omp::ParallelOp opInst, llvm::IRBuilderBase &builder,
"expected one value to be yielded from the "
"reduction neutral element declaration region");

// mapInitializationArg finishes its block with a terminator. We need to
// insert before that terminator.
builder.SetInsertPoint(builder.GetInsertBlock()->getTerminator());

if (isByRef[i]) {
if (!reductionDecls[i].getAllocRegion().empty())
continue;

// TODO: remove after all users of by-ref are updated to use the alloc

// Store the result of the inlined region to the allocated reduction var
// ptr
builder.CreateStore(phis[0], byRefVars[i]);
Expand Down
Loading
Loading