-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[flang][OpenMP][Lower] lower array subscripts for task depend #132994
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The OpenMP standard says that all dependencies in the same set of inter-dependent tasks must be non-overlapping. This simplification means that the OpenMP only needs to keep track of the base addresses of dependency variables. This can be seen in kmp_taskdeps.cpp, which stores task dependency information in a hash table, using the base address as a key. This patch generates a rebox operation to slice boxed arrays, but only the box data address is used for the task dependency. The extra box is optimized away by LLVM at O3. Vector subscripts are TODO (I will address in my next patch). This also fixes a bug for ordinary subscripts when the symbol was mapped to a box: Fixes llvm#132647
@llvm/pr-subscribers-flang-openmp @llvm/pr-subscribers-flang-fir-hlfir Author: Tom Eccles (tblah) ChangesThe OpenMP standard says that all dependencies in the same set of inter-dependent tasks must be non-overlapping. This simplification means that the OpenMP only needs to keep track of the base addresses of dependency variables. This can be seen in kmp_taskdeps.cpp, which stores task dependency information in a hash table, using the base address as a key. This patch generates a rebox operation to slice boxed arrays, but only the box data address is used for the task dependency. The extra box is optimized away by LLVM at O3. Vector subscripts are TODO (I will address in my next patch). This also fixes a bug for ordinary subscripts when the symbol was mapped to a box: Fixes #132647 Full diff: https://github.com/llvm/llvm-project/pull/132994.diff 6 Files Affected:
diff --git a/flang/lib/Lower/OpenMP/ClauseProcessor.cpp b/flang/lib/Lower/OpenMP/ClauseProcessor.cpp
index c66fd46767b86..2610d977fb908 100644
--- a/flang/lib/Lower/OpenMP/ClauseProcessor.cpp
+++ b/flang/lib/Lower/OpenMP/ClauseProcessor.cpp
@@ -14,6 +14,7 @@
#include "Clauses.h"
#include "Utils.h"
+#include "flang/Lower/ConvertExprToHLFIR.h"
#include "flang/Lower/PFTBuilder.h"
#include "flang/Parser/tools.h"
#include "flang/Semantics/tools.h"
@@ -808,7 +809,21 @@ bool ClauseProcessor::processCopyprivate(
return hasCopyPrivate;
}
-bool ClauseProcessor::processDepend(mlir::omp::DependClauseOps &result) const {
+template <typename T>
+static bool isVectorSubscript(const evaluate::Expr<T> &expr) {
+ if (std::optional<evaluate::DataRef> dataRef{evaluate::ExtractDataRef(expr)})
+ if (const auto *arrayRef = std::get_if<evaluate::ArrayRef>(&dataRef->u))
+ for (const evaluate::Subscript &subscript : arrayRef->subscript())
+ if (std::holds_alternative<evaluate::IndirectSubscriptIntegerExpr>(
+ subscript.u))
+ if (subscript.Rank() > 0)
+ return true;
+ return false;
+}
+
+bool ClauseProcessor::processDepend(lower::SymMap &symMap,
+ lower::StatementContext &stmtCtx,
+ mlir::omp::DependClauseOps &result) const {
auto process = [&](const omp::clause::Depend &clause,
const parser::CharBlock &) {
using Depend = omp::clause::Depend;
@@ -830,18 +845,38 @@ bool ClauseProcessor::processDepend(mlir::omp::DependClauseOps &result) const {
for (const omp::Object &object : objects) {
assert(object.ref() && "Expecting designator");
+ mlir::Value dependVar;
if (evaluate::ExtractSubstring(*object.ref())) {
TODO(converter.getCurrentLocation(),
"substring not supported for task depend");
} else if (evaluate::IsArrayElement(*object.ref())) {
- TODO(converter.getCurrentLocation(),
- "array sections not supported for task depend");
+ // Array Section
+ SomeExpr expr = *object.ref();
+ if (isVectorSubscript(expr))
+ TODO(converter.getCurrentLocation(),
+ "Vector subscripted array section for task dependency");
+
+ hlfir::EntityWithAttributes entity = convertExprToHLFIR(
+ converter.getCurrentLocation(), converter, expr, symMap, stmtCtx);
+ dependVar = entity.getBase();
+ } else {
+ semantics::Symbol *sym = object.sym();
+ dependVar = converter.getSymbolAddress(*sym);
}
- semantics::Symbol *sym = object.sym();
- const mlir::Value variable = converter.getSymbolAddress(*sym);
- result.dependVars.push_back(variable);
+ // The openmp dialect doesn't know what to do with boxes (and it would
+ // break layering to teach it about them). The dependency variable can be
+ // a box because it was an array section or because the original symbol
+ // was mapped to a box.
+ // Getting the address of the box data is okay because all the runtime
+ // ultimately cares about is the base address of the array.
+ if (fir::isa_box_type(dependVar.getType())) {
+ fir::FirOpBuilder &builder = converter.getFirOpBuilder();
+ dependVar = builder.create<fir::BoxAddrOp>(
+ converter.getCurrentLocation(), dependVar);
+ }
+ result.dependVars.push_back(dependVar);
}
};
diff --git a/flang/lib/Lower/OpenMP/ClauseProcessor.h b/flang/lib/Lower/OpenMP/ClauseProcessor.h
index aa203689ab560..6b1f7a31c7aac 100644
--- a/flang/lib/Lower/OpenMP/ClauseProcessor.h
+++ b/flang/lib/Lower/OpenMP/ClauseProcessor.h
@@ -104,7 +104,8 @@ class ClauseProcessor {
bool processCopyin() const;
bool processCopyprivate(mlir::Location currentLocation,
mlir::omp::CopyprivateClauseOps &result) const;
- bool processDepend(mlir::omp::DependClauseOps &result) const;
+ bool processDepend(lower::SymMap &symMap, lower::StatementContext &stmtCtx,
+ mlir::omp::DependClauseOps &result) const;
bool
processEnter(llvm::SmallVectorImpl<DeclareTargetCapturePair> &result) const;
bool processIf(omp::clause::If::DirectiveNameModifier directiveName,
diff --git a/flang/lib/Lower/OpenMP/OpenMP.cpp b/flang/lib/Lower/OpenMP/OpenMP.cpp
index b8ef18860e9b4..5bf806c58e6e0 100644
--- a/flang/lib/Lower/OpenMP/OpenMP.cpp
+++ b/flang/lib/Lower/OpenMP/OpenMP.cpp
@@ -1669,15 +1669,15 @@ static void genSingleClauses(lower::AbstractConverter &converter,
static void genTargetClauses(
lower::AbstractConverter &converter, semantics::SemanticsContext &semaCtx,
- lower::StatementContext &stmtCtx, lower::pft::Evaluation &eval,
- const List<Clause> &clauses, mlir::Location loc,
- mlir::omp::TargetOperands &clauseOps,
+ lower::SymMap &symTable, lower::StatementContext &stmtCtx,
+ lower::pft::Evaluation &eval, const List<Clause> &clauses,
+ mlir::Location loc, mlir::omp::TargetOperands &clauseOps,
llvm::SmallVectorImpl<const semantics::Symbol *> &hasDeviceAddrSyms,
llvm::SmallVectorImpl<const semantics::Symbol *> &isDevicePtrSyms,
llvm::SmallVectorImpl<const semantics::Symbol *> &mapSyms) {
ClauseProcessor cp(converter, semaCtx, clauses);
cp.processBare(clauseOps);
- cp.processDepend(clauseOps);
+ cp.processDepend(symTable, stmtCtx, clauseOps);
cp.processDevice(stmtCtx, clauseOps);
cp.processHasDeviceAddr(stmtCtx, clauseOps, hasDeviceAddrSyms);
if (!hostEvalInfo.empty()) {
@@ -1728,11 +1728,12 @@ static void genTargetDataClauses(
static void genTargetEnterExitUpdateDataClauses(
lower::AbstractConverter &converter, semantics::SemanticsContext &semaCtx,
- lower::StatementContext &stmtCtx, const List<Clause> &clauses,
- mlir::Location loc, llvm::omp::Directive directive,
+ lower::SymMap &symTable, lower::StatementContext &stmtCtx,
+ const List<Clause> &clauses, mlir::Location loc,
+ llvm::omp::Directive directive,
mlir::omp::TargetEnterExitUpdateDataOperands &clauseOps) {
ClauseProcessor cp(converter, semaCtx, clauses);
- cp.processDepend(clauseOps);
+ cp.processDepend(symTable, stmtCtx, clauseOps);
cp.processDevice(stmtCtx, clauseOps);
cp.processIf(directive, clauseOps);
@@ -1746,12 +1747,13 @@ static void genTargetEnterExitUpdateDataClauses(
static void genTaskClauses(lower::AbstractConverter &converter,
semantics::SemanticsContext &semaCtx,
+ lower::SymMap &symTable,
lower::StatementContext &stmtCtx,
const List<Clause> &clauses, mlir::Location loc,
mlir::omp::TaskOperands &clauseOps) {
ClauseProcessor cp(converter, semaCtx, clauses);
cp.processAllocate(clauseOps);
- cp.processDepend(clauseOps);
+ cp.processDepend(symTable, stmtCtx, clauseOps);
cp.processFinal(stmtCtx, clauseOps);
cp.processIf(llvm::omp::Directive::OMPD_task, clauseOps);
cp.processMergeable(clauseOps);
@@ -2194,8 +2196,8 @@ genTargetOp(lower::AbstractConverter &converter, lower::SymMap &symTable,
mlir::omp::TargetOperands clauseOps;
llvm::SmallVector<const semantics::Symbol *> mapSyms, isDevicePtrSyms,
hasDeviceAddrSyms;
- genTargetClauses(converter, semaCtx, stmtCtx, eval, item->clauses, loc,
- clauseOps, hasDeviceAddrSyms, isDevicePtrSyms, mapSyms);
+ genTargetClauses(converter, semaCtx, symTable, stmtCtx, eval, item->clauses,
+ loc, clauseOps, hasDeviceAddrSyms, isDevicePtrSyms, mapSyms);
DataSharingProcessor dsp(converter, semaCtx, item->clauses, eval,
/*shouldCollectPreDeterminedSymbols=*/
@@ -2415,7 +2417,7 @@ static OpTy genTargetEnterExitUpdateDataOp(
}
mlir::omp::TargetEnterExitUpdateDataOperands clauseOps;
- genTargetEnterExitUpdateDataClauses(converter, semaCtx, stmtCtx,
+ genTargetEnterExitUpdateDataClauses(converter, semaCtx, symTable, stmtCtx,
item->clauses, loc, directive, clauseOps);
return firOpBuilder.create<OpTy>(loc, clauseOps);
@@ -2428,7 +2430,8 @@ genTaskOp(lower::AbstractConverter &converter, lower::SymMap &symTable,
ConstructQueue::const_iterator item) {
lower::StatementContext stmtCtx;
mlir::omp::TaskOperands clauseOps;
- genTaskClauses(converter, semaCtx, stmtCtx, item->clauses, loc, clauseOps);
+ genTaskClauses(converter, semaCtx, symTable, stmtCtx, item->clauses, loc,
+ clauseOps);
if (!enableDelayedPrivatization)
return genOpWithBody<mlir::omp::TaskOp>(
diff --git a/flang/test/Lower/OpenMP/Todo/depend-clause-vector-subscript-array-section.f90 b/flang/test/Lower/OpenMP/Todo/depend-clause-vector-subscript-array-section.f90
new file mode 100644
index 0000000000000..f3bd58c8c559a
--- /dev/null
+++ b/flang/test/Lower/OpenMP/Todo/depend-clause-vector-subscript-array-section.f90
@@ -0,0 +1,11 @@
+! RUN: %not_todo_cmd bbc -emit-hlfir -fopenmp -o - %s 2>&1 | FileCheck %s
+! RUN: %not_todo_cmd %flang_fc1 -emit-hlfir -fopenmp -o - %s 2>&1 | FileCheck %s
+
+! CHECK: Vector subscripted array section for task dependency
+subroutine vectorSubscriptArraySection(array, indices)
+ integer :: array(:)
+ integer :: indices(:)
+
+ !$omp task depend (in: array(indices))
+ !$omp end task
+end subroutine
diff --git a/flang/test/Lower/OpenMP/task-depend-array-section.f90 b/flang/test/Lower/OpenMP/task-depend-array-section.f90
new file mode 100644
index 0000000000000..b364a5e06a29c
--- /dev/null
+++ b/flang/test/Lower/OpenMP/task-depend-array-section.f90
@@ -0,0 +1,51 @@
+! RUN: %flang_fc1 -emit-hlfir -fopenmp %s -o - | FileCheck %s
+
+subroutine knownShape(array)
+ integer :: array(10)
+
+ !$omp task depend(in: array(2:8))
+ !$omp end task
+end subroutine
+
+! CHECK-LABEL: func.func @_QPknownshape(
+! CHECK-SAME: %[[VAL_0:[0-9]+|[a-zA-Z$._-][a-zA-Z0-9$._-]*]]: !fir.ref<!fir.array<10xi32>> {fir.bindc_name = "array"}) {
+! CHECK: %[[VAL_1:.*]] = fir.dummy_scope : !fir.dscope
+! CHECK: %[[VAL_2:.*]] = arith.constant 10 : index
+! CHECK: %[[VAL_3:.*]] = fir.shape %[[VAL_2]] : (index) -> !fir.shape<1>
+! CHECK: %[[VAL_4:.*]]:2 = hlfir.declare %[[VAL_0]](%[[VAL_3]]) dummy_scope %[[VAL_1]] {uniq_name = "_QFknownshapeEarray"} : (!fir.ref<!fir.array<10xi32>>, !fir.shape<1>, !fir.dscope) -> (!fir.ref<!fir.array<10xi32>>, !fir.ref<!fir.array<10xi32>>)
+! CHECK: %[[VAL_5:.*]] = arith.constant 2 : index
+! CHECK: %[[VAL_6:.*]] = arith.constant 8 : index
+! CHECK: %[[VAL_7:.*]] = arith.constant 1 : index
+! CHECK: %[[VAL_8:.*]] = arith.constant 7 : index
+! CHECK: %[[VAL_9:.*]] = fir.shape %[[VAL_8]] : (index) -> !fir.shape<1>
+! CHECK: %[[VAL_10:.*]] = hlfir.designate %[[VAL_4]]#0 (%[[VAL_5]]:%[[VAL_6]]:%[[VAL_7]]) shape %[[VAL_9]] : (!fir.ref<!fir.array<10xi32>>, index, index, index, !fir.shape<1>) -> !fir.ref<!fir.array<7xi32>>
+! CHECK: omp.task depend(taskdependin -> %[[VAL_10]] : !fir.ref<!fir.array<7xi32>>) {
+! CHECK: omp.terminator
+! CHECK: }
+! CHECK: return
+! CHECK: }
+
+
+subroutine assumedShape(array)
+ integer :: array(:)
+
+ !$omp task depend(in: array(2:8:2))
+ !$omp end task
+end subroutine
+
+! CHECK-LABEL: func.func @_QPassumedshape(
+! CHECK-SAME: %[[VAL_0:[0-9]+|[a-zA-Z$._-][a-zA-Z0-9$._-]*]]: !fir.box<!fir.array<?xi32>> {fir.bindc_name = "array"}) {
+! CHECK: %[[VAL_1:.*]] = fir.dummy_scope : !fir.dscope
+! CHECK: %[[VAL_2:.*]]:2 = hlfir.declare %[[VAL_0]] dummy_scope %[[VAL_1]] {uniq_name = "_QFassumedshapeEarray"} : (!fir.box<!fir.array<?xi32>>, !fir.dscope) -> (!fir.box<!fir.array<?xi32>>, !fir.box<!fir.array<?xi32>>)
+! CHECK: %[[VAL_3:.*]] = arith.constant 2 : index
+! CHECK: %[[VAL_4:.*]] = arith.constant 8 : index
+! CHECK: %[[VAL_5:.*]] = arith.constant 2 : index
+! CHECK: %[[VAL_6:.*]] = arith.constant 4 : index
+! CHECK: %[[VAL_7:.*]] = fir.shape %[[VAL_6]] : (index) -> !fir.shape<1>
+! CHECK: %[[VAL_8:.*]] = hlfir.designate %[[VAL_2]]#0 (%[[VAL_3]]:%[[VAL_4]]:%[[VAL_5]]) shape %[[VAL_7]] : (!fir.box<!fir.array<?xi32>>, index, index, index, !fir.shape<1>) -> !fir.box<!fir.array<4xi32>>
+! CHECK: %[[VAL_9:.*]] = fir.box_addr %[[VAL_8]] : (!fir.box<!fir.array<4xi32>>) -> !fir.ref<!fir.array<4xi32>>
+! CHECK: omp.task depend(taskdependin -> %[[VAL_9]] : !fir.ref<!fir.array<4xi32>>) {
+! CHECK: omp.terminator
+! CHECK: }
+! CHECK: return
+! CHECK: }
diff --git a/flang/test/Lower/OpenMP/task.f90 b/flang/test/Lower/OpenMP/task.f90
index 393801997aebc..7e1e49dad9797 100644
--- a/flang/test/Lower/OpenMP/task.f90
+++ b/flang/test/Lower/OpenMP/task.f90
@@ -158,6 +158,14 @@ subroutine task_depend_multi_task()
!$omp end task
end subroutine task_depend_multi_task
+subroutine task_depend_box(array)
+ integer :: array(:)
+ !CHECK: %[[BOX_ADDR:.*]] = fir.box_addr %{{.*}} : (!fir.box<!fir.array<?xi32>>) -> !fir.ref<!fir.array<?xi32>>
+ !CHECK: omp.task depend(taskdependin -> %[[BOX_ADDR]] : !fir.ref<!fir.array<?xi32>>)
+ !$omp task depend(in: array)
+ !$omp end task
+end subroutine
+
!===============================================================================
! `private` clause
!===============================================================================
|
// was mapped to a box. | ||
// Getting the address of the box data is okay because all the runtime | ||
// ultimately cares about is the base address of the array. | ||
if (fir::isa_box_type(dependVar.getType())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about an allocatable array?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The standard says that using an unallocated allocatable is undefined behavior (same for unassociated pointers). So I don't think this needs any special handling here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you saying it is OK to use the address of the descriptor and not the real address of the array for this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No that would not be okay. As I understand it, this will use the address of the data pointed to by the descriptor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You will probably need to unwrap the ref or dyn_cast_ptrEleTy
otherwise I think this code will only work for the box type (assumed shape?) and not fir.ref<fir.box
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ohh I see what you mean now. Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in a891346
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Please wait a day in case the other folks working in this area has comments.
Is there shared code with target dependencies? Would we need tests there?
Just as a side point. An alternative way to express dependencies is to use tokens like in async dialect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix.
@@ -2428,7 +2430,8 @@ genTaskOp(lower::AbstractConverter &converter, lower::SymMap &symTable, | |||
ConstructQueue::const_iterator item) { | |||
lower::StatementContext stmtCtx; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While working on the next patch in this series I've noticed that this stmtCtx
seems not to be scoped correctly. If any cleanup code is added to it, it will emit that cleanup code when it is destroyed. Unfortunately, when this goes out of scope at the end of genTaskOp
the builder insertion point is inside of the task operation. It would make more sense to do the cleanup after the end of the task operation to avoid adding unnecessary shared variables (this might be only a theoretical bug because I haven't yet found an example where any cleanup code is still present by the time we reach FIR.)
I have a proof of concept fix which I will finish off on Monday.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am trying to understand the problem here. This stmtCtx
is used in managing and cleaning up intermediates while generating expressions of certain clauses. At the time the cleanup code is emitted, the insertion point is inside the task's region. Is it after the operations in task's region are generated, or before that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's discuss this on the PR for this change. Hopefully the example will make it clearer: #133891 (comment)
The statement context is used for lowering clauses for openmp operations using generalised helpers from flang lowering. The statement context stores closures which generate code for cleaning up temporary values generated by the lowering helper. These closures are run when the statement construct is destroyed. Keeping the statement context local to the clause or operation being lowered without any special handling was not correct because any cleanup code would be generated at the insertion point when that statement context went out of scope (which would in general be inside of the newly created container operation). It would be better to generate the cleanup code after the newly created operation (clause processing is synchronous even for deferred tasks). Currently supported clauses are mostly populated with simple scalar values that require no cleanup. Even the simple array sections added by #132994 needed no cleanup because indexing the right values of the array did not create any temporaries. Supporting array sections with vector indexing will generate hlfir.destroy operations for cleanup. This patch fixes where those will be created. Those hlfir.destroy operations don't generate any FIR (or LLVM) code, but the issue still exists theoretically. I wasn't able to find any clauses which have any cleanup to use to test this PR. It is probably NFC for the current lowering. This will be tested in the PR adding vector subscripting of array sections.
…32994) The OpenMP standard says that all dependencies in the same set of inter-dependent tasks must be non-overlapping. This simplification means that the OpenMP only needs to keep track of the base addresses of dependency variables. This can be seen in kmp_taskdeps.cpp, which stores task dependency information in a hash table, using the base address as a key. This patch generates a rebox operation to slice boxed arrays, but only the box data address is used for the task dependency. The extra box is optimized away by LLVM at O3. Vector subscripts are TODO (I will address in my next patch). This also fixes a bug for ordinary subscripts when the symbol was mapped to a box: Fixes llvm#132647
…133891) The statement context is used for lowering clauses for openmp operations using generalised helpers from flang lowering. The statement context stores closures which generate code for cleaning up temporary values generated by the lowering helper. These closures are run when the statement construct is destroyed. Keeping the statement context local to the clause or operation being lowered without any special handling was not correct because any cleanup code would be generated at the insertion point when that statement context went out of scope (which would in general be inside of the newly created container operation). It would be better to generate the cleanup code after the newly created operation (clause processing is synchronous even for deferred tasks). Currently supported clauses are mostly populated with simple scalar values that require no cleanup. Even the simple array sections added by #132994 needed no cleanup because indexing the right values of the array did not create any temporaries. Supporting array sections with vector indexing will generate hlfir.destroy operations for cleanup. This patch fixes where those will be created. Those hlfir.destroy operations don't generate any FIR (or LLVM) code, but the issue still exists theoretically. I wasn't able to find any clauses which have any cleanup to use to test this PR. It is probably NFC for the current lowering. This will be tested in [the PR adding vector subscripting of array sections](#133892).
This was added in - llvm#132230 - llvm#132994 - llvm#133892
…rted (#139081) This was added in - llvm/llvm-project#132230 - llvm/llvm-project#132994 - llvm/llvm-project#133892
The OpenMP standard says that all dependencies in the same set of inter-dependent tasks must be non-overlapping. This simplification means that the OpenMP only needs to keep track of the base addresses of dependency variables. This can be seen in kmp_taskdeps.cpp, which stores task dependency information in a hash table, using the base address as a key.
This patch generates a rebox operation to slice boxed arrays, but only the box data address is used for the task dependency. The extra box is optimized away by LLVM at O3.
Vector subscripts are TODO (I will address in my next patch).
This also fixes a bug for ordinary subscripts when the symbol was mapped to a box:
Fixes #132647