-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[flang] Canonicalize redundant pointer converts. #121864
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This patch adds a canonicalization pattern for optimizing redundant "pointer" fir.converts. Such converts prevent the StackArrays pass to recognize fir.freemem for the corresponding fir.allocmem, e.g.: ``` %69 = fir.allocmem !fir.array<2xi32> %71:2 = hlfir.declare %69(%70) {uniq_name = ".tmp.arrayctor"} : (!fir.heap<!fir.array<2xi32>>, !fir.shape<1>) -> (!fir.heap<!fir.array<2xi32>>, !fir.heap<!fir.array<2xi32>>) %95 = fir.convert %71#1 : (!fir.heap<!fir.array<2xi32>>) -> !fir.ref<!fir.array<2xi32>> %100 = fir.convert %95 : (!fir.ref<!fir.array<2xi32>>) -> !fir.heap<!fir.array<2xi32>> fir.freemem %100 : !fir.heap<!fir.array<2xi32>> ``` I found this in `tonto`, but the change does not affect performance at all. Anyway, it looks like a reasonable thing to do, and it makes easier to compare the performance profiles with other compilers'.
@llvm/pr-subscribers-flang-fir-hlfir Author: Slava Zakharin (vzakhari) ChangesThis patch adds a canonicalization pattern for optimizing redundant
I found this in Full diff: https://github.com/llvm/llvm-project/pull/121864.diff 5 Files Affected:
diff --git a/flang/include/flang/Optimizer/Dialect/CanonicalizationPatterns.td b/flang/include/flang/Optimizer/Dialect/CanonicalizationPatterns.td
index 1dbde5c1c73024..2414de496d45bc 100644
--- a/flang/include/flang/Optimizer/Dialect/CanonicalizationPatterns.td
+++ b/flang/include/flang/Optimizer/Dialect/CanonicalizationPatterns.td
@@ -57,6 +57,9 @@ def StrictSmallerWidthPred : Constraint<CPred<
"$0.getType().getIntOrFloatBitWidth() < "
"$1.getType().getIntOrFloatBitWidth()">>;
+def PointerCompatiblePred
+ : Constraint<CPred<"fir::ConvertOp::isPointerCompatible($0.getType())">>;
+
// floats or ints that undergo successive extensions or successive truncations.
def ConvertConvertOptPattern
: Pat<(fir_ConvertOp:$res (fir_ConvertOp:$irm $arg)),
@@ -112,4 +115,18 @@ def ForwardConstantConvertPattern
(createConstantOp $res, $attr),
[(IndexTypePred $res), (IntegerTypePred $cnt)]>;
+// Optimize redundant pointer conversions, e.g.:
+// %1 = fir.convert %0 :
+// (!fir.heap<!fir.array<2xf32>>) -> !fir.ref<!fir.array<2xf32>>
+// %2 = fir.convert %1 :
+// (!fir.ref<!fir.array<2xf32>>) -> !fir.heap<!fir.array<2xf32>>
+// Will be optimized into:
+// %2 = fir.convert %0 :
+// (!fir.heap<!fir.array<2xf32>>) -> !fir.heap<!fir.array<2xf32>>
+// which is redundant due to RedundantConvertOptPattern.
+def ChainedPointerConvertsPattern
+ : Pat<(fir_ConvertOp:$res(fir_ConvertOp:$irm $arg)), (fir_ConvertOp $arg),
+ [(PointerCompatiblePred $arg), (PointerCompatiblePred $irm),
+ (PointerCompatiblePred $res)]>;
+
#endif // FORTRAN_FIR_REWRITE_PATTERNS
diff --git a/flang/lib/Optimizer/Dialect/FIROps.cpp b/flang/lib/Optimizer/Dialect/FIROps.cpp
index cdcf9bda49a627..fa83aa380e489c 100644
--- a/flang/lib/Optimizer/Dialect/FIROps.cpp
+++ b/flang/lib/Optimizer/Dialect/FIROps.cpp
@@ -1313,7 +1313,8 @@ void fir::ConvertOp::getCanonicalizationPatterns(
results.insert<ConvertConvertOptPattern, ConvertAscendingIndexOptPattern,
ConvertDescendingIndexOptPattern, RedundantConvertOptPattern,
CombineConvertOptPattern, CombineConvertTruncOptPattern,
- ForwardConstantConvertPattern>(context);
+ ForwardConstantConvertPattern, ChainedPointerConvertsPattern>(
+ context);
}
mlir::OpFoldResult fir::ConvertOp::fold(FoldAdaptor adaptor) {
diff --git a/flang/test/Fir/convert-fold.fir b/flang/test/Fir/convert-fold.fir
index ebb6c8db7c891c..fb30e634ba5e6d 100644
--- a/flang/test/Fir/convert-fold.fir
+++ b/flang/test/Fir/convert-fold.fir
@@ -35,3 +35,12 @@ func.func @ctest() -> index {
// CHECK-NEXT: return %{{.*}} : index
return %2 : index
}
+
+// CHECK-LABEL: func.func @ptrtest(
+// CHECK-SAME: %[[VAL_0:.*]]: !fir.heap<!fir.array<2xf32>>) -> !fir.heap<!fir.array<2xf32>> {
+func.func @ptrtest(%0 : !fir.heap<!fir.array<2xf32>>) -> !fir.heap<!fir.array<2xf32>> {
+ %1 = fir.convert %0 : (!fir.heap<!fir.array<2xf32>>) -> !fir.ref<!fir.array<2xf32>>
+ %2 = fir.convert %1 : (!fir.ref<!fir.array<2xf32>>) -> !fir.heap<!fir.array<2xf32>>
+// CHECK: return %[[VAL_0]] : !fir.heap<!fir.array<2xf32>>
+ return %2 : !fir.heap<!fir.array<2xf32>>
+}
diff --git a/flang/test/Lower/array-substring.f90 b/flang/test/Lower/array-substring.f90
index 02101039120e9f..7544fbb989627b 100644
--- a/flang/test/Lower/array-substring.f90
+++ b/flang/test/Lower/array-substring.f90
@@ -24,9 +24,8 @@
! CHECK: %[[VAL_16:.*]] = fir.array_coor %[[VAL_7]](%[[VAL_9]]) {{\[}}%[[VAL_10]]] %[[VAL_15]] : (!fir.ref<!fir.array<1x!fir.char<1,12>>>, !fir.shape<1>, !fir.slice<1>, index) -> !fir.ref<!fir.char<1,12>>
! CHECK: %[[VAL_17:.*]] = fir.convert %[[VAL_16]] : (!fir.ref<!fir.char<1,12>>) -> !fir.ref<!fir.array<12x!fir.char<1>>>
! CHECK: %[[VAL_18:.*]] = fir.coordinate_of %[[VAL_17]], %[[VAL_2]] : (!fir.ref<!fir.array<12x!fir.char<1>>>, index) -> !fir.ref<!fir.char<1>>
-! CHECK: %[[VAL_19:.*]] = fir.convert %[[VAL_18]] : (!fir.ref<!fir.char<1>>) -> !fir.ref<!fir.char<1,?>>
! CHECK: %[[VAL_20:.*]] = fir.array_coor %[[VAL_11]](%[[VAL_9]]) %[[VAL_15]] : (!fir.ref<!fir.array<1x!fir.char<1,8>>>, !fir.shape<1>, index) -> !fir.ref<!fir.char<1,8>>
-! CHECK: %[[VAL_21:.*]] = fir.convert %[[VAL_19]] : (!fir.ref<!fir.char<1,?>>) -> !fir.ref<i8>
+! CHECK: %[[VAL_21:.*]] = fir.convert %[[VAL_18]] : (!fir.ref<!fir.char<1>>) -> !fir.ref<i8>
! CHECK: %[[VAL_22:.*]] = fir.convert %[[VAL_20]] : (!fir.ref<!fir.char<1,8>>) -> !fir.ref<i8>
! CHECK: %[[VAL_23:.*]] = fir.convert %[[VAL_4]] : (index) -> i64
! CHECK: %[[VAL_24:.*]] = fir.call @_FortranACharacterCompareScalar1(%[[VAL_21]], %[[VAL_22]], %[[VAL_23]], %[[VAL_23]]) {{.*}}: (!fir.ref<i8>, !fir.ref<i8>, i64, i64) -> i32
diff --git a/flang/test/Lower/vector-subscript-io.f90 b/flang/test/Lower/vector-subscript-io.f90
index 372130fd099074..9a041af16c88cc 100644
--- a/flang/test/Lower/vector-subscript-io.f90
+++ b/flang/test/Lower/vector-subscript-io.f90
@@ -325,12 +325,11 @@ subroutine substring(x, y, i, j)
! CHECK: %[[VAL_230:.*]] = arith.subi %[[VAL_216]], %[[VAL_210]] : index
! CHECK: %[[VAL_231:.*]] = fir.convert %[[VAL_228]] : (!fir.ref<!fir.char<1,?>>) -> !fir.ref<!fir.array<?x!fir.char<1>>>
! CHECK: %[[VAL_232:.*]] = fir.coordinate_of %[[VAL_231]], %[[VAL_230]] : (!fir.ref<!fir.array<?x!fir.char<1>>>, index) -> !fir.ref<!fir.char<1>>
-! CHECK: %[[VAL_233:.*]] = fir.convert %[[VAL_232]] : (!fir.ref<!fir.char<1>>) -> !fir.ref<!fir.char<1,?>>
! CHECK: %[[VAL_234:.*]] = arith.subi %[[VAL_219]], %[[VAL_216]] : index
! CHECK: %[[VAL_235:.*]] = arith.addi %[[VAL_234]], %[[VAL_210]] : index
! CHECK: %[[VAL_236:.*]] = arith.cmpi slt, %[[VAL_235]], %[[VAL_209]] : index
! CHECK: %[[VAL_237:.*]] = arith.select %[[VAL_236]], %[[VAL_209]], %[[VAL_235]] : index
-! CHECK: %[[VAL_238:.*]] = fir.convert %[[VAL_233]] : (!fir.ref<!fir.char<1,?>>) -> !fir.ref<i8>
+! CHECK: %[[VAL_238:.*]] = fir.convert %[[VAL_232]] : (!fir.ref<!fir.char<1>>) -> !fir.ref<i8>
! CHECK: %[[VAL_239:.*]] = fir.convert %[[VAL_237]] : (index) -> i64
! CHECK: %[[VAL_240:.*]] = fir.call @_FortranAioInputAscii(%[[VAL_213]], %[[VAL_238]], %[[VAL_239]]) {{.*}}: (!fir.ref<i8>, !fir.ref<i8>, i64) -> i1
! CHECK: %[[VAL_241:.*]] = arith.addi %[[VAL_221]], %[[VAL_210]] overflow<nsw> : index
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Makes sense to me to make the IR cleaner here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM, but maybe it is a broader problem that stack arrays doesn't understand fir.convert
because it could still get confused by non-redundant converts. I'll add that to my to-do list.
Thanks for the follow-up! I did not want to modify just the StackArrays pass. It seems to me that the conversions canonicalization potentially helps other passes, so I decided to go this route instead. |
@vzakhari After this PR is merged, https://github.com/fujitsu/compiler-test-suite/tree/2024-12/Fortran/0393/0393_0171.f90 and https://github.com/fujitsu/compiler-test-suite/blob/2024-12/Fortran/0393/0393_0174.f90 started to fail by segmentation fault on AArch64. I can see the error with
I confirmed the error occurs with this commit (2e637db) and the latest Could you check? If you cannot reproduce the error, please let me know. |
Thanks for reporting, @kawashima-fj! I saw the bug report, but could not reproduce on my side. I will try again today and will let you know if I still cannot. |
I tried differently built compilers today, but could not reproduce it. @kawashima-fj, can you please run it under gdb and let me know what the backtrace is? |
@vzakhari Assembly output by 0393_0171.s.txt I don't know why I run it on Ubuntu 24.04.1 LTS docker container in Rocky Linux 8.6 host on Grace (Neoverse V2). Please let me know you need more information. |
Thank you! I will take a closer look tomorrow. I would appreciate it if you can collect the assembly files with my commit reverted. No problem, if you do not have time for this. |
I'll do it after some meetings are over. You can obtan the result tomorrow in your timezone. |
@vzakhari I'm very sorry. The root cause of the segmentation fault does not seem to be your commit. Reverting your commit (2e637db) from latest I'll take a look and create a new issue if needed. |
Thank you for figuring it out! No worries! |
This patch adds a canonicalization pattern for optimizing redundant
"pointer" fir.converts. Such converts prevent the StackArrays pass
to recognize fir.freemem for the corresponding fir.allocmem, e.g.:
I found this in
tonto
, but the change does not affect performance at all.Anyway, it looks like a reasonable thing to do, and it makes easier
to compare the performance profiles with other compilers'.