Skip to content

[flang] Canonicalize redundant pointer converts. #121864

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 7, 2025

Conversation

vzakhari
Copy link
Contributor

@vzakhari vzakhari commented Jan 7, 2025

This patch adds a canonicalization pattern for optimizing redundant
"pointer" fir.converts. Such converts prevent the StackArrays pass
to recognize fir.freemem for the corresponding fir.allocmem, e.g.:

    %69 = fir.allocmem !fir.array<2xi32>
    %71:2 = hlfir.declare %69(%70) {uniq_name = ".tmp.arrayctor"} :
        (!fir.heap<!fir.array<2xi32>>, !fir.shape<1>) ->
        (!fir.heap<!fir.array<2xi32>>, !fir.heap<!fir.array<2xi32>>)
    %95 = fir.convert %71#1 :
        (!fir.heap<!fir.array<2xi32>>) -> !fir.ref<!fir.array<2xi32>>
    %100 = fir.convert %95 :
        (!fir.ref<!fir.array<2xi32>>) -> !fir.heap<!fir.array<2xi32>>
    fir.freemem %100 : !fir.heap<!fir.array<2xi32>>

I found this in tonto, but the change does not affect performance at all.
Anyway, it looks like a reasonable thing to do, and it makes easier
to compare the performance profiles with other compilers'.

This patch adds a canonicalization pattern for optimizing redundant
"pointer" fir.converts. Such converts prevent the StackArrays pass
to recognize fir.freemem for the corresponding fir.allocmem, e.g.:
```
    %69 = fir.allocmem !fir.array<2xi32>
    %71:2 = hlfir.declare %69(%70) {uniq_name = ".tmp.arrayctor"} :
        (!fir.heap<!fir.array<2xi32>>, !fir.shape<1>) ->
        (!fir.heap<!fir.array<2xi32>>, !fir.heap<!fir.array<2xi32>>)
    %95 = fir.convert %71#1 :
        (!fir.heap<!fir.array<2xi32>>) -> !fir.ref<!fir.array<2xi32>>
    %100 = fir.convert %95 :
        (!fir.ref<!fir.array<2xi32>>) -> !fir.heap<!fir.array<2xi32>>
    fir.freemem %100 : !fir.heap<!fir.array<2xi32>>
```
I found this in `tonto`, but the change does not affect performance at all.
Anyway, it looks like a reasonable thing to do, and it makes easier
to compare the performance profiles with other compilers'.
@llvmbot llvmbot added flang Flang issues not falling into any other category flang:fir-hlfir labels Jan 7, 2025
@llvmbot
Copy link
Member

llvmbot commented Jan 7, 2025

@llvm/pr-subscribers-flang-fir-hlfir

Author: Slava Zakharin (vzakhari)

Changes

This patch adds a canonicalization pattern for optimizing redundant
"pointer" fir.converts. Such converts prevent the StackArrays pass
to recognize fir.freemem for the corresponding fir.allocmem, e.g.:

    %69 = fir.allocmem !fir.array&lt;2xi32&gt;
    %71:2 = hlfir.declare %69(%70) {uniq_name = ".tmp.arrayctor"} :
        (!fir.heap&lt;!fir.array&lt;2xi32&gt;&gt;, !fir.shape&lt;1&gt;) -&gt;
        (!fir.heap&lt;!fir.array&lt;2xi32&gt;&gt;, !fir.heap&lt;!fir.array&lt;2xi32&gt;&gt;)
    %95 = fir.convert %71#<!-- -->1 :
        (!fir.heap&lt;!fir.array&lt;2xi32&gt;&gt;) -&gt; !fir.ref&lt;!fir.array&lt;2xi32&gt;&gt;
    %100 = fir.convert %95 :
        (!fir.ref&lt;!fir.array&lt;2xi32&gt;&gt;) -&gt; !fir.heap&lt;!fir.array&lt;2xi32&gt;&gt;
    fir.freemem %100 : !fir.heap&lt;!fir.array&lt;2xi32&gt;&gt;

I found this in tonto, but the change does not affect performance at all.
Anyway, it looks like a reasonable thing to do, and it makes easier
to compare the performance profiles with other compilers'.


Full diff: https://github.com/llvm/llvm-project/pull/121864.diff

5 Files Affected:

  • (modified) flang/include/flang/Optimizer/Dialect/CanonicalizationPatterns.td (+17)
  • (modified) flang/lib/Optimizer/Dialect/FIROps.cpp (+2-1)
  • (modified) flang/test/Fir/convert-fold.fir (+9)
  • (modified) flang/test/Lower/array-substring.f90 (+1-2)
  • (modified) flang/test/Lower/vector-subscript-io.f90 (+1-2)
diff --git a/flang/include/flang/Optimizer/Dialect/CanonicalizationPatterns.td b/flang/include/flang/Optimizer/Dialect/CanonicalizationPatterns.td
index 1dbde5c1c73024..2414de496d45bc 100644
--- a/flang/include/flang/Optimizer/Dialect/CanonicalizationPatterns.td
+++ b/flang/include/flang/Optimizer/Dialect/CanonicalizationPatterns.td
@@ -57,6 +57,9 @@ def StrictSmallerWidthPred : Constraint<CPred<
                        "$0.getType().getIntOrFloatBitWidth() < "
                        "$1.getType().getIntOrFloatBitWidth()">>;
 
+def PointerCompatiblePred
+    : Constraint<CPred<"fir::ConvertOp::isPointerCompatible($0.getType())">>;
+
 // floats or ints that undergo successive extensions or successive truncations.
 def ConvertConvertOptPattern
     : Pat<(fir_ConvertOp:$res (fir_ConvertOp:$irm $arg)),
@@ -112,4 +115,18 @@ def ForwardConstantConvertPattern
           (createConstantOp $res, $attr),
           [(IndexTypePred $res), (IntegerTypePred $cnt)]>;
 
+// Optimize redundant pointer conversions, e.g.:
+// %1 = fir.convert %0 :
+//     (!fir.heap<!fir.array<2xf32>>) -> !fir.ref<!fir.array<2xf32>>
+// %2 = fir.convert %1 :
+//     (!fir.ref<!fir.array<2xf32>>) -> !fir.heap<!fir.array<2xf32>>
+// Will be optimized into:
+// %2 = fir.convert %0 :
+//     (!fir.heap<!fir.array<2xf32>>) -> !fir.heap<!fir.array<2xf32>>
+// which is redundant due to RedundantConvertOptPattern.
+def ChainedPointerConvertsPattern
+    : Pat<(fir_ConvertOp:$res(fir_ConvertOp:$irm $arg)), (fir_ConvertOp $arg),
+          [(PointerCompatiblePred $arg), (PointerCompatiblePred $irm),
+           (PointerCompatiblePred $res)]>;
+
 #endif // FORTRAN_FIR_REWRITE_PATTERNS
diff --git a/flang/lib/Optimizer/Dialect/FIROps.cpp b/flang/lib/Optimizer/Dialect/FIROps.cpp
index cdcf9bda49a627..fa83aa380e489c 100644
--- a/flang/lib/Optimizer/Dialect/FIROps.cpp
+++ b/flang/lib/Optimizer/Dialect/FIROps.cpp
@@ -1313,7 +1313,8 @@ void fir::ConvertOp::getCanonicalizationPatterns(
   results.insert<ConvertConvertOptPattern, ConvertAscendingIndexOptPattern,
                  ConvertDescendingIndexOptPattern, RedundantConvertOptPattern,
                  CombineConvertOptPattern, CombineConvertTruncOptPattern,
-                 ForwardConstantConvertPattern>(context);
+                 ForwardConstantConvertPattern, ChainedPointerConvertsPattern>(
+      context);
 }
 
 mlir::OpFoldResult fir::ConvertOp::fold(FoldAdaptor adaptor) {
diff --git a/flang/test/Fir/convert-fold.fir b/flang/test/Fir/convert-fold.fir
index ebb6c8db7c891c..fb30e634ba5e6d 100644
--- a/flang/test/Fir/convert-fold.fir
+++ b/flang/test/Fir/convert-fold.fir
@@ -35,3 +35,12 @@ func.func @ctest() -> index {
   // CHECK-NEXT: return %{{.*}} : index
   return %2 : index
 }
+
+// CHECK-LABEL:   func.func @ptrtest(
+// CHECK-SAME:                       %[[VAL_0:.*]]: !fir.heap<!fir.array<2xf32>>) -> !fir.heap<!fir.array<2xf32>> {
+func.func @ptrtest(%0 : !fir.heap<!fir.array<2xf32>>) -> !fir.heap<!fir.array<2xf32>> {
+  %1 = fir.convert %0 : (!fir.heap<!fir.array<2xf32>>) -> !fir.ref<!fir.array<2xf32>>
+  %2 = fir.convert %1 : (!fir.ref<!fir.array<2xf32>>) -> !fir.heap<!fir.array<2xf32>>
+// CHECK:           return %[[VAL_0]] : !fir.heap<!fir.array<2xf32>>
+  return %2 : !fir.heap<!fir.array<2xf32>>
+}
diff --git a/flang/test/Lower/array-substring.f90 b/flang/test/Lower/array-substring.f90
index 02101039120e9f..7544fbb989627b 100644
--- a/flang/test/Lower/array-substring.f90
+++ b/flang/test/Lower/array-substring.f90
@@ -24,9 +24,8 @@
 ! CHECK:         %[[VAL_16:.*]] = fir.array_coor %[[VAL_7]](%[[VAL_9]]) {{\[}}%[[VAL_10]]] %[[VAL_15]] : (!fir.ref<!fir.array<1x!fir.char<1,12>>>, !fir.shape<1>, !fir.slice<1>, index) -> !fir.ref<!fir.char<1,12>>
 ! CHECK:         %[[VAL_17:.*]] = fir.convert %[[VAL_16]] : (!fir.ref<!fir.char<1,12>>) -> !fir.ref<!fir.array<12x!fir.char<1>>>
 ! CHECK:         %[[VAL_18:.*]] = fir.coordinate_of %[[VAL_17]], %[[VAL_2]] : (!fir.ref<!fir.array<12x!fir.char<1>>>, index) -> !fir.ref<!fir.char<1>>
-! CHECK:         %[[VAL_19:.*]] = fir.convert %[[VAL_18]] : (!fir.ref<!fir.char<1>>) -> !fir.ref<!fir.char<1,?>>
 ! CHECK:         %[[VAL_20:.*]] = fir.array_coor %[[VAL_11]](%[[VAL_9]]) %[[VAL_15]] : (!fir.ref<!fir.array<1x!fir.char<1,8>>>, !fir.shape<1>, index) -> !fir.ref<!fir.char<1,8>>
-! CHECK:         %[[VAL_21:.*]] = fir.convert %[[VAL_19]] : (!fir.ref<!fir.char<1,?>>) -> !fir.ref<i8>
+! CHECK:         %[[VAL_21:.*]] = fir.convert %[[VAL_18]] : (!fir.ref<!fir.char<1>>) -> !fir.ref<i8>
 ! CHECK:         %[[VAL_22:.*]] = fir.convert %[[VAL_20]] : (!fir.ref<!fir.char<1,8>>) -> !fir.ref<i8>
 ! CHECK:         %[[VAL_23:.*]] = fir.convert %[[VAL_4]] : (index) -> i64
 ! CHECK:         %[[VAL_24:.*]] = fir.call @_FortranACharacterCompareScalar1(%[[VAL_21]], %[[VAL_22]], %[[VAL_23]], %[[VAL_23]]) {{.*}}: (!fir.ref<i8>, !fir.ref<i8>, i64, i64) -> i32
diff --git a/flang/test/Lower/vector-subscript-io.f90 b/flang/test/Lower/vector-subscript-io.f90
index 372130fd099074..9a041af16c88cc 100644
--- a/flang/test/Lower/vector-subscript-io.f90
+++ b/flang/test/Lower/vector-subscript-io.f90
@@ -325,12 +325,11 @@ subroutine substring(x, y, i, j)
 ! CHECK:   %[[VAL_230:.*]] = arith.subi %[[VAL_216]], %[[VAL_210]] : index
 ! CHECK:   %[[VAL_231:.*]] = fir.convert %[[VAL_228]] : (!fir.ref<!fir.char<1,?>>) -> !fir.ref<!fir.array<?x!fir.char<1>>>
 ! CHECK:   %[[VAL_232:.*]] = fir.coordinate_of %[[VAL_231]], %[[VAL_230]] : (!fir.ref<!fir.array<?x!fir.char<1>>>, index) -> !fir.ref<!fir.char<1>>
-! CHECK:   %[[VAL_233:.*]] = fir.convert %[[VAL_232]] : (!fir.ref<!fir.char<1>>) -> !fir.ref<!fir.char<1,?>>
 ! CHECK:   %[[VAL_234:.*]] = arith.subi %[[VAL_219]], %[[VAL_216]] : index
 ! CHECK:   %[[VAL_235:.*]] = arith.addi %[[VAL_234]], %[[VAL_210]] : index
 ! CHECK:   %[[VAL_236:.*]] = arith.cmpi slt, %[[VAL_235]], %[[VAL_209]] : index
 ! CHECK:   %[[VAL_237:.*]] = arith.select %[[VAL_236]], %[[VAL_209]], %[[VAL_235]] : index
-! CHECK:   %[[VAL_238:.*]] = fir.convert %[[VAL_233]] : (!fir.ref<!fir.char<1,?>>) -> !fir.ref<i8>
+! CHECK:   %[[VAL_238:.*]] = fir.convert %[[VAL_232]] : (!fir.ref<!fir.char<1>>) -> !fir.ref<i8>
 ! CHECK:   %[[VAL_239:.*]] = fir.convert %[[VAL_237]] : (index) -> i64
 ! CHECK:   %[[VAL_240:.*]] = fir.call @_FortranAioInputAscii(%[[VAL_213]], %[[VAL_238]], %[[VAL_239]]) {{.*}}: (!fir.ref<i8>, !fir.ref<i8>, i64) -> i1
 ! CHECK:   %[[VAL_241:.*]] = arith.addi %[[VAL_221]], %[[VAL_210]] overflow<nsw> : index

Copy link
Contributor

@jeanPerier jeanPerier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Makes sense to me to make the IR cleaner here.

Copy link
Contributor

@tblah tblah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM, but maybe it is a broader problem that stack arrays doesn't understand fir.convert because it could still get confused by non-redundant converts. I'll add that to my to-do list.

@vzakhari
Copy link
Contributor Author

vzakhari commented Jan 7, 2025

This LGTM, but maybe it is a broader problem that stack arrays doesn't understand fir.convert because it could still get confused by non-redundant converts. I'll add that to my to-do list.

Thanks for the follow-up! I did not want to modify just the StackArrays pass. It seems to me that the conversions canonicalization potentially helps other passes, so I decided to go this route instead.

@vzakhari vzakhari merged commit 2e637db into llvm:main Jan 7, 2025
11 checks passed
@kawashima-fj
Copy link
Member

@vzakhari After this PR is merged, https://github.com/fujitsu/compiler-test-suite/tree/2024-12/Fortran/0393/0393_0171.f90 and https://github.com/fujitsu/compiler-test-suite/blob/2024-12/Fortran/0393/0393_0174.f90 started to fail by segmentation fault on AArch64.

I can see the error with -O2.

$ flang -O2 Fortran/0393/0393_0171.f90
$ ./a.out
Segmentation fault (core dumped)

I confirmed the error occurs with this commit (2e637db) and the latest main branch (f30ff0b) but doesn't occur with the previous commit (51c9c82).

Could you check? If you cannot reproduce the error, please let me know.

@vzakhari
Copy link
Contributor Author

Thanks for reporting, @kawashima-fj! I saw the bug report, but could not reproduce on my side. I will try again today and will let you know if I still cannot.

@vzakhari
Copy link
Contributor Author

vzakhari commented Jan 17, 2025

I tried differently built compilers today, but could not reproduce it. @kawashima-fj, can you please run it under gdb and let me know what the backtrace is?

@kawashima-fj
Copy link
Member

@vzakhari Assembly output by flang -S -g -O2 (Flang/LLVM: commit f30ff0b) and GDB outputs are attached.

0393_0171.s.txt
0393_0174.s.txt
0393_0171.gdb.txt
0393_0174.gdb.txt

I don't know why 0393_0174.gdb.txt cannot get a program counter.

I run it on Ubuntu 24.04.1 LTS docker container in Rocky Linux 8.6 host on Grace (Neoverse V2).

Please let me know you need more information.

@vzakhari
Copy link
Contributor Author

Thank you! I will take a closer look tomorrow. I would appreciate it if you can collect the assembly files with my commit reverted. No problem, if you do not have time for this.

@kawashima-fj
Copy link
Member

I'll do it after some meetings are over. You can obtan the result tomorrow in your timezone.

@kawashima-fj
Copy link
Member

@vzakhari I'm very sorry. The root cause of the segmentation fault does not seem to be your commit. Reverting your commit (2e637db) from latest main branch (f30ff0b) results in the same assembly file and the same segmentation fault. Your commit 2e637db seems to have triggered an existing bug at that time, and the latest main branch can generate the segmentation fault without your change.

I'll take a look and create a new issue if needed.

@vzakhari
Copy link
Contributor Author

Thank you for figuring it out! No worries!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flang:fir-hlfir flang Flang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants