[AArch64][GISel] Always fold G_SHL into addressing mode where possible, unless the subtarget has addr-lsl-slow-14 #96603

Him188 · 2024-06-25T08:10:36Z

Before this patch, we fold G_SHL into addressing mode lsl only when there is exactly one usage, or all the usages are memory ops, or we are optimizing for size. However, lsl is free on all aarch64 targets except those with FeatureAddrLSLSlow14.

This patch uses this fact and always folds G_SHL into lsl for memory ops, with exceptions for FeatureAddrLSLSlow14.

This patch also fixes GISel 15% regression in TSVC kernel s482, and brings regression in s291 from 20% to 10%.

llvmbot · 2024-06-25T08:11:12Z

@llvm/pr-subscribers-llvm-globalisel

Author: Him188 (Him188)

Changes

This patch fixes GISel 15% regression in TSVC kernel s482. It also brings regression in s291 from 20% to 10%.

Full diff: https://github.com/llvm/llvm-project/pull/96603.diff

2 Files Affected:

(modified) llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp (+15-2)
(added) llvm/test/CodeGen/AArch64/GlobalISel/addressing-modes-multiple.mir (+29)

diff --git a/llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp b/llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp
index 0357a7206c478..efbef473a9e5a 100644
--- a/llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp
+++ b/llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp
@@ -6717,6 +6717,20 @@ AArch64InstructionSelector::selectNegArithImmed(MachineOperand &Root) const {
   return select12BitValueWithLeftShift(Immed);
 }
 
+/// Returns true if the def of MI is only used by memory operations.
+/// If the def is G_SHL, we also check indirect usages through G_PTR_ADD.
+static bool onlyUsedInMemoryOps(MachineInstr &MI, const MachineRegisterInfo &MRI) {
+  const Register DefReg = MI.getOperand(0).getReg();
+  return all_of(MRI.use_nodbg_instructions(DefReg),
+                [&](MachineInstr &Use) {
+                  if (MI.getOpcode() == AArch64::G_SHL &&
+                    Use.getOpcode() == AArch64::G_PTR_ADD &&
+                    onlyUsedInMemoryOps(Use, MRI))
+                    return true;
+                  return Use.mayLoadOrStore();
+                });
+}
+
 /// Return true if it is worth folding MI into an extended register. That is,
 /// if it's safe to pull it into the addressing mode of a load or store as a
 /// shift.
@@ -6734,8 +6748,7 @@ bool AArch64InstructionSelector::isWorthFoldingIntoExtendedReg(
   // We have a fastpath, so folding a shift in and potentially computing it
   // many times may be beneficial. Check if this is only used in memory ops.
   // If it is, then we should fold.
-  return all_of(MRI.use_nodbg_instructions(DefReg),
-                [](MachineInstr &Use) { return Use.mayLoadOrStore(); });
+  return onlyUsedInMemoryOps(MI, MRI);
 }
 
 static bool isSignExtendShiftType(AArch64_AM::ShiftExtendType Type) {
diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/addressing-modes-multiple.mir b/llvm/test/CodeGen/AArch64/GlobalISel/addressing-modes-multiple.mir
new file mode 100644
index 0000000000000..cfcdcf84be9dd
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/addressing-modes-multiple.mir
@@ -0,0 +1,29 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
+# RUN: llc -mtriple=aarch64-linux-gnu -O3 -run-pass=instruction-select -verify-machineinstrs %s -global-isel-abort=1 -o - | FileCheck %s
+
+---
+name:            shl_multiple_ptr_add
+alignment:       4
+legalized:       true
+regBankSelected: true
+tracksRegLiveness: true
+machineFunctionInfo: {}
+body:             |
+  bb.0:
+    liveins: $x0, $x1, $x2
+    ; CHECK-LABEL: name: shl_ptr_add
+    ; CHECK: liveins: $x0, $x1, $x2
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: [[COPY:%[0-9]+]]:gpr64 = COPY $x0
+    ; CHECK-NEXT: [[COPY1:%[0-9]+]]:gpr64sp = COPY $x1
+    ; CHECK-NEXT: [[LDRXroX:%[0-9]+]]:gpr64 = LDRXroX [[COPY1]], [[COPY]], 0, 1 :: (load (s64))
+    ; CHECK-NEXT: STRXroX [[LDRXroX]], [[COPY1]], [[COPY]], 0, 1 :: (store (s64))
+    %0:gpr(s64) = COPY $x0
+    %1:gpr(s64) = G_CONSTANT i64 3
+    %2:gpr(s64) = G_SHL %0, %1(s64) ; %2 used by multiple G_PTR_ADD
+    %3:gpr(p0) = COPY $x1
+    %4:gpr(p0) = G_PTR_ADD %3, %2
+    %5:gpr(s64) = G_LOAD %4(p0) :: (load (s64))
+    %ptr:gpr(p0) = G_PTR_ADD %3, %2
+    G_STORE %5, %ptr :: (store (s64))
+...

llvmbot · 2024-06-25T08:11:13Z

@llvm/pr-subscribers-backend-aarch64

Author: Him188 (Him188)

Changes

This patch fixes GISel 15% regression in TSVC kernel s482. It also brings regression in s291 from 20% to 10%.

Full diff: https://github.com/llvm/llvm-project/pull/96603.diff

2 Files Affected:

(modified) llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp (+15-2)
(added) llvm/test/CodeGen/AArch64/GlobalISel/addressing-modes-multiple.mir (+29)

diff --git a/llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp b/llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp
index 0357a7206c478..efbef473a9e5a 100644
--- a/llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp
+++ b/llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp
@@ -6717,6 +6717,20 @@ AArch64InstructionSelector::selectNegArithImmed(MachineOperand &Root) const {
   return select12BitValueWithLeftShift(Immed);
 }
 
+/// Returns true if the def of MI is only used by memory operations.
+/// If the def is G_SHL, we also check indirect usages through G_PTR_ADD.
+static bool onlyUsedInMemoryOps(MachineInstr &MI, const MachineRegisterInfo &MRI) {
+  const Register DefReg = MI.getOperand(0).getReg();
+  return all_of(MRI.use_nodbg_instructions(DefReg),
+                [&](MachineInstr &Use) {
+                  if (MI.getOpcode() == AArch64::G_SHL &&
+                    Use.getOpcode() == AArch64::G_PTR_ADD &&
+                    onlyUsedInMemoryOps(Use, MRI))
+                    return true;
+                  return Use.mayLoadOrStore();
+                });
+}
+
 /// Return true if it is worth folding MI into an extended register. That is,
 /// if it's safe to pull it into the addressing mode of a load or store as a
 /// shift.
@@ -6734,8 +6748,7 @@ bool AArch64InstructionSelector::isWorthFoldingIntoExtendedReg(
   // We have a fastpath, so folding a shift in and potentially computing it
   // many times may be beneficial. Check if this is only used in memory ops.
   // If it is, then we should fold.
-  return all_of(MRI.use_nodbg_instructions(DefReg),
-                [](MachineInstr &Use) { return Use.mayLoadOrStore(); });
+  return onlyUsedInMemoryOps(MI, MRI);
 }
 
 static bool isSignExtendShiftType(AArch64_AM::ShiftExtendType Type) {
diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/addressing-modes-multiple.mir b/llvm/test/CodeGen/AArch64/GlobalISel/addressing-modes-multiple.mir
new file mode 100644
index 0000000000000..cfcdcf84be9dd
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/addressing-modes-multiple.mir
@@ -0,0 +1,29 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
+# RUN: llc -mtriple=aarch64-linux-gnu -O3 -run-pass=instruction-select -verify-machineinstrs %s -global-isel-abort=1 -o - | FileCheck %s
+
+---
+name:            shl_multiple_ptr_add
+alignment:       4
+legalized:       true
+regBankSelected: true
+tracksRegLiveness: true
+machineFunctionInfo: {}
+body:             |
+  bb.0:
+    liveins: $x0, $x1, $x2
+    ; CHECK-LABEL: name: shl_ptr_add
+    ; CHECK: liveins: $x0, $x1, $x2
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: [[COPY:%[0-9]+]]:gpr64 = COPY $x0
+    ; CHECK-NEXT: [[COPY1:%[0-9]+]]:gpr64sp = COPY $x1
+    ; CHECK-NEXT: [[LDRXroX:%[0-9]+]]:gpr64 = LDRXroX [[COPY1]], [[COPY]], 0, 1 :: (load (s64))
+    ; CHECK-NEXT: STRXroX [[LDRXroX]], [[COPY1]], [[COPY]], 0, 1 :: (store (s64))
+    %0:gpr(s64) = COPY $x0
+    %1:gpr(s64) = G_CONSTANT i64 3
+    %2:gpr(s64) = G_SHL %0, %1(s64) ; %2 used by multiple G_PTR_ADD
+    %3:gpr(p0) = COPY $x1
+    %4:gpr(p0) = G_PTR_ADD %3, %2
+    %5:gpr(s64) = G_LOAD %4(p0) :: (load (s64))
+    %ptr:gpr(p0) = G_PTR_ADD %3, %2
+    G_STORE %5, %ptr :: (store (s64))
+...

davemgreen · 2024-06-25T08:22:01Z

See #86894 for how this should work. Some addressing modes are effectively free, so I don't believe it should matter of there are other uses. There are some target features that control what is considered free (as-in FIXME: Consider checking HasAddrLSLSlow14 and HasALULSLFast as appropriate.)

Him188 · 2024-06-27T11:54:17Z

@davemgreen Does the updated solution solves your concerns? I will polish it if we are on the right way.

davemgreen · 2024-06-27T14:49:28Z

I will have to check the details, but yeah I think this looks decent. It would be good to have a .ll test file that tests various cases with GISel and SDAG run lines to check that they are folding at the same times.

arsenm · 2024-06-27T16:29:12Z

llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp

+      if (const auto Worth = isWorthFoldingIntoAddrMode(*OffsetInst, MRI);
+          Worth.has_value())


The .has_value check is redundant

There are more oddities, but let's first wait for convergence on the solution.

Him188 · 2024-06-28T10:29:47Z

@davemgreen I have included one .ll test, part of which is copied from SDAG aarch64-fold-lslfast.ll. Also added MIR tests for s128 and stores.

tschuett · 2024-06-28T16:53:40Z

llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp

+/// beneficial or not.
+///
+/// Returns:
+/// - true if fodling MI would be beneficial.


tschuett · 2024-06-28T16:55:01Z

llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp

+  if (IsAddrOperand) {
+    // If we are already sure that folding MI is good or bad, return the result.
+    if (const auto Worth = isWorthFoldingIntoAddrMode(MI, MRI))
+      return Worth.value();


optionals are odd, see https://en.cppreference.com/w/cpp/utility/optional.

return *Worth;

I think optional is fine here as we need to represent a state where we are not sure if folding is benefitial.

tschuett · 2024-06-28T16:55:19Z

llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp

+      // If we are also sure about whether folding is beneficial or not,
+      // return the result.
+      if (const auto Worth = isWorthFoldingIntoAddrMode(*OffsetInst, MRI))
+        return Worth.value();


return *Worth;

aemerson · 2024-06-28T17:02:57Z

You can pre-commit the tests (don't have to have a PR) so that this change shows the code differences more clearly.

Him188 · 2024-07-01T12:47:33Z

You can pre-commit the tests

Nice suggestion. I have prepared one test commit before the code commit. You can review the second commit (code commit) c6ec76e to see what's changed. @aemerson

davemgreen · 2024-07-08T07:17:35Z

llvm/test/CodeGen/AArch64/GlobalISel/select-fold-lslfast.ll

@@ -12,17 +12,16 @@ declare void @foo()
 define i16 @halfword(ptr %ctx, i32 %xor72) nounwind {


This test can be moved up a folder, if it's not just gisel. I would add the SDAG lines first too.

Can you remove the calls? It should still be possible to have different uses if it stores a different value.

This was actually copied from (part of) AArch64/aarch64-fold-lslfast.ll. Let's add GISEL RUN lines to the original file and delete the new file.

The calls are originally there so I would prefer keeping them. Instead I added a new case multi_use_half_word to show multiple uses with stores.

The original case with calls is renamed from multi_use_non_memory to multi_use_non_memory_call.

davemgreen

Thanks. I think this looks OK to me.

davemgreen · 2024-07-11T10:50:24Z

llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp

+std::optional<bool> AArch64InstructionSelector::isWorthFoldingIntoAddrMode(
+    MachineInstr &MI, const MachineRegisterInfo &MRI) const {
+  if (MI.getOpcode() == AArch64::G_SHL) {
+    // Address operands with shifts are free, except for running on sub targets


"sub targets" -> subtargets

davemgreen · 2024-07-11T10:51:36Z

llvm/test/CodeGen/AArch64/aarch64-fold-lslfast.ll

@@ -1,43 +1,74 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
-; RUN: llc < %s -mtriple=aarch64-linux-gnu -mattr=+addr-lsl-slow-14 | FileCheck %s --check-prefixes=CHECK,CHECK0
-; RUN: llc < %s -mtriple=aarch64-linux-gnu | FileCheck %s --check-prefixes=CHECK,CHECK3
+; RUN: llc < %s -mtriple=aarch64-linux-gnu -mattr=+addr-lsl-slow-14 -global-isel=1 -global-isel-abort=1 | FileCheck %s --check-prefixes=CHECK,CHECK0,CHECK0-GISEL


Can you put the SDAG tests first, for consistency with the other tests we have.

davemgreen

Thanks. LGTM

aemerson

Would be good to understand why we still have differences to SDAG in the tests at some point.

…e, unless the subtarget has addr-lsl-slow-14

Him188 · 2024-07-17T10:55:36Z

Thanks. NFC test cases merged separately. Will merge this after successful CI.

…e, unless the subtarget has addr-lsl-slow-14 (#96603) Summary: Before this patch, we fold G_SHL into addressing mode lsl only when there is exactly one usage, or all the usages are memory ops, or we are optimizing for size. However, lsl is free on all aarch64 targets except those with FeatureAddrLSLSlow14. This patch uses this fact and always folds G_SHL into lsl for memory ops, with exceptions for FeatureAddrLSLSlow14. This patch also fixes GISel 15% regression in TSVC kernel s482, and brings regression in s291 from 20% to 10%. Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: https://phabricator.intern.facebook.com/D60250816

Him188 requested review from aemerson, arsenm, madhur13490, tschuett and davemgreen June 25, 2024 08:10

llvmbot added backend:AArch64 llvm:globalisel labels Jun 25, 2024

arsenm reviewed Jun 27, 2024

View reviewed changes

Him188 changed the title ~~[AArch64][GISel] Fold G_SHL used by multiple G_PTR_ADD into load/store addressing mode~~ [AArch64][GISel] Always fold G_SHL into addressing mode where possible, unless the subtarget has addr-lsl-slow-14 Jun 28, 2024

tschuett reviewed Jun 28, 2024

View reviewed changes

Him188 force-pushed the tguan/gisel-fold-shl branch from 1cea7af to c6ec76e Compare July 1, 2024 12:43

davemgreen reviewed Jul 8, 2024

View reviewed changes

Him188 force-pushed the tguan/gisel-fold-shl branch from c6ec76e to 93edeea Compare July 11, 2024 10:31

Him188 requested a review from davemgreen July 11, 2024 10:33

davemgreen reviewed Jul 11, 2024

View reviewed changes

Him188 force-pushed the tguan/gisel-fold-shl branch from 93edeea to 655cb6a Compare July 16, 2024 10:46

davemgreen approved these changes Jul 16, 2024

View reviewed changes

aemerson approved these changes Jul 16, 2024

View reviewed changes

[AArch64][GISel] Always fold G_SHL into addressing mode where possibl…

ee8a1af

…e, unless the subtarget has addr-lsl-slow-14

Him188 force-pushed the tguan/gisel-fold-shl branch from 655cb6a to ee8a1af Compare July 17, 2024 10:54

Him188 merged commit e4a2d74 into llvm:main Jul 18, 2024
7 checks passed

Him188 deleted the tguan/gisel-fold-shl branch July 18, 2024 09:22

		if (const auto Worth = isWorthFoldingIntoAddrMode(*OffsetInst, MRI);
		Worth.has_value())

		@@ -12,17 +12,16 @@ declare void @foo()
		define i16 @halfword(ptr %ctx, i32 %xor72) nounwind {

[AArch64][GISel] Always fold G_SHL into addressing mode where possible, unless the subtarget has addr-lsl-slow-14 #96603

[AArch64][GISel] Always fold G_SHL into addressing mode where possible, unless the subtarget has addr-lsl-slow-14 #96603

Uh oh!

Conversation

Him188 commented Jun 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Jun 25, 2024

Uh oh!

llvmbot commented Jun 25, 2024

Uh oh!

davemgreen commented Jun 25, 2024

Uh oh!

Him188 commented Jun 27, 2024

Uh oh!

davemgreen commented Jun 27, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Him188 commented Jun 28, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aemerson commented Jun 28, 2024

Uh oh!

Him188 commented Jul 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Him188 Jul 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Him188 Jul 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

davemgreen left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

davemgreen left a comment

Choose a reason for hiding this comment

Uh oh!

aemerson left a comment

Choose a reason for hiding this comment

Uh oh!

Him188 commented Jul 17, 2024

Uh oh!

Uh oh!

Uh oh!

Him188 commented Jun 25, 2024 •

edited

Loading

Him188 commented Jul 1, 2024 •

edited

Loading

Him188 Jul 11, 2024 •

edited

Loading

Him188 Jul 11, 2024 •

edited

Loading