Skip to content

[AArch64] Only clear kill flags if necessary when merging str #69680

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Nov 3, 2023

Conversation

nocchijiang
Copy link
Contributor

Previously the kill flags of the source register were unconditionally cleared when a str pair was merged, which results in suboptimal register allocation and inhibits some renaming opportunities which may allow further merging str.

Previously the kill flags of the source register were unconditionally
cleared when a str pair was merged, which results in suboptimal register
allocation and inhibits some renaming opportunities which may allow
further merging str.
@llvmbot
Copy link
Member

llvmbot commented Oct 20, 2023

@llvm/pr-subscribers-backend-aarch64

Author: Zhaoxuan Jiang (nocchijiang)

Changes

Previously the kill flags of the source register were unconditionally cleared when a str pair was merged, which results in suboptimal register allocation and inhibits some renaming opportunities which may allow further merging str.


Full diff: https://github.com/llvm/llvm-project/pull/69680.diff

7 Files Affected:

  • (modified) llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp (+11-2)
  • (modified) llvm/test/CodeGen/AArch64/irg-nomem.mir (+1-1)
  • (modified) llvm/test/CodeGen/AArch64/ldrpre-ldr-merge.mir (+1-1)
  • (modified) llvm/test/CodeGen/AArch64/ldst-opt-aa.mir (+1-1)
  • (modified) llvm/test/CodeGen/AArch64/stp-opt-with-renaming-debug.mir (+1-1)
  • (modified) llvm/test/CodeGen/AArch64/sve-fixed-length-fp128.ll (+2-2)
  • (modified) llvm/test/tools/UpdateTestChecks/update_llc_test_checks/Inputs/aarch64_generated_funcs.ll.generated.expected (+2-2)
diff --git a/llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp b/llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp
index c93fd02a821dcf9..0dc787c524daa96 100644
--- a/llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp
+++ b/llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp
@@ -997,8 +997,17 @@ AArch64LoadStoreOpt::mergePairedInsns(MachineBasicBlock::iterator I,
       //   STRWui %w0, ...
       //   USE %w1
       //   STRWui kill %w1  ; need to clear kill flag when moving STRWui upwards
-      RegOp0.setIsKill(false);
-      RegOp1.setIsKill(false);
+      for (auto It = std::next(I);
+           It != Paired && (RegOp0.isKill() || RegOp1.isKill()); ++It) {
+        auto ClearKill = [](MachineInstr &MI, MachineOperand &MOP,
+                            const TargetRegisterInfo *TRI) {
+          Register Reg = MOP.getReg();
+          if (MI.readsRegister(Reg, TRI) || MI.modifiesRegister(Reg, TRI))
+            MOP.setIsKill(false);
+        };
+        ClearKill(*It, RegOp0, TRI);
+        ClearKill(*It, RegOp1, TRI);
+      }
     } else {
       // Clear kill flags of the first stores register. Example:
       //   STRWui %w1, ...
diff --git a/llvm/test/CodeGen/AArch64/irg-nomem.mir b/llvm/test/CodeGen/AArch64/irg-nomem.mir
index d428f16011a700e..bc247b0dbf9e307 100644
--- a/llvm/test/CodeGen/AArch64/irg-nomem.mir
+++ b/llvm/test/CodeGen/AArch64/irg-nomem.mir
@@ -62,7 +62,7 @@ body:             |
     $w9 = MOVZWi 1, 0, implicit-def $x9
 
     ; Check that stores are merged across IRG.
-    ; CHECK: STPXi renamable $x9, renamable $x9, renamable $x0, 0
+    ; CHECK: STPXi renamable $x9, killed renamable $x9, renamable $x0, 0
 
     STRXui renamable $x9, renamable $x0, 0 :: (store (s64) into %ir.x)
     dead renamable $x10 = IRG renamable $x8, $xzr
diff --git a/llvm/test/CodeGen/AArch64/ldrpre-ldr-merge.mir b/llvm/test/CodeGen/AArch64/ldrpre-ldr-merge.mir
index 647130ba7d908fc..b76570da781f5a6 100644
--- a/llvm/test/CodeGen/AArch64/ldrpre-ldr-merge.mir
+++ b/llvm/test/CodeGen/AArch64/ldrpre-ldr-merge.mir
@@ -17,7 +17,7 @@ body:             |
     ; CHECK: liveins: $w0, $w1, $x1
     ; CHECK-NEXT: {{  $}}
     ; CHECK-NEXT: early-clobber $x1, renamable $w0, renamable $w2 = LDPWpre renamable $x1, 5 :: (load (s32))
-    ; CHECK-NEXT: STPWi renamable $w0, renamable $w2, renamable $x1, 0 :: (store (s32))
+    ; CHECK-NEXT: STPWi renamable $w0, killed renamable $w2, renamable $x1, 0 :: (store (s32))
     ; CHECK-NEXT: RET undef $lr
     early-clobber renamable $x1, renamable $w0 = LDRWpre killed renamable $x1, 20 :: (load (s32))
     renamable $w2 = LDRWui renamable $x1, 1 :: (load (s32))
diff --git a/llvm/test/CodeGen/AArch64/ldst-opt-aa.mir b/llvm/test/CodeGen/AArch64/ldst-opt-aa.mir
index 9f2680aca970e9d..16369af9ccc6dd3 100644
--- a/llvm/test/CodeGen/AArch64/ldst-opt-aa.mir
+++ b/llvm/test/CodeGen/AArch64/ldst-opt-aa.mir
@@ -15,7 +15,7 @@
 ---
 # CHECK-LABEL: name: ldr_str_aa
 # CHECK: $w8, $w9 = LDPWi $x1, 0
-# CHECK: STPWi $w8, $w9, $x0, 0
+# CHECK: STPWi killed $w8, killed $w9, $x0, 0
 name:            ldr_str_aa
 tracksRegLiveness: true
 body:             |
diff --git a/llvm/test/CodeGen/AArch64/stp-opt-with-renaming-debug.mir b/llvm/test/CodeGen/AArch64/stp-opt-with-renaming-debug.mir
index 2c24383e6baa349..41d84a0d1cb079c 100644
--- a/llvm/test/CodeGen/AArch64/stp-opt-with-renaming-debug.mir
+++ b/llvm/test/CodeGen/AArch64/stp-opt-with-renaming-debug.mir
@@ -61,7 +61,7 @@ body:             |
 # CHECK-NEXT:  DBG_VALUE $x0, $noreg,
 # CHECK-NEXT:  STRXui killed renamable $x8, renamable $x19, 2 :: (store (s64))
 # CHECK-NEXT:  $x8 = ADDXrs renamable $x0, killed renamable $x20, 0
-# CHECK-NEXT:  STPXi $xzr, renamable $x8, renamable $x19, 0 :: (store (s64))
+# CHECK-NEXT:  STPXi $xzr, killed renamable $x8, renamable $x19, 0 :: (store (s64))
 # CHECK-NEXT:  RET undef $lr, implicit $x0
 name:            test_dbg_value2
 alignment:       4
diff --git a/llvm/test/CodeGen/AArch64/sve-fixed-length-fp128.ll b/llvm/test/CodeGen/AArch64/sve-fixed-length-fp128.ll
index 7ac1b58c626d191..da7e772461e28bc 100644
--- a/llvm/test/CodeGen/AArch64/sve-fixed-length-fp128.ll
+++ b/llvm/test/CodeGen/AArch64/sve-fixed-length-fp128.ll
@@ -38,10 +38,10 @@ define void @fcvt_v4f64_v4f128(ptr %a, ptr %b) vscale_range(2,0) #0 {
 ; CHECK-NEXT:    fmov d0, d1
 ; CHECK-NEXT:    bl __extenddftf2
 ; CHECK-NEXT:    ldr q1, [sp] // 16-byte Folded Reload
-; CHECK-NEXT:    ldr q2, [sp, #16] // 16-byte Folded Reload
 ; CHECK-NEXT:    stp q1, q0, [x19]
+; CHECK-NEXT:    ldr q1, [sp, #16] // 16-byte Folded Reload
 ; CHECK-NEXT:    ldr q0, [sp, #32] // 16-byte Folded Reload
-; CHECK-NEXT:    stp q0, q2, [x19, #32]
+; CHECK-NEXT:    stp q0, q1, [x19, #32]
 ; CHECK-NEXT:    addvl sp, sp, #2
 ; CHECK-NEXT:    add sp, sp, #48
 ; CHECK-NEXT:    ldp x30, x19, [sp, #16] // 16-byte Folded Reload
diff --git a/llvm/test/tools/UpdateTestChecks/update_llc_test_checks/Inputs/aarch64_generated_funcs.ll.generated.expected b/llvm/test/tools/UpdateTestChecks/update_llc_test_checks/Inputs/aarch64_generated_funcs.ll.generated.expected
index 110fa4d9b1cf1f4..97b17d98d347209 100644
--- a/llvm/test/tools/UpdateTestChecks/update_llc_test_checks/Inputs/aarch64_generated_funcs.ll.generated.expected
+++ b/llvm/test/tools/UpdateTestChecks/update_llc_test_checks/Inputs/aarch64_generated_funcs.ll.generated.expected
@@ -133,8 +133,8 @@ attributes #0 = { noredzone nounwind ssp uwtable "frame-pointer"="all" }
 ; CHECK-LABEL: OUTLINED_FUNCTION_0:
 ; CHECK:       // %bb.0:
 ; CHECK-NEXT:    mov w9, #2 // =0x2
-; CHECK-NEXT:    mov w10, #3 // =0x3
 ; CHECK-NEXT:    stp w9, w8, [x29, #-12]
+; CHECK-NEXT:    mov w9, #3 // =0x3
 ; CHECK-NEXT:    mov w8, #4 // =0x4
-; CHECK-NEXT:    stp w8, w10, [sp, #12]
+; CHECK-NEXT:    stp w8, w9, [sp, #12]
 ; CHECK-NEXT:    ret

@nocchijiang
Copy link
Contributor Author

I don't have commit access and I am trying to find some reviewers who reviewed/approved similar patches in the history. Sorry if this ping bothers you.

@davemgreen @fhahn

Blaming the lines leads me to @MatzeB who I believe is the original author to clear the kill flags. Please also take a look at this when you are free, thanks.

@davemgreen
Copy link
Collaborator

I think this sounds OK. I'm a little surprised that it can't recalc the info if needed.
Is it worth adding a test for the specific case you had - "which results in suboptimal register allocation"? Or do you think the existing tests that change enough to show the issue?

Copy link
Contributor

@MatzeB MatzeB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine, but you should take the time to update the comment and explain your algorithm.

@nocchijiang
Copy link
Contributor Author

Is it worth adding a test for the specific case you had - "which results in suboptimal register allocation"? Or do you think the existing tests that change enough to show the issue?

Initially I found that a few seemingly mergable strs were not merged in the generated machine instructions from the codebase I am working on, then I tried debugging the pass to locate the issue which is the kill flags being unconditionally cleared that prevents register renaming. Slightly better register usages can be observed from the affected tests (sve-fixed-length-fp128.ll and aarch64_generated_funcs.ll.generated.expected), which is enough to show the issue IMO, but it may still be an overstatement to call it a "register allocation" issue. Please let me know if I should make changes in the commit message and/or add a test to better describe the issue.

@nocchijiang nocchijiang requested a review from MatzeB October 30, 2023 06:37
@nocchijiang
Copy link
Contributor Author

Removed redundant flag clearing by only manipulating the operand of the MI being moved; added comment to explain the idea.

Copy link
Contributor

@MatzeB MatzeB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@nocchijiang
Copy link
Contributor Author

@MatzeB Thanks for the review! Could you please merge the PR for me since I don't have the commit access?

@MatzeB MatzeB merged commit 1f54ef7 into llvm:main Nov 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants