Skip to content

[Coalescer] Consider NewMI's subreg index when updating lanemask. #121780

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

sdesmalen-arm
Copy link
Collaborator

The code added in #116191 that updated the lanemasks for rematerialized
values checked if DefMI's destination register had a subreg index.
This seems to have missed the following case:

  %0:gpr32 = MOVi32imm 1
  %1:gpr64 = SUBREG_TO_REG 0, %0:gpr32, %subreg.sub_32

which during rematerialization would have the following variables set:

  DefMI = %0:gpr32 = MOVi32imm 1

  NewMI = %3.sub_32:gpr64 = MOVi32imm 1   (rematerialized value)

When checking whether the lanemasks need to be generated, considering
whether DefMI's destination has a subreg index is insufficient, we should
look at DefMI's subreg index instead.

The added tests are a bit more involved, because I was not able to
reconstruct the issue without having some control flow in the test.
These tests come from actual reproducers.

@llvmbot
Copy link
Member

llvmbot commented Jan 6, 2025

@llvm/pr-subscribers-llvm-regalloc

Author: Sander de Smalen (sdesmalen-arm)

Changes

The code added in #116191 that updated the lanemasks for rematerialized
values checked if DefMI's destination register had a subreg index.
This seems to have missed the following case:

  %0:gpr32 = MOVi32imm 1
  %1:gpr64 = SUBREG_TO_REG 0, %0:gpr32, %subreg.sub_32

which during rematerialization would have the following variables set:

  DefMI = %0:gpr32 = MOVi32imm 1

  NewMI = %3.sub_32:gpr64 = MOVi32imm 1   (rematerialized value)

When checking whether the lanemasks need to be generated, considering
whether DefMI's destination has a subreg index is insufficient, we should
look at DefMI's subreg index instead.

The added tests are a bit more involved, because I was not able to
reconstruct the issue without having some control flow in the test.
These tests come from actual reproducers.


Full diff: https://github.com/llvm/llvm-project/pull/121780.diff

2 Files Affected:

  • (modified) llvm/lib/CodeGen/RegisterCoalescer.cpp (+4-4)
  • (modified) llvm/test/CodeGen/AArch64/register-coalesce-update-subranges-remat.mir (+92-1)
diff --git a/llvm/lib/CodeGen/RegisterCoalescer.cpp b/llvm/lib/CodeGen/RegisterCoalescer.cpp
index 712c64bb9cf6d5..df7193eae6fbc3 100644
--- a/llvm/lib/CodeGen/RegisterCoalescer.cpp
+++ b/llvm/lib/CodeGen/RegisterCoalescer.cpp
@@ -1508,14 +1508,14 @@ bool RegisterCoalescer::reMaterializeTrivialDef(const CoalescerPair &CP,
     //    undef %2.subreg:reg = INST %1:reg         ; DefMI (rematerializable),
     //                                              ; DefSubIdx = subreg
     //    %3:reg = COPY %2                          ; SrcIdx = DstIdx = 0
-    //    .... = SOMEINSTR %3:reg
+    //    .... = SOMEINSTR %3:reg, %2
     //
     // there are no subranges for %3 so after rematerialization we need
     // to explicitly create them. Undefined subranges are removed later on.
-    if (DefSubIdx && !CP.getSrcIdx() && !CP.getDstIdx() &&
-        MRI->shouldTrackSubRegLiveness(DstReg) && !DstInt.hasSubRanges()) {
+    if (NewIdx && !DstInt.hasSubRanges() &&
+        MRI->shouldTrackSubRegLiveness(DstReg)) {
       LaneBitmask FullMask = MRI->getMaxLaneMaskForVReg(DstReg);
-      LaneBitmask UsedLanes = TRI->getSubRegIndexLaneMask(DefSubIdx);
+      LaneBitmask UsedLanes = TRI->getSubRegIndexLaneMask(NewIdx);
       LaneBitmask UnusedLanes = FullMask & ~UsedLanes;
       VNInfo::Allocator &Alloc = LIS->getVNInfoAllocator();
       DstInt.createSubRangeFrom(Alloc, UsedLanes, DstInt);
diff --git a/llvm/test/CodeGen/AArch64/register-coalesce-update-subranges-remat.mir b/llvm/test/CodeGen/AArch64/register-coalesce-update-subranges-remat.mir
index b61fa4be040070..08fc47d9480ce9 100644
--- a/llvm/test/CodeGen/AArch64/register-coalesce-update-subranges-remat.mir
+++ b/llvm/test/CodeGen/AArch64/register-coalesce-update-subranges-remat.mir
@@ -1,5 +1,5 @@
+# RUN: llc -mtriple=aarch64 -o /dev/null -run-pass=register-coalescer -aarch64-enable-subreg-liveness-tracking -debug-only=regalloc %s 2>&1 | FileCheck %s --check-prefix=CHECK-DBG
 # RUN: llc -mtriple=aarch64 -verify-machineinstrs -o - -run-pass=register-coalescer -aarch64-enable-subreg-liveness-tracking %s | FileCheck %s --check-prefix=CHECK
-# RUN: llc -mtriple=aarch64 -verify-machineinstrs -o /dev/null -run-pass=register-coalescer -aarch64-enable-subreg-liveness-tracking -debug-only=regalloc %s 2>&1 | FileCheck %s --check-prefix=CHECK-DBG
 # REQUIRES: asserts
 
 # CHECK-DBG: ********** REGISTER COALESCER **********
@@ -36,3 +36,94 @@ body:             |
     RET_ReallyLR
 
 ...
+# CHECK-DBG: ********** REGISTER COALESCER **********
+# CHECK-DBG: ********** Function: reproducer
+# CHECK-DBG: ********** JOINING INTERVALS ***********
+# CHECK-DBG: ********** INTERVALS **********
+# CHECK-DBG: %1 [32r,48B:2)[48B,320r:0)[320r,368B:1) 0@48B-phi 1@320r 2@32r
+# CHECK-DBG-SAME: weight:0.000000e+00
+# CHECK-DBG: %3 [80r,160B:2)[240r,272B:1)[288r,304B:0)[304B,320r:3) 0@288r 1@240r 2@80r 3@304B-phi
+# CHECK-DBG-SAME: L0000000000000080 [288r,304B:0)[304B,320r:3) 0@288r 1@x 2@x 3@304B-phi
+# CHECK-DBG-SAME: L0000000000000040 [80r,160B:2)[240r,272B:1)[288r,304B:0)[304B,320r:3) 0@288r 1@240r 2@80r 3@304B-phi
+# CHECK-DBG-SAME: weight:0.000000e+00
+---
+name:              reproducer
+tracksRegLiveness: true
+body:             |
+  bb.0:
+    %0:gpr32 = MOVi32imm 1
+    %1:gpr64 = IMPLICIT_DEF
+
+  bb.1:
+
+  bb.2:
+    %3:gpr64all = SUBREG_TO_REG 0, %0, %subreg.sub_32
+
+  bb.3:
+    $nzcv = IMPLICIT_DEF
+    %4:gpr64 = COPY killed %3
+    Bcc 1, %bb.7, implicit killed $nzcv
+
+  bb.4:
+    $nzcv = IMPLICIT_DEF
+    Bcc 1, %bb.6, implicit killed $nzcv
+
+  bb.5:
+    %5:gpr64all = SUBREG_TO_REG 0, %0, %subreg.sub_32
+    %4:gpr64 = COPY killed %5
+    B %bb.7
+
+  bb.6:
+    %4:gpr64 = COPY $xzr
+
+  bb.7:
+    %7:gpr64 = ADDXrs killed %1, killed %4, 1
+    %1:gpr64 = COPY killed %7
+    B %bb.1
+
+...
+# CHECK-DBG: ********** REGISTER COALESCER **********
+# CHECK-DBG: ********** Function: reproducer2
+# CHECK-DBG: ********** JOINING INTERVALS ***********
+# CHECK-DBG: ********** INTERVALS **********
+# CHECK-DBG: %1 [32r,48B:2)[48B,304r:0)[304r,352B:1) 0@48B-phi 1@304r 2@32r
+# CHECK-DBG-SAME: weight:0.000000e+00
+# CHECK-DBG: %3 [80r,160B:2)[224r,256B:1)[272r,288B:0)[288B,304r:3) 0@272r 1@224r 2@80r 3@288B-phi
+# CHECK-DBG-SAME: L0000000000000080 [224r,256B:1)[272r,288B:0)[288B,304r:3) 0@272r 1@224r 2@x 3@288B-phi
+# CHECK-DBG-SAME: L0000000000000040 [80r,160B:2)[224r,256B:1)[272r,288B:0)[288B,304r:3) 0@272r 1@224r 2@80r 3@288B-phi
+# CHECK-DBG-SAME: weight:0.000000e+00
+---
+name:              reproducer2
+tracksRegLiveness: true
+body:             |
+  bb.0:
+    %0:gpr32 = MOVi32imm 1
+    %1:gpr64 = IMPLICIT_DEF
+
+  bb.1:
+
+  bb.2:
+    %3:gpr64all = SUBREG_TO_REG 0, %0, %subreg.sub_32
+
+  bb.3:
+    $nzcv = IMPLICIT_DEF
+    %4:gpr64 = COPY killed %3
+    Bcc 1, %bb.7, implicit killed $nzcv
+
+  bb.4:
+    $nzcv = IMPLICIT_DEF
+    Bcc 1, %bb.6, implicit killed $nzcv
+
+  bb.5:
+    %4:gpr64 = IMPLICIT_DEF
+    B %bb.7
+
+  bb.6:
+    %4:gpr64 = COPY $xzr
+
+  bb.7:
+    %5:gpr64 = ADDXrs killed %1, killed %4, 1
+    %1:gpr64 = COPY killed %5
+    B %bb.1
+
+...

@llvmbot
Copy link
Member

llvmbot commented Jan 6, 2025

@llvm/pr-subscribers-backend-aarch64

Author: Sander de Smalen (sdesmalen-arm)

Changes

The code added in #116191 that updated the lanemasks for rematerialized
values checked if DefMI's destination register had a subreg index.
This seems to have missed the following case:

  %0:gpr32 = MOVi32imm 1
  %1:gpr64 = SUBREG_TO_REG 0, %0:gpr32, %subreg.sub_32

which during rematerialization would have the following variables set:

  DefMI = %0:gpr32 = MOVi32imm 1

  NewMI = %3.sub_32:gpr64 = MOVi32imm 1   (rematerialized value)

When checking whether the lanemasks need to be generated, considering
whether DefMI's destination has a subreg index is insufficient, we should
look at DefMI's subreg index instead.

The added tests are a bit more involved, because I was not able to
reconstruct the issue without having some control flow in the test.
These tests come from actual reproducers.


Full diff: https://github.com/llvm/llvm-project/pull/121780.diff

2 Files Affected:

  • (modified) llvm/lib/CodeGen/RegisterCoalescer.cpp (+4-4)
  • (modified) llvm/test/CodeGen/AArch64/register-coalesce-update-subranges-remat.mir (+92-1)
diff --git a/llvm/lib/CodeGen/RegisterCoalescer.cpp b/llvm/lib/CodeGen/RegisterCoalescer.cpp
index 712c64bb9cf6d5..df7193eae6fbc3 100644
--- a/llvm/lib/CodeGen/RegisterCoalescer.cpp
+++ b/llvm/lib/CodeGen/RegisterCoalescer.cpp
@@ -1508,14 +1508,14 @@ bool RegisterCoalescer::reMaterializeTrivialDef(const CoalescerPair &CP,
     //    undef %2.subreg:reg = INST %1:reg         ; DefMI (rematerializable),
     //                                              ; DefSubIdx = subreg
     //    %3:reg = COPY %2                          ; SrcIdx = DstIdx = 0
-    //    .... = SOMEINSTR %3:reg
+    //    .... = SOMEINSTR %3:reg, %2
     //
     // there are no subranges for %3 so after rematerialization we need
     // to explicitly create them. Undefined subranges are removed later on.
-    if (DefSubIdx && !CP.getSrcIdx() && !CP.getDstIdx() &&
-        MRI->shouldTrackSubRegLiveness(DstReg) && !DstInt.hasSubRanges()) {
+    if (NewIdx && !DstInt.hasSubRanges() &&
+        MRI->shouldTrackSubRegLiveness(DstReg)) {
       LaneBitmask FullMask = MRI->getMaxLaneMaskForVReg(DstReg);
-      LaneBitmask UsedLanes = TRI->getSubRegIndexLaneMask(DefSubIdx);
+      LaneBitmask UsedLanes = TRI->getSubRegIndexLaneMask(NewIdx);
       LaneBitmask UnusedLanes = FullMask & ~UsedLanes;
       VNInfo::Allocator &Alloc = LIS->getVNInfoAllocator();
       DstInt.createSubRangeFrom(Alloc, UsedLanes, DstInt);
diff --git a/llvm/test/CodeGen/AArch64/register-coalesce-update-subranges-remat.mir b/llvm/test/CodeGen/AArch64/register-coalesce-update-subranges-remat.mir
index b61fa4be040070..08fc47d9480ce9 100644
--- a/llvm/test/CodeGen/AArch64/register-coalesce-update-subranges-remat.mir
+++ b/llvm/test/CodeGen/AArch64/register-coalesce-update-subranges-remat.mir
@@ -1,5 +1,5 @@
+# RUN: llc -mtriple=aarch64 -o /dev/null -run-pass=register-coalescer -aarch64-enable-subreg-liveness-tracking -debug-only=regalloc %s 2>&1 | FileCheck %s --check-prefix=CHECK-DBG
 # RUN: llc -mtriple=aarch64 -verify-machineinstrs -o - -run-pass=register-coalescer -aarch64-enable-subreg-liveness-tracking %s | FileCheck %s --check-prefix=CHECK
-# RUN: llc -mtriple=aarch64 -verify-machineinstrs -o /dev/null -run-pass=register-coalescer -aarch64-enable-subreg-liveness-tracking -debug-only=regalloc %s 2>&1 | FileCheck %s --check-prefix=CHECK-DBG
 # REQUIRES: asserts
 
 # CHECK-DBG: ********** REGISTER COALESCER **********
@@ -36,3 +36,94 @@ body:             |
     RET_ReallyLR
 
 ...
+# CHECK-DBG: ********** REGISTER COALESCER **********
+# CHECK-DBG: ********** Function: reproducer
+# CHECK-DBG: ********** JOINING INTERVALS ***********
+# CHECK-DBG: ********** INTERVALS **********
+# CHECK-DBG: %1 [32r,48B:2)[48B,320r:0)[320r,368B:1) 0@48B-phi 1@320r 2@32r
+# CHECK-DBG-SAME: weight:0.000000e+00
+# CHECK-DBG: %3 [80r,160B:2)[240r,272B:1)[288r,304B:0)[304B,320r:3) 0@288r 1@240r 2@80r 3@304B-phi
+# CHECK-DBG-SAME: L0000000000000080 [288r,304B:0)[304B,320r:3) 0@288r 1@x 2@x 3@304B-phi
+# CHECK-DBG-SAME: L0000000000000040 [80r,160B:2)[240r,272B:1)[288r,304B:0)[304B,320r:3) 0@288r 1@240r 2@80r 3@304B-phi
+# CHECK-DBG-SAME: weight:0.000000e+00
+---
+name:              reproducer
+tracksRegLiveness: true
+body:             |
+  bb.0:
+    %0:gpr32 = MOVi32imm 1
+    %1:gpr64 = IMPLICIT_DEF
+
+  bb.1:
+
+  bb.2:
+    %3:gpr64all = SUBREG_TO_REG 0, %0, %subreg.sub_32
+
+  bb.3:
+    $nzcv = IMPLICIT_DEF
+    %4:gpr64 = COPY killed %3
+    Bcc 1, %bb.7, implicit killed $nzcv
+
+  bb.4:
+    $nzcv = IMPLICIT_DEF
+    Bcc 1, %bb.6, implicit killed $nzcv
+
+  bb.5:
+    %5:gpr64all = SUBREG_TO_REG 0, %0, %subreg.sub_32
+    %4:gpr64 = COPY killed %5
+    B %bb.7
+
+  bb.6:
+    %4:gpr64 = COPY $xzr
+
+  bb.7:
+    %7:gpr64 = ADDXrs killed %1, killed %4, 1
+    %1:gpr64 = COPY killed %7
+    B %bb.1
+
+...
+# CHECK-DBG: ********** REGISTER COALESCER **********
+# CHECK-DBG: ********** Function: reproducer2
+# CHECK-DBG: ********** JOINING INTERVALS ***********
+# CHECK-DBG: ********** INTERVALS **********
+# CHECK-DBG: %1 [32r,48B:2)[48B,304r:0)[304r,352B:1) 0@48B-phi 1@304r 2@32r
+# CHECK-DBG-SAME: weight:0.000000e+00
+# CHECK-DBG: %3 [80r,160B:2)[224r,256B:1)[272r,288B:0)[288B,304r:3) 0@272r 1@224r 2@80r 3@288B-phi
+# CHECK-DBG-SAME: L0000000000000080 [224r,256B:1)[272r,288B:0)[288B,304r:3) 0@272r 1@224r 2@x 3@288B-phi
+# CHECK-DBG-SAME: L0000000000000040 [80r,160B:2)[224r,256B:1)[272r,288B:0)[288B,304r:3) 0@272r 1@224r 2@80r 3@288B-phi
+# CHECK-DBG-SAME: weight:0.000000e+00
+---
+name:              reproducer2
+tracksRegLiveness: true
+body:             |
+  bb.0:
+    %0:gpr32 = MOVi32imm 1
+    %1:gpr64 = IMPLICIT_DEF
+
+  bb.1:
+
+  bb.2:
+    %3:gpr64all = SUBREG_TO_REG 0, %0, %subreg.sub_32
+
+  bb.3:
+    $nzcv = IMPLICIT_DEF
+    %4:gpr64 = COPY killed %3
+    Bcc 1, %bb.7, implicit killed $nzcv
+
+  bb.4:
+    $nzcv = IMPLICIT_DEF
+    Bcc 1, %bb.6, implicit killed $nzcv
+
+  bb.5:
+    %4:gpr64 = IMPLICIT_DEF
+    B %bb.7
+
+  bb.6:
+    %4:gpr64 = COPY $xzr
+
+  bb.7:
+    %5:gpr64 = ADDXrs killed %1, killed %4, 1
+    %1:gpr64 = COPY killed %5
+    B %bb.1
+
+...

Copy link
Collaborator

@qcolombet qcolombet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@@ -1508,14 +1508,14 @@ bool RegisterCoalescer::reMaterializeTrivialDef(const CoalescerPair &CP,
// undef %2.subreg:reg = INST %1:reg ; DefMI (rematerializable),
// ; DefSubIdx = subreg
// %3:reg = COPY %2 ; SrcIdx = DstIdx = 0
// .... = SOMEINSTR %3:reg
// .... = SOMEINSTR %3:reg, %2
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this part of the change intended?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It wasn't, thanks. Your question did trigger me to clarify the existing comment a bit though! :)

Base automatically changed from users/sdesmalen-arm/srlt-fix-coalescer-the-sequel-nfci to main January 7, 2025 09:57
The code added in #116191 that updated the lanemasks for rematerialized
values, checked if DefMI's destination reg had a subreg index.
This seems to have missed the following case:

  %0:gpr32 = MOVi32imm 1
  %1:gpr64 = SUBREG_TO_REG 0, %0:gpr32, %subreg.sub_32

which during rematerialization would have the following variables set:

  DefMI = %0:gpr32 = MOVi32imm 1

  NewMI = %3.sub_32:gpr64 = MOVi32imm 1   (rematerialized value)

When checking whether the lanemasks need to be generated, considering
whether DefMI's destination has a subreg index is insufficient, we should
look at DefMI's subreg index instead.

The added tests are a bit more involved, because I was not able to
reconstruct the issue without having some control flow in the test.
These tests come from actual reproducers.
@sdesmalen-arm sdesmalen-arm force-pushed the users/sdesmalen-arm/srlt-fix-coalescer-the-sequel branch from f9e3ada to 912ab02 Compare January 7, 2025 12:32
Copy link

github-actions bot commented Jan 7, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@sdesmalen-arm sdesmalen-arm force-pushed the users/sdesmalen-arm/srlt-fix-coalescer-the-sequel branch from 912ab02 to fa17b74 Compare January 7, 2025 14:31
@sdesmalen-arm sdesmalen-arm merged commit 82ec2d6 into main Jan 7, 2025
5 of 7 checks passed
@sdesmalen-arm sdesmalen-arm deleted the users/sdesmalen-arm/srlt-fix-coalescer-the-sequel branch January 7, 2025 15:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants