Skip to content

[GlobalISel] Check whether G_CTLZ is legal in matchUMulHToLShr #126457

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 10, 2025

Conversation

shiltian
Copy link
Contributor

@shiltian shiltian commented Feb 10, 2025

We need to check G_CTLZ because the combine uses G_CTLZ to get log base 2,
and it is not always legal for on a target.

Fixes SWDEV-512440.

Copy link
Contributor Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

@shiltian shiltian requested a review from arsenm February 10, 2025 03:26
@llvmbot
Copy link
Member

llvmbot commented Feb 10, 2025

@llvm/pr-subscribers-llvm-globalisel

Author: Shilei Tian (shiltian)

Changes

We need to check G_CTLZ because the combine uses G_CTLZ to get log base 2,
and it is not always legal for on a target.


Full diff: https://github.com/llvm/llvm-project/pull/126457.diff

2 Files Affected:

  • (modified) llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp (+5-1)
  • (added) llvm/test/CodeGen/AMDGPU/GlobalISel/no-ctlz-from-umul-to-lshr-in-postlegalizer.mir (+23)
diff --git a/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp b/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
index 4648414cc46aef6..0dfbb91f2ac5437 100644
--- a/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
@@ -5641,6 +5641,7 @@ bool CombinerHelper::matchUMulHToLShr(MachineInstr &MI) const {
   Register RHS = MI.getOperand(2).getReg();
   Register Dst = MI.getOperand(0).getReg();
   LLT Ty = MRI.getType(Dst);
+  LLT RHSTy = MRI.getType(RHS);
   LLT ShiftAmtTy = getTargetLowering().getPreferredShiftAmountTy(Ty);
   auto MatchPow2ExceptOne = [&](const Constant *C) {
     if (auto *CI = dyn_cast<ConstantInt>(C))
@@ -5649,7 +5650,10 @@ bool CombinerHelper::matchUMulHToLShr(MachineInstr &MI) const {
   };
   if (!matchUnaryPredicate(MRI, RHS, MatchPow2ExceptOne, false))
     return false;
-  return isLegalOrBeforeLegalizer({TargetOpcode::G_LSHR, {Ty, ShiftAmtTy}});
+  // We need to check both G_LSHR and G_CTLZ because the combine uses G_CTLZ to
+  // get log base 2, and it is not always legal for on a target.
+  return isLegalOrBeforeLegalizer({TargetOpcode::G_LSHR, {Ty, ShiftAmtTy}}) &&
+         isLegalOrBeforeLegalizer({TargetOpcode::G_CTLZ, {RHSTy, RHSTy}});
 }
 
 void CombinerHelper::applyUMulHToLShr(MachineInstr &MI) const {
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/no-ctlz-from-umul-to-lshr-in-postlegalizer.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/no-ctlz-from-umul-to-lshr-in-postlegalizer.mir
new file mode 100644
index 000000000000000..00ead74cb37bb84
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/no-ctlz-from-umul-to-lshr-in-postlegalizer.mir
@@ -0,0 +1,23 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
+# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -run-pass=amdgpu-postlegalizer-combiner %s -o - | FileCheck %s
+
+---
+name: test
+tracksRegLiveness: true
+legalized: true
+body: |
+  bb.0:
+    liveins: $vgpr0, $vgpr1
+    ; CHECK-LABEL: name: test
+    ; CHECK: liveins: $vgpr0, $vgpr1
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0
+    ; CHECK-NEXT: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 4
+    ; CHECK-NEXT: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[COPY]], [[C]]
+    ; CHECK-NEXT: $vgpr0 = COPY [[UMULH]](s32)
+    ; CHECK-NEXT: SI_RETURN implicit $vgpr0
+    %0:_(s32) = COPY $vgpr0
+    %1:_(s32) = G_CONSTANT i32 4
+    %2:_(s32) = G_UMULH %0:_, %1:_
+    $vgpr0 = COPY %2:_(s32)
+    SI_RETURN implicit $vgpr0

@llvmbot
Copy link
Member

llvmbot commented Feb 10, 2025

@llvm/pr-subscribers-backend-amdgpu

Author: Shilei Tian (shiltian)

Changes

We need to check G_CTLZ because the combine uses G_CTLZ to get log base 2,
and it is not always legal for on a target.


Full diff: https://github.com/llvm/llvm-project/pull/126457.diff

2 Files Affected:

  • (modified) llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp (+5-1)
  • (added) llvm/test/CodeGen/AMDGPU/GlobalISel/no-ctlz-from-umul-to-lshr-in-postlegalizer.mir (+23)
diff --git a/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp b/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
index 4648414cc46aef6..0dfbb91f2ac5437 100644
--- a/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
@@ -5641,6 +5641,7 @@ bool CombinerHelper::matchUMulHToLShr(MachineInstr &MI) const {
   Register RHS = MI.getOperand(2).getReg();
   Register Dst = MI.getOperand(0).getReg();
   LLT Ty = MRI.getType(Dst);
+  LLT RHSTy = MRI.getType(RHS);
   LLT ShiftAmtTy = getTargetLowering().getPreferredShiftAmountTy(Ty);
   auto MatchPow2ExceptOne = [&](const Constant *C) {
     if (auto *CI = dyn_cast<ConstantInt>(C))
@@ -5649,7 +5650,10 @@ bool CombinerHelper::matchUMulHToLShr(MachineInstr &MI) const {
   };
   if (!matchUnaryPredicate(MRI, RHS, MatchPow2ExceptOne, false))
     return false;
-  return isLegalOrBeforeLegalizer({TargetOpcode::G_LSHR, {Ty, ShiftAmtTy}});
+  // We need to check both G_LSHR and G_CTLZ because the combine uses G_CTLZ to
+  // get log base 2, and it is not always legal for on a target.
+  return isLegalOrBeforeLegalizer({TargetOpcode::G_LSHR, {Ty, ShiftAmtTy}}) &&
+         isLegalOrBeforeLegalizer({TargetOpcode::G_CTLZ, {RHSTy, RHSTy}});
 }
 
 void CombinerHelper::applyUMulHToLShr(MachineInstr &MI) const {
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/no-ctlz-from-umul-to-lshr-in-postlegalizer.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/no-ctlz-from-umul-to-lshr-in-postlegalizer.mir
new file mode 100644
index 000000000000000..00ead74cb37bb84
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/no-ctlz-from-umul-to-lshr-in-postlegalizer.mir
@@ -0,0 +1,23 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
+# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -run-pass=amdgpu-postlegalizer-combiner %s -o - | FileCheck %s
+
+---
+name: test
+tracksRegLiveness: true
+legalized: true
+body: |
+  bb.0:
+    liveins: $vgpr0, $vgpr1
+    ; CHECK-LABEL: name: test
+    ; CHECK: liveins: $vgpr0, $vgpr1
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0
+    ; CHECK-NEXT: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 4
+    ; CHECK-NEXT: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[COPY]], [[C]]
+    ; CHECK-NEXT: $vgpr0 = COPY [[UMULH]](s32)
+    ; CHECK-NEXT: SI_RETURN implicit $vgpr0
+    %0:_(s32) = COPY $vgpr0
+    %1:_(s32) = G_CONSTANT i32 4
+    %2:_(s32) = G_UMULH %0:_, %1:_
+    $vgpr0 = COPY %2:_(s32)
+    SI_RETURN implicit $vgpr0

@shiltian shiltian requested a review from aemerson February 10, 2025 03:26
Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add and end to end IR test where this reproduced

We need to check `G_CTLZ` because the combine uses `G_CTLZ` to get log base 2,
and it is not always legal for on a target.
@shiltian shiltian force-pushed the users/shiltian/check-ctlz-in-matchUMulHToLShr branch from 9760a8d to bf21d87 Compare February 10, 2025 03:55
@@ -0,0 +1,98 @@
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
Copy link
Contributor Author

@shiltian shiltian Feb 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two interesting things about this test case:

  1. It is not reproducible w/o -O0 argument
  2. The udiv is actually a UB, and replacing either of its operand will make the crash go away (or the pattern not recognized such that the issue will not happen).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You would have an easier time reproducing this by directly emitting a multiply + shift by half bitwidth. it just happens one appears inside the div expansion

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

like this? it doesn't seem to be able to reproduce the issue.

define void @test(ptr %p, i32 %a, i32 %b) {
  %mul = mul i32 %a, %b
  %lshr = lshr i32 %mul, 16
  store i32 %lshr, ptr %p, align 4
  ret void
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably want 64-bit multiply, but I don't see the general combine to form umulh from mul + shift

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I guess I'm gonna merge as it is. Didn't get the luck with 64 bit as well.

@shiltian shiltian merged commit 70fdd9f into main Feb 10, 2025
8 checks passed
@shiltian shiltian deleted the users/shiltian/check-ctlz-in-matchUMulHToLShr branch February 10, 2025 05:11
Icohedron pushed a commit to Icohedron/llvm-project that referenced this pull request Feb 11, 2025
…lvm#126457)

We need to check `G_CTLZ` because the combine uses `G_CTLZ` to get log
base 2,
and it is not always legal for on a target.

Fixes SWDEV-512440.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants