[AMDGPU] prevent shrinking udiv/urem if either operand is in (SignedMax,UnsignedMax] #116733

choikwa · 2024-11-19T02:46:04Z

Do this by using ComputeKnownBits and checking for !isNonNegative and isUnsigned. This rejects shrinking unsigned div/rem if operands exceed smax_bitwidth since we know NumSignBits will be always 0.

llvmbot · 2024-11-19T02:46:41Z

@llvm/pr-subscribers-backend-amdgpu

Author: choikwa (choikwa)

Changes

Do this by using ComputeKnownBits and checking for !isNonNegative and isUnsigned. This rejects shrinking unsigned div/rem if operands exceed smax_bitwidth since we know NumSignBits will be always 0.

Full diff: https://github.com/llvm/llvm-project/pull/116733.diff

2 Files Affected:

(modified) llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp (+12)
(modified) llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-idiv.ll (+57)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp b/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
index c49aab823b44a4..bb9dbda200a524 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
@@ -1193,6 +1193,18 @@ int AMDGPUCodeGenPrepareImpl::getDivNumBits(BinaryOperator &I, Value *Num,
                                             Value *Den, unsigned AtLeast,
                                             bool IsSigned) const {
   const DataLayout &DL = Mod->getDataLayout();
+  if (!IsSigned) {
+    KnownBits Known = computeKnownBits(Num, DL, 0, AC, &I);
+    // We know all bits are used for division for Operand > smax_bitwidth
+    if (!Known.isNonNegative())
+      return -1;
+
+    Known = computeKnownBits(Den, DL, 0, AC, &I);
+    if (!Known.isNonNegative())
+      return -1;
+  }
+
+
   unsigned LHSSignBits = ComputeNumSignBits(Num, DL, 0, AC, &I);
   if (LHSSignBits < AtLeast)
     return -1;
diff --git a/llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-idiv.ll b/llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-idiv.ll
index b7436aeb1d5302..283a3524c2ff93 100644
--- a/llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-idiv.ll
+++ b/llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-idiv.ll
@@ -9999,3 +9999,60 @@ define <2 x i64> @v_udiv_i64_exact(<2 x i64> %num) {
    %result = udiv exact <2 x i64> %num, <i64 4096, i64 1024>
    ret <2 x i64> %result
 }
+
+define i64 @udiv_i64_gt_smax(i8 %size) {
+; GFX6-LABEL: udiv_i64_gt_smax:
+; GFX6:       ; %bb.0:
+; GFX6-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX6-NEXT:    v_bfe_i32 v0, v0, 0, 8
+; GFX6-NEXT:    v_ashrrev_i32_e32 v1, 31, v0
+; GFX6-NEXT:    v_not_b32_e32 v1, v1
+; GFX6-NEXT:    v_not_b32_e32 v0, v0
+; GFX6-NEXT:    s_mov_b32 s4, 0xcccccccd
+; GFX6-NEXT:    v_mul_lo_u32 v3, v1, s4
+; GFX6-NEXT:    v_mul_hi_u32 v4, v0, s4
+; GFX6-NEXT:    s_mov_b32 s6, 0xcccccccc
+; GFX6-NEXT:    v_mul_hi_u32 v5, v1, s4
+; GFX6-NEXT:    v_mul_hi_u32 v2, v0, s6
+; GFX6-NEXT:    v_mul_lo_u32 v0, v0, s6
+; GFX6-NEXT:    v_add_i32_e32 v3, vcc, v3, v4
+; GFX6-NEXT:    v_addc_u32_e32 v4, vcc, 0, v5, vcc
+; GFX6-NEXT:    v_add_i32_e32 v0, vcc, v0, v3
+; GFX6-NEXT:    v_addc_u32_e32 v0, vcc, 0, v2, vcc
+; GFX6-NEXT:    v_mul_lo_u32 v2, v1, s6
+; GFX6-NEXT:    v_mul_hi_u32 v1, v1, s6
+; GFX6-NEXT:    v_add_i32_e32 v0, vcc, v4, v0
+; GFX6-NEXT:    v_addc_u32_e64 v3, s[4:5], 0, 0, vcc
+; GFX6-NEXT:    v_add_i32_e32 v0, vcc, v2, v0
+; GFX6-NEXT:    v_addc_u32_e32 v1, vcc, v1, v3, vcc
+; GFX6-NEXT:    v_alignbit_b32 v0, v1, v0, 3
+; GFX6-NEXT:    v_lshrrev_b32_e32 v1, 3, v1
+; GFX6-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: udiv_i64_gt_smax:
+; GFX9:       ; %bb.0:
+; GFX9-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT:    v_mov_b32_e32 v1, 31
+; GFX9-NEXT:    v_not_b32_sdwa v4, sext(v0) dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0
+; GFX9-NEXT:    s_mov_b32 s4, 0xcccccccd
+; GFX9-NEXT:    v_ashrrev_i32_sdwa v1, v1, sext(v0) dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:BYTE_0
+; GFX9-NEXT:    v_mul_hi_u32 v0, v4, s4
+; GFX9-NEXT:    v_not_b32_e32 v5, v1
+; GFX9-NEXT:    v_mov_b32_e32 v1, 0
+; GFX9-NEXT:    s_mov_b32 s6, 0xcccccccc
+; GFX9-NEXT:    v_mad_u64_u32 v[2:3], s[4:5], v5, s4, v[0:1]
+; GFX9-NEXT:    v_mov_b32_e32 v6, v3
+; GFX9-NEXT:    v_mov_b32_e32 v3, v1
+; GFX9-NEXT:    v_mad_u64_u32 v[0:1], s[4:5], v4, s6, v[2:3]
+; GFX9-NEXT:    v_mov_b32_e32 v0, v1
+; GFX9-NEXT:    v_add_co_u32_e32 v0, vcc, v6, v0
+; GFX9-NEXT:    v_addc_co_u32_e64 v1, s[4:5], 0, 0, vcc
+; GFX9-NEXT:    v_mad_u64_u32 v[0:1], s[4:5], v5, s6, v[0:1]
+; GFX9-NEXT:    v_alignbit_b32 v0, v1, v0, 3
+; GFX9-NEXT:    v_lshrrev_b32_e32 v1, 3, v1
+; GFX9-NEXT:    s_setpc_b64 s[30:31]
+  %esize = sext i8 %size to i64
+  %minus = sub nuw nsw i64 -1, %esize
+  %div = udiv i64 %minus, 10
+  ret i64 %div
+}

github-actions · 2024-11-19T02:49:35Z

✅ With the latest revision this PR passed the C/C++ code formatter.

changpeng

LGTM

jayfoad

I think it would make more sense for this function to have separate paths for signed (based on computeNumSignBits) and unsigned (based on leading zeros, from computeKnownBits). The way it is written now, the unsigned case will potentially call both computeKnownBits and computeNumSignBits on both arguments, which seems like overkill.

llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp

choikwa · 2024-11-19T14:17:50Z

Separated into unsigned/signed path in getDivNumBits and added/verified positive testcase for shrinking.

… max Handle signed and unsigned path differently in getDivNumBits. Using computeKnownBits, this rejects shrinking unsigned div/rem if operands exceed signed max since we know NumSignBits will be always 0.

jplehr · 2024-11-20T21:04:23Z

I believe this broke one of our bots: https://lab.llvm.org/buildbot/#/builders/73/builds/8871

Sorry for the delay to report, I'm currently traveling.
@jhuber6

jhuber6 · 2024-11-20T21:07:43Z

Definitely seems like it. The LLVM libc tests tend to trigger a lot of different edge cases because it compiles a lot of non-standard GPU code.

shiltian · 2024-11-20T21:38:49Z

Definitely seems like it. The LLVM libc tests tend to trigger a lot of different edge cases because it compiles a lot of non-standard GPU code.

👍 which is really nice to have :-)

jhuber6 · 2024-11-20T21:46:55Z

I'll try to do a local test and confirm if it's this patch and request a revert if so. If anyone else wants to try you should just need to add the options documented in https://libc.llvm.org/gpu/building.html and do ninja check-libc-amdgcn-amd-amdhsa.

changpeng · 2024-11-20T21:55:31Z

I'll try to do a local test and confirm if it's this patch and request a revert if so. If anyone else wants to try you should just need to add the options documented in https://libc.llvm.org/gpu/building.html and do ninja check-libc-amdgcn-amd-amdhsa.

Could it possibly be smoker tester error? I think this change just resulting in less optimized udiv/drem

jhuber6 · 2024-11-20T21:57:20Z

Could it possibly be smoker tester error? I think this change just resulting in less optimized udiv/drem

I's guess not because the failures are consistent and started just after this PR. I'll confirm later once I finish up something.

jhuber6 · 2024-11-21T00:20:22Z

Failures go away when I revert this patch locally, so I'm going to revert.

…(SignedMax,UnsignedMax] (#116733)" This reverts commit b8e1d4d. Causes failures on the `libc` test suite https://lab.llvm.org/buildbot/#/builders/73/builds/8871

arsenm · 2024-11-21T02:00:57Z

llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp

+    unsigned RHSSignBits = ComputeNumSignBits(Den, DL, 0, AC, &I);
+    if (RHSSignBits < AtLeast)
+      return -1;


Should try RHS first as it's canonically simpler

arsenm · 2024-11-21T02:01:40Z

llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp

+      return -1;
+    unsigned LHSSignBits = Known.countMinLeadingZeros();
+
+    Known = computeKnownBits(Den, DL, 0, AC, &I);


Try RHS first

arsenm · 2024-11-21T02:01:53Z

llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp

+    unsigned SignBits = std::min(LHSSignBits, RHSSignBits);
+    unsigned DivBits = Num->getType()->getScalarSizeInBits() - SignBits;
+    return DivBits + 1;
+  } else {


No else after return

arsenm · 2024-11-21T02:03:48Z

llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp

+    // We know all bits are used for division for Num or Den in range
+    // (SignedMax, UnsignedMax]
+    if (Known.isNegative() || !Known.isNonNegative())
+      return -1;


Is this reproducing the logic of computeKnownBits for division? Can you just do KnownLHS.udiv/urem(KnowRHS)?

Unfortunately we need the KnownBits of the operands for shrinking as div/rem gives us the known of the final output.

choikwa · 2024-11-21T06:39:25Z

Going to try to use computeKnownBits for signed and unsigned cases for consistency and wider available information. Will also look at the LibC failure.

… (SignedMax,UnsignedMax] (llvm#116733)" This reverts commit 905e831. Handle signed and unsigned path differently in getDivNumBits. Using computeKnownBits, this rejects shrinking unsigned div/rem if operands exceed signed max since we know NumSignBits will be always 0.

#118928) … (SignedMax,UnsignedMax] (#116733)" This reverts commit 905e831. Handle signed and unsigned path differently in getDivNumBits. Using computeKnownBits, this rejects shrinking unsigned div/rem if operands exceed signed max since we know NumSignBits will be always 0. Rebased and re-attempt after first one was reverted due to unrelated failure in LibC (should be fixed by now I'm told).

choikwa requested a review from changpeng November 19, 2024 02:46

llvmbot added the backend:AMDGPU label Nov 19, 2024

choikwa force-pushed the udiv-neg branch from 2382de3 to 266e767 Compare November 19, 2024 03:26

changpeng approved these changes Nov 19, 2024

View reviewed changes

jayfoad reviewed Nov 19, 2024

View reviewed changes

llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp Outdated Show resolved Hide resolved

choikwa force-pushed the udiv-neg branch from 266e767 to 850e3b7 Compare November 19, 2024 14:13

choikwa force-pushed the udiv-neg branch 2 times, most recently from 3565d13 to 5ab2888 Compare November 19, 2024 16:44

[AMDGPU] prevent shrinking udiv/urem if either operand exceeds signed…

ce6e955

… max Handle signed and unsigned path differently in getDivNumBits. Using computeKnownBits, this rejects shrinking unsigned div/rem if operands exceed signed max since we know NumSignBits will be always 0.

choikwa force-pushed the udiv-neg branch from 5ab2888 to ce6e955 Compare November 19, 2024 16:56

choikwa changed the title ~~[AMDGPU] prevent shrinking udiv/urem if operands exceed smax_bitwidth~~ [AMDGPU] prevent shrinking udiv/urem if either operand is in (SignedMax,UnsignedMax] Nov 19, 2024

choikwa merged commit b8e1d4d into llvm:main Nov 20, 2024
6 of 8 checks passed

arsenm reviewed Nov 21, 2024

View reviewed changes

choikwa mentioned this pull request Dec 6, 2024

Reapply "[AMDGPU] prevent shrinking udiv/urem if either operand is in… #118928

Merged

[AMDGPU] prevent shrinking udiv/urem if either operand is in (SignedMax,UnsignedMax] #116733

[AMDGPU] prevent shrinking udiv/urem if either operand is in (SignedMax,UnsignedMax] #116733

Uh oh!

Conversation

choikwa commented Nov 19, 2024

Uh oh!

llvmbot commented Nov 19, 2024

Uh oh!

github-actions bot commented Nov 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

changpeng left a comment

Choose a reason for hiding this comment

Uh oh!

jayfoad left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

choikwa commented Nov 19, 2024

Uh oh!

Uh oh!

jplehr commented Nov 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jhuber6 commented Nov 20, 2024

Uh oh!

shiltian commented Nov 20, 2024

Uh oh!

jhuber6 commented Nov 20, 2024

Uh oh!

changpeng commented Nov 20, 2024

Uh oh!

jhuber6 commented Nov 20, 2024

Uh oh!

jhuber6 commented Nov 21, 2024

Uh oh!

arsenm Nov 21, 2024

Choose a reason for hiding this comment

Uh oh!

arsenm Nov 21, 2024

Choose a reason for hiding this comment

Uh oh!

arsenm Nov 21, 2024

Choose a reason for hiding this comment

Uh oh!

arsenm Nov 21, 2024

Choose a reason for hiding this comment

Uh oh!

choikwa Nov 21, 2024

Choose a reason for hiding this comment

Uh oh!

choikwa commented Nov 21, 2024

Uh oh!

Uh oh!

github-actions bot commented Nov 19, 2024 •

edited

Loading

jplehr commented Nov 20, 2024 •

edited

Loading