[AMDGPU] Add InstCombine rule for ballot.i64 intrinsic in wave32 mode. #71556

vpykhtin · 2023-11-07T16:24:07Z

Substitute with zero-extended to i64 ballot.i32 intrinsic.

llvmbot · 2023-11-07T16:24:36Z

@llvm/pr-subscribers-clang
@llvm/pr-subscribers-llvm-transforms
@llvm/pr-subscribers-llvm-globalisel

@llvm/pr-subscribers-backend-amdgpu

Author: Valery Pykhtin (vpykhtin)

Changes

Currently ballot intrinsic is lowered to AMDGPUISD::SETCC using bit width of the lowered intrinsic mnemonic. Instead use the bit width of the current wave mode converted to the required bit width. This is similar to ICMP/FCMP lowering.

I haven't added tests for ballot.i32 in wave64 mode because I'm not sure this should be a compilation error?

Full diff: https://github.com/llvm/llvm-project/pull/71556.diff

2 Files Affected:

(modified) llvm/lib/Target/AMDGPU/SIISelLowering.cpp (+9-4)
(modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.ballot.i32.ll (+4-10)

diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index cff7e4bc66218ca..0b1a8bf0861720b 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -5586,10 +5586,14 @@ static SDValue lowerBALLOTIntrinsic(const SITargetLowering &TLI, SDNode *N,
   SDValue Src = N->getOperand(1);
   SDLoc SL(N);
 
+  unsigned WavefrontSize = TLI.getSubtarget()->getWavefrontSize();
+  EVT CCVT = EVT::getIntegerVT(*DAG.getContext(), WavefrontSize);
+
   if (Src.getOpcode() == ISD::SETCC) {
     // (ballot (ISD::SETCC ...)) -> (AMDGPUISD::SETCC ...)
-    return DAG.getNode(AMDGPUISD::SETCC, SL, VT, Src.getOperand(0),
-                       Src.getOperand(1), Src.getOperand(2));
+    SDValue SetCC = DAG.getNode(AMDGPUISD::SETCC, SL, CCVT, Src.getOperand(0),
+                                Src.getOperand(1), Src.getOperand(2));
+    return VT.bitsEq(CCVT) ? SetCC : DAG.getZExtOrTrunc(SetCC, SL, VT);
   }
   if (const ConstantSDNode *Arg = dyn_cast<ConstantSDNode>(Src)) {
     // (ballot 0) -> 0
@@ -5612,9 +5616,10 @@ static SDValue lowerBALLOTIntrinsic(const SITargetLowering &TLI, SDNode *N,
 
   // (ballot (i1 $src)) -> (AMDGPUISD::SETCC (i32 (zext $src)) (i32 0)
   // ISD::SETNE)
-  return DAG.getNode(
-      AMDGPUISD::SETCC, SL, VT, DAG.getZExtOrTrunc(Src, SL, MVT::i32),
+  SDValue SetCC = DAG.getNode(
+      AMDGPUISD::SETCC, SL, CCVT, DAG.getZExtOrTrunc(Src, SL, MVT::i32),
       DAG.getConstant(0, SL, MVT::i32), DAG.getCondCode(ISD::SETNE));
+  return VT.bitsEq(CCVT) ? SetCC : DAG.getZExtOrTrunc(SetCC, SL, VT);
 }
 
 void SITargetLowering::ReplaceNodeResults(SDNode *N,
diff --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.ballot.i32.ll b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.ballot.i32.ll
index 3337d053eb930b9..563899b6bcce117 100644
--- a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.ballot.i32.ll
+++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.ballot.i32.ll
@@ -202,10 +202,8 @@ false:
 define amdgpu_cs i32 @branch_divergent_ballot64_ne_zero_compare(i32 %v) {
 ; CHECK-LABEL: branch_divergent_ballot64_ne_zero_compare:
 ; CHECK:       ; %bb.0:
-; CHECK-NEXT:    v_cmp_gt_u32_e64 s0, 12, v0
-; CHECK-NEXT:    s_mov_b32 s1, 0
-; CHECK-NEXT:    s_cmp_eq_u64 s[0:1], 0
-; CHECK-NEXT:    s_cbranch_scc1 .LBB12_2
+; CHECK-NEXT:    v_cmp_gt_u32_e32 vcc_lo, 12, v0
+; CHECK-NEXT:    s_cbranch_vccz .LBB12_2
 ; CHECK-NEXT:  ; %bb.1: ; %true
 ; CHECK-NEXT:    s_mov_b32 s0, 42
 ; CHECK-NEXT:    s_branch .LBB12_3
@@ -320,12 +318,8 @@ define amdgpu_cs i32 @branch_divergent_ballot64_ne_zero_and(i32 %v1, i32 %v2) {
 ; CHECK:       ; %bb.0:
 ; CHECK-NEXT:    v_cmp_gt_u32_e32 vcc_lo, 12, v0
 ; CHECK-NEXT:    v_cmp_lt_u32_e64 s0, 34, v1
-; CHECK-NEXT:    s_mov_b32 s1, 0
-; CHECK-NEXT:    s_and_b32 s0, vcc_lo, s0
-; CHECK-NEXT:    v_cndmask_b32_e64 v0, 0, 1, s0
-; CHECK-NEXT:    v_cmp_ne_u32_e64 s0, 0, v0
-; CHECK-NEXT:    s_cmp_eq_u64 s[0:1], 0
-; CHECK-NEXT:    s_cbranch_scc1 .LBB17_2
+; CHECK-NEXT:    s_and_b32 vcc_lo, vcc_lo, s0
+; CHECK-NEXT:    s_cbranch_vccz .LBB17_2
 ; CHECK-NEXT:  ; %bb.1: ; %true
 ; CHECK-NEXT:    s_mov_b32 s0, 42
 ; CHECK-NEXT:    s_branch .LBB17_3

jayfoad · 2023-11-07T16:27:27Z

I would expect it to fail to select if the type does not match the wave size. Is there a real use case for it?

vpykhtin · 2023-11-07T16:43:28Z

I would expect it to fail to select if the type does not match the wave size. Is there a real use case for it?

I would like to use ballot.i64 in both wave modes because I don't want to select bit width in AddressSanitizer code, it's relatively high in the pass chain and wave mode isn't readily available there.

arsenm · 2023-11-08T03:23:45Z

I would expect it to fail to select if the type does not match the wave size. Is there a real use case for it?

It's a synthetic operation to begin with and has a pretty clear meaning, just 0 the high bits. It's still valid to read exec/exec_hi in wave32 mode, it's just the high bits are always 0. This mirrors the same behavior (such that we are now using this to implement the read_exec builtins)

arsenm

Is this just about code quality? This probably isn't necessary if you consider the pipeline as a whole. I believe InstCombine is already reducing i64 ballot to i32 based on the subtarget's wave size

vpykhtin · 2023-11-08T09:45:18Z

Is this just about code quality? This probably isn't necessary if you consider the pipeline as a whole. I believe InstCombine is already reducing i64 ballot to i32 based on the subtarget's wave size

Right, but InstCombine is a bit late. This is needed after #68714, otherwise I need special processing of i64 mask in wave32 mode there. I've implemented the TODO with 51578ea.

vpykhtin · 2023-11-13T13:11:10Z

Ping.

arsenm · 2023-11-15T07:04:07Z

Right, but InstCombine is a bit late.

I'm confused, InstCombine can't be too late, it runs many times after all IR producers. There's no practical path between the sanitizer passes and the backend where it won't run

vpykhtin · 2023-11-15T07:23:20Z

I'm confused, InstCombine can't be too late, it runs many times after all IR producers. There's no practical path between the sanitizer passes and the backend where it won't run

I'm sorry, I thought it's machine IR pass. I don't see wavesize type correction for ballot in GCNTTIImpl::instCombineIntrinsic, do you beleive it's better to add it there?

arsenm · 2023-11-15T07:25:21Z

I'm confused, InstCombine can't be too late, it runs many times after all IR producers. There's no practical path between the sanitizer passes and the backend where it won't run

I'm sorry, I thought it's machine IR pass. I don't see wavesize type correction for ballot in GCNTTIImpl::instCombineIntrinsic, do you beleive it's better to add it there?

Oh yes, it is missing. I recently added it for amdgcn_mbcnt_hi above. We should do the same for ballot

vpykhtin · 2023-11-20T15:02:48Z

Reimplemented using instcombine rule. I had to move i64/wave32 tests to the existing dedicated test.

Please see the differences for individual commits in this branch to follow the changes made.

vpykhtin · 2023-11-21T07:43:04Z

Removed unrelated changes.

jayfoad · 2023-11-21T13:25:03Z

llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp

-        // TODO: make condition below an assert after fixing ballot bitwidth.
-        VCMP.getValueType().getSizeInBits() == ST->getWavefrontSize()) {
+    if ((CC == ISD::SETEQ || CC == ISD::SETNE) && CRHS && CRHS->isZero()) {
+      assert(VCMP.getValueType().getSizeInBits() == ST->getWavefrontSize());


You are asserting that instcombine has been run? That seems wrong. What about -O0 compiles?

Good point. It looks like I need to perform the transformation in both places - instcombiner and SelectionDAG combiner if I want to keep ballot folding on -O0 but I'm not sure this is really needed.

Also I need to keep original test running without opt and just add runs with opt.

vpykhtin · 2023-11-21T15:08:07Z

Undo the assert per Jay's comment for -O0 mode.
Restored the original test and added runs with opt to it.

Pierre-vh · 2023-11-29T09:04:00Z

llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp

+      CallInst *NewCall = IC.Builder.CreateCall(NewF, {II.getArgOperand(0)});
+      Value *CastedCall = IC.Builder.CreateZExtOrBitCast(NewCall, II.getType());
+      CastedCall->takeName(&II);
+      return IC.replaceInstUsesWith(II, CastedCall);


Nit: Can just reuse the same value, e.g.

Value *Call = IC.Builder.CreateCall(NewF, {II.getArgOperand(0)}); Call = IC.Builder.CreateZExtOrBitCast(NewCall, II.getType()); Call->takeName(&II); return IC.replaceInstUsesWith(II, Call);

I also think you should be able to use something like IC.Builder.CreateIntrinsic ? There should be a function to create an intrinsic call directly.

Thanks, changed to IC.Builder.CreateIntrinsic. I decided to use it as a temp for CreateZExtOrBitCast.

vpykhtin · 2023-11-29T10:58:39Z

I'm going to rebase it on top of #73779 to show test changes.

vpykhtin · 2023-11-30T11:34:43Z

This is ready to be submitted if there're no objections.

vpykhtin · 2023-12-12T09:49:51Z

Gentle ping.

arsenm

Description should be adjusted, this isn't really changing the selection anymore

llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp

dyung · 2024-01-17T11:16:20Z

@vpykhtin the test you modified builtins-amdgcn-wave32.cl seems to be failing on a few bots. Can you take a look?

https://lab.llvm.org/buildbot/#/builders/188/builds/40541
https://lab.llvm.org/buildbot/#/builders/91/builds/21847
https://lab.llvm.org/buildbot/#/builders/98/builds/31671
https://lab.llvm.org/buildbot/#/builders/139/builds/57289

vpykhtin · 2024-01-17T11:35:42Z

@vpykhtin the test you modified builtins-amdgcn-wave32.cl seems to be failing on a few bots. Can you take a look?

Sure, I'm going to revert and resubmit with the tests updated.

…e32 mode." (#78429) Reverts #71556 Fixes failures: https://lab.llvm.org/buildbot/#/builders/188/builds/40541 https://lab.llvm.org/buildbot/#/builders/91/builds/21847 https://lab.llvm.org/buildbot/#/builders/98/builds/31671 https://lab.llvm.org/buildbot/#/builders/139/builds/57289

…e32 mode." (llvm#78429) Reverts llvm#71556 Fixes failures: https://lab.llvm.org/buildbot/#/builders/188/builds/40541 https://lab.llvm.org/buildbot/#/builders/91/builds/21847 https://lab.llvm.org/buildbot/#/builders/98/builds/31671 https://lab.llvm.org/buildbot/#/builders/139/builds/57289

…ve32 mode." (#80303) Reapply #71556 with added lit test constraint: `REQUIRES: amdgpu-registered-target`. This reverts commit 9791e54.

…ve32 mode." (llvm#80303) Reapply llvm#71556 with added lit test constraint: `REQUIRES: amdgpu-registered-target`. This reverts commit 9791e54.

…ve32 mode." (llvm#80303) Reapply llvm#71556 with added lit test constraint: `REQUIRES: amdgpu-registered-target`. This reverts commit 9791e54. (cherry picked from commit b8025d1) Change-Id: I03aafda08ca433456f6e82accd6b702a307bfd0b

vpykhtin requested review from Flakebi and rampitec November 7, 2023 16:24

llvmbot added the backend:AMDGPU label Nov 7, 2023

arsenm reviewed Nov 8, 2023

View reviewed changes

vpykhtin force-pushed the fix_ballot_width branch from 51578ea to 4cf6ac7 Compare November 20, 2023 14:59

llvmbot added llvm:globalisel llvm:transforms labels Nov 20, 2023

vpykhtin requested a review from Pierre-vh November 20, 2023 15:12

vpykhtin changed the title ~~[AMDGPU] Improve selection of ballot.i64 intrinsic in wave32 mode in SelectionDAG.~~ [AMDGPU] Improve selection of ballot.i64 intrinsic in wave32 mode. Nov 20, 2023

llvmbot added the clang Clang issues not falling into any other category label Nov 20, 2023

vpykhtin requested review from arsenm and ranapratap55 November 20, 2023 17:12

vpykhtin force-pushed the fix_ballot_width branch from 5543d14 to e97e128 Compare November 21, 2023 07:41

jayfoad reviewed Nov 21, 2023

View reviewed changes

vpykhtin force-pushed the fix_ballot_width branch from e97e128 to 526c635 Compare November 21, 2023 15:05

Pierre-vh reviewed Nov 29, 2023

View reviewed changes

vpykhtin force-pushed the fix_ballot_width branch from 0ea1052 to b6204d3 Compare December 8, 2023 09:30

vpykhtin requested a review from kazutakahirata December 8, 2023 12:45

Flakebi approved these changes Dec 18, 2023

View reviewed changes

arsenm reviewed Dec 18, 2023

View reviewed changes

llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp Outdated Show resolved Hide resolved

add instcombine rule

871f5ec

vpykhtin force-pushed the fix_ballot_width branch from b6204d3 to 871f5ec Compare December 18, 2023 14:44

vpykhtin changed the title ~~[AMDGPU] Improve selection of ballot.i64 intrinsic in wave32 mode.~~ [AMDGPU] Add InstCombine rule for ballot.i64 intrinsic in wave32 mode. Dec 18, 2023

arsenm approved these changes Jan 17, 2024

View reviewed changes

arsenm merged commit 57b50ef into llvm:main Jan 17, 2024

vpykhtin mentioned this pull request Jan 17, 2024

Revert "[AMDGPU] Add InstCombine rule for ballot.i64 intrinsic in wave32 mode." #78429

Merged

vpykhtin mentioned this pull request Feb 1, 2024

Reapply "[AMDGPU] Add InstCombine rule for ballot.i64 intrinsic in wave32 mode." #80303

Merged

vpykhtin added a commit that referenced this pull request Feb 2, 2024

Reapply "[AMDGPU] Add InstCombine rule for ballot.i64 intrinsic in wa…

b8025d1

…ve32 mode." (#80303) Reapply #71556 with added lit test constraint: `REQUIRES: amdgpu-registered-target`. This reverts commit 9791e54.

[AMDGPU] Add InstCombine rule for ballot.i64 intrinsic in wave32 mode. #71556

[AMDGPU] Add InstCombine rule for ballot.i64 intrinsic in wave32 mode. #71556

Uh oh!

Conversation

vpykhtin commented Nov 7, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Nov 7, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jayfoad commented Nov 7, 2023

Uh oh!

vpykhtin commented Nov 7, 2023

Uh oh!

arsenm commented Nov 8, 2023

Uh oh!

arsenm left a comment

Choose a reason for hiding this comment

Uh oh!

vpykhtin commented Nov 8, 2023

Uh oh!

vpykhtin commented Nov 13, 2023

Uh oh!

arsenm commented Nov 15, 2023

Uh oh!

vpykhtin commented Nov 15, 2023

Uh oh!

arsenm commented Nov 15, 2023

Uh oh!

vpykhtin commented Nov 20, 2023

Uh oh!

vpykhtin commented Nov 21, 2023

Uh oh!

jayfoad Nov 21, 2023

Choose a reason for hiding this comment

Uh oh!

vpykhtin Nov 21, 2023

Choose a reason for hiding this comment

Uh oh!

vpykhtin commented Nov 21, 2023

Uh oh!

Pierre-vh Nov 29, 2023

Choose a reason for hiding this comment

Uh oh!

vpykhtin Nov 29, 2023

Choose a reason for hiding this comment

Uh oh!

vpykhtin commented Nov 29, 2023

Uh oh!

vpykhtin commented Nov 30, 2023

Uh oh!

vpykhtin commented Dec 12, 2023

Uh oh!

arsenm left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dyung commented Jan 17, 2024

Uh oh!

vpykhtin commented Jan 17, 2024

Uh oh!

Uh oh!

vpykhtin commented Nov 7, 2023 •

edited

Loading

llvmbot commented Nov 7, 2023 •

edited

Loading