[AMDGPU] Implement hasAndNot for scalar bitwise AND-NOT operations. #112647

harrisonGPU · 2024-10-17T02:08:35Z

This PR implements the hasAndNot function for AMDGPU in the TargetLowering class, enabling LLVM to recognize and optimize bitwise AND-NOT operations for scalar 32-bit and 64-bit integer types (i32 and i64).

For example:

if ((X & Y) == Y) {
  // Perform action if all bits set in Y are also set in X
}

In such cases, the condition (X & Y) == Y can be optimized to (~X & Y) == 0 if hasAndNot returns true.

Closes #112550

llvmbot · 2024-10-17T02:08:56Z

@llvm/pr-subscribers-backend-amdgpu

Author: Harrison Hao (harrisonGPU)

Changes

From #112550

This PR implements the hasAndNot function for AMDGPU in the TargetLowering class, enabling LLVM to recognize and optimize bitwise AND-NOT operations for scalar 32-bit and 64-bit integer types (i32 and i64).

For example:

if ((X &amp; Y) == Y) {
  // Perform action if all bits set in Y are also set in X
}

In such cases, the condition (X & Y) == Y can be optimized to (~X & Y) == 0 if hasAndNot returns true.

Full diff: https://github.com/llvm/llvm-project/pull/112647.diff

2 Files Affected:

(modified) llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp (+8)
(modified) llvm/lib/Target/AMDGPU/AMDGPUISelLowering.h (+2)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp b/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
index 0f65df0763cc83..b746b94a60be21 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
@@ -3721,6 +3721,14 @@ SDValue AMDGPUTargetLowering::LowerSIGN_EXTEND_INREG(SDValue Op,
   return DAG.getBuildVector(VT, DL, Args);
 }
 
+bool AMDGPUTargetLowering::hasAndNot(SDValue Op) const {
+  if (Op->isDivergent())
+    return false;
+
+  EVT VT = Op.getValueType();
+  return VT == MVT::i32 || VT == MVT::i64;
+}
+
 //===----------------------------------------------------------------------===//
 // Custom DAG optimizations
 //===----------------------------------------------------------------------===//
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.h b/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.h
index b2fd31cb2346eb..1289458570358b 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.h
+++ b/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.h
@@ -99,6 +99,8 @@ class AMDGPUTargetLowering : public TargetLowering {
 
   SDValue LowerSIGN_EXTEND_INREG(SDValue Op, SelectionDAG &DAG) const;
 
+  bool hasAndNot(SDValue Y) const override;
+
 protected:
   bool shouldCombineMemoryType(EVT VT) const;
   SDValue performLoadCombine(SDNode *N, DAGCombinerInfo &DCI) const;

shiltian · 2024-10-17T02:38:57Z

test

harrisonGPU · 2024-10-17T03:31:49Z

test

Okay, thanks Shilei. :)

arsenm

Needs tests (most of the effort on this patch is adding the requisite tests).

We need coverage with all the combinations of SGPR / VGPR inputs in i1 / i8 / i16 / i32 / i64 for the basic and-not pattern.

Additionally we need some tests for the optimizations enabled by this hook.

llvm-project/llvm/test/CodeGen/AMDGPU/xor3.ll

Line 27 in 1b4a173

define amdgpu_ps float @xor3_vgpr_b(i32 inreg %a, i32 %b, i32 inreg %c) {

is one example.

Use arguments with inreg to get sample SGPR inputs, otherwise VGPR

llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp

llvm/lib/Target/AMDGPU/SIISelLowering.h

arsenm · 2024-10-17T05:29:02Z

llvm/lib/Target/AMDGPU/SIISelLowering.cpp

+bool SITargetLowering::hasAndNot(SDValue Op) const {
+  if (Op->isDivergent())
+    return false;
+


Comment why this is the set of cases

Shouldn't really need to consider the machine opcode case

Sorry for the late update on this PR. Last month, I was still thinking about this patch and forgot to push it to the origin branch. I'm still considering this issue, because some lit tests show an increase in the number of instructions, while others show a decrease. So, I'm not yet sure whether it impacts performance.

Most of the work of this change is avoiding the regressions

Thanks! Do you have any further suggestions? Also, do you think it's ready to be merged now? :-)

harrisonGPU · 2024-10-17T06:24:24Z

Needs tests (most of the effort on this patch is adding the requisite tests).

We need coverage with all the combinations of SGPR / VGPR inputs in i1 / i8 / i16 / i32 / i64 for the basic and-not pattern.

Additionally we need some tests for the optimizations enabled by this hook.

llvm-project/llvm/test/CodeGen/AMDGPU/xor3.ll

Line 27 in 1b4a173

define amdgpu_ps float @xor3_vgpr_b(i32 inreg %a, i32 %b, i32 inreg %c) {

is one example.
Use arguments with inreg to get sample SGPR inputs, otherwise VGPR

Okay, I need to consider how to add tests for this patch. :)

llvm/lib/Target/AMDGPU/SIISelLowering.cpp

github-actions · 2024-10-18T05:31:44Z

✅ With the latest revision this PR passed the C/C++ code formatter.

arsenm · 2024-10-18T05:38:05Z

llvm/lib/Target/AMDGPU/SIISelLowering.cpp

@@ -6822,6 +6822,81 @@ static unsigned getExtOpcodeForPromotedOp(SDValue Op) {
  }
 }

+SDValue SITargetLowering::combineAnd(SDValue Op,


You should not be implementing any new combine in this patch. The point of this is to enable the existing optimizations. If it's worth adding something else, it would be a separate PR

Okay, but I think hasAndNot can use And operation when meet this scenario: (and LHS, (or Y, ~Z)) and X86 used this way, so I try to implement it. :)

I don't know why x86 has that in the backend but it belongs in a generic combine

llvm/test/CodeGen/AMDGPU/andornot.ll

llvm/lib/Target/AMDGPU/SIISelLowering.cpp

harrisonGPU · 2024-10-18T10:59:49Z

Needs tests (most of the effort on this patch is adding the requisite tests).

We need coverage with all the combinations of SGPR / VGPR inputs in i1 / i8 / i16 / i32 / i64 for the basic and-not pattern.

Additionally we need some tests for the optimizations enabled by this hook.

llvm-project/llvm/test/CodeGen/AMDGPU/xor3.ll

Line 27 in 1b4a173

define amdgpu_ps float @xor3_vgpr_b(i32 inreg %a, i32 %b, i32 inreg %c) {

is one example.
Use arguments with inreg to get sample SGPR inputs, otherwise VGPR

Do I need to add cases for i1, i8, and i16? I think the hasAndNot function does not affect the i1, i8, or i16 cases.

arsenm · 2024-10-19T03:40:03Z

llvm/test/CodeGen/AMDGPU/andorn2.ll

@@ -25,6 +25,28 @@ entry:
  ret void
 }

+; GCN-LABEL: {{^}}scalar_andn2_i32_one_sgpr
+; GCN: s_andn2_b32
+define amdgpu_kernel void @scalar_andn2_i32_one_sgpr(


better to use a shader calling convention and use the return value. the style of tests with the output pointer and a kernel is older and predates function support. kernels have a lot of extra noise in the output

Okay, I updated it. :)

llvm/lib/Target/AMDGPU/SIISelLowering.cpp

arsenm · 2025-05-13T07:57:55Z

llvm/lib/Target/AMDGPU/SIISelLowering.cpp

+bool SITargetLowering::hasAndNot(SDValue Op) const {
+  if (Op->isDivergent())
+    return false;
+


Shouldn't really need to consider the machine opcode case

arsenm · 2025-05-13T07:58:32Z

llvm/test/CodeGen/AMDGPU/andorn2.ll

@@ -25,6 +25,24 @@ entry:
  ret void
 }

+; GCN-LABEL: {{^}}scalar_andn2_i32_one_sgpr
+; GCN: s_andn2_b32
+define i32 @scalar_andn2_i32_one_sgpr(i32 inreg %a, i32 inreg %b) {


These tests only show the and not pattern to begin with, it is not showing the transforms enabled by this hook

I have removed it. Sorry for the late update on this PR. Last month, I was still thinking about this patch and forgot to push it to the origin branch. I'm still considering this issue, because some lit tests show an increase in the number of instructions, while others show a decrease. So, I'm not yet sure whether it impacts performance.

arsenm · 2025-05-13T15:59:45Z

llvm/test/CodeGen/AMDGPU/unfold-masked-merge-scalar-variablemask.ll

@@ -0,0 +1,764 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc -mtriple=amdgcn -mcpu=gfx1100 -verify-machineinstrs < %s | FileCheck --check-prefix=GCN %s


Suggested change

; RUN: llc -mtriple=amdgcn -mcpu=gfx1100 -verify-machineinstrs < %s | FileCheck --check-prefix=GCN %s

; RUN: llc -mtriple=amdgcn -mcpu=gfx1100 < %s | FileCheck --check-prefix=GCN %s

Should test with a few sub targets

arsenm · 2025-05-13T16:00:17Z

llvm/test/CodeGen/AMDGPU/unfold-masked-merge-scalar-variablemask.ll

+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc -mtriple=amdgcn -mcpu=gfx1100 -verify-machineinstrs < %s | FileCheck --check-prefix=GCN %s
+
+define i32 @out32(i32 inreg %x, i32 inreg %y, i32 inreg %mask) {


Not sure what "out32" refers to. Need SGPR and VGPR versions of the tests (which usually use s_ and v_ prefixes)

I have a quetion, In hasAndNot we have:

if (Op->isDivergent()) return false;

Values in VGPRs are divergent, so the check returns false right away—the pass only triggers on inreg values. With that in mind, do we really need a VGPR test case?

arsenm · 2025-05-13T16:00:31Z

llvm/test/CodeGen/AMDGPU/unfold-masked-merge-scalar-variablemask.ll

+  %n1 = and i32 %n0, %mask
+  %r = xor i32 %n1, %z
+  ret i32 %r
+}


Also test vector types

arsenm · 2025-05-13T16:00:59Z

llvm/test/CodeGen/AMDGPU/commute-compares.ll

@@ -1,3 +1,4 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5


Shouldn't convert a test to generated checks in a functional change

arsenm · 2025-05-13T16:01:31Z

llvm/lib/Target/AMDGPU/SIISelLowering.cpp

+  // Return false if the operation is divergent, as AND-NOT is a scalar-only
+  // instruction.


First part of the comment is describing the mechanics, not the reason

arsenm · 2025-05-13T16:03:08Z

Also it would be better to precommit the tests separately so the diff is obvious here

llvm/lib/Target/AMDGPU/SIISelLowering.cpp

harrisonGPU · 2025-05-15T12:29:58Z

Also it would be better to precommit the tests separately so the diff is obvious here

Okay, I will try to separate it ! Thanks.

jayfoad · 2025-05-21T09:57:00Z

llvm/lib/Target/AMDGPU/SIISelLowering.cpp

+    return false;
+
+  EVT VT = Op.getValueType();
+  return VT == MVT::i32 || VT == MVT::i64;


I'm not sure we need to check types here. How about just return !Op->isDivergent();? If the types are not legal they will get legalized, but that should not affect the decision of whether to form an and-not pattern.

If it's a different type and then is legalized, there will be intermediate instructions that break the and not pattern

harrisonGPU added the backend:AMDGPU label Oct 17, 2024

harrisonGPU requested a review from arsenm October 17, 2024 02:08

harrisonGPU self-assigned this Oct 17, 2024

harrisonGPU requested review from jayfoad and nhaehnle October 17, 2024 02:09

harrisonGPU requested review from shiltian and jhuber6 October 17, 2024 02:42

arsenm reviewed Oct 17, 2024

View reviewed changes

llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp Outdated Show resolved Hide resolved

arsenm reviewed Oct 17, 2024

View reviewed changes

llvm/lib/Target/AMDGPU/SIISelLowering.cpp Outdated Show resolved Hide resolved

harrisonGPU marked this pull request as draft October 17, 2024 13:01

arsenm reviewed Oct 18, 2024

View reviewed changes

harrisonGPU marked this pull request as ready for review October 18, 2024 05:38

shiltian reviewed Oct 18, 2024

View reviewed changes

llvm/test/CodeGen/AMDGPU/andornot.ll Outdated Show resolved Hide resolved

shiltian reviewed Oct 18, 2024

View reviewed changes

llvm/lib/Target/AMDGPU/SIISelLowering.cpp Outdated Show resolved Hide resolved

shiltian reviewed Oct 18, 2024

View reviewed changes

llvm/lib/Target/AMDGPU/SIISelLowering.cpp Outdated Show resolved Hide resolved

harrisonGPU marked this pull request as draft October 18, 2024 07:14

harrisonGPU force-pushed the harrison/amdgpu branch from 8dba2b5 to ea08a49 Compare October 18, 2024 10:28

harrisonGPU marked this pull request as ready for review October 18, 2024 10:32

arsenm reviewed Oct 19, 2024

View reviewed changes

vg0204 reviewed Oct 21, 2024

View reviewed changes

llvm/lib/Target/AMDGPU/SIISelLowering.cpp Show resolved Hide resolved

harrisonGPU mentioned this pull request Dec 19, 2024

AMDGPU should implement TargetLowering::hasAndNot #112550

Open

arsenm mentioned this pull request May 13, 2025

[SelectionDAG] Make (a & x) | (~a & y) -> (a & (x ^ y)) ^ y available for all targets #137641

Merged

arsenm reviewed May 13, 2025

View reviewed changes

harrisonGPU force-pushed the harrison/amdgpu branch 2 times, most recently from 5e64801 to f46d274 Compare May 13, 2025 12:39

arsenm reviewed May 13, 2025

View reviewed changes

el-ev reviewed May 15, 2025

View reviewed changes

llvm/lib/Target/AMDGPU/SIISelLowering.cpp Show resolved Hide resolved

harrisonGPU mentioned this pull request May 16, 2025

[NFC][AMDGPU] Add test for unfold-masked-merge-scalar-variablemask.ll #140093

Merged

harrisonGPU added 11 commits May 19, 2025 16:29

[AMDGPU] Implement hasAndNot for scalar bitwise AND-NOT operations.

a29b0b8

[AMDGPU] Update value name.

e0ddab1

[AMDGPU] Update patch.

7c900c2

[AMDGPU] Move to SIISelLowering.

563de33

[AMDGPU] Update comments.

b06240e

[AMDGPU] Add a lit test for hasAndNot.

244612d

[AMDGPU] Fix clang format issue.

70d8ac0

[AMDGPU] Remove combineAnd.

ee5ca4e

[AMDGPU] Update lit test.

2ec01c6

[AMDGPU] Add unfold test.

28ea084

[AMDGPU] Update.

dba6155

harrisonGPU force-pushed the harrison/amdgpu branch from 6791fea to dba6155 Compare May 19, 2025 16:42

[AMDGPU] Update comments.

9990cfb

jayfoad reviewed May 21, 2025

View reviewed changes

arsenm requested a review from Pierre-vh May 21, 2025 10:05

		@@ -0,0 +1,764 @@
		; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
		; RUN: llc -mtriple=amdgcn -mcpu=gfx1100 -verify-machineinstrs < %s \| FileCheck --check-prefix=GCN %s

		@@ -1,3 +1,4 @@
		; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5

		// Return false if the operation is divergent, as AND-NOT is a scalar-only
		// instruction.

[AMDGPU] Implement hasAndNot for scalar bitwise AND-NOT operations. #112647

Are you sure you want to change the base?

[AMDGPU] Implement hasAndNot for scalar bitwise AND-NOT operations. #112647

Uh oh!

Conversation

harrisonGPU commented Oct 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Oct 17, 2024

Uh oh!

shiltian commented Oct 17, 2024

Uh oh!

harrisonGPU commented Oct 17, 2024

Uh oh!

arsenm left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

harrisonGPU commented Oct 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Oct 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

harrisonGPU Oct 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

harrisonGPU commented Oct 18, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

arsenm commented May 13, 2025

Uh oh!

Uh oh!

harrisonGPU commented May 15, 2025

Uh oh!

Choose a reason for hiding this comment

harrisonGPU commented Oct 17, 2024 •

edited

Loading

harrisonGPU commented Oct 17, 2024 •

edited

Loading

github-actions bot commented Oct 18, 2024 •

edited

Loading

harrisonGPU Oct 18, 2024 •

edited

Loading