[AArch64][SDAG] Fix selection of extend of v1if16 SETCC #140274

MacDue · 2025-05-16T16:14:17Z

There is a DAG combine, that folds:

t1: v1i1 = setcc x:v1f16, y:v1f16, setogt:ch
	t2: v1i64 = zero_extend t1

->

t1: v1i16 = setcc x:v1f16, y:v1f16, setogt:ch
	t2: v1i64 = any_extend t1

This creates an issue on AArch64 when attempting to widen the result to v4i16. The operand types (v1f16) are set to be scalarized, so the "by hand" widening with DAG.WidenVector is used for them, however, this only widens to the next power-of-2, so returns v2f16, which does not match the result VF. The fix is to manually construct the widened inputs using INSERT_SUBVECTOR.

Fixes #136540

There is a DAG combine, that folds: ``` t1: v1i1 = setcc x:v1f16, y:v1f16, setogt:ch t2: v1i64 = zero_extend t1 ``` -> ``` t1: v1i16 = setcc x:v1f16, y:v1f16, setogt:ch t2: v1i64 = any_extend t1 ``` This creates an issue on AArch64 when attempting to widen the result to `v4i16`. The operand types (v1f16) are set to be scalarized, so the "by hand" widening with `DAG.WidenVector` is used for them, however, this only widens to the next power-of-2, so returns v2f16, which does not match the result VF. The fix is to manually construct the widened inputs using INSERT_SUBVECTOR. Fixes llvm#136540

llvmbot · 2025-05-16T16:14:50Z

@llvm/pr-subscribers-backend-aarch64

Author: Benjamin Maxwell (MacDue)

Changes

There is a DAG combine, that folds:

t1: v1i1 = setcc x:v1f16, y:v1f16, setogt:ch
	t2: v1i64 = zero_extend t1

->

t1: v1i16 = setcc x:v1f16, y:v1f16, setogt:ch
	t2: v1i64 = any_extend t1

This creates an issue on AArch64 when attempting to widen the result to v4i16. The operand types (v1f16) are set to be scalarized, so the "by hand" widening with DAG.WidenVector is used for them, however, this only widens to the next power-of-2, so returns v2f16, which does not match the result VF. The fix is to manually construct the widened inputs using INSERT_SUBVECTOR.

Fixes #136540

Full diff: https://github.com/llvm/llvm-project/pull/140274.diff

2 Files Affected:

(modified) llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp (+6-2)
(modified) llvm/test/CodeGen/AArch64/arm64-neon-v1i1-setcc.ll (+16)

diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
index 0c0e700f6abca..ac7e4f5ab9e20 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
@@ -6623,8 +6623,12 @@ SDValue DAGTypeLegalizer::WidenVecRes_SETCC(SDNode *N) {
     InOp1 = GetWidenedVector(InOp1);
     InOp2 = GetWidenedVector(InOp2);
   } else {
-    InOp1 = DAG.WidenVector(InOp1, SDLoc(N));
-    InOp2 = DAG.WidenVector(InOp2, SDLoc(N));
+    SDValue Undef = DAG.getUNDEF(WidenInVT);
+    SDValue ZeroIdx = DAG.getVectorIdxConstant(0, SDLoc(N));
+    InOp1 = DAG.getNode(ISD::INSERT_SUBVECTOR, SDLoc(N), WidenInVT, Undef,
+                        InOp1, ZeroIdx);
+    InOp2 = DAG.getNode(ISD::INSERT_SUBVECTOR, SDLoc(N), WidenInVT, Undef,
+                        InOp2, ZeroIdx);
   }
 
   // Assume that the input and output will be widen appropriately.  If not,
diff --git a/llvm/test/CodeGen/AArch64/arm64-neon-v1i1-setcc.ll b/llvm/test/CodeGen/AArch64/arm64-neon-v1i1-setcc.ll
index 6c70d19a977a5..5ed362604dc5f 100644
--- a/llvm/test/CodeGen/AArch64/arm64-neon-v1i1-setcc.ll
+++ b/llvm/test/CodeGen/AArch64/arm64-neon-v1i1-setcc.ll
@@ -249,3 +249,19 @@ if.then:
 if.end:
   ret i32 1;
 }
+
+define <1 x i64> @test_zext_half(<1 x half> %v1) {
+; CHECK-LABEL: test_zext_half:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    // kill: def $h0 killed $h0 def $d0
+; CHECK-NEXT:    mov w8, #1 // =0x1
+; CHECK-NEXT:    fcvtl v0.4s, v0.4h
+; CHECK-NEXT:    fmov d1, x8
+; CHECK-NEXT:    fcmgt v0.4s, v0.4s, #0.0
+; CHECK-NEXT:    xtn v0.4h, v0.4s
+; CHECK-NEXT:    and v0.8b, v0.8b, v1.8b
+; CHECK-NEXT:    ret
+  %1 = fcmp ogt <1 x half> %v1, zeroinitializer
+  %2 = zext <1 x i1> %1 to <1 x i64>
+  ret <1 x i64> %2
+}

llvmbot · 2025-05-16T16:14:51Z

@llvm/pr-subscribers-llvm-selectiondag

Author: Benjamin Maxwell (MacDue)

Changes

There is a DAG combine, that folds:

t1: v1i1 = setcc x:v1f16, y:v1f16, setogt:ch
	t2: v1i64 = zero_extend t1

->

t1: v1i16 = setcc x:v1f16, y:v1f16, setogt:ch
	t2: v1i64 = any_extend t1

This creates an issue on AArch64 when attempting to widen the result to v4i16. The operand types (v1f16) are set to be scalarized, so the "by hand" widening with DAG.WidenVector is used for them, however, this only widens to the next power-of-2, so returns v2f16, which does not match the result VF. The fix is to manually construct the widened inputs using INSERT_SUBVECTOR.

Fixes #136540

Full diff: https://github.com/llvm/llvm-project/pull/140274.diff

2 Files Affected:

(modified) llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp (+6-2)
(modified) llvm/test/CodeGen/AArch64/arm64-neon-v1i1-setcc.ll (+16)

diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
index 0c0e700f6abca..ac7e4f5ab9e20 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
@@ -6623,8 +6623,12 @@ SDValue DAGTypeLegalizer::WidenVecRes_SETCC(SDNode *N) {
     InOp1 = GetWidenedVector(InOp1);
     InOp2 = GetWidenedVector(InOp2);
   } else {
-    InOp1 = DAG.WidenVector(InOp1, SDLoc(N));
-    InOp2 = DAG.WidenVector(InOp2, SDLoc(N));
+    SDValue Undef = DAG.getUNDEF(WidenInVT);
+    SDValue ZeroIdx = DAG.getVectorIdxConstant(0, SDLoc(N));
+    InOp1 = DAG.getNode(ISD::INSERT_SUBVECTOR, SDLoc(N), WidenInVT, Undef,
+                        InOp1, ZeroIdx);
+    InOp2 = DAG.getNode(ISD::INSERT_SUBVECTOR, SDLoc(N), WidenInVT, Undef,
+                        InOp2, ZeroIdx);
   }
 
   // Assume that the input and output will be widen appropriately.  If not,
diff --git a/llvm/test/CodeGen/AArch64/arm64-neon-v1i1-setcc.ll b/llvm/test/CodeGen/AArch64/arm64-neon-v1i1-setcc.ll
index 6c70d19a977a5..5ed362604dc5f 100644
--- a/llvm/test/CodeGen/AArch64/arm64-neon-v1i1-setcc.ll
+++ b/llvm/test/CodeGen/AArch64/arm64-neon-v1i1-setcc.ll
@@ -249,3 +249,19 @@ if.then:
 if.end:
   ret i32 1;
 }
+
+define <1 x i64> @test_zext_half(<1 x half> %v1) {
+; CHECK-LABEL: test_zext_half:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    // kill: def $h0 killed $h0 def $d0
+; CHECK-NEXT:    mov w8, #1 // =0x1
+; CHECK-NEXT:    fcvtl v0.4s, v0.4h
+; CHECK-NEXT:    fmov d1, x8
+; CHECK-NEXT:    fcmgt v0.4s, v0.4s, #0.0
+; CHECK-NEXT:    xtn v0.4h, v0.4s
+; CHECK-NEXT:    and v0.8b, v0.8b, v1.8b
+; CHECK-NEXT:    ret
+  %1 = fcmp ogt <1 x half> %v1, zeroinitializer
+  %2 = zext <1 x i1> %1 to <1 x i64>
+  ret <1 x i64> %2
+}

davemgreen

Thanks for looking into this, I didn't get too far when looking into it before. Should it be scalarizing somewhere along the line? I would expect that to produce better codegen.

MacDue · 2025-05-20T09:53:37Z

Thanks for looking into this, I didn't get too far when looking into it before. Should it be scalarizing somewhere along the line? I would expect that to produce better codegen.

It seems like this was deliberately done back in https://reviews.llvm.org/D4322 (apparently to avoid some legalization issues).

davemgreen · 2025-05-21T08:22:21Z

It seems like this was deliberately done back in https://reviews.llvm.org/D4322 (apparently to avoid some legalization issues).

Oh - it was setcc specifically I was thinking of, not all operations. setcc is a little odd because the return type is comparatively less important to the type of the operands, which specifies whether the operation is float or integer.

If the DAG combine that creates the v1i16 is the transform that is causing this to go wrong then maybe it would be better to disable it somehow. It should be possible to mark that specific type as v1i1 in getSetCCResultType to encourage it to scalarize instead. The vector codegen isn't as efficient as it could be so maybe it doesn't matter a lot whether we scalarize or not, and your explanation as to what goes wrong makes sense.

davemgreen

I think this is OK. LGTM

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp

MacDue · 2025-05-21T12:27:58Z

If the DAG combine that creates the v1i16 is the transform that is causing this to go wrong then maybe it would be better to disable it somehow. It should be possible to mark that specific type as v1i1 in getSetCCResultType to encourage it to scalarize instead. The vector codegen isn't as efficient as it could be so maybe it doesn't matter a lot whether we scalarize or not, and your explanation as to what goes wrong makes sense.

I believe it's this DAG combine that results in the return type changing to v1i16: https://github.com/llvm/llvm-project/blob/9f8d798942e7679ca265efb23a7f655936d19a49/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp#L14631-L14638

However, I think using DAG.WidenVector here is a valid issue that can be fixed independently of possibly disabling/changing the combine.

arsenm · 2025-05-21T12:36:55Z

I believe it's this DAG combine that results in the return type changing to v1i16: https://github.com/llvm/llvm-project/blob/9f8d798942e7679ca265efb23a7f655936d19a49/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp#L14631-L14638

That combine is busted, it shouldn't be picking a random type for the return value. It should be using getSetCCResultType

MacDue · 2025-06-06T10:12:45Z

That combine is busted, it shouldn't be picking a random type for the return value. It should be using getSetCCResultType

I just had another look at this, and I believe that this patch is a valid fix for this case, as v1i16 is the result of getSetCCResultType here. Restricting those DAG combines may also be desirable, but that's not relevant here AFAIK.

There is a DAG combine, that folds: ``` t1: v1i1 = setcc x:v1f16, y:v1f16, setogt:ch t2: v1i64 = zero_extend t1 ``` -> ``` t1: v1i16 = setcc x:v1f16, y:v1f16, setogt:ch t2: v1i64 = any_extend t1 ``` This creates an issue on AArch64 when attempting to widen the result to `v4i16`. The operand types (`v1f16`) are set to be scalarized, so the "by hand" widening with `DAG.WidenVector` is used for them, however, this only widens to the next power-of-2, so returns `v2f16`, which does not match the result VF. The fix is to manually construct the widened inputs using `INSERT_SUBVECTOR`. Fixes llvm#136540

MacDue requested a review from davemgreen May 16, 2025 16:14

llvmbot added backend:AArch64 llvm:SelectionDAG SelectionDAGISel as well labels May 16, 2025

MacDue changed the title ~~[AArch64] Fix selection of extend of v1if16 SETCC~~ [AArch64][SDAG] Fix selection of extend of v1if16 SETCC May 16, 2025

davemgreen reviewed May 20, 2025

View reviewed changes

davemgreen approved these changes May 21, 2025

View reviewed changes

arsenm approved these changes May 21, 2025

View reviewed changes

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp Outdated Show resolved Hide resolved

Use poison

dc4bf0c

arsenm approved these changes May 21, 2025

View reviewed changes

MacDue merged commit c95bc41 into llvm:main Jun 6, 2025
11 checks passed

MacDue deleted the fix_i16v_fcmp branch June 6, 2025 10:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AArch64][SDAG] Fix selection of extend of v1if16 SETCC #140274

[AArch64][SDAG] Fix selection of extend of v1if16 SETCC #140274

Uh oh!

MacDue commented May 16, 2025

Uh oh!

llvmbot commented May 16, 2025

Uh oh!

llvmbot commented May 16, 2025

Uh oh!

davemgreen left a comment

Uh oh!

MacDue commented May 20, 2025 •

edited

Loading

Uh oh!

davemgreen commented May 21, 2025

Uh oh!

davemgreen left a comment

Uh oh!

Uh oh!

MacDue commented May 21, 2025

Uh oh!

arsenm commented May 21, 2025

Uh oh!

MacDue commented Jun 6, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

[AArch64][SDAG] Fix selection of extend of v1if16 SETCC #140274

[AArch64][SDAG] Fix selection of extend of v1if16 SETCC #140274

Uh oh!

Conversation

MacDue commented May 16, 2025

Uh oh!

llvmbot commented May 16, 2025

Uh oh!

llvmbot commented May 16, 2025

Uh oh!

davemgreen left a comment

Choose a reason for hiding this comment

Uh oh!

MacDue commented May 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

davemgreen commented May 21, 2025

Uh oh!

davemgreen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

MacDue commented May 21, 2025

Uh oh!

arsenm commented May 21, 2025

Uh oh!

MacDue commented Jun 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MacDue commented May 20, 2025 •

edited

Loading

MacDue commented Jun 6, 2025 •

edited

Loading