Skip to content

[AArch64][SDAG] Fix selection of extend of v1if16 SETCC #140274

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 6, 2025

Conversation

MacDue
Copy link
Member

@MacDue MacDue commented May 16, 2025

There is a DAG combine, that folds:

t1: v1i1 = setcc x:v1f16, y:v1f16, setogt:ch
	t2: v1i64 = zero_extend t1

->

t1: v1i16 = setcc x:v1f16, y:v1f16, setogt:ch
	t2: v1i64 = any_extend t1

This creates an issue on AArch64 when attempting to widen the result to v4i16. The operand types (v1f16) are set to be scalarized, so the "by hand" widening with DAG.WidenVector is used for them, however, this only widens to the next power-of-2, so returns v2f16, which does not match the result VF. The fix is to manually construct the widened inputs using INSERT_SUBVECTOR.

Fixes #136540

There is a DAG combine, that folds:

```
t1: v1i1 = setcc x:v1f16, y:v1f16, setogt:ch
	t2: v1i64 = zero_extend t1
```

->

```
t1: v1i16 = setcc x:v1f16, y:v1f16, setogt:ch
	t2: v1i64 = any_extend t1
```

This creates an issue on AArch64 when attempting to widen the result
to `v4i16`. The operand types (v1f16) are set to be scalarized, so the
"by hand" widening with `DAG.WidenVector` is used for them, however,
this only widens to the next power-of-2, so returns v2f16, which does
not match the result VF. The fix is to manually construct the widened
inputs using INSERT_SUBVECTOR.

Fixes llvm#136540
@MacDue MacDue requested a review from davemgreen May 16, 2025 16:14
@llvmbot llvmbot added backend:AArch64 llvm:SelectionDAG SelectionDAGISel as well labels May 16, 2025
@llvmbot
Copy link
Member

llvmbot commented May 16, 2025

@llvm/pr-subscribers-backend-aarch64

Author: Benjamin Maxwell (MacDue)

Changes

There is a DAG combine, that folds:

t1: v1i1 = setcc x:v1f16, y:v1f16, setogt:ch
	t2: v1i64 = zero_extend t1

->

t1: v1i16 = setcc x:v1f16, y:v1f16, setogt:ch
	t2: v1i64 = any_extend t1

This creates an issue on AArch64 when attempting to widen the result to v4i16. The operand types (v1f16) are set to be scalarized, so the "by hand" widening with DAG.WidenVector is used for them, however, this only widens to the next power-of-2, so returns v2f16, which does not match the result VF. The fix is to manually construct the widened inputs using INSERT_SUBVECTOR.

Fixes #136540


Full diff: https://github.com/llvm/llvm-project/pull/140274.diff

2 Files Affected:

  • (modified) llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp (+6-2)
  • (modified) llvm/test/CodeGen/AArch64/arm64-neon-v1i1-setcc.ll (+16)
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
index 0c0e700f6abca..ac7e4f5ab9e20 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
@@ -6623,8 +6623,12 @@ SDValue DAGTypeLegalizer::WidenVecRes_SETCC(SDNode *N) {
     InOp1 = GetWidenedVector(InOp1);
     InOp2 = GetWidenedVector(InOp2);
   } else {
-    InOp1 = DAG.WidenVector(InOp1, SDLoc(N));
-    InOp2 = DAG.WidenVector(InOp2, SDLoc(N));
+    SDValue Undef = DAG.getUNDEF(WidenInVT);
+    SDValue ZeroIdx = DAG.getVectorIdxConstant(0, SDLoc(N));
+    InOp1 = DAG.getNode(ISD::INSERT_SUBVECTOR, SDLoc(N), WidenInVT, Undef,
+                        InOp1, ZeroIdx);
+    InOp2 = DAG.getNode(ISD::INSERT_SUBVECTOR, SDLoc(N), WidenInVT, Undef,
+                        InOp2, ZeroIdx);
   }
 
   // Assume that the input and output will be widen appropriately.  If not,
diff --git a/llvm/test/CodeGen/AArch64/arm64-neon-v1i1-setcc.ll b/llvm/test/CodeGen/AArch64/arm64-neon-v1i1-setcc.ll
index 6c70d19a977a5..5ed362604dc5f 100644
--- a/llvm/test/CodeGen/AArch64/arm64-neon-v1i1-setcc.ll
+++ b/llvm/test/CodeGen/AArch64/arm64-neon-v1i1-setcc.ll
@@ -249,3 +249,19 @@ if.then:
 if.end:
   ret i32 1;
 }
+
+define <1 x i64> @test_zext_half(<1 x half> %v1) {
+; CHECK-LABEL: test_zext_half:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    // kill: def $h0 killed $h0 def $d0
+; CHECK-NEXT:    mov w8, #1 // =0x1
+; CHECK-NEXT:    fcvtl v0.4s, v0.4h
+; CHECK-NEXT:    fmov d1, x8
+; CHECK-NEXT:    fcmgt v0.4s, v0.4s, #0.0
+; CHECK-NEXT:    xtn v0.4h, v0.4s
+; CHECK-NEXT:    and v0.8b, v0.8b, v1.8b
+; CHECK-NEXT:    ret
+  %1 = fcmp ogt <1 x half> %v1, zeroinitializer
+  %2 = zext <1 x i1> %1 to <1 x i64>
+  ret <1 x i64> %2
+}

@llvmbot
Copy link
Member

llvmbot commented May 16, 2025

@llvm/pr-subscribers-llvm-selectiondag

Author: Benjamin Maxwell (MacDue)

Changes

There is a DAG combine, that folds:

t1: v1i1 = setcc x:v1f16, y:v1f16, setogt:ch
	t2: v1i64 = zero_extend t1

->

t1: v1i16 = setcc x:v1f16, y:v1f16, setogt:ch
	t2: v1i64 = any_extend t1

This creates an issue on AArch64 when attempting to widen the result to v4i16. The operand types (v1f16) are set to be scalarized, so the "by hand" widening with DAG.WidenVector is used for them, however, this only widens to the next power-of-2, so returns v2f16, which does not match the result VF. The fix is to manually construct the widened inputs using INSERT_SUBVECTOR.

Fixes #136540


Full diff: https://github.com/llvm/llvm-project/pull/140274.diff

2 Files Affected:

  • (modified) llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp (+6-2)
  • (modified) llvm/test/CodeGen/AArch64/arm64-neon-v1i1-setcc.ll (+16)
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
index 0c0e700f6abca..ac7e4f5ab9e20 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
@@ -6623,8 +6623,12 @@ SDValue DAGTypeLegalizer::WidenVecRes_SETCC(SDNode *N) {
     InOp1 = GetWidenedVector(InOp1);
     InOp2 = GetWidenedVector(InOp2);
   } else {
-    InOp1 = DAG.WidenVector(InOp1, SDLoc(N));
-    InOp2 = DAG.WidenVector(InOp2, SDLoc(N));
+    SDValue Undef = DAG.getUNDEF(WidenInVT);
+    SDValue ZeroIdx = DAG.getVectorIdxConstant(0, SDLoc(N));
+    InOp1 = DAG.getNode(ISD::INSERT_SUBVECTOR, SDLoc(N), WidenInVT, Undef,
+                        InOp1, ZeroIdx);
+    InOp2 = DAG.getNode(ISD::INSERT_SUBVECTOR, SDLoc(N), WidenInVT, Undef,
+                        InOp2, ZeroIdx);
   }
 
   // Assume that the input and output will be widen appropriately.  If not,
diff --git a/llvm/test/CodeGen/AArch64/arm64-neon-v1i1-setcc.ll b/llvm/test/CodeGen/AArch64/arm64-neon-v1i1-setcc.ll
index 6c70d19a977a5..5ed362604dc5f 100644
--- a/llvm/test/CodeGen/AArch64/arm64-neon-v1i1-setcc.ll
+++ b/llvm/test/CodeGen/AArch64/arm64-neon-v1i1-setcc.ll
@@ -249,3 +249,19 @@ if.then:
 if.end:
   ret i32 1;
 }
+
+define <1 x i64> @test_zext_half(<1 x half> %v1) {
+; CHECK-LABEL: test_zext_half:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    // kill: def $h0 killed $h0 def $d0
+; CHECK-NEXT:    mov w8, #1 // =0x1
+; CHECK-NEXT:    fcvtl v0.4s, v0.4h
+; CHECK-NEXT:    fmov d1, x8
+; CHECK-NEXT:    fcmgt v0.4s, v0.4s, #0.0
+; CHECK-NEXT:    xtn v0.4h, v0.4s
+; CHECK-NEXT:    and v0.8b, v0.8b, v1.8b
+; CHECK-NEXT:    ret
+  %1 = fcmp ogt <1 x half> %v1, zeroinitializer
+  %2 = zext <1 x i1> %1 to <1 x i64>
+  ret <1 x i64> %2
+}

@MacDue MacDue changed the title [AArch64] Fix selection of extend of v1if16 SETCC [AArch64][SDAG] Fix selection of extend of v1if16 SETCC May 16, 2025
Copy link
Collaborator

@davemgreen davemgreen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into this, I didn't get too far when looking into it before. Should it be scalarizing somewhere along the line? I would expect that to produce better codegen.

@MacDue
Copy link
Member Author

MacDue commented May 20, 2025

Thanks for looking into this, I didn't get too far when looking into it before. Should it be scalarizing somewhere along the line? I would expect that to produce better codegen.

It seems like this was deliberately done back in https://reviews.llvm.org/D4322 (apparently to avoid some legalization issues).

@davemgreen
Copy link
Collaborator

It seems like this was deliberately done back in https://reviews.llvm.org/D4322 (apparently to avoid some legalization issues).

Oh - it was setcc specifically I was thinking of, not all operations. setcc is a little odd because the return type is comparatively less important to the type of the operands, which specifies whether the operation is float or integer.

If the DAG combine that creates the v1i16 is the transform that is causing this to go wrong then maybe it would be better to disable it somehow. It should be possible to mark that specific type as v1i1 in getSetCCResultType to encourage it to scalarize instead. The vector codegen isn't as efficient as it could be so maybe it doesn't matter a lot whether we scalarize or not, and your explanation as to what goes wrong makes sense.

Copy link
Collaborator

@davemgreen davemgreen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is OK. LGTM

@MacDue
Copy link
Member Author

MacDue commented May 21, 2025

If the DAG combine that creates the v1i16 is the transform that is causing this to go wrong then maybe it would be better to disable it somehow. It should be possible to mark that specific type as v1i1 in getSetCCResultType to encourage it to scalarize instead. The vector codegen isn't as efficient as it could be so maybe it doesn't matter a lot whether we scalarize or not, and your explanation as to what goes wrong makes sense.

I believe it's this DAG combine that results in the return type changing to v1i16: https://github.com/llvm/llvm-project/blob/9f8d798942e7679ca265efb23a7f655936d19a49/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp#L14631-L14638

However, I think using DAG.WidenVector here is a valid issue that can be fixed independently of possibly disabling/changing the combine.

@arsenm
Copy link
Contributor

arsenm commented May 21, 2025

I believe it's this DAG combine that results in the return type changing to v1i16: https://github.com/llvm/llvm-project/blob/9f8d798942e7679ca265efb23a7f655936d19a49/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp#L14631-L14638

That combine is busted, it shouldn't be picking a random type for the return value. It should be using getSetCCResultType

@MacDue
Copy link
Member Author

MacDue commented Jun 6, 2025

That combine is busted, it shouldn't be picking a random type for the return value. It should be using getSetCCResultType

I just had another look at this, and I believe that this patch is a valid fix for this case, as v1i16 is the result of getSetCCResultType here. Restricting those DAG combines may also be desirable, but that's not relevant here AFAIK.

@MacDue MacDue merged commit c95bc41 into llvm:main Jun 6, 2025
11 checks passed
@MacDue MacDue deleted the fix_i16v_fcmp branch June 6, 2025 10:20
rorth pushed a commit to rorth/llvm-project that referenced this pull request Jun 11, 2025
There is a DAG combine, that folds:

```
t1: v1i1 = setcc x:v1f16, y:v1f16, setogt:ch
	t2: v1i64 = zero_extend t1
```

->

```
t1: v1i16 = setcc x:v1f16, y:v1f16, setogt:ch
	t2: v1i64 = any_extend t1
```

This creates an issue on AArch64 when attempting to widen the result to
`v4i16`. The operand types (`v1f16`) are set to be scalarized, so the
"by hand" widening with `DAG.WidenVector` is used for them, however,
this only widens to the next power-of-2, so returns `v2f16`, which does
not match the result VF. The fix is to manually construct the widened
inputs using `INSERT_SUBVECTOR`.

Fixes llvm#136540
DhruvSrivastavaX pushed a commit to DhruvSrivastavaX/lldb-for-aix that referenced this pull request Jun 12, 2025
There is a DAG combine, that folds:

```
t1: v1i1 = setcc x:v1f16, y:v1f16, setogt:ch
	t2: v1i64 = zero_extend t1
```

->

```
t1: v1i16 = setcc x:v1f16, y:v1f16, setogt:ch
	t2: v1i64 = any_extend t1
```

This creates an issue on AArch64 when attempting to widen the result to
`v4i16`. The operand types (`v1f16`) are set to be scalarized, so the
"by hand" widening with `DAG.WidenVector` is used for them, however,
this only widens to the next power-of-2, so returns `v2f16`, which does
not match the result VF. The fix is to manually construct the widened
inputs using `INSERT_SUBVECTOR`.

Fixes llvm#136540
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:AArch64 llvm:SelectionDAG SelectionDAGISel as well
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[clang] compilation (instruction selection) failure (crash) while targeting AArch64 architecture
4 participants