Skip to content

[SDAG] Harden assumption in getMemsetStringVal #126207

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Feb 13, 2025

Conversation

c-rhodes
Copy link
Collaborator

@c-rhodes c-rhodes commented Feb 7, 2025

In 5235973, an ICE was fixed in
getMemsetStringVal where f128 wasn't handled. It was noted at the time
[1] that the code below this also looks suspect, since it assumes the
element type of VT is either an f32 or f64.

This part of getMemsetStringVal relates to memcpy operations where the
source is a copy from a zero constant. The VT in question is determined
by TargetLowering::findOptimalMemOpLowering, which in turn calls a
further TLI hook getOptimalMemOpType.

For AArch64, getOptimalMemOpType returns either a v16i8, f128, i64, i32
or Other. For Other, TargetLowering::findOptimalMemOpLowering will then
pick an integer VT. So on AArch64 at least, I don't believe the suspect
code can be reached.

For other targets, ARM and x86 are the only ones that return a FP vector
type from getOptimalMemOpType. For both targets, the only such type is
v2f64, but given f64 is already handled it should also be fine.

To defend this, I considered adding an assert as mentioned in [1], but
given getConstantFP handles vector types, I figured using this to fully
handle the FP types makes the code simpler and more robust.

For test coverage I added unreachables to both of the branches handling
FP types in this code, but found neither fired with check-llvm across
all targets.

Test coverage was added to llvm/test/CodeGen/AArch64/memcpy-f128.ll in
5235973 to defend ICE on f128, but at
some point it stopped hitting this code.

AArch64TargetLowering::getOptimalMemOpType was updated in
2920061, so I suspect this is when it
happened, although I haven't verified this. Although I did find by
updating the test to disable NEON, getOptimalMemOpType returns an f128
and the branch is once again hit.

For the final branch noted as suspect in [1], as far as I can tell this
has never had any test coverage, so I've added a test to the ARM backend
for this.

Fixes: #20521 [1]

c-rhodes added a commit to c-rhodes/llvm-project that referenced this pull request Feb 7, 2025
Add missing test coverage for codepaths touched by llvm#126207.
@c-rhodes c-rhodes marked this pull request as ready for review February 7, 2025 09:38
@c-rhodes c-rhodes requested a review from davemgreen February 7, 2025 09:38
@llvmbot
Copy link
Member

llvmbot commented Feb 7, 2025

@llvm/pr-subscribers-backend-aarch64

@llvm/pr-subscribers-backend-arm

Author: Cullen Rhodes (c-rhodes)

Changes

In 5235973, an ICE was fixed in
getMemsetStringVal where f128 wasn't handled. It was noted at the time
[1] that the code below this also looks suspect, since it assumes the
element type of VT is either an f32 or f64.

This part of getMemsetStringVal relates to memcpy operations where the
source is a copy from a zero constant. The VT in question is determined
by TargetLowering::findOptimalMemOpLowering, which in turn calls a
further TLI hook getOptimalMemOpType.

For AArch64, getOptimalMemOpType returns either a v16i8, f128, i64, i32
or Other. For Other, TargetLowering::findOptimalMemOpLowering will then
pick an integer VT. So on AArch64 at least, I don't believe the suspect
code can be reached.

For other targets, ARM and x86 are the only ones that return a FP vector
type from getOptimalMemOpType. For both targets, the only such type is
v2f64, but given f64 is already handled it should also be fine.

To defend this, I considered adding an assert as mentioned in [1], but
given getConstantFP handles vector types, I figured using this to fully
handle the FP types makes the code simpler and more robust.

For test coverage I added unreachables to both of the branches handling
FP types in this code, but found neither fired with check-llvm across
all targets.

Test coverage was added to llvm/test/CodeGen/AArch64/memcpy-f128.ll in
5235973 to defend ICE on f128, but at
some point it stopped hitting this code.

AArch64TargetLowering::getOptimalMemOpType was updated in
2920061, so I suspect this is when it
happened, although I haven't verified this. Although I did find by
updating the test to disable NEON, getOptimalMemOpType returns an f128
and the branch is once again hit.

For the final branch noted as suspect in [1], as far as I can tell this
has never had any test coverage, so I've added a test to the ARM backend
for this.

Fixes: #20521 [1]


Full diff: https://github.com/llvm/llvm-project/pull/126207.diff

3 Files Affected:

  • (modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp (+1-9)
  • (modified) llvm/test/CodeGen/AArch64/memcpy-f128.ll (+1-1)
  • (modified) llvm/test/CodeGen/ARM/memcpy-inline.ll (+27)
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
index 16c3b295426c648..445edf1dea41d24 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
@@ -8015,16 +8015,8 @@ static SDValue getMemsetStringVal(EVT VT, const SDLoc &dl, SelectionDAG &DAG,
   if (Slice.Array == nullptr) {
     if (VT.isInteger())
       return DAG.getConstant(0, dl, VT);
-    if (VT == MVT::f32 || VT == MVT::f64 || VT == MVT::f128)
+    if (VT.isFloatingPoint())
       return DAG.getConstantFP(0.0, dl, VT);
-    if (VT.isVector()) {
-      unsigned NumElts = VT.getVectorNumElements();
-      MVT EltVT = (VT.getVectorElementType() == MVT::f32) ? MVT::i32 : MVT::i64;
-      return DAG.getNode(ISD::BITCAST, dl, VT,
-                         DAG.getConstant(0, dl,
-                                         EVT::getVectorVT(*DAG.getContext(),
-                                                          EltVT, NumElts)));
-    }
     llvm_unreachable("Expected type!");
   }
 
diff --git a/llvm/test/CodeGen/AArch64/memcpy-f128.ll b/llvm/test/CodeGen/AArch64/memcpy-f128.ll
index 5b354dd23e01dfa..229c0034b6f636c 100644
--- a/llvm/test/CodeGen/AArch64/memcpy-f128.ll
+++ b/llvm/test/CodeGen/AArch64/memcpy-f128.ll
@@ -1,4 +1,4 @@
-; RUN: llc < %s -mtriple=aarch64-linux-gnu | FileCheck %s
+; RUN: llc < %s -mtriple=aarch64-linux-gnu -mattr=-neon | FileCheck %s
 
 %structA = type { i128 }
 @stubA = internal unnamed_addr constant %structA zeroinitializer, align 8
diff --git a/llvm/test/CodeGen/ARM/memcpy-inline.ll b/llvm/test/CodeGen/ARM/memcpy-inline.ll
index 596a58afe46e5bb..89db22d1f0ed53c 100644
--- a/llvm/test/CodeGen/ARM/memcpy-inline.ll
+++ b/llvm/test/CodeGen/ARM/memcpy-inline.ll
@@ -12,6 +12,7 @@
 @.str4 = private unnamed_addr constant [18 x i8] c"DHRYSTONE PROGR  \00", align 1
 @.str5 = private unnamed_addr constant [7 x i8] c"DHRYST\00", align 1
 @.str6 = private unnamed_addr constant [14 x i8] c"/tmp/rmXXXXXX\00", align 1
+@empty = private unnamed_addr constant [31 x i8] zeroinitializer, align 1
 @spool.splbuf = internal global [512 x i8] zeroinitializer, align 16
 
 define i32 @t0() {
@@ -282,5 +283,31 @@ entry:
   ret void
 }
 
+define void @copy_from_zero_constant(ptr nocapture %C) nounwind {
+; CHECK-LABEL: copy_from_zero_constant:
+; CHECK:       @ %bb.0: @ %entry
+; CHECK-NEXT:    vmov.i32 q8, #0x0
+; CHECK-NEXT:    movs r1, #15
+; CHECK-NEXT:    vst1.8 {d16, d17}, [r0], r1
+; CHECK-NEXT:    vst1.8 {d16, d17}, [r0]
+; CHECK-NEXT:    bx lr
+;
+; CHECK-T1-LABEL: copy_from_zero_constant:
+; CHECK-T1:       @ %bb.0: @ %entry
+; CHECK-T1-NEXT:    .save {r7, lr}
+; CHECK-T1-NEXT:    push {r7, lr}
+; CHECK-T1-NEXT:    ldr r1, .LCPI8_0
+; CHECK-T1-NEXT:    movs r2, #31
+; CHECK-T1-NEXT:    bl __aeabi_memcpy
+; CHECK-T1-NEXT:    pop {r7, pc}
+; CHECK-T1-NEXT:    .p2align 2
+; CHECK-T1-NEXT:  @ %bb.1:
+; CHECK-T1-NEXT:  .LCPI8_0:
+; CHECK-T1-NEXT:    .long .Lempty
+entry:
+  tail call void @llvm.memcpy.p0.p0.i64(ptr %C, ptr @empty, i64 31, i1 false)
+  ret void
+}
+
 declare void @llvm.memcpy.p0.p0.i32(ptr nocapture, ptr nocapture, i32, i1) nounwind
 declare void @llvm.memcpy.p0.p0.i64(ptr nocapture, ptr nocapture, i64, i1) nounwind

@llvmbot
Copy link
Member

llvmbot commented Feb 7, 2025

@llvm/pr-subscribers-llvm-selectiondag

Author: Cullen Rhodes (c-rhodes)

Changes

In 5235973, an ICE was fixed in
getMemsetStringVal where f128 wasn't handled. It was noted at the time
[1] that the code below this also looks suspect, since it assumes the
element type of VT is either an f32 or f64.

This part of getMemsetStringVal relates to memcpy operations where the
source is a copy from a zero constant. The VT in question is determined
by TargetLowering::findOptimalMemOpLowering, which in turn calls a
further TLI hook getOptimalMemOpType.

For AArch64, getOptimalMemOpType returns either a v16i8, f128, i64, i32
or Other. For Other, TargetLowering::findOptimalMemOpLowering will then
pick an integer VT. So on AArch64 at least, I don't believe the suspect
code can be reached.

For other targets, ARM and x86 are the only ones that return a FP vector
type from getOptimalMemOpType. For both targets, the only such type is
v2f64, but given f64 is already handled it should also be fine.

To defend this, I considered adding an assert as mentioned in [1], but
given getConstantFP handles vector types, I figured using this to fully
handle the FP types makes the code simpler and more robust.

For test coverage I added unreachables to both of the branches handling
FP types in this code, but found neither fired with check-llvm across
all targets.

Test coverage was added to llvm/test/CodeGen/AArch64/memcpy-f128.ll in
5235973 to defend ICE on f128, but at
some point it stopped hitting this code.

AArch64TargetLowering::getOptimalMemOpType was updated in
2920061, so I suspect this is when it
happened, although I haven't verified this. Although I did find by
updating the test to disable NEON, getOptimalMemOpType returns an f128
and the branch is once again hit.

For the final branch noted as suspect in [1], as far as I can tell this
has never had any test coverage, so I've added a test to the ARM backend
for this.

Fixes: #20521 [1]


Full diff: https://github.com/llvm/llvm-project/pull/126207.diff

3 Files Affected:

  • (modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp (+1-9)
  • (modified) llvm/test/CodeGen/AArch64/memcpy-f128.ll (+1-1)
  • (modified) llvm/test/CodeGen/ARM/memcpy-inline.ll (+27)
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
index 16c3b295426c648..445edf1dea41d24 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
@@ -8015,16 +8015,8 @@ static SDValue getMemsetStringVal(EVT VT, const SDLoc &dl, SelectionDAG &DAG,
   if (Slice.Array == nullptr) {
     if (VT.isInteger())
       return DAG.getConstant(0, dl, VT);
-    if (VT == MVT::f32 || VT == MVT::f64 || VT == MVT::f128)
+    if (VT.isFloatingPoint())
       return DAG.getConstantFP(0.0, dl, VT);
-    if (VT.isVector()) {
-      unsigned NumElts = VT.getVectorNumElements();
-      MVT EltVT = (VT.getVectorElementType() == MVT::f32) ? MVT::i32 : MVT::i64;
-      return DAG.getNode(ISD::BITCAST, dl, VT,
-                         DAG.getConstant(0, dl,
-                                         EVT::getVectorVT(*DAG.getContext(),
-                                                          EltVT, NumElts)));
-    }
     llvm_unreachable("Expected type!");
   }
 
diff --git a/llvm/test/CodeGen/AArch64/memcpy-f128.ll b/llvm/test/CodeGen/AArch64/memcpy-f128.ll
index 5b354dd23e01dfa..229c0034b6f636c 100644
--- a/llvm/test/CodeGen/AArch64/memcpy-f128.ll
+++ b/llvm/test/CodeGen/AArch64/memcpy-f128.ll
@@ -1,4 +1,4 @@
-; RUN: llc < %s -mtriple=aarch64-linux-gnu | FileCheck %s
+; RUN: llc < %s -mtriple=aarch64-linux-gnu -mattr=-neon | FileCheck %s
 
 %structA = type { i128 }
 @stubA = internal unnamed_addr constant %structA zeroinitializer, align 8
diff --git a/llvm/test/CodeGen/ARM/memcpy-inline.ll b/llvm/test/CodeGen/ARM/memcpy-inline.ll
index 596a58afe46e5bb..89db22d1f0ed53c 100644
--- a/llvm/test/CodeGen/ARM/memcpy-inline.ll
+++ b/llvm/test/CodeGen/ARM/memcpy-inline.ll
@@ -12,6 +12,7 @@
 @.str4 = private unnamed_addr constant [18 x i8] c"DHRYSTONE PROGR  \00", align 1
 @.str5 = private unnamed_addr constant [7 x i8] c"DHRYST\00", align 1
 @.str6 = private unnamed_addr constant [14 x i8] c"/tmp/rmXXXXXX\00", align 1
+@empty = private unnamed_addr constant [31 x i8] zeroinitializer, align 1
 @spool.splbuf = internal global [512 x i8] zeroinitializer, align 16
 
 define i32 @t0() {
@@ -282,5 +283,31 @@ entry:
   ret void
 }
 
+define void @copy_from_zero_constant(ptr nocapture %C) nounwind {
+; CHECK-LABEL: copy_from_zero_constant:
+; CHECK:       @ %bb.0: @ %entry
+; CHECK-NEXT:    vmov.i32 q8, #0x0
+; CHECK-NEXT:    movs r1, #15
+; CHECK-NEXT:    vst1.8 {d16, d17}, [r0], r1
+; CHECK-NEXT:    vst1.8 {d16, d17}, [r0]
+; CHECK-NEXT:    bx lr
+;
+; CHECK-T1-LABEL: copy_from_zero_constant:
+; CHECK-T1:       @ %bb.0: @ %entry
+; CHECK-T1-NEXT:    .save {r7, lr}
+; CHECK-T1-NEXT:    push {r7, lr}
+; CHECK-T1-NEXT:    ldr r1, .LCPI8_0
+; CHECK-T1-NEXT:    movs r2, #31
+; CHECK-T1-NEXT:    bl __aeabi_memcpy
+; CHECK-T1-NEXT:    pop {r7, pc}
+; CHECK-T1-NEXT:    .p2align 2
+; CHECK-T1-NEXT:  @ %bb.1:
+; CHECK-T1-NEXT:  .LCPI8_0:
+; CHECK-T1-NEXT:    .long .Lempty
+entry:
+  tail call void @llvm.memcpy.p0.p0.i64(ptr %C, ptr @empty, i64 31, i1 false)
+  ret void
+}
+
 declare void @llvm.memcpy.p0.p0.i32(ptr nocapture, ptr nocapture, i32, i1) nounwind
 declare void @llvm.memcpy.p0.p0.i64(ptr nocapture, ptr nocapture, i64, i1) nounwind

Comment on lines -8020 to -8027
if (VT.isVector()) {
unsigned NumElts = VT.getVectorNumElements();
MVT EltVT = (VT.getVectorElementType() == MVT::f32) ? MVT::i32 : MVT::i64;
return DAG.getNode(ISD::BITCAST, dl, VT,
DAG.getConstant(0, dl,
EVT::getVectorVT(*DAG.getContext(),
EltVT, NumElts)));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can just bitcast to VT.changeVectorElementTypeToInteger?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that would also work I guess, although it's not obvious to me that's any better than calling getConstantFP?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean instead of having this unreachable case

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if my understanding is correct, but I've updated to what I believe you're suggesting, except I've used changeTypeToInteger since it handles both scalar/vector.

Copy link
Collaborator

@davemgreen davemgreen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds OK, but could it use a fp type that doesn't have a representation of 0.0, or where 0.0 != 0x0000? I doubt that would be used in practice, but is it worth having an assert to check?

c-rhodes added a commit that referenced this pull request Feb 10, 2025
Add missing test coverage for codepaths touched by #126207.
In 5235973, an ICE was fixed in
getMemsetStringVal where f128 wasn't handled. It was noted at the time
[1] that the code below this also looks suspect, since it assumes the
element type of VT is either an f32 or f64.

This part of getMemsetStringVal relates to memcpy operations where the
source is a copy from a zero constant. The VT in question is determined
by TargetLowering::findOptimalMemOpLowering, which in turn calls a
further TLI hook getOptimalMemOpType.

For AArch64, getOptimalMemOpType returns either a v16i8, f128, i64, i32
or Other. For Other, TargetLowering::findOptimalMemOpLowering will then
pick an integer VT. So on AArch64 at least, I don't believe the suspect
code can be reached.

For other targets, ARM and x86 are the only ones that return a FP vector
type from getOptimalMemOpType. For both targets, the only such type is
v2f64, but given f64 is already handled it should also be fine.

To defend this, I considered adding an assert as mentioned in [1], but
given getConstantFP handles vector types, I figured using this to fully
handle the FP types makes the code simpler and more robust.

For test coverage I added unreachables to both of the branches handling
FP types in this code, but found neither fired with check-llvm across
all targets.

Test coverage was added to llvm/test/CodeGen/AArch64/memcpy-f128.ll in
5235973 to defend ICE on f128, but at
some point it stopped hitting this code.

AArch64TargetLowering::getOptimalMemOpType was updated in
2920061, so I suspect this is when it
happened, although I haven't verified this. Although I did find by
updating the test to disable NEON, getOptimalMemOpType returns an f128
and the branch is once again hit.

For the final branch noted as suspect in [1], as far as I can tell this
has never had any test coverage, so I've added a test to the ARM backend
for this.

Fixes: llvm#20521 [1]
@c-rhodes
Copy link
Collaborator Author

This sounds OK, but could it use a fp type that doesn't have a representation of 0.0, or where 0.0 != 0x0000? I doubt that would be used in practice, but is it worth having an assert to check?

any pointers on how i can check that?

@arsenm
Copy link
Contributor

arsenm commented Feb 10, 2025

This sounds OK, but could it use a fp type that doesn't have a representation of 0.0, or where 0.0 != 0x0000? I doubt that would be used in practice, but is it worth having an assert to check?

any pointers on how i can check that?

We don't have any of those in the IR, if they exist.

Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could have kept the constant fp case, I guess. Also could unconditionally go through the bitcast

@c-rhodes
Copy link
Collaborator Author

Could have kept the constant fp case, I guess. Also could unconditionally go through the bitcast

Thanks for review. By unconditionally go through the bitcast, do you mean remove the int case?

}
llvm_unreachable("Expected type!");
return DAG.getNode(ISD::BITCAST, dl, VT,
DAG.getConstant(0, dl, VT.changeTypeToInteger()));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically this can be used for the isInteger() path above as well, but I'm not sure if that just gets confusing?

Copy link
Collaborator Author

@c-rhodes c-rhodes Feb 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that crossed my mind also and is perhaps what @arsenm was referring to as well, I don't mind either way.

Icohedron pushed a commit to Icohedron/llvm-project that referenced this pull request Feb 11, 2025
Add missing test coverage for codepaths touched by llvm#126207.
@c-rhodes
Copy link
Collaborator Author

I'll land this tomorrow if there's no further comments by then.

@c-rhodes c-rhodes merged commit 9b2fc66 into llvm:main Feb 13, 2025
8 checks passed
flovent pushed a commit to flovent/llvm-project that referenced this pull request Feb 13, 2025
In 5235973, an ICE was fixed in
getMemsetStringVal where f128 wasn't handled. It was noted at the time
[1] that the code below this also looks suspect, since it assumes the
element type of VT is either an f32 or f64.

This part of getMemsetStringVal relates to memcpy operations where the
source is a copy from a zero constant. The VT in question is determined
by TargetLowering::findOptimalMemOpLowering, which in turn calls a
further TLI hook getOptimalMemOpType.

For AArch64, getOptimalMemOpType returns either a v16i8, f128, i64, i32
or Other. For Other, TargetLowering::findOptimalMemOpLowering will then
pick an integer VT. So on AArch64 at least, I don't believe the suspect
code can be reached.

For other targets, ARM and x86 are the only ones that return a FP vector
type from getOptimalMemOpType. For both targets, the only such type is
v2f64, but given f64 is already handled it should also be fine.

To defend this, I considered adding an assert as mentioned in [1], but
given getConstantFP handles vector types, I figured using this to fully
handle the FP types makes the code simpler and more robust.

For test coverage I added unreachables to both of the branches handling
FP types in this code, but found neither fired with check-llvm across
all targets.

Test coverage was added to llvm/test/CodeGen/AArch64/memcpy-f128.ll in
5235973 to defend ICE on f128, but at
some point it stopped hitting this code.

AArch64TargetLowering::getOptimalMemOpType was updated in
2920061, so I suspect this is when it
happened, although I haven't verified this. Although I did find by
updating the test to disable NEON, getOptimalMemOpType returns an f128
and the branch is once again hit.

For the final branch noted as suspect in [1], as far as I can tell this
has never had any test coverage, so I've added a test to the ARM backend
for this.

Fixes: llvm#20521 [1]
joaosaffran pushed a commit to joaosaffran/llvm-project that referenced this pull request Feb 14, 2025
In 5235973, an ICE was fixed in
getMemsetStringVal where f128 wasn't handled. It was noted at the time
[1] that the code below this also looks suspect, since it assumes the
element type of VT is either an f32 or f64.

This part of getMemsetStringVal relates to memcpy operations where the
source is a copy from a zero constant. The VT in question is determined
by TargetLowering::findOptimalMemOpLowering, which in turn calls a
further TLI hook getOptimalMemOpType.

For AArch64, getOptimalMemOpType returns either a v16i8, f128, i64, i32
or Other. For Other, TargetLowering::findOptimalMemOpLowering will then
pick an integer VT. So on AArch64 at least, I don't believe the suspect
code can be reached.

For other targets, ARM and x86 are the only ones that return a FP vector
type from getOptimalMemOpType. For both targets, the only such type is
v2f64, but given f64 is already handled it should also be fine.

To defend this, I considered adding an assert as mentioned in [1], but
given getConstantFP handles vector types, I figured using this to fully
handle the FP types makes the code simpler and more robust.

For test coverage I added unreachables to both of the branches handling
FP types in this code, but found neither fired with check-llvm across
all targets.

Test coverage was added to llvm/test/CodeGen/AArch64/memcpy-f128.ll in
5235973 to defend ICE on f128, but at
some point it stopped hitting this code.

AArch64TargetLowering::getOptimalMemOpType was updated in
2920061, so I suspect this is when it
happened, although I haven't verified this. Although I did find by
updating the test to disable NEON, getOptimalMemOpType returns an f128
and the branch is once again hit.

For the final branch noted as suspect in [1], as far as I can tell this
has never had any test coverage, so I've added a test to the ARM backend
for this.

Fixes: llvm#20521 [1]
sivan-shani pushed a commit to sivan-shani/llvm-project that referenced this pull request Feb 24, 2025
In 5235973, an ICE was fixed in
getMemsetStringVal where f128 wasn't handled. It was noted at the time
[1] that the code below this also looks suspect, since it assumes the
element type of VT is either an f32 or f64.

This part of getMemsetStringVal relates to memcpy operations where the
source is a copy from a zero constant. The VT in question is determined
by TargetLowering::findOptimalMemOpLowering, which in turn calls a
further TLI hook getOptimalMemOpType.

For AArch64, getOptimalMemOpType returns either a v16i8, f128, i64, i32
or Other. For Other, TargetLowering::findOptimalMemOpLowering will then
pick an integer VT. So on AArch64 at least, I don't believe the suspect
code can be reached.

For other targets, ARM and x86 are the only ones that return a FP vector
type from getOptimalMemOpType. For both targets, the only such type is
v2f64, but given f64 is already handled it should also be fine.

To defend this, I considered adding an assert as mentioned in [1], but
given getConstantFP handles vector types, I figured using this to fully
handle the FP types makes the code simpler and more robust.

For test coverage I added unreachables to both of the branches handling
FP types in this code, but found neither fired with check-llvm across
all targets.

Test coverage was added to llvm/test/CodeGen/AArch64/memcpy-f128.ll in
5235973 to defend ICE on f128, but at
some point it stopped hitting this code.

AArch64TargetLowering::getOptimalMemOpType was updated in
2920061, so I suspect this is when it
happened, although I haven't verified this. Although I did find by
updating the test to disable NEON, getOptimalMemOpType returns an f128
and the branch is once again hit.

For the final branch noted as suspect in [1], as far as I can tell this
has never had any test coverage, so I've added a test to the ARM backend
for this.

Fixes: llvm#20521 [1]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[AArch64] Suspect code near ICE on memset call.
5 participants