-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[SystemZ] Fix bitwidth problem in FindReplicatedImm() (NFC). #115383
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-backend-systemz Author: Jonas Paulsson (JonPsson1) ChangesA test case emerged with an i32 truncating store of an i64 constant operand, where the i64 constant did not fit in 32 bits, which caused FindReplicatedImm() to crash. As SystemZVectorConstantInfo() does not really care about the bitwidth of its initial APInt, it seems simple enough to just not recreate it with a limited bitwidth (corresponding to MemVT or Splat VT), but rather just pass the APInt of the ConstantSDNode directly instead. Full diff: https://github.com/llvm/llvm-project/pull/115383.diff 2 Files Affected:
diff --git a/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp b/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
index 3999b54de81b65..ed170bae6aec30 100644
--- a/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
+++ b/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
@@ -7218,12 +7218,12 @@ SDValue SystemZTargetLowering::combineSTORE(
// Find a replicated immediate and return it if found in Word and its
// type in WordVT.
- auto FindReplicatedImm = [&](ConstantSDNode *C, unsigned TotBytes) {
+ auto FindReplicatedImm = [&](ConstantSDNode *C) {
// Some constants are better handled with a scalar store.
if (C->getAPIntValue().getBitWidth() > 64 || C->isAllOnes() ||
isInt<16>(C->getSExtValue()) || MemVT.getStoreSize() <= 2)
return;
- SystemZVectorConstantInfo VCI(APInt(TotBytes * 8, C->getZExtValue()));
+ SystemZVectorConstantInfo VCI(C->getAPIntValue());
if (VCI.isVectorConstantLegal(Subtarget) &&
VCI.Opcode == SystemZISD::REPLICATE) {
Word = DAG.getConstant(VCI.OpVals[0], SDLoc(SN), MVT::i32);
@@ -7261,12 +7261,12 @@ SDValue SystemZTargetLowering::combineSTORE(
DAG.isSplatValue(Op1, true/*AllowUndefs*/)) {
SDValue SplatVal = Op1->getOperand(0);
if (auto *C = dyn_cast<ConstantSDNode>(SplatVal))
- FindReplicatedImm(C, SplatVal.getValueType().getStoreSize());
+ FindReplicatedImm(C);
else
FindReplicatedReg(SplatVal);
} else {
if (auto *C = dyn_cast<ConstantSDNode>(Op1))
- FindReplicatedImm(C, MemVT.getStoreSize());
+ FindReplicatedImm(C);
else
FindReplicatedReg(Op1);
}
diff --git a/llvm/test/CodeGen/SystemZ/dag-combine-07.ll b/llvm/test/CodeGen/SystemZ/dag-combine-07.ll
new file mode 100644
index 00000000000000..11359009c2899b
--- /dev/null
+++ b/llvm/test/CodeGen/SystemZ/dag-combine-07.ll
@@ -0,0 +1,26 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc < %s -mtriple=s390x-linux-gnu -mcpu=z16 | FileCheck %s
+;
+; Test that SystemZTargetLowering::combineSTORE() does not crash on a
+; truncated immediate.
+
+@G1 = external global i64, align 8
+@G2 = external global i64, align 8
+
+define void @func_5(ptr %Dst) {
+; CHECK-LABEL: func_5:
+; CHECK: # %bb.0:
+; CHECK-NEXT: lgrl %r1, G2@GOT
+; CHECK-NEXT: llihl %r0, 50
+; CHECK-NEXT: oill %r0, 2
+; CHECK-NEXT: stg %r0, 0(%r1)
+; CHECK-NEXT: lgrl %r1, G1@GOT
+; CHECK-NEXT: stg %r0, 0(%r1)
+; CHECK-NEXT: st %r0, 0(%r2)
+; CHECK-NEXT: br %r14
+ store i64 214748364802, ptr @G2, align 8
+ store i64 214748364802, ptr @G1, align 8
+ %1 = load i32, ptr getelementptr inbounds (i8, ptr @G1, i64 4), align 4
+ store i32 %1, ptr %Dst, align 4
+ ret void
+}
|
I don't think this is quite true. It does use the initial bitwidth to decide whether the value can be generated as a splat in some cases. For example, it would appear that the value 0x01010101 can be created via VREPIB if the initial bitwidth is 32 (because then it is a splat), but not if the initial bitwidth is 64 (then it is not). |
Ouch... ofc, that is needed for the SplatBits. Patch updated to instead truncate the APInt if the store is truncating. This should work also for the case of a BuildVector being truncated to less than the element width. This is NFC on benchmarks, so not sure if it is worth taking the care to have a handling there. Alternatively FindReplicatedImm() could just bail if the immediate does not fit..? |
Val = Val.trunc(TotBytes * 8); | ||
} | ||
|
||
SystemZVectorConstantInfo VCI(Val); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still uses the size of Val rather than TotBytes, so it still pessimizes some cases (where the original Val.getExtValue() does fit, like in my example of a 64->32 truncating store of 0x01010101, which should use VREPI).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, was just about to change that.
I guess in the non-truncating case the original APInt should be of the matching bitwidth, but it doesn't hurt to be explicit and use TotBytes * 8.
And in the truncating case, it would be truncated where also the Bitwidth of the new APInt should be reduced.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and yes, if there is a truncating store with a value that fits, this also makes a difference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
A test case emerged with an i32 truncating store of an i64 constant operand, where the i64 constant did not fit in 32 bits, which caused FindReplicatedImm() to crash. Make sure to truncate the APInt in these cases.
A test case emerged with an i32 truncating store of an i64 constant operand, where the i64 constant did not fit in 32 bits, which caused FindReplicatedImm() to crash.
As SystemZVectorConstantInfo() does not really care about the bitwidth of its initial APInt, it seems simple enough to just not recreate it with a limited bitwidth (corresponding to MemVT or Splat VT), but rather just pass the APInt of the ConstantSDNode directly instead.