-
Notifications
You must be signed in to change notification settings - Fork 14.3k
Calculate KnownBits from Metadata correctly for vector loads #128908
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-llvm-selectiondag @llvm/pr-subscribers-llvm-analysis Author: None (LU-JOHN) ChangesCalculate KnownBits correctly from metadata for vector loads. Full diff: https://github.com/llvm/llvm-project/pull/128908.diff 2 Files Affected:
diff --git a/llvm/lib/Analysis/ValueTracking.cpp b/llvm/lib/Analysis/ValueTracking.cpp
index e3e026f7979da..1d6368b408811 100644
--- a/llvm/lib/Analysis/ValueTracking.cpp
+++ b/llvm/lib/Analysis/ValueTracking.cpp
@@ -429,11 +429,16 @@ void llvm::computeKnownBitsFromRangeMetadata(const MDNode &Ranges,
ConstantInt *Upper =
mdconst::extract<ConstantInt>(Ranges.getOperand(2 * i + 1));
ConstantRange Range(Lower->getValue(), Upper->getValue());
+ unsigned RangeBitWidth = Lower->getBitWidth();
+ // BitWidth > RangeBitWidth can happen if Known is set to the width of a
+ // vector load but Ranges describes a vector element.
+ assert(BitWidth >= RangeBitWidth);
// The first CommonPrefixBits of all values in Range are equal.
unsigned CommonPrefixBits =
(Range.getUnsignedMax() ^ Range.getUnsignedMin()).countl_zero();
- APInt Mask = APInt::getHighBitsSet(BitWidth, CommonPrefixBits);
+ APInt Mask = APInt::getBitsSet(BitWidth, RangeBitWidth - CommonPrefixBits,
+ RangeBitWidth);
APInt UnsignedMax = Range.getUnsignedMax().zextOrTrunc(BitWidth);
Known.One &= UnsignedMax & Mask;
Known.Zero &= ~UnsignedMax & Mask;
diff --git a/llvm/test/CodeGen/AMDGPU/shl64_reduce.ll b/llvm/test/CodeGen/AMDGPU/shl64_reduce.ll
index 69242f4e44840..3f8553180df30 100644
--- a/llvm/test/CodeGen/AMDGPU/shl64_reduce.ll
+++ b/llvm/test/CodeGen/AMDGPU/shl64_reduce.ll
@@ -79,8 +79,9 @@ define <2 x i64> @shl_v2_metadata(<2 x i64> %arg0, ptr %arg1.ptr) {
; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; CHECK-NEXT: flat_load_dwordx4 v[4:7], v[4:5]
; CHECK-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
-; CHECK-NEXT: v_lshlrev_b64 v[0:1], v4, v[0:1]
; CHECK-NEXT: v_lshlrev_b64 v[2:3], v6, v[2:3]
+; CHECK-NEXT: v_lshlrev_b32_e32 v1, v4, v0
+; CHECK-NEXT: v_mov_b32_e32 v0, 0
; CHECK-NEXT: s_setpc_b64 s[30:31]
%shift.amt = load <2 x i64>, ptr %arg1.ptr, !range !0, !noundef !{}
%shl = shl <2 x i64> %arg0, %shift.amt
|
@llvm/pr-subscribers-backend-amdgpu Author: None (LU-JOHN) ChangesCalculate KnownBits correctly from metadata for vector loads. Full diff: https://github.com/llvm/llvm-project/pull/128908.diff 2 Files Affected:
diff --git a/llvm/lib/Analysis/ValueTracking.cpp b/llvm/lib/Analysis/ValueTracking.cpp
index e3e026f7979da..1d6368b408811 100644
--- a/llvm/lib/Analysis/ValueTracking.cpp
+++ b/llvm/lib/Analysis/ValueTracking.cpp
@@ -429,11 +429,16 @@ void llvm::computeKnownBitsFromRangeMetadata(const MDNode &Ranges,
ConstantInt *Upper =
mdconst::extract<ConstantInt>(Ranges.getOperand(2 * i + 1));
ConstantRange Range(Lower->getValue(), Upper->getValue());
+ unsigned RangeBitWidth = Lower->getBitWidth();
+ // BitWidth > RangeBitWidth can happen if Known is set to the width of a
+ // vector load but Ranges describes a vector element.
+ assert(BitWidth >= RangeBitWidth);
// The first CommonPrefixBits of all values in Range are equal.
unsigned CommonPrefixBits =
(Range.getUnsignedMax() ^ Range.getUnsignedMin()).countl_zero();
- APInt Mask = APInt::getHighBitsSet(BitWidth, CommonPrefixBits);
+ APInt Mask = APInt::getBitsSet(BitWidth, RangeBitWidth - CommonPrefixBits,
+ RangeBitWidth);
APInt UnsignedMax = Range.getUnsignedMax().zextOrTrunc(BitWidth);
Known.One &= UnsignedMax & Mask;
Known.Zero &= ~UnsignedMax & Mask;
diff --git a/llvm/test/CodeGen/AMDGPU/shl64_reduce.ll b/llvm/test/CodeGen/AMDGPU/shl64_reduce.ll
index 69242f4e44840..3f8553180df30 100644
--- a/llvm/test/CodeGen/AMDGPU/shl64_reduce.ll
+++ b/llvm/test/CodeGen/AMDGPU/shl64_reduce.ll
@@ -79,8 +79,9 @@ define <2 x i64> @shl_v2_metadata(<2 x i64> %arg0, ptr %arg1.ptr) {
; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; CHECK-NEXT: flat_load_dwordx4 v[4:7], v[4:5]
; CHECK-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
-; CHECK-NEXT: v_lshlrev_b64 v[0:1], v4, v[0:1]
; CHECK-NEXT: v_lshlrev_b64 v[2:3], v6, v[2:3]
+; CHECK-NEXT: v_lshlrev_b32_e32 v1, v4, v0
+; CHECK-NEXT: v_mov_b32_e32 v0, 0
; CHECK-NEXT: s_setpc_b64 s[30:31]
%shift.amt = load <2 x i64>, ptr %arg1.ptr, !range !0, !noundef !{}
%shl = shl <2 x i64> %arg0, %shift.amt
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For vectors, the width of KnownBits should be the same as the width in the range metadata: Both are element-wise.
// MemoryVT().getSizeInBits()), which is truncated to the correct elt size | ||
// if it is know. These are then extended to the original VT sizes below. | ||
if (const MDNode *MD = LD->getRanges()) { | ||
ConstantInt *Lower = mdconst::extract<ConstantInt>(MD->getOperand(0)); | ||
|
||
// FIXME: If loads are modified (e.g. type legalization) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're adding the fixme, but this is what this patch is fixing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're adding the fixme, but this is what this patch is fixing?
This is not the right location for the FIXME. Type legalization and perhaps other DAG processing does not maintain the consistency between the range metadata type and the load type/width. This inconsistency causes problems here, but the fix should not be here. I'll remove the FIXME.
Signed-off-by: John Lu <[email protected]>
Signed-off-by: John Lu <[email protected]>
…rong place. Signed-off-by: John Lu <[email protected]>
Signed-off-by: John Lu <[email protected]>
Signed-off-by: John Lu <[email protected]>
✅ With the latest revision this PR passed the C/C++ code formatter. |
Signed-off-by: John Lu <[email protected]>
Signed-off-by: John Lu <[email protected]>
Signed-off-by: John Lu <[email protected]>
Signed-off-by: John Lu <[email protected]>
Signed-off-by: John Lu <[email protected]>
if (const MDNode *MD = LD->getRanges()) { | ||
ConstantInt *Lower = mdconst::extract<ConstantInt>(MD->getOperand(0)); | ||
if (Lower->getBitWidth() != NVT.getScalarSizeInBits()) | ||
LD->getMemOperand()->clearRanges(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In principle we should cast the range. But since this references IR metadata directly, this may imply mutating the IR which we should avoid doing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to attach the range metadata to the subsequent bitcast to the original type?
Signed-off-by: John Lu <[email protected]>
This is causing rustc to no longer be able to build its stage 2 compiler: instead it gets a SIGSEGV partway through. Search for SEGV in https://buildkite.com/llvm-project/rust-llvm-integrate-prototype/builds/35748#0195ce08-82ac-45e8-9161-ae441a1939bd - I'll try and do more debugging to figure out what's going on. |
I keep running out of disk space on my test machine (sigh) when I do debuginfo builds, but with assertions I got as far as |
I suspect that in the code:
the range metadata bitwidth does not == ScalarMemorySize when you see a crash. I thought that this should never happen, and I believe that the range metadata and load are incompatible if this does happen. Can you try changing the code to:
|
That yielded
Hopefully that helps - I've also got things sorted so I can get the invocation open in |
Thanks this helps. The range metadata has a 128-bit type, but the scalar type of the load is 8-bit. This is inconsistent. Can you send me a testcase so I can investigate how this inconsistent metadata was created? |
I'm honestly out of my depth here. My only path to reproducing this is building the Rust crate |
Can you send me instructions on how to build a rustc 2-stage compiler? |
It will easily shrink with llvm-reduce |
I was now also able to reproduce and got reduced IR, which crashes when run through target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
target triple = "x86_64-apple-macosx10.12.0"
define void @"_ZN4core3ptr100drop_in_place$LT$core..option..Option$LT$regex_automata..meta..wrappers..ReverseHybridEngine$GT$$GT$17hb293fd3cea85facfE"() {
start:
%0 = load i128, ptr null, align 16, !range !0, !noundef !1
%1 = icmp eq i128 %0, 1
br i1 %1, label %bb1, label %bb1
bb1: ; preds = %start, %start
ret void
}
!0 = !{i128 0, i128 3}
!1 = !{} |
PR to fix this crash made in #133095 |
Calculate KnownBits correctly from metadata for vector loads.