Skip to content

Calculate KnownBits from Metadata correctly for vector loads #128908

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Mar 25, 2025

Conversation

LU-JOHN
Copy link
Contributor

@LU-JOHN LU-JOHN commented Feb 26, 2025

Calculate KnownBits correctly from metadata for vector loads.

@LU-JOHN LU-JOHN requested a review from nikic as a code owner February 26, 2025 16:50
@llvmbot llvmbot added backend:AMDGPU llvm:analysis Includes value tracking, cost tables and constant folding labels Feb 26, 2025
@llvmbot
Copy link
Member

llvmbot commented Feb 26, 2025

@llvm/pr-subscribers-llvm-selectiondag

@llvm/pr-subscribers-llvm-analysis

Author: None (LU-JOHN)

Changes

Calculate KnownBits correctly from metadata for vector loads.
Utilize KnownBits to reduce 64-bit shl to 32-bit shl.


Full diff: https://github.com/llvm/llvm-project/pull/128908.diff

2 Files Affected:

  • (modified) llvm/lib/Analysis/ValueTracking.cpp (+6-1)
  • (modified) llvm/test/CodeGen/AMDGPU/shl64_reduce.ll (+2-1)
diff --git a/llvm/lib/Analysis/ValueTracking.cpp b/llvm/lib/Analysis/ValueTracking.cpp
index e3e026f7979da..1d6368b408811 100644
--- a/llvm/lib/Analysis/ValueTracking.cpp
+++ b/llvm/lib/Analysis/ValueTracking.cpp
@@ -429,11 +429,16 @@ void llvm::computeKnownBitsFromRangeMetadata(const MDNode &Ranges,
     ConstantInt *Upper =
         mdconst::extract<ConstantInt>(Ranges.getOperand(2 * i + 1));
     ConstantRange Range(Lower->getValue(), Upper->getValue());
+    unsigned RangeBitWidth = Lower->getBitWidth();
+    // BitWidth > RangeBitWidth can happen if Known is set to the width of a
+    // vector load but Ranges describes a vector element.
+    assert(BitWidth >= RangeBitWidth);
 
     // The first CommonPrefixBits of all values in Range are equal.
     unsigned CommonPrefixBits =
         (Range.getUnsignedMax() ^ Range.getUnsignedMin()).countl_zero();
-    APInt Mask = APInt::getHighBitsSet(BitWidth, CommonPrefixBits);
+    APInt Mask = APInt::getBitsSet(BitWidth, RangeBitWidth - CommonPrefixBits,
+                                   RangeBitWidth);
     APInt UnsignedMax = Range.getUnsignedMax().zextOrTrunc(BitWidth);
     Known.One &= UnsignedMax & Mask;
     Known.Zero &= ~UnsignedMax & Mask;
diff --git a/llvm/test/CodeGen/AMDGPU/shl64_reduce.ll b/llvm/test/CodeGen/AMDGPU/shl64_reduce.ll
index 69242f4e44840..3f8553180df30 100644
--- a/llvm/test/CodeGen/AMDGPU/shl64_reduce.ll
+++ b/llvm/test/CodeGen/AMDGPU/shl64_reduce.ll
@@ -79,8 +79,9 @@ define <2 x i64> @shl_v2_metadata(<2 x i64> %arg0, ptr %arg1.ptr) {
 ; CHECK-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; CHECK-NEXT:    flat_load_dwordx4 v[4:7], v[4:5]
 ; CHECK-NEXT:    s_waitcnt vmcnt(0) lgkmcnt(0)
-; CHECK-NEXT:    v_lshlrev_b64 v[0:1], v4, v[0:1]
 ; CHECK-NEXT:    v_lshlrev_b64 v[2:3], v6, v[2:3]
+; CHECK-NEXT:    v_lshlrev_b32_e32 v1, v4, v0
+; CHECK-NEXT:    v_mov_b32_e32 v0, 0
 ; CHECK-NEXT:    s_setpc_b64 s[30:31]
   %shift.amt = load <2 x i64>, ptr %arg1.ptr, !range !0, !noundef !{}
   %shl = shl <2 x i64> %arg0, %shift.amt

@llvmbot
Copy link
Member

llvmbot commented Feb 26, 2025

@llvm/pr-subscribers-backend-amdgpu

Author: None (LU-JOHN)

Changes

Calculate KnownBits correctly from metadata for vector loads.
Utilize KnownBits to reduce 64-bit shl to 32-bit shl.


Full diff: https://github.com/llvm/llvm-project/pull/128908.diff

2 Files Affected:

  • (modified) llvm/lib/Analysis/ValueTracking.cpp (+6-1)
  • (modified) llvm/test/CodeGen/AMDGPU/shl64_reduce.ll (+2-1)
diff --git a/llvm/lib/Analysis/ValueTracking.cpp b/llvm/lib/Analysis/ValueTracking.cpp
index e3e026f7979da..1d6368b408811 100644
--- a/llvm/lib/Analysis/ValueTracking.cpp
+++ b/llvm/lib/Analysis/ValueTracking.cpp
@@ -429,11 +429,16 @@ void llvm::computeKnownBitsFromRangeMetadata(const MDNode &Ranges,
     ConstantInt *Upper =
         mdconst::extract<ConstantInt>(Ranges.getOperand(2 * i + 1));
     ConstantRange Range(Lower->getValue(), Upper->getValue());
+    unsigned RangeBitWidth = Lower->getBitWidth();
+    // BitWidth > RangeBitWidth can happen if Known is set to the width of a
+    // vector load but Ranges describes a vector element.
+    assert(BitWidth >= RangeBitWidth);
 
     // The first CommonPrefixBits of all values in Range are equal.
     unsigned CommonPrefixBits =
         (Range.getUnsignedMax() ^ Range.getUnsignedMin()).countl_zero();
-    APInt Mask = APInt::getHighBitsSet(BitWidth, CommonPrefixBits);
+    APInt Mask = APInt::getBitsSet(BitWidth, RangeBitWidth - CommonPrefixBits,
+                                   RangeBitWidth);
     APInt UnsignedMax = Range.getUnsignedMax().zextOrTrunc(BitWidth);
     Known.One &= UnsignedMax & Mask;
     Known.Zero &= ~UnsignedMax & Mask;
diff --git a/llvm/test/CodeGen/AMDGPU/shl64_reduce.ll b/llvm/test/CodeGen/AMDGPU/shl64_reduce.ll
index 69242f4e44840..3f8553180df30 100644
--- a/llvm/test/CodeGen/AMDGPU/shl64_reduce.ll
+++ b/llvm/test/CodeGen/AMDGPU/shl64_reduce.ll
@@ -79,8 +79,9 @@ define <2 x i64> @shl_v2_metadata(<2 x i64> %arg0, ptr %arg1.ptr) {
 ; CHECK-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; CHECK-NEXT:    flat_load_dwordx4 v[4:7], v[4:5]
 ; CHECK-NEXT:    s_waitcnt vmcnt(0) lgkmcnt(0)
-; CHECK-NEXT:    v_lshlrev_b64 v[0:1], v4, v[0:1]
 ; CHECK-NEXT:    v_lshlrev_b64 v[2:3], v6, v[2:3]
+; CHECK-NEXT:    v_lshlrev_b32_e32 v1, v4, v0
+; CHECK-NEXT:    v_mov_b32_e32 v0, 0
 ; CHECK-NEXT:    s_setpc_b64 s[30:31]
   %shift.amt = load <2 x i64>, ptr %arg1.ptr, !range !0, !noundef !{}
   %shl = shl <2 x i64> %arg0, %shift.amt

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For vectors, the width of KnownBits should be the same as the width in the range metadata: Both are element-wise.

@llvmbot llvmbot added the llvm:SelectionDAG SelectionDAGISel as well label Mar 7, 2025
// MemoryVT().getSizeInBits()), which is truncated to the correct elt size
// if it is know. These are then extended to the original VT sizes below.
if (const MDNode *MD = LD->getRanges()) {
ConstantInt *Lower = mdconst::extract<ConstantInt>(MD->getOperand(0));

// FIXME: If loads are modified (e.g. type legalization)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're adding the fixme, but this is what this patch is fixing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're adding the fixme, but this is what this patch is fixing?

This is not the right location for the FIXME. Type legalization and perhaps other DAG processing does not maintain the consistency between the range metadata type and the load type/width. This inconsistency causes problems here, but the fix should not be here. I'll remove the FIXME.

Copy link

github-actions bot commented Mar 12, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

Signed-off-by: John Lu <[email protected]>
@LU-JOHN LU-JOHN requested a review from arsenm March 18, 2025 15:11
@LU-JOHN LU-JOHN requested a review from arsenm March 25, 2025 01:54
Comment on lines 695 to 699
if (const MDNode *MD = LD->getRanges()) {
ConstantInt *Lower = mdconst::extract<ConstantInt>(MD->getOperand(0));
if (Lower->getBitWidth() != NVT.getScalarSizeInBits())
LD->getMemOperand()->clearRanges();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In principle we should cast the range. But since this references IR metadata directly, this may imply mutating the IR which we should avoid doing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to attach the range metadata to the subsequent bitcast to the original type?

@arsenm arsenm merged commit 70aeb89 into llvm:main Mar 25, 2025
11 checks passed
@durin42
Copy link
Contributor

durin42 commented Mar 25, 2025

This is causing rustc to no longer be able to build its stage 2 compiler: instead it gets a SIGSEGV partway through. Search for SEGV in https://buildkite.com/llvm-project/rust-llvm-integrate-prototype/builds/35748#0195ce08-82ac-45e8-9161-ae441a1939bd - I'll try and do more debugging to figure out what's going on.

@durin42
Copy link
Contributor

durin42 commented Mar 25, 2025

I keep running out of disk space on my test machine (sigh) when I do debuginfo builds, but with assertions I got as far as src/llvm-project/llvm/include/llvm/ADT/APInt.h:1369: void llvm::APInt::setBits(unsigned int, unsigned int): Assertion `loBit <= BitWidth && "loBit out of range"' failed. Sadly no stack trace on that though.

@LU-JOHN
Copy link
Contributor Author

LU-JOHN commented Mar 25, 2025

I keep running out of disk space on my test machine (sigh) when I do debuginfo builds, but with assertions I got as far as src/llvm-project/llvm/include/llvm/ADT/APInt.h:1369: void llvm::APInt::setBits(unsigned int, unsigned int): Assertion `loBit <= BitWidth && "loBit out of range"' failed. Sadly no stack trace on that though.

I suspect that in the code:

  KnownBits KnownScalarMemory(ScalarMemorySize);
  if (const MDNode *MD = LD->getRanges())
    computeKnownBitsFromRangeMetadata(*MD, KnownScalarMemory);

the range metadata bitwidth does not == ScalarMemorySize when you see a crash. I thought that this should never happen, and I believe that the range metadata and load are incompatible if this does happen. Can you try changing the code to:

  if (const MDNode *MD = LD->getRanges()) {
    ConstantInt *Lower = mdconst::extract<ConstantInt>(MD->getOperand(0));
    if (Lower->getBitWidth() != ScalarMemorySize) {
      LD->dump();
      MD->dump();
      assert(0);
    }
    computeKnownBitsFromRangeMetadata(*MD, KnownScalarMemory);
  }

@durin42
Copy link
Contributor

durin42 commented Mar 25, 2025

That yielded

t24: v16i8,ch = load<(load (s128) from %ir.<badref>, !alias.scope <0x7fd6d812dc28>, !noalias <0x7fd6d812dc58>, !range <0x7fd6d80869d0>)> t0, t7, undef:i64
<0x7fd6d80869d0> = !{i128 0, i128 3}
rustc: /usr/local/google/home/augie/Programming/big/rust/src/llvm-project/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp:4016: llvm::KnownBits llvm::SelectionDAG::computeKnownBits(llvm::SDValue, const llvm::APInt&, unsigned int) const: Assertion `0' failed.

Hopefully that helps - I've also got things sorted so I can get the invocation open in lldb if that would help for me to inspect some stuff here.

@LU-JOHN
Copy link
Contributor Author

LU-JOHN commented Mar 25, 2025

That yielded

t24: v16i8,ch = load<(load (s128) from %ir.<badref>, !alias.scope <0x7fd6d812dc28>, !noalias <0x7fd6d812dc58>, !range <0x7fd6d80869d0>)> t0, t7, undef:i64
<0x7fd6d80869d0> = !{i128 0, i128 3}
rustc: /usr/local/google/home/augie/Programming/big/rust/src/llvm-project/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp:4016: llvm::KnownBits llvm::SelectionDAG::computeKnownBits(llvm::SDValue, const llvm::APInt&, unsigned int) const: Assertion `0' failed.

Hopefully that helps - I've also got things sorted so I can get the invocation open in lldb if that would help for me to inspect some stuff here.

Thanks this helps. The range metadata has a 128-bit type, but the scalar type of the load is 8-bit. This is inconsistent. Can you send me a testcase so I can investigate how this inconsistent metadata was created?

@durin42
Copy link
Contributor

durin42 commented Mar 25, 2025

I'm honestly out of my depth here. My only path to reproducing this is building the Rust crate regex-automata and the resulting text IR is 42 megs, which I assume is way too much for you to look at usefully?

@LU-JOHN
Copy link
Contributor Author

LU-JOHN commented Mar 25, 2025

I'm honestly out of my depth here. My only path to reproducing this is building the Rust crate regex-automata and the resulting text IR is 42 megs, which I assume is way too much for you to look at usefully?

Can you send me instructions on how to build a rustc 2-stage compiler?

@arsenm
Copy link
Contributor

arsenm commented Mar 25, 2025

I'm honestly out of my depth here. My only path to reproducing this is building the Rust crate regex-automata and the resulting text IR is 42 megs, which I assume is way too much for you to look at usefully?

It will easily shrink with llvm-reduce

@TimNN
Copy link
Contributor

TimNN commented Mar 26, 2025

I was now also able to reproduce and got reduced IR, which crashes when run through llc -O0:

target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
target triple = "x86_64-apple-macosx10.12.0"

define void @"_ZN4core3ptr100drop_in_place$LT$core..option..Option$LT$regex_automata..meta..wrappers..ReverseHybridEngine$GT$$GT$17hb293fd3cea85facfE"() {
start:
  %0 = load i128, ptr null, align 16, !range !0, !noundef !1
  %1 = icmp eq i128 %0, 1
  br i1 %1, label %bb1, label %bb1

bb1:                                              ; preds = %start, %start
  ret void
}

!0 = !{i128 0, i128 3}
!1 = !{}

@LU-JOHN
Copy link
Contributor Author

LU-JOHN commented Mar 26, 2025

PR to fix this crash made in #133095

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:AMDGPU llvm:analysis Includes value tracking, cost tables and constant folding llvm:SelectionDAG SelectionDAGISel as well
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants