Skip to content

[SeparateConstOffsetFromGEP] Decompose constant xor operand if possible #135788

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Jun 10, 2025

Conversation

sgundapa
Copy link
Contributor

@sgundapa sgundapa commented Apr 15, 2025

Try to transform XOR(A, B+C) in to XOR(A,C) + B where XOR(A,C) becomes
the base for memory operations. This transformation is true under the
following conditions
Check 1 - B and C are disjoint.
Check 2 - XOR(A,C) and B are disjoint.

This transformation is beneficial particularly for GEPs because
Disjoint OR operations often map better to addressing modes than XOR.
This can enable further optimizations in the GEP offset folding pipeline

NOTE: This patch is not to be merged, just for evaluation.
@llvmbot
Copy link
Member

llvmbot commented Apr 15, 2025

@llvm/pr-subscribers-backend-amdgpu

@llvm/pr-subscribers-llvm-transforms

Author: Sumanth Gundapaneni (sgundapa)

Changes

Try to transform I = xor(A, C1) into or disjoint(Y, C2) where Y = xor(A, C0) is another existing instruction dominating I,
C2 = C0 ^ C1, and A is known to be disjoint with C2.

%157 = xor i32 %155, 32 // %155 is disjoint with 2048
%161 = xor i32 %155, 2080

is decomposed in to
%157 = xor i32 %155, 32
%161 = or disjoint i32 %157, 2048

This will help GEPs with indexes fed with "xor" be simplified and use a single register for base instead of recomputing for each memory operation.


Full diff: https://github.com/llvm/llvm-project/pull/135788.diff

1 Files Affected:

  • (modified) llvm/lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp (+166)
diff --git a/llvm/lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp b/llvm/lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp
index e048015298461..b0f7c7d862519 100644
--- a/llvm/lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp
+++ b/llvm/lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp
@@ -160,6 +160,7 @@
 #include "llvm/ADT/DenseMap.h"
 #include "llvm/ADT/DepthFirstIterator.h"
 #include "llvm/ADT/SmallVector.h"
+#include "llvm/Analysis/AssumptionCache.h"
 #include "llvm/Analysis/LoopInfo.h"
 #include "llvm/Analysis/MemoryBuiltins.h"
 #include "llvm/Analysis/TargetLibraryInfo.h"
@@ -198,6 +199,8 @@
 using namespace llvm;
 using namespace llvm::PatternMatch;
 
+#define DEBUG_TYPE "separate-offset-gep"
+
 static cl::opt<bool> DisableSeparateConstOffsetFromGEP(
     "disable-separate-const-offset-from-gep", cl::init(false),
     cl::desc("Do not separate the constant offset from a GEP instruction"),
@@ -484,6 +487,9 @@ class SeparateConstOffsetFromGEP {
 
   DenseMap<ExprKey, SmallVector<Instruction *, 2>> DominatingAdds;
   DenseMap<ExprKey, SmallVector<Instruction *, 2>> DominatingSubs;
+
+  bool decomposeXor(Function &F);
+  Value *tryFoldXorToOrDisjoint(Instruction &I);
 };
 
 } // end anonymous namespace
@@ -1162,6 +1168,162 @@ bool SeparateConstOffsetFromGEP::splitGEP(GetElementPtrInst *GEP) {
   return true;
 }
 
+bool SeparateConstOffsetFromGEP::decomposeXor(Function &F) {
+  bool FunctionChanged = false;
+  SmallVector<std::pair<Instruction *, Value *>, 16> ReplacementsToMake;
+
+  for (BasicBlock &BB : F) {
+    for (Instruction &I : BB) {
+      if (I.getOpcode() == Instruction::Xor) {
+        if (Value *Replacement = tryFoldXorToOrDisjoint(I)) {
+          ReplacementsToMake.push_back({&I, Replacement});
+          FunctionChanged = true;
+        }
+      }
+    }
+  }
+
+  if (!ReplacementsToMake.empty()) {
+    LLVM_DEBUG(dbgs() << "Applying " << ReplacementsToMake.size()
+                      << " XOR->OR Disjoint replacements in " << F.getName()
+                      << "\n");
+    for (auto &Pair : ReplacementsToMake) {
+      Pair.first->replaceAllUsesWith(Pair.second);
+    }
+    for (auto &Pair : ReplacementsToMake) {
+      Pair.first->eraseFromParent();
+    }
+  }
+
+  return FunctionChanged;
+}
+
+static llvm::Instruction *findClosestSequentialXor(Value *A, Instruction &I) {
+  llvm::Instruction *ClosestUser = nullptr;
+  for (llvm::User *User : A->users()) {
+    if (auto *UserInst = llvm::dyn_cast<llvm::Instruction>(User)) {
+      if (UserInst->getOpcode() != Instruction::Xor || UserInst == &I)
+        continue;
+      if (!ClosestUser) {
+        ClosestUser = UserInst;
+      } else {
+        // Compare instruction positions.
+        if (UserInst->comesBefore(ClosestUser)) {
+          ClosestUser = UserInst;
+        }
+      }
+    }
+  }
+  return ClosestUser;
+}
+
+/// Try to transform I = xor(A, C1) into or disjoint(Y, C2)
+/// where Y = xor(A, C0) is another existing instruction dominating I,
+/// C2 = C0 ^ C1, and A is known to be disjoint with C2.
+///
+/// @param I  The XOR instruction being visited.
+/// @return The replacement Value* if successful, nullptr otherwise.
+Value *SeparateConstOffsetFromGEP::tryFoldXorToOrDisjoint(Instruction &I) {
+  assert(I.getOpcode() == Instruction::Xor && "Instruction must be XOR");
+
+  // Check if I has at least one GEP user.
+  bool HasGepUser = false;
+  for (User *U : I.users()) {
+    if (isa<GetElementPtrInst>(U)) {
+      HasGepUser = true;
+      break;
+    }
+  }
+  // If no user is a GEP instruction, abort the transformation.
+  if (!HasGepUser) {
+    LLVM_DEBUG(
+        dbgs() << "SeparateConstOffsetFromGEP: Skipping XOR->OR DISJOINT for "
+               << I << " because it has no GEP users.\n");
+    return nullptr;
+  }
+
+  Value *Op0 = I.getOperand(0);
+  Value *Op1 = I.getOperand(1);
+  ConstantInt *C1 = dyn_cast<ConstantInt>(Op1);
+  Value *A = Op0;
+
+  // Bail out of there is not constant operand.
+  if (!C1) {
+    C1 = dyn_cast<ConstantInt>(Op0);
+    if (!C1)
+      return nullptr;
+    A = Op1;
+  }
+
+  if (isa<UndefValue>(A))
+    return nullptr;
+
+  APInt C1_APInt = C1->getValue();
+  unsigned BitWidth = C1_APInt.getBitWidth();
+  Type *Ty = I.getType();
+
+  // --- Step 2: Find Dominating Y = xor A, C0 ---
+  Instruction *FoundUserInst = nullptr; // Instruction Y
+  APInt C0_APInt;
+
+  auto UserInst = findClosestSequentialXor(A, I);
+
+  BinaryOperator *UserBO = cast<BinaryOperator>(UserInst);
+  Value *UserOp0 = UserBO->getOperand(0);
+  Value *UserOp1 = UserBO->getOperand(1);
+  ConstantInt *UserC = nullptr;
+  if (UserOp0 == A)
+    UserC = dyn_cast<ConstantInt>(UserOp1);
+  else if (UserOp1 == A)
+    UserC = dyn_cast<ConstantInt>(UserOp0);
+  if (UserC) {
+    if (DT->dominates(UserInst, &I)) {
+      FoundUserInst = UserInst;
+      C0_APInt = UserC->getValue();
+    }
+  }
+  if (!FoundUserInst)
+    return nullptr;
+
+  // Calculate C2.
+  APInt C2_APInt = C0_APInt ^ C1_APInt;
+
+  // Check Disjointness A & C2 == 0.
+  KnownBits KnownA(BitWidth);
+  AssumptionCache *AC = nullptr;
+  computeKnownBits(A, KnownA, *DL, 0, AC, &I, DT);
+
+  if ((KnownA.Zero & C2_APInt) != C2_APInt)
+    return nullptr;
+
+  IRBuilder<> Builder(&I);
+  Builder.SetInsertPoint(&I); // Access Builder directly
+  Constant *C2_Const = ConstantInt::get(Ty, C2_APInt);
+  Twine Name = I.getName(); // Create Twine explicitly
+  Value *NewOr = BinaryOperator::CreateDisjointOr(FoundUserInst, C2_Const, Name,
+                                                  I.getIterator());
+  // Transformation Conditions Met.
+  LLVM_DEBUG(dbgs() << "SeparateConstOffsetFromGEP: Replacing " << I
+                    << " (used by GEP) with " << *NewOr << " based on "
+                    << *FoundUserInst << "\n");
+
+#if 0
+  // Preserve metadata
+  if (Instruction *NewOrInst = dyn_cast<Instruction>(NewOr)) {
+    NewOrInst->copyMetadata(I);
+  } else {
+    assert(false && "CreateNUWOr did not return an Instruction");
+    if (NewOr)
+      NewOr->deleteValue();
+    return nullptr;
+  }
+#endif
+
+  // Return the replacement value. runOnFunction will handle replacement &
+  // deletion.
+  return NewOr;
+}
+
 bool SeparateConstOffsetFromGEPLegacyPass::runOnFunction(Function &F) {
   if (skipFunction(F))
     return false;
@@ -1181,6 +1343,10 @@ bool SeparateConstOffsetFromGEP::run(Function &F) {
 
   DL = &F.getDataLayout();
   bool Changed = false;
+
+  // Decompose xor in to "or disjoint" if possible.
+  Changed |= decomposeXor(F);
+
   for (BasicBlock &B : F) {
     if (!DT->isReachableFromEntry(&B))
       continue;


auto UserInst = findClosestSequentialXor(A, I);

BinaryOperator *UserBO = cast<BinaryOperator>(UserInst);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add a nullptr check here before I use this instruction.


// Check Disjointness A & C2 == 0.
KnownBits KnownA(BitWidth);
AssumptionCache *AC = nullptr;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to clean this up

#if 0
// Preserve metadata
if (Instruction *NewOrInst = dyn_cast<Instruction>(NewOr)) {
NewOrInst->copyMetadata(I);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will add this to preserve the metadata

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to manually copy the metadata, the IRBuilder will handle the debug-line info. Or ReplaceInstWithInst.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There could be more metadata than the debug lines, but it's not necessarily trivially correct to preserve it all

Instruction *FoundUserInst = nullptr; // Instruction Y
APInt C0_APInt;

auto UserInst = findClosestSequentialXor(A, I);
Copy link
Contributor

@jrbyrnes jrbyrnes Apr 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that we should still decompose the xor (if it can be made disjoint) -- even if there is no preexisting compatible xor. Doing so may end up producing a common base for other xors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try this as a follow up patch to this. Let me know what you think

if (I.getOpcode() == Instruction::Xor) {
if (Value *Replacement = tryFoldXorToOrDisjoint(I)) {
ReplacementsToMake.push_back({&I, Replacement});
FunctionChanged = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need FunctionChanged, just check if !ReplacementsToMake.empty()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

#if 0
// Preserve metadata
if (Instruction *NewOrInst = dyn_cast<Instruction>(NewOr)) {
NewOrInst->copyMetadata(I);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to manually copy the metadata, the IRBuilder will handle the debug-line info. Or ReplaceInstWithInst.

#if 0
// Preserve metadata
if (Instruction *NewOrInst = dyn_cast<Instruction>(NewOr)) {
NewOrInst->copyMetadata(I);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There could be more metadata than the debug lines, but it's not necessarily trivially correct to preserve it all


LLVM_DEBUG(dbgs() << "SeparateConstOffsetFromGEP: Replacing" << I
<< " (used by GEP) with" << *NewOr << " based on"
<< *FoundUserInst << "\n");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<< *FoundUserInst << "\n");
<< *FoundUserInst << '\n');

Comment on lines 1284 to 1288
ConstantInt *UserC = nullptr;
if (UserOp0 == A)
UserC = dyn_cast<ConstantInt>(UserOp1);
else if (UserOp1 == A)
UserC = dyn_cast<ConstantInt>(UserOp0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't handle the vector case, use PatternMatch to make it easier

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will fix it.

Comment on lines 1232 to 1239
// Check if I has at least one GEP user.
bool HasGepUser = false;
for (User *U : I.users()) {
if (isa<GetElementPtrInst>(U)) {
HasGepUser = true;
break;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally shouldn't be looking at users but I also don't see why this cares

Comment on lines 1218 to 1220
/// Try to transform I = xor(A, C1) into or_disjoint(Y, C2)
/// where Y = xor(A, C0) is another existing instruction dominating I,
/// C2 = C1 - C0, and A is known to be disjoint with C2.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't core to what this pass is doing. This should have occurred in previous passes

@sgundapa
Copy link
Contributor Author

sgundapa commented Apr 23, 2025

Thanks for reviewing and I will address them. This patch is a work in progress and is quickly whipped up to validate performance for a customer.

The transformation is not mathematically correct and it happens to work for customer.
In order for the transformation to be correct, more constraints needs to be imposed.

Essentially, this is what I am going to change the patch to.
XOR(A, B+C) = XOR(A,C) + B --> Where XOR(A,C) becomes base for memory operations and B becomes offset.
This is true under these conditions
A , B are disjoint
B, C are disjoint
xor(A,C), B are disjoint.
These constraints can be proved easily, atleast in the ir I have.

I tried doing this in InstCombine and the pass fail to reach a fixed point state and the pass is trying to merge back the two constants in to one. This is one of the reasons why I chose to do it SeparateConstOffset pass.

Sample IR code:

%155 = or disjoint i32 %153, %154
%156 = getelementptr inbounds nuw half, ptr addrspace(3) @global_smem, i32 %155
%157 = xor i32 %155, 32
%158 = getelementptr inbounds nuw half, ptr addrspace(3) @global_smem, i32 %157
%159 = or disjoint i32 %155, 2048
%160 = getelementptr inbounds nuw half, ptr addrspace(3) @global_smem, i32 %159
%161 = xor i32 %155, 2080
%162 = getelementptr inbounds nuw half, ptr addrspace(3) @global_smem, i32 %161
%163 = or disjoint i32 %155, 4096
%164 = getelementptr inbounds nuw half, ptr addrspace(3) @global_smem, i32 %163
%165 = xor i32 %155, 4128
%166 = getelementptr inbounds nuw half, ptr addrspace(3) @global_smem, i32 %165
%167 = or disjoint i32 %155, 6144
%168 = getelementptr inbounds nuw half, ptr addrspace(3) @global_smem, i32 %167
%169 = xor i32 %155, 6176
%170 = getelementptr inbounds nuw half, ptr addrspace(3) @global_smem, i32 %169
%171 = or disjoint i32 %155, 8192
%172 = getelementptr inbounds nuw half, ptr addrspace(3) @global_smem, i32 %171
%173 = xor i32 %155, 8224
%174 = getelementptr inbounds nuw half, ptr addrspace(3) @global_smem, i32 %173

@@ -1181,6 +1359,10 @@ bool SeparateConstOffsetFromGEP::run(Function &F) {

DL = &F.getDataLayout();
bool Changed = false;

// Decompose xor in to "or disjoint" if possible.
Changed |= decomposeXor(F);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to integrate this with the main function walk? I don't see why we need another walk.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will look at it. I can change the main function walk, but it may it more complex. decomposeXor modifies the IR in such a way that the rest of the pass does its things.

@arsenm
Copy link
Contributor

arsenm commented Apr 24, 2025

I tried doing this in InstCombine and the pass fail to reach a fixed point state and the pass is trying to merge back the two constants in to one. This is one of the reasons why I chose to do it SeparateConstOffset pass.

Do you mean cases like the add 0, 0 from the tests? I'm confused about what this is trying to do . You should not be trying to create these. These also should not be in these tests (other than maybe one if you need to check a degenerate case doesn't crash)

@sgundapa
Copy link
Contributor Author

Do you mean cases like the add 0, 0 from the tests?

I added this (add 0,0) as a short cut for the knownbits to compute bits easily. I can rewrite them with variable, and, or to set the values i need.

sgundapa added 2 commits May 13, 2025 14:11
Try to transform XOR(A, B+C) in to XOR(A,C) + B where XOR(A,C) becomes
the base for memory operations. This transformation is true under the
following conditions
Check 1 -  B and C are disjoint.
Check 2 - XOR(A,C) and B are disjoint.

This transformation is beneficial particularly for GEPs because
Disjoint OR operations often map better to addressing modes than XOR.
This can enable further optimizations in the GEP offset folding pipeline
@sgundapa sgundapa requested a review from jmmartinez May 20, 2025 14:47
@sgundapa sgundapa requested review from jmmartinez and jrbyrnes May 21, 2025 22:31
Copy link
Contributor

@jmmartinez jmmartinez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but check my last comment to confirm we either handle unreachable basic-blocks / or that they are impossible.

@jayfoad
Copy link
Contributor

jayfoad commented May 22, 2025

Try to transform I = xor(A, C1) into or disjoint(Y, C2) where Y = xor(A, C0) is another existing instruction dominating I,
C2 = C1 - C0, and A is known to be disjoint with C2.

This transform also requires C0 and C2 to be disjoint, right? So C2 = C1 - C0 can also be written as C2 = C1 ^ C0.

Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

InstCombine already removes these xors. Your examples have them all fold away when run through instcombine. In general we should not need to worry about non-canonical IR in late optimization passes

@arsenm
Copy link
Contributor

arsenm commented May 22, 2025

I tried doing this in InstCombine and the pass fail to reach a fixed point state and the pass is trying to merge back the two constants in to one.

It would be better to review this attempt

This is one of the reasons why I chose to do it SeparateConstOffset pass.

Sample IR code:

%155 = or disjoint i32 %153, %154 %156 = getelementptr inbounds nuw half, ptr addrspace(3) @global_smem, i32 %155 %157 = xor i32 %155, 32 %158 = getelementptr inbounds nuw half, ptr addrspace(3) @global_smem, i32 %157 %159 = or disjoint i32 %155, 2048 %160 = getelementptr inbounds nuw half, ptr addrspace(3) @global_smem, i32 %159 %161 = xor i32 %155, 2080 %162 = getelementptr inbounds nuw half, ptr addrspace(3) @global_smem, i32 %161 %163 = or disjoint i32 %155, 4096 %164 = getelementptr inbounds nuw half, ptr addrspace(3) @global_smem, i32 %163 %165 = xor i32 %155, 4128 %166 = getelementptr inbounds nuw half, ptr addrspace(3) @global_smem, i32 %165 %167 = or disjoint i32 %155, 6144 %168 = getelementptr inbounds nuw half, ptr addrspace(3) @global_smem, i32 %167 %169 = xor i32 %155, 6176 %170 = getelementptr inbounds nuw half, ptr addrspace(3) @global_smem, i32 %169 %171 = or disjoint i32 %155, 8192 %172 = getelementptr inbounds nuw half, ptr addrspace(3) @global_smem, i32 %171 %173 = xor i32 %155, 8224 %174 = getelementptr inbounds nuw half, ptr addrspace(3) @global_smem, i32 %173

Samples that are compilable are always more helpful, this is just a fragment I can't run experiments with

@sgundapa
Copy link
Contributor Author

Try to transform I = xor(A, C1) into or disjoint(Y, C2) where Y = xor(A, C0) is another existing instruction dominating I,
C2 = C1 - C0, and A is known to be disjoint with C2.

This transform also requires C0 and C2 to be disjoint, right? So C2 = C1 - C0 can also be written as C2 = C1 ^ C0.

I have updated the description in a subsequent commit message. This is what I ended up with
"Try to transform XOR(A, B+C) in to XOR(A,C) + B where XOR(A,C) becomes
the base for memory operations. This transformation is true under the
following conditions
Check 1 - B and C are disjoint.
Check 2 - XOR(A,C) and B are disjoint"

I can check for A and B are disjointed as well, but check 1 and check 2 can deduce the same.

@sgundapa sgundapa requested review from Copilot and bcahoon May 27, 2025 14:36
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a transformation that decomposes constant XOR operands into disjoint OR operations, enabling the simplification of GEP indices. Key changes include:

  • New test cases in llvm/test/Transforms/SeparateConstOffsetFromGEP/AMDGPU/xor-to-or-disjoint.ll validating the transformation.
  • Addition of the XorToOrDisjointTransformer class and its integration into the existing SeparateConstOffsetFromGEP transformation.
  • Updates to the transformation pass in llvm/lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp to support the new optimization.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
llvm/test/Transforms/SeparateConstOffsetFromGEP/AMDGPU/xor-to-or-disjoint.ll Adds test cases that verify the xor-to-or disjoint transformation in various scenarios.
llvm/lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp Implements a new transformer class to decompose XOR instructions into OR disjoint instructions and integrates it into the optimization pass.
Comments suppressed due to low confidence (1)

llvm/lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp:1344

  • [nitpick] Adding a comment to clarify the matching pattern used here would help explain its intent and assist future maintainers in understanding the pattern matching logic.
if (match(&I, m_CombineAnd(m_Xor(m_Instruction(Op0), m_ConstantInt(C1)), m_BinOp(MatchedXorOp))) && hasGEPUser(MatchedXorOp))

@sgundapa sgundapa self-assigned this May 27, 2025
@sgundapa sgundapa requested review from jayfoad and arsenm May 29, 2025 19:29
@sgundapa
Copy link
Contributor Author

sgundapa commented Jun 4, 2025

Ping! This is a P1 in downstream ticketing system.

@sgundapa sgundapa merged commit 13ccce2 into llvm:main Jun 10, 2025
11 checks passed
@sgundapa
Copy link
Contributor Author

I will address if the there any concerns, post the merge.

@arsenm
Copy link
Contributor

arsenm commented Jun 10, 2025

I think this should be reverted and abandoned. This has nothing to do with SeparateConstOffsetFromGEP, and is another patch bolted at the front of it. The provided tests are non-canonical IR and I suspect this is just working around the depth limit issue in instcombine that @jrbyrnes was working on

arsenm added a commit that referenced this pull request Jun 10, 2025
…f possible (#135788)"

This reverts commit 13ccce2.

The tests are on non-canonical IR, and adds an extra unrelated
pre-processing step to the pass. I'm assuming this is a workaround
for the known-bits recursion depth limit in instcombine.
@arsenm
Copy link
Contributor

arsenm commented Jun 10, 2025

I have reverted this in ad479dd

@jrbyrnes
Copy link
Contributor

Going to comment on the conceptual side and ignore the implementation details.

I think we need to confirm a few details. Specifically, how does InstCombine handle the relevant cases here when overriding the recursion depth? I believe previous analysis showed that we still have the bad xors. Second, can we combine GEP (xor A, C) -> GEP (xor A, C1), C2. I think previously we have tried to do the "combine", xor A, C -> or disjoint (xor A, C1), C2, but this just gets canonicalized back into a single xor. Maybe if we look into the GEP InstCombine we may have more luck.

Otherwise, I can see how something like this may fit in to SeparateConstOffsetFromGEP. Currently, this pass will look through add,sub and or-disjoint (and ext/trunc of such) to extract a constant. However, I think we found a rule for xors as well: if we can decompose the C in xor A, C to non-disjoint and disjoint components, then we can extract the disjoint component into the GEP. I don't think that feature is too outlandish for a pass which tries to do stuff like that (again, ignoring the implementation details).

@llvm-ci
Copy link
Collaborator

llvm-ci commented Jun 11, 2025

LLVM Buildbot has detected a new failure on builder ppc64le-flang-rhel-clang running on ppc64le-flang-rhel-test while building llvm at step 5 "build-unified-tree".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/157/builds/30432

Here is the relevant piece of the build log for the reference
Step 5 (build-unified-tree) failure: build (failure)
...
5.418 [4/29/91] Building CXX object flang-rt/lib/runtime/CMakeFiles/flang_rt.runtime.static.dir/home/buildbots/llvm-external-buildbots/workers/ppc64le-flang-rhel-test/ppc64le-flang-rhel-clang-build/llvm-project/flang/lib/Decimal/decimal-to-binary.cpp.o
5.428 [4/28/92] Building CXX object flang-rt/lib/runtime/CMakeFiles/flang_rt.runtime.static.dir/external-unit.cpp.o
5.455 [4/27/93] Linking CXX shared library openmp/tools/archer/libarcher.so
5.464 [4/26/94] Building CXX object openmp/tools/archer/CMakeFiles/archer_static.dir/ompt-tsan.cpp.o
5.476 [3/26/95] Linking CXX static library openmp/tools/archer/libarcher_static.a
5.547 [3/25/96] Building CXX object flang-rt/lib/runtime/CMakeFiles/flang_rt.runtime.static.dir/assign.cpp.o
5.635 [3/24/97] Building CXX object flang-rt/lib/runtime/CMakeFiles/flang_rt.runtime.static.dir/edit-input.cpp.o
5.952 [3/23/98] Building CXX object openmp/runtime/src/CMakeFiles/omp.dir/kmp_runtime.cpp.o
6.144 [3/22/99] Building CXX object flang-rt/lib/runtime/CMakeFiles/flang_rt.runtime.static.dir/io-api-minimal.cpp.o
6.215 [3/21/100] Building CXX object flang-rt/lib/runtime/CMakeFiles/flang_rt.runtime.static.dir/unit.cpp.o
FAILED: flang-rt/lib/runtime/CMakeFiles/flang_rt.runtime.static.dir/unit.cpp.o 
/home/buildbots/llvm-external-buildbots/workers/ppc64le-flang-rhel-test/ppc64le-flang-rhel-clang-build/build/bin/clang++ --target=powerpc64le-unknown-linux-gnu -D_DEBUG -D_GLIBCXX_ASSERTIONS -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/buildbots/llvm-external-buildbots/workers/ppc64le-flang-rhel-test/ppc64le-flang-rhel-clang-build/llvm-project/flang-rt/include -I/home/buildbots/llvm-external-buildbots/workers/ppc64le-flang-rhel-test/ppc64le-flang-rhel-clang-build/llvm-project/flang-rt/../flang/include -I/home/buildbots/llvm-external-buildbots/workers/ppc64le-flang-rhel-test/ppc64le-flang-rhel-clang-build/build/runtimes/runtimes-bins/flang-rt -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -std=gnu++17 -UNDEBUG -fno-lto -fno-exceptions -fno-rtti -funwind-tables -fno-asynchronous-unwind-tables -U_GLIBCXX_ASSERTIONS -U_LIBCPP_ENABLE_ASSERTIONS -MD -MT flang-rt/lib/runtime/CMakeFiles/flang_rt.runtime.static.dir/unit.cpp.o -MF flang-rt/lib/runtime/CMakeFiles/flang_rt.runtime.static.dir/unit.cpp.o.d -o flang-rt/lib/runtime/CMakeFiles/flang_rt.runtime.static.dir/unit.cpp.o -c /home/buildbots/llvm-external-buildbots/workers/ppc64le-flang-rhel-test/ppc64le-flang-rhel-clang-build/llvm-project/flang-rt/lib/runtime/unit.cpp
PHI nodes not grouped at top of basic block!
  %niter = phi i64 [ 0, %for.cond2.preheader.us.i.new ], [ %niter.next.7, %for.body5.us.i ]
label %for.body5.us.i
in function _ZN7Fortran7runtime2io16ExternalFileUnit4EmitEPKcmmRNS1_14IoErrorHandlerE
fatal error: error in backend: Broken function found, compilation aborted!
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.	Program arguments: /home/buildbots/llvm-external-buildbots/workers/ppc64le-flang-rhel-test/ppc64le-flang-rhel-clang-build/build/bin/clang++ --target=powerpc64le-unknown-linux-gnu -D_DEBUG -D_GLIBCXX_ASSERTIONS -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/buildbots/llvm-external-buildbots/workers/ppc64le-flang-rhel-test/ppc64le-flang-rhel-clang-build/llvm-project/flang-rt/include -I/home/buildbots/llvm-external-buildbots/workers/ppc64le-flang-rhel-test/ppc64le-flang-rhel-clang-build/llvm-project/flang-rt/../flang/include -I/home/buildbots/llvm-external-buildbots/workers/ppc64le-flang-rhel-test/ppc64le-flang-rhel-clang-build/build/runtimes/runtimes-bins/flang-rt -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -std=gnu++17 -UNDEBUG -fno-lto -fno-exceptions -fno-rtti -funwind-tables -fno-asynchronous-unwind-tables -U_GLIBCXX_ASSERTIONS -U_LIBCPP_ENABLE_ASSERTIONS -MD -MT flang-rt/lib/runtime/CMakeFiles/flang_rt.runtime.static.dir/unit.cpp.o -MF flang-rt/lib/runtime/CMakeFiles/flang_rt.runtime.static.dir/unit.cpp.o.d -o flang-rt/lib/runtime/CMakeFiles/flang_rt.runtime.static.dir/unit.cpp.o -c /home/buildbots/llvm-external-buildbots/workers/ppc64le-flang-rhel-test/ppc64le-flang-rhel-clang-build/llvm-project/flang-rt/lib/runtime/unit.cpp
1.	<eof> parser at end of file
2.	Code generation
3.	Running pass 'Function Pass Manager' on module '/home/buildbots/llvm-external-buildbots/workers/ppc64le-flang-rhel-test/ppc64le-flang-rhel-clang-build/llvm-project/flang-rt/lib/runtime/unit.cpp'.
4.	Running pass 'Module Verifier' on function '@_ZN7Fortran7runtime2io16ExternalFileUnit4EmitEPKcmmRNS1_14IoErrorHandlerE'
 #0 0x0000000133eb5dcc llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/home/buildbots/llvm-external-buildbots/workers/ppc64le-flang-rhel-test/ppc64le-flang-rhel-clang-build/build/bin/clang+++0x4e05dcc)
 #1 0x0000000133eb6584 PrintStackTraceSignalHandler(void*) Signals.cpp:0:0
 #2 0x0000000133eb2ee0 llvm::sys::RunSignalHandlers() (/home/buildbots/llvm-external-buildbots/workers/ppc64le-flang-rhel-test/ppc64le-flang-rhel-clang-build/build/bin/clang+++0x4e02ee0)
 #3 0x0000000133eb4ad4 llvm::sys::CleanupOnSignal(unsigned long) (/home/buildbots/llvm-external-buildbots/workers/ppc64le-flang-rhel-test/ppc64le-flang-rhel-clang-build/build/bin/clang+++0x4e04ad4)
 #4 0x0000000133df46bc (anonymous namespace)::CrashRecoveryContextImpl::HandleCrash(int, unsigned long) CrashRecoveryContext.cpp:0:0
 #5 0x0000000133df4614 llvm::CrashRecoveryContext::HandleExit(int) (/home/buildbots/llvm-external-buildbots/workers/ppc64le-flang-rhel-test/ppc64le-flang-rhel-clang-build/build/bin/clang+++0x4d44614)
 #6 0x0000000133eae734 llvm::sys::Process::Exit(int, bool) (/home/buildbots/llvm-external-buildbots/workers/ppc64le-flang-rhel-test/ppc64le-flang-rhel-clang-build/build/bin/clang+++0x4dfe734)
 #7 0x00000001329e6490 LLVMErrorHandler(void*, char const*, bool) cc1_main.cpp:0:0
 #8 0x0000000133dfe804 llvm::report_fatal_error(llvm::Twine const&, bool) (/home/buildbots/llvm-external-buildbots/workers/ppc64le-flang-rhel-test/ppc64le-flang-rhel-clang-build/build/bin/clang+++0x4d4e804)
 #9 0x0000000133dfe66c llvm::report_fatal_error(char const*, bool) (/home/buildbots/llvm-external-buildbots/workers/ppc64le-flang-rhel-test/ppc64le-flang-rhel-clang-build/build/bin/clang+++0x4d4e66c)
#10 0x0000000133916414 (anonymous namespace)::VerifierLegacyPass::runOnFunction(llvm::Function&) Verifier.cpp:0:0
#11 0x0000000133832350 llvm::FPPassManager::runOnFunction(llvm::Function&) (/home/buildbots/llvm-external-buildbots/workers/ppc64le-flang-rhel-test/ppc64le-flang-rhel-clang-build/build/bin/clang+++0x4782350)
#12 0x000000013383c1a8 llvm::FPPassManager::runOnModule(llvm::Module&) (/home/buildbots/llvm-external-buildbots/workers/ppc64le-flang-rhel-test/ppc64le-flang-rhel-clang-build/build/bin/clang+++0x478c1a8)
#13 0x0000000133832df4 llvm::legacy::PassManagerImpl::run(llvm::Module&) (/home/buildbots/llvm-external-buildbots/workers/ppc64le-flang-rhel-test/ppc64le-flang-rhel-clang-build/build/bin/clang+++0x4782df4)
#14 0x000000013383c79c llvm::legacy::PassManager::run(llvm::Module&) (/home/buildbots/llvm-external-buildbots/workers/ppc64le-flang-rhel-test/ppc64le-flang-rhel-clang-build/build/bin/clang+++0x478c79c)
#15 0x000000013478c4bc clang::emitBackendOutput(clang::CompilerInstance&, clang::CodeGenOptions&, llvm::StringRef, llvm::Module*, clang::BackendAction, llvm::IntrusiveRefCntPtr<llvm::vfs::FileSystem>, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream>>, clang::BackendConsumer*) (/home/buildbots/llvm-external-buildbots/workers/ppc64le-flang-rhel-test/ppc64le-flang-rhel-clang-build/build/bin/clang+++0x56dc4bc)
#16 0x00000001347aabd8 clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) (/home/buildbots/llvm-external-buildbots/workers/ppc64le-flang-rhel-test/ppc64le-flang-rhel-clang-build/build/bin/clang+++0x56fabd8)
#17 0x00000001368edc10 clang::ParseAST(clang::Sema&, bool, bool) (/home/buildbots/llvm-external-buildbots/workers/ppc64le-flang-rhel-test/ppc64le-flang-rhel-clang-build/build/bin/clang+++0x783dc10)
#18 0x0000000134e67eec clang::ASTFrontendAction::ExecuteAction() (/home/buildbots/llvm-external-buildbots/workers/ppc64le-flang-rhel-test/ppc64le-flang-rhel-clang-build/build/bin/clang+++0x5db7eec)
#19 0x00000001347b0844 clang::CodeGenAction::ExecuteAction() (/home/buildbots/llvm-external-buildbots/workers/ppc64le-flang-rhel-test/ppc64le-flang-rhel-clang-build/build/bin/clang+++0x5700844)
#20 0x0000000134e674a0 clang::FrontendAction::Execute() (/home/buildbots/llvm-external-buildbots/workers/ppc64le-flang-rhel-test/ppc64le-flang-rhel-clang-build/build/bin/clang+++0x5db74a0)
#21 0x0000000134dcb670 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (/home/buildbots/llvm-external-buildbots/workers/ppc64le-flang-rhel-test/ppc64le-flang-rhel-clang-build/build/bin/clang+++0x5d1b670)
#22 0x0000000134f8ce90 clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (/home/buildbots/llvm-external-buildbots/workers/ppc64le-flang-rhel-test/ppc64le-flang-rhel-clang-build/build/bin/clang+++0x5edce90)
#23 0x00000001329e5a84 cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (/home/buildbots/llvm-external-buildbots/workers/ppc64le-flang-rhel-test/ppc64le-flang-rhel-clang-build/build/bin/clang+++0x3935a84)
#24 0x00000001329e16e0 ExecuteCC1Tool(llvm::SmallVectorImpl<char const*>&, llvm::ToolContext const&) driver.cpp:0:0

rorth pushed a commit to rorth/llvm-project that referenced this pull request Jun 11, 2025
…le (llvm#135788)

Try to transform XOR(A, B+C) in to XOR(A,C) + B where XOR(A,C) becomes
the base for memory operations. This transformation is true under the
following conditions
Check 1 -  B and C are disjoint.
Check 2 - XOR(A,C) and B are disjoint.

This transformation is beneficial particularly for GEPs because
Disjoint OR operations often map better to addressing modes than XOR.
This can enable further optimizations in the GEP offset folding pipeline
rorth pushed a commit to rorth/llvm-project that referenced this pull request Jun 11, 2025
…f possible (llvm#135788)"

This reverts commit 13ccce2.

The tests are on non-canonical IR, and adds an extra unrelated
pre-processing step to the pass. I'm assuming this is a workaround
for the known-bits recursion depth limit in instcombine.
@sgundapa
Copy link
Contributor Author

Increased recursion depth was needed to prove the disjointedness. Jeff has been involved to work on the issues related to recursion depth. For the Triton use case w.r.t Linear Layout, the addressing mode they chose to represent lane/warp/block/reg offset involves xors.
The unit tests are written to test the pass. I can write an unit test that can run both InstCombine and SeparateConstOffset pass, if needed.

@shiltian
Copy link
Contributor

shiltian commented Jun 11, 2025

Ping! This is a P1 in downstream ticketing system.

This PR has NEVER been approved after the request for changes so you can't just merge it, even if it is blocking anything.

tomtor pushed a commit to tomtor/llvm-project that referenced this pull request Jun 14, 2025
…le (llvm#135788)

Try to transform XOR(A, B+C) in to XOR(A,C) + B where XOR(A,C) becomes
the base for memory operations. This transformation is true under the
following conditions
Check 1 -  B and C are disjoint.
Check 2 - XOR(A,C) and B are disjoint.

This transformation is beneficial particularly for GEPs because
Disjoint OR operations often map better to addressing modes than XOR.
This can enable further optimizations in the GEP offset folding pipeline
tomtor pushed a commit to tomtor/llvm-project that referenced this pull request Jun 14, 2025
…f possible (llvm#135788)"

This reverts commit 13ccce2.

The tests are on non-canonical IR, and adds an extra unrelated
pre-processing step to the pass. I'm assuming this is a workaround
for the known-bits recursion depth limit in instcombine.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants