Skip to content

[DAG] Matched FixedWidth pattern for ISD::AVGFLOORU #84903

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Mar 19, 2024
36 changes: 36 additions & 0 deletions llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2826,6 +2826,38 @@ SDValue DAGCombiner::visitADDLike(SDNode *N) {
return SDValue();
}

// Attempt to form ext(avgflooru(A, B)) from add(and(A, B), lshr(xor(A, B), 1))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally the operands of the Add would be treated commutatively - either could be the and and either could be the srl. Same for the and/xor args below.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using sd_match would handle commutative matching for us

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the kind of thing I had in mind:

  if (TLI.isOperationLegal(ISD::AVGFLOORU, VT)) {
    SDValue A, B;
    if (sd_match(N, m_Add(m_And(m_Value(A), m_Value(B)),
                          m_Srl(m_Xor(m_Deferred(A), m_Deferred(B)),
                                m_SpecificInt(1))))) {
      return DAG.getNode(ISD::AVGFLOORU, DL, VT, A, B);
    }
  }

Copy link
Member Author

@Sh0g0-1758 Sh0g0-1758 Mar 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But since m_And and m_Xor are not available in SDPatternMatch, we will have to use llvm::PatternMatch for them. But I don't think if this will work. This is what I get finally and it crashes the build process.

  const TargetLowering &TLI = DAG.getTargetLoweringInfo();
  SDValue N0 = N->getOperand(0);
  EVT VT = N0.getValueType();
  SDLoc DL(N);
  if (TLI.isOperationLegal(ISD::AVGFLOORU, VT)) {
    SDValue A, B;
    if (sd_match(N, llvm::SDPatternMatch::m_Add(llvm::PatternMatch::m_And(m_Value(A), m_Value(B)),
                          llvm::SDPatternMatch::m_Srl(llvm::PatternMatch::m_Xor(m_Deferred(A), m_Deferred(B)),
                                m_SpecificInt(1))))) {
      return DAG.getNode(ISD::AVGFLOORU, DL, VT, A, B);
    }
  }

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added the code for commutative matching, but not using the method as suggested by @RKSimon.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll get m_And / mXor / m_Or added shortly

Copy link
Member Author

@Sh0g0-1758 Sh0g0-1758 Mar 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, sure thing. Then we can use the shortened version as part of this PR itself.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified the code accordingly.

static SDValue combineFixedwidthToAVG(SDNode *N, SelectionDAG &DAG) {
assert(N->getOpcode() == ISD::ADD && "ADD node is required here");
SDValue And = N->getOperand(0);
SDValue Lshr = N->getOperand(1);
if (And.getOpcode() != ISD::AND || Lshr.getOpcode() != ISD::SRL)
return SDValue();
SDValue Xor = Lshr.getOperand(0);
if (Xor.getOpcode() != ISD::XOR)
return SDValue();
SDValue And1 = And.getOperand(0);
SDValue And2 = And.getOperand(1);
SDValue Xor1 = Xor.getOperand(0);
SDValue Xor2 = Xor.getOperand(1);
if (Xor1 != And1 && Xor2 != And2)
return SDValue();
// Is the right shift using an immediate value of 1?
ConstantSDNode *N1C = isConstOrConstSplat(Lshr.getOperand(1));
if (!N1C || N1C->getAPIntValue() != 1)
return SDValue();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be able to vastly simplify this matching code by using the sd_match pattern helper - #84759 will introduce this very soon, but if you're willing to try it you can just add the include and using lines from the patch you should be able to use it here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#84759 has now been committed if you want to try it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would much rather do it as a separate issue after implementing the fixedwidthpatterns for both floor and ceil. But if you want it to be part of this PR only, please do let me know.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think using sd_match is the way to go from the start

Copy link
Member Author

@Sh0g0-1758 Sh0g0-1758 Mar 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing. Please push the code for m_And / mXor / m_Or and I will update accordingly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

saw the updates, updating it.

EVT VT = And.getValueType();
if (VT.isVector())
return SDValue();
SDLoc DL(N);
const TargetLowering &TLI = DAG.getTargetLoweringInfo();
if (!TLI.isOperationLegalOrCustom(ISD::AVGFLOORU, VT))
return SDValue();
return DAG.getNode(ISD::AVGFLOORU, DL, VT,
DAG.getExtOrTrunc(false, And1, DL, VT),
DAG.getExtOrTrunc(false, And2, DL, VT));
}

SDValue DAGCombiner::visitADD(SDNode *N) {
SDValue N0 = N->getOperand(0);
SDValue N1 = N->getOperand(1);
Expand All @@ -2841,6 +2873,10 @@ SDValue DAGCombiner::visitADD(SDNode *N) {
if (SDValue V = foldAddSubOfSignBit(N, DAG))
return V;

// Try to match AVG fixedwidth pattern
if (SDValue V = combineFixedwidthToAVG(N, DAG))
return V;

// fold (a+b) -> (a|b) iff a and b share no bits.
if ((!LegalOperations || TLI.isOperationLegal(ISD::OR, VT)) &&
DAG.haveNoCommonBitsSet(N0, N1))
Expand Down
15 changes: 15 additions & 0 deletions llvm/test/CodeGen/AArch64/hadd-combine.ll
Original file line number Diff line number Diff line change
Expand Up @@ -859,6 +859,21 @@ define <4 x i32> @urhadd_v4i32(<4 x i32> %x) {
ret <4 x i32> %r
}

define <4 x i32> @fixedwidth(<4 x i32> %a0, <4 x i32> %a1) {
; CHECK-LABEL: fixedwidth:
; CHECK: // %bb.0:
; CHECK-NEXT: and v2.16b, v0.16b, v1.16b
; CHECK-NEXT: eor v0.16b, v0.16b, v1.16b
; CHECK-NEXT: usra v2.4s, v0.4s, #1
; CHECK-NEXT: mov v0.16b, v2.16b
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any idea why this is not producing a single uhadd instruction?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just ran the automation script for the tests.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me rephrase that :) - this test case should be creating an uhadd (which is the aarch64 spelling for AVGFLOORU). Currently it's just producing the same as without the patch. I think it's because of the vector type check.

Copy link
Member Author

@Sh0g0-1758 Sh0g0-1758 Mar 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, indeed but whenever I run the update_llc_test_checks.py on the test case :

define <4 x i32> @uhadd_fixedwidth_v4i32(<4 x i32> %a0, <4 x i32> %a1)  {
  %and = and <4 x i32> %a0, %a1
  %xor = xor <4 x i32> %a0, %a1
  %srl = lshr <4 x i32> %xor, <i32 1,i32 1,i32 1,i32 1>
  %res = add <4 x i32> %and, %srl
  ret <4 x i32> %res
}

I get the same check assertions for both when I add the change in VT to vector type and when I don't. The only difference is that in the former, the tests pass. Is there something else that needs to be done to generate the correct assertions or is there some change in the original DAGCombiner.cpp required?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I wrote the correct check assertions manually now with the original value type and the checks are passing now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the automation script update_llc_test_checks.py needs to be changed?

; CHECK-NEXT: ret
%and = and <4 x i32> %a0, %a1
%xor = xor <4 x i32> %a0, %a1
%srl = lshr <4 x i32> %xor, <i32 1,i32 1,i32 1,i32 1>
%res = add <4 x i32> %and, %srl
ret <4 x i32> %res
}

declare <8 x i8> @llvm.aarch64.neon.shadd.v8i8(<8 x i8>, <8 x i8>)
declare <4 x i16> @llvm.aarch64.neon.shadd.v4i16(<4 x i16>, <4 x i16>)
declare <2 x i32> @llvm.aarch64.neon.shadd.v2i32(<2 x i32>, <2 x i32>)
Expand Down