Skip to content

[SDAG] Adding scalarization of llvm.vector.insert #71614

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
Original file line number Diff line number Diff line change
Expand Up @@ -809,6 +809,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
// Vector Operand Scalarization: <1 x ty> -> ty.
bool ScalarizeVectorOperand(SDNode *N, unsigned OpNo);
SDValue ScalarizeVecOp_BITCAST(SDNode *N);
SDValue ScalarizeVecOp_INSERT_SUBVECTOR(SDNode *N, unsigned OpNo);
SDValue ScalarizeVecOp_UnaryOp(SDNode *N);
SDValue ScalarizeVecOp_UnaryOp_StrictFP(SDNode *N);
SDValue ScalarizeVecOp_CONCAT_VECTORS(SDNode *N);
Expand Down
26 changes: 24 additions & 2 deletions llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -675,6 +675,9 @@ bool DAGTypeLegalizer::ScalarizeVectorOperand(SDNode *N, unsigned OpNo) {
case ISD::BITCAST:
Res = ScalarizeVecOp_BITCAST(N);
break;
case ISD::INSERT_SUBVECTOR:
Res = ScalarizeVecOp_INSERT_SUBVECTOR(N, OpNo);
break;
case ISD::ANY_EXTEND:
case ISD::ZERO_EXTEND:
case ISD::SIGN_EXTEND:
Expand Down Expand Up @@ -766,6 +769,22 @@ SDValue DAGTypeLegalizer::ScalarizeVecOp_BITCAST(SDNode *N) {
N->getValueType(0), Elt);
}

/// If the value to subvector is a vector that needs to be scalarized, it must
/// be <1 x ty>. Return the element instead.
SDValue DAGTypeLegalizer::ScalarizeVecOp_INSERT_SUBVECTOR(SDNode *N,
unsigned OpNo) {
// If the destination vector is unary, we can just return the source vector
auto Src = GetScalarizedVector(N->getOperand(1));
if (OpNo == 0) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this ever be called with OpNo==0? Wouldn't ScalarVecRes_INSERT_SUBVECTOR be called first? The first operand the result must have the same type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not certain, it would depend where the type propagation occurs. My assumption is that this could iteratively come from a previously operation now scalarized without the result type necessarily being updated (which is only an asumption). Do you have a suggestion for how to verify this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Result types are always legalized first and need to update the first operand along with them. You could try just asserting OpNo == 1

return Src;
}

auto Dest = N->getOperand(0);
auto Idx = N->getOperand(2);
return DAG.getNode(ISD::INSERT_VECTOR_ELT, SDLoc(N), N->getValueType(0), Dest,
Src, Idx);
}

/// If the input is a vector that needs to be scalarized, it must be <1 x ty>.
/// Do the operation on the element instead.
SDValue DAGTypeLegalizer::ScalarizeVecOp_UnaryOp(SDNode *N) {
Expand Down Expand Up @@ -5891,8 +5910,11 @@ SDValue DAGTypeLegalizer::WidenVecRes_SETCC(SDNode *N) {
InOp1 = GetWidenedVector(InOp1);
InOp2 = GetWidenedVector(InOp2);
} else {
InOp1 = DAG.WidenVector(InOp1, SDLoc(N));
InOp2 = DAG.WidenVector(InOp2, SDLoc(N));
do {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Type legalization doesn't operate incrementally in individual steps. Newly legalized operations are supposed to be re-legalized later if necessary

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arsenm

For having looked at this with Rob, the problem we are facing here is:

  • The legalization strategy looks like this:
    • v1i16 is marked as widen by AArch64 backend and the target type is v4i16
    • v1f16 is marked as scalarize and the target type is f16
      So the legalization strategies don't match and we end up in this block.
      Now, the problem is a single call to WidenVector will push v1f16 to v2f16 and not the expected v4f16 and the assert line 5900 (of the old code) would break.

There may be a better way to fix that but the newly legalized operations won't get a chance to be re-legalized.

Open to suggestions, admittedly I went with a quick fix here.

FWIW, GISel doesn't have this issue.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The widening is only really intended to apply to integer types. A v1i1 setcc v1f16 having been turned into a v1i16 setcc v1f16 would be better to scalarize from there if we can tell it that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the correct way to handle this is by inserting an explicit conversion instruction from the widened v2f16 to match v4f16. i.e. create a new INSERT_SUBVECTOR v4f16:undef, (v2f16 GetWidenedVector), 0

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The widening is only really intended to apply to integer types. A v1i1 setcc v1f16 having been turned into a v1i16 setcc v1f16 would be better to scalarize from there if we can tell it that.

Yes, but that's not what the AArch64 lowering wants (see https://github.com/llvm/llvm-project/blob/f0cdf4b468f6ee48b0d0d51ce78145455e2f07a6/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp#L24550).
I believe the rational is v1i16 is not natively supported so we widen to the larger supported vector size.
Changing that would fix this particular issue, but I believe it would create worse code in a bunch of cases.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me re-phrase, we could change the code to explicitly create an insert_subvector here, but I don't see how this is better :).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but that's not what the AArch64 lowering wants (see https://github.com/llvm/llvm-project/blob/f0cdf4b468f6ee48b0d0d51ce78145455e2f07a6/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp#L24550).
I believe the rational is v1i16 is not natively supported so we widen to the larger supported vector size.
Changing that would fix this particular issue, but I believe it would create worse code in a bunch of cases.

Yeah, The point of that code is to act upon integer types, to keep them in vector registers as opposed to scalarizing them to gprs. A fcmp uno <1 x half> is really a floating point operation though, even if it's return type becomes a integer when it gets transformed to a v1i16 setcc v1f16. As an operation it still makes sense to scalarize it, not widen.

I agree that we shouldn't be changing the way that integer types are handled in general, but if there was a way to specify that a setcc with fp operands should really be treated like fp types, that would avoid a lot of the awkwardness here where we attempt to widen it only to scalarize again.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an operation it still makes sense to scalarize it, not widen.

I agree, but could you refresh my memory: is it something we have a control on?

IIRC, the selection of a strategy for type legalization is agnostic of the particular instruction being legalized and happens before the custom lowering, but it's been a while since I touched SDISel.

Copy link
Collaborator

@davemgreen davemgreen Nov 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats always the question, and Im not sure there is a way to specify the types for operations separately. Looking through the example though, it appears that the v1i16 setcc v1f16 is being created in the AArch64 backend inside performVSelectCombine.

I've put up #72048 as a possible fix. It just tries to prevent the generation of v1i16 setcc v1f16, which so long as they dont get generated from anywhere else seems to work OK, even if the code generated it not necessarily optimal. Let me know what you think.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works for me.

InOp1 = DAG.WidenVector(InOp1, SDLoc(N));
InOp2 = DAG.WidenVector(InOp2, SDLoc(N));
} while (ElementCount::isKnownLT(
InOp1.getValueType().getVectorElementCount(), WidenEC));
}

// Assume that the input and output will be widen appropriately. If not,
Expand Down
9 changes: 9 additions & 0 deletions llvm/test/CodeGen/AArch64/aarch64-neon-v1i1-setcc.ll
Original file line number Diff line number Diff line change
Expand Up @@ -67,3 +67,12 @@ if.then:
if.end:
ret i32 1;
}

define dso_local <1 x half> @cmp_select(<1 x half> %i105, <1 x half> %in) {
; CHECK-LABEL: @cmp_select
; CHECL: fcmge
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CHECL->CHECK

newFuncRoot:
%i179 = fcmp uno <1 x half> %i105, zeroinitializer
%i180 = select <1 x i1> %i179, <1 x half> %in, <1 x half> %i105
ret <1 x half> %i180
}