Skip to content

[GlobalISel] Combine [S,U]SUBO #116489

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Nov 18, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h
Original file line number Diff line number Diff line change
Expand Up @@ -939,6 +939,9 @@ class CombinerHelper {
// merge_values(_, zero) -> zext
bool matchMergeXAndZero(const MachineInstr &MI, BuildFnTy &MatchInfo);

// overflow sub
bool matchSuboCarryOut(const MachineInstr &MI, BuildFnTy &MatchInfo);

private:
/// Checks for legality of an indexed variant of \p LdSt.
bool isIndexedLoadStoreLegal(GLoadStore &LdSt) const;
Expand Down
17 changes: 17 additions & 0 deletions llvm/include/llvm/CodeGen/GlobalISel/GenericMachineInstrs.h
Original file line number Diff line number Diff line change
Expand Up @@ -486,6 +486,23 @@ class GAddCarryOut : public GBinOpCarryOut {
}
};

/// Represents overflowing sub operations.
/// G_USUBO, G_SSUBO
class GSubCarryOut : public GBinOpCarryOut {
public:
bool isSigned() const { return getOpcode() == TargetOpcode::G_SSUBO; }

static bool classof(const MachineInstr *MI) {
switch (MI->getOpcode()) {
case TargetOpcode::G_USUBO:
case TargetOpcode::G_SSUBO:
return true;
default:
return false;
}
}
};

/// Represents overflowing add/sub operations that also consume a carry-in.
/// G_UADDE, G_SADDE, G_USUBE, G_SSUBE
class GAddSubCarryInOut : public GAddSubCarryOut {
Expand Down
16 changes: 14 additions & 2 deletions llvm/include/llvm/Target/GlobalISel/Combine.td
Original file line number Diff line number Diff line change
Expand Up @@ -1385,6 +1385,12 @@ def match_addos : GICombineRule<
[{ return Helper.matchAddOverflow(*${root}, ${matchinfo}); }]),
(apply [{ Helper.applyBuildFn(*${root}, ${matchinfo}); }])>;

def match_subo_no_overflow : GICombineRule<
(defs root:$root, build_fn_matchinfo:$matchinfo),
(match (wip_match_opcode G_SSUBO, G_USUBO):$root,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you just use match instead of wip_match_opcode?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Firstly, it was copy-pasted from match_addos above. Secondly, I believe we can match only one tree/graph per match. I can split it into two combines if you want.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like it either. There is no prior-art for matches with two output operands.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a big deal, we can try to covert them en-masse in another change.

Copy link
Author

@tschuett tschuett Nov 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My real/only issue with wip_match_opcodeis when the docstring says, e.g. G_FNEG(G_BUILD_VECTOR), and the user is doing wip_match_opcode with G_FNEG. It looses precision.

[{ return Helper.matchSuboCarryOut(*${root}, ${matchinfo}); }]),
(apply [{ Helper.applyBuildFn(*${root}, ${matchinfo}); }])>;

def match_extract_of_element_undef_vector: GICombineRule <
(defs root:$root),
(match (G_IMPLICIT_DEF $vector),
Expand Down Expand Up @@ -1901,6 +1907,12 @@ def cmp_combines: GICombineGroup<[
redundant_binop_in_equality
]>;


def overflow_combines: GICombineGroup<[
match_addos,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These aren't artifacts.

Copy link
Author

@tschuett tschuett Nov 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the legalizer builds addos while narrowing, e.g., adds:

OpO = TargetOpcode::G_UADDO;

Maybe artifacts is too strict. I stated in the summary that we import addos and the legalizer builds them while legalizing non addos.

Copy link
Contributor

@aemerson aemerson Nov 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That doesn't make them artifacts. There are lots of instructions that the legalizer builds like loads/stores but those aren't artifacts either.

Copy link
Author

@tschuett tschuett Nov 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I need a mechanism to combine addos in AArch64PostLegalizerCombiner and I want to have a group where I/we can add more combines that run in the post legalizer combiner. Is it sufficient to rename artifact_combines to something like post_legalizer_combines ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't name these as post-legalizer combines in this file. Each target defines their own pipeline. Name the groups based on their category and then refer to them in any custom groups needed within AArch64Combine.td

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I splitted them again into two topic groups.

match_subo_no_overflow
]>;

// FIXME: These should use the custom predicate feature once it lands.
def undef_combines : GICombineGroup<[undef_to_fp_zero, undef_to_int_zero,
undef_to_negative_one,
Expand Down Expand Up @@ -1984,9 +1996,9 @@ def all_combines : GICombineGroup<[integer_reassoc_combines, trivial_combines,
and_or_disjoint_mask, fma_combines, fold_binop_into_select,
sub_add_reg, select_to_minmax,
fsub_to_fneg, commute_constant_to_rhs, match_ands, match_ors,
combine_concat_vector, match_addos,
combine_concat_vector,
sext_trunc, zext_trunc, prefer_sign_combines, shuffle_combines,
combine_use_vector_truncate, merge_combines]>;
combine_use_vector_truncate, merge_combines, overflow_combines]>;

// A combine group used to for prelegalizer combiners at -O0. The combines in
// this group have been selected based on experiments to balance code size and
Expand Down
75 changes: 75 additions & 0 deletions llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -7790,3 +7790,78 @@ bool CombinerHelper::matchShuffleDisjointMask(MachineInstr &MI,

return true;
}

bool CombinerHelper::matchSuboCarryOut(const MachineInstr &MI,
BuildFnTy &MatchInfo) {
const GSubCarryOut *Subo = cast<GSubCarryOut>(&MI);

Register Dst = Subo->getReg(0);
Register LHS = Subo->getLHSReg();
Register RHS = Subo->getRHSReg();
Register Carry = Subo->getCarryOutReg();
LLT DstTy = MRI.getType(Dst);
LLT CarryTy = MRI.getType(Carry);

// Check legality before known bits.
if (!isLegalOrBeforeLegalizer({TargetOpcode::G_SUB, {DstTy}}) ||
!isConstantLegalOrBeforeLegalizer(CarryTy))
return false;

ConstantRange KBLHS =
ConstantRange::fromKnownBits(KB->getKnownBits(LHS),
/* IsSigned=*/Subo->isSigned());
ConstantRange KBRHS =
ConstantRange::fromKnownBits(KB->getKnownBits(RHS),
/* IsSigned=*/Subo->isSigned());

if (Subo->isSigned()) {
// G_SSUBO
switch (KBLHS.signedSubMayOverflow(KBRHS)) {
case ConstantRange::OverflowResult::MayOverflow:
return false;
case ConstantRange::OverflowResult::NeverOverflows: {
MatchInfo = [=](MachineIRBuilder &B) {
B.buildSub(Dst, LHS, RHS, MachineInstr::MIFlag::NoSWrap);
B.buildConstant(Carry, 0);
};
return true;
}
case ConstantRange::OverflowResult::AlwaysOverflowsLow:
case ConstantRange::OverflowResult::AlwaysOverflowsHigh: {
MatchInfo = [=](MachineIRBuilder &B) {
B.buildSub(Dst, LHS, RHS);
B.buildConstant(Carry, getICmpTrueVal(getTargetLowering(),
/*isVector=*/CarryTy.isVector(),
/*isFP=*/false));
};
return true;
}
}
return false;
}

// G_USUBO
switch (KBLHS.unsignedSubMayOverflow(KBRHS)) {
case ConstantRange::OverflowResult::MayOverflow:
return false;
case ConstantRange::OverflowResult::NeverOverflows: {
MatchInfo = [=](MachineIRBuilder &B) {
B.buildSub(Dst, LHS, RHS, MachineInstr::MIFlag::NoUWrap);
B.buildConstant(Carry, 0);
};
return true;
}
case ConstantRange::OverflowResult::AlwaysOverflowsLow:
case ConstantRange::OverflowResult::AlwaysOverflowsHigh: {
MatchInfo = [=](MachineIRBuilder &B) {
B.buildSub(Dst, LHS, RHS);
B.buildConstant(Carry, getICmpTrueVal(getTargetLowering(),
/*isVector=*/CarryTy.isVector(),
/*isFP=*/false));
};
return true;
}
}

return false;
}
6 changes: 3 additions & 3 deletions llvm/lib/Target/AArch64/AArch64Combine.td
Original file line number Diff line number Diff line change
Expand Up @@ -322,13 +322,13 @@ def AArch64PostLegalizerCombiner
extractvecelt_pairwise_add, redundant_or,
mul_const, redundant_sext_inreg,
form_bitfield_extract, rotate_out_of_range,
icmp_to_true_false_known_bits,
select_combines, fold_merge_to_zext,
icmp_to_true_false_known_bits, overflow_combines,
select_combines, fold_merge_to_zext, merge_combines,
constant_fold_binops, identity_combines,
ptr_add_immed_chain, overlapping_and,
split_store_zero_128, undef_combines,
select_to_minmax, or_to_bsp, combine_concat_vector,
commute_constant_to_rhs, merge_combines,
commute_constant_to_rhs,
push_freeze_to_prevent_poison_from_propagating,
combine_mul_cmlt, combine_use_vector_truncate]> {
}
101 changes: 101 additions & 0 deletions llvm/test/CodeGen/AArch64/GlobalISel/combine-overflow.mir
Original file line number Diff line number Diff line change
Expand Up @@ -176,3 +176,104 @@ body: |
$q1 = COPY %o_wide
RET_ReallyLR implicit $w0
...
---
name: sub_may
body: |
bb.0:
liveins: $w0, $w1
; CHECK-LABEL: name: sub_may
; CHECK: liveins: $w0, $w1
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: [[COPY:%[0-9]+]]:_(s32) = COPY $w0
; CHECK-NEXT: %const:_(s32) = G_CONSTANT i32 512
; CHECK-NEXT: %sub:_(s32), %o:_(s1) = G_SSUBO [[COPY]], %const
; CHECK-NEXT: %o_wide:_(s32) = G_ZEXT %o(s1)
; CHECK-NEXT: $w0 = COPY %sub(s32)
; CHECK-NEXT: $w1 = COPY %o_wide(s32)
; CHECK-NEXT: RET_ReallyLR implicit $w0
%0:_(s32) = COPY $w0
%const:_(s32) = G_CONSTANT i32 512
%sub:_(s32), %o:_(s1) = G_SSUBO %0, %const
%o_wide:_(s32) = G_ZEXT %o(s1)
$w0 = COPY %sub(s32)
$w1 = COPY %o_wide
RET_ReallyLR implicit $w0
...
---
name: usub_may
body: |
bb.0:
liveins: $w0, $w1
; CHECK-LABEL: name: usub_may
; CHECK: liveins: $w0, $w1
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: [[COPY:%[0-9]+]]:_(s32) = COPY $w0
; CHECK-NEXT: %const:_(s32) = G_CONSTANT i32 512
; CHECK-NEXT: %sub:_(s32), %o:_(s1) = G_USUBO [[COPY]], %const
; CHECK-NEXT: %o_wide:_(s32) = G_ZEXT %o(s1)
; CHECK-NEXT: $w0 = COPY %sub(s32)
; CHECK-NEXT: $w1 = COPY %o_wide(s32)
; CHECK-NEXT: RET_ReallyLR implicit $w0
%0:_(s32) = COPY $w0
%const:_(s32) = G_CONSTANT i32 512
%sub:_(s32), %o:_(s1) = G_USUBO %0, %const
%o_wide:_(s32) = G_ZEXT %o(s1)
$w0 = COPY %sub(s32)
$w1 = COPY %o_wide
RET_ReallyLR implicit $w0
...
---
name: usub_may_carry_s11
body: |
bb.0:
liveins: $w0, $w1
; CHECK-LABEL: name: usub_may_carry_s11
; CHECK: liveins: $w0, $w1
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: [[COPY:%[0-9]+]]:_(s32) = COPY $w0
; CHECK-NEXT: %const:_(s32) = G_CONSTANT i32 512
; CHECK-NEXT: %sub:_(s32), %o:_(s11) = G_USUBO [[COPY]], %const
; CHECK-NEXT: %o_wide:_(s32) = G_ZEXT %o(s11)
; CHECK-NEXT: $w0 = COPY %sub(s32)
; CHECK-NEXT: $w1 = COPY %o_wide(s32)
; CHECK-NEXT: RET_ReallyLR implicit $w0
%0:_(s32) = COPY $w0
%const:_(s32) = G_CONSTANT i32 512
%sub:_(s32), %o:_(s11) = G_USUBO %0, %const
%o_wide:_(s32) = G_ZEXT %o(s11)
$w0 = COPY %sub(s32)
$w1 = COPY %o_wide
RET_ReallyLR implicit $w0
...
---
name: usub_may_carry_s11_vector
body: |
bb.0:
liveins: $w0, $w1
; CHECK-LABEL: name: usub_may_carry_s11_vector
; CHECK: liveins: $w0, $w1
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: [[COPY:%[0-9]+]]:_(s32) = COPY $w0
; CHECK-NEXT: [[COPY1:%[0-9]+]]:_(s32) = COPY $w0
; CHECK-NEXT: [[COPY2:%[0-9]+]]:_(s32) = COPY $w0
; CHECK-NEXT: %const:_(s32) = G_CONSTANT i32 512
; CHECK-NEXT: %bv:_(<4 x s32>) = G_BUILD_VECTOR %const(s32), [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32)
; CHECK-NEXT: %bv1:_(<4 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), %const(s32)
; CHECK-NEXT: %sub:_(<4 x s32>), %o:_(<4 x s11>) = G_USUBO %bv, %bv1
; CHECK-NEXT: %o_wide:_(<4 x s32>) = G_ZEXT %o(<4 x s11>)
; CHECK-NEXT: $q0 = COPY %sub(<4 x s32>)
; CHECK-NEXT: $q1 = COPY %o_wide(<4 x s32>)
; CHECK-NEXT: RET_ReallyLR implicit $w0
%0:_(s32) = COPY $w0
%1:_(s32) = COPY $w0
%2:_(s32) = COPY $w0
%3:_(s32) = COPY $w0
%const:_(s32) = G_CONSTANT i32 512
%bv:_(<4 x s32>) = G_BUILD_VECTOR %const(s32), %0(s32), %1(s32), %2(s32)
%bv1:_(<4 x s32>) = G_BUILD_VECTOR %0(s32), %1(s32), %2(s32), %const(s32)
%sub:_(<4 x s32>), %o:_(<4 x s11>) = G_USUBO %bv, %bv1
%o_wide:_(<4 x s32>) = G_ZEXT %o(<4 x s11>)
$q0 = COPY %sub(<4 x s32>)
$q1 = COPY %o_wide
RET_ReallyLR implicit $w0
...
4 changes: 2 additions & 2 deletions llvm/test/CodeGen/AArch64/popcount.ll
Original file line number Diff line number Diff line change
Expand Up @@ -113,9 +113,9 @@ define i16 @popcount256(ptr nocapture nonnull readonly %0) {
;
; GISEL-LABEL: popcount256:
; GISEL: // %bb.0: // %Entry
; GISEL-NEXT: ldp x8, x9, [x0, #16]
; GISEL-NEXT: ldp x8, x9, [x0]
; GISEL-NEXT: mov v0.d[0], x8
; GISEL-NEXT: ldp x8, x10, [x0]
; GISEL-NEXT: ldp x8, x10, [x0, #16]
; GISEL-NEXT: mov v1.d[0], x8
; GISEL-NEXT: mov v0.d[1], x9
; GISEL-NEXT: mov v1.d[1], x10
Expand Down
Loading
Loading