Skip to content

[MachineScheduler][NFCI] Add Offset and OffsetIsScalable args to shouldClusterMemOps #73778

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Dec 6, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions llvm/include/llvm/CodeGen/TargetInstrInfo.h
Original file line number Diff line number Diff line change
Expand Up @@ -1414,6 +1414,8 @@ class TargetInstrInfo : public MCInstrInfo {
/// Get the base operand and byte offset of an instruction that reads/writes
/// memory. This is a convenience function for callers that are only prepared
/// to handle a single base operand.
/// FIXME: Move Offset and OffsetIsScalable to some ElementCount-style
/// abstraction that supports negative offsets.
bool getMemOperandWithOffset(const MachineInstr &MI,
const MachineOperand *&BaseOp, int64_t &Offset,
bool &OffsetIsScalable,
Expand All @@ -1426,6 +1428,8 @@ class TargetInstrInfo : public MCInstrInfo {
/// It returns false if base operands and offset could not be determined.
/// It is not guaranteed to always recognize base operands and offsets in all
/// cases.
/// FIXME: Move Offset and OffsetIsScalable to some ElementCount-style
/// abstraction that supports negative offsets.
virtual bool getMemOperandsWithOffsetWidth(
const MachineInstr &MI, SmallVectorImpl<const MachineOperand *> &BaseOps,
int64_t &Offset, bool &OffsetIsScalable, unsigned &Width,
Expand Down Expand Up @@ -1496,12 +1500,18 @@ class TargetInstrInfo : public MCInstrInfo {
/// to TargetPassConfig::createMachineScheduler() to have an effect.
///
/// \p BaseOps1 and \p BaseOps2 are memory operands of two memory operations.
/// \p Offset1 and \p Offset2 are the byte offsets for the memory
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about using one parameter of type ElementCount for each pair of parameters (int,bool)? So that the signature simplifies to:

virtual bool shouldClusterMemOps(ArrayRef<const MachineOperand *> BaseOps1,
                                    ElementCount Offset1,
                                    ArrayRef<const MachineOperand *> BaseOps2,
                                    ElementCount Offset2,
                                    unsigned ClusterSize,
                                    unsigned NumBytes) const {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at ElementCount, I'm not sure it's semantically the right abstraction? e.g. for LW %0:gpr, 12 we'd be passing Offset1=12 and OffsetIsScalable1=false. ElementCount doesn't seem right for representing that.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docs for ElementCount are a bit confusingly worded, but I think this is the right abstraction here. An Offset1=12 and OffsetIsScalable1=false would be a ElementCount::getFixed(12).

Copy link
Collaborator

@topperc topperc Nov 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is ElementCount able to represent a negative offset or large offsets? The current signature uses int64_t. I believe ElementCount is unsigned and a bool

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a good point. I don't think ElementCount can be negative. We could either introduce a new subclass of FixedOrScalableQuantity, or just go with what @asb proposed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right - we do not have an abstraction for negative scalable values. It would be ideal though, I can see OffsetInBytes derived from FixedOrScalableQuantity being useful in other places. I won't hold your work here - please go ahead with your code, we can fix it later if needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you all, and sorry for not understanding the suggestion at first Francesco. The docs for ElementCount indeed weren't clear to me.

I've added a FIXME note that moving to a new abstraction would be better. Would you be happy for me to land this PR as-is? I'm not opposed to looking to introduce this new abstraction, but if that's your desired route would appreciate some pointers to where else it might be used.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect that the whole class StackOffset could be converted to what we need here, but it can be done later.

/// operations.
/// \p OffsetIsScalable1 and \p OffsetIsScalable2 indicate if the offset is
/// scaled by a runtime quantity.
/// \p ClusterSize is the number of operations in the resulting load/store
/// cluster if this hook returns true.
/// \p NumBytes is the number of bytes that will be loaded from all the
/// clustered loads if this hook returns true.
virtual bool shouldClusterMemOps(ArrayRef<const MachineOperand *> BaseOps1,
int64_t Offset1, bool OffsetIsScalable1,
ArrayRef<const MachineOperand *> BaseOps2,
int64_t Offset2, bool OffsetIsScalable2,
unsigned ClusterSize,
unsigned NumBytes) const {
llvm_unreachable("target did not implement shouldClusterMemOps()");
Expand Down
14 changes: 9 additions & 5 deletions llvm/lib/CodeGen/MachineScheduler.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1698,11 +1698,12 @@ class BaseMemOpClusterMutation : public ScheduleDAGMutation {
SmallVector<const MachineOperand *, 4> BaseOps;
int64_t Offset;
unsigned Width;
bool OffsetIsScalable;

MemOpInfo(SUnit *SU, ArrayRef<const MachineOperand *> BaseOps,
int64_t Offset, unsigned Width)
int64_t Offset, bool OffsetIsScalable, unsigned Width)
: SU(SU), BaseOps(BaseOps.begin(), BaseOps.end()), Offset(Offset),
Width(Width) {}
Width(Width), OffsetIsScalable(OffsetIsScalable) {}

static bool Compare(const MachineOperand *const &A,
const MachineOperand *const &B) {
Expand Down Expand Up @@ -1831,8 +1832,10 @@ void BaseMemOpClusterMutation::clusterNeighboringMemOps(
SUnit2ClusterInfo[MemOpa.SU->NodeNum].second + MemOpb.Width;
}

if (!TII->shouldClusterMemOps(MemOpa.BaseOps, MemOpb.BaseOps, ClusterLength,
CurrentClusterBytes))
if (!TII->shouldClusterMemOps(MemOpa.BaseOps, MemOpa.Offset,
MemOpa.OffsetIsScalable, MemOpb.BaseOps,
MemOpb.Offset, MemOpb.OffsetIsScalable,
ClusterLength, CurrentClusterBytes))
continue;

SUnit *SUa = MemOpa.SU;
Expand Down Expand Up @@ -1899,7 +1902,8 @@ void BaseMemOpClusterMutation::collectMemOpRecords(
unsigned Width;
if (TII->getMemOperandsWithOffsetWidth(MI, BaseOps, Offset,
OffsetIsScalable, Width, TRI)) {
MemOpRecords.push_back(MemOpInfo(&SU, BaseOps, Offset, Width));
MemOpRecords.push_back(
MemOpInfo(&SU, BaseOps, Offset, OffsetIsScalable, Width));

LLVM_DEBUG(dbgs() << "Num BaseOps: " << BaseOps.size() << ", Offset: "
<< Offset << ", OffsetIsScalable: " << OffsetIsScalable
Expand Down
5 changes: 3 additions & 2 deletions llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -4230,8 +4230,9 @@ static bool shouldClusterFI(const MachineFrameInfo &MFI, int FI1,
///
/// Only called for LdSt for which getMemOperandWithOffset returns true.
bool AArch64InstrInfo::shouldClusterMemOps(
ArrayRef<const MachineOperand *> BaseOps1,
ArrayRef<const MachineOperand *> BaseOps2, unsigned ClusterSize,
ArrayRef<const MachineOperand *> BaseOps1, int64_t OpOffset1,
bool OffsetIsScalable1, ArrayRef<const MachineOperand *> BaseOps2,
int64_t OpOffset2, bool OffsetIsScalable2, unsigned ClusterSize,
unsigned NumBytes) const {
assert(BaseOps1.size() == 1 && BaseOps2.size() == 1);
const MachineOperand &BaseOp1 = *BaseOps1.front();
Expand Down
2 changes: 2 additions & 0 deletions llvm/lib/Target/AArch64/AArch64InstrInfo.h
Original file line number Diff line number Diff line change
Expand Up @@ -179,7 +179,9 @@ class AArch64InstrInfo final : public AArch64GenInstrInfo {
int64_t &MinOffset, int64_t &MaxOffset);

bool shouldClusterMemOps(ArrayRef<const MachineOperand *> BaseOps1,
int64_t Offset1, bool OffsetIsScalable1,
ArrayRef<const MachineOperand *> BaseOps2,
int64_t Offset2, bool OffsetIsScalable2,
unsigned ClusterSize,
unsigned NumBytes) const override;

Expand Down
5 changes: 4 additions & 1 deletion llvm/lib/Target/AMDGPU/SIInsertHardClauses.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -232,7 +232,10 @@ class SIInsertHardClauses : public MachineFunctionPass {
// scheduler it limits the size of the cluster to avoid increasing
// register pressure too much, but this pass runs after register
// allocation so there is no need for that kind of limit.
!SII->shouldClusterMemOps(CI.BaseOps, BaseOps, 2, 2)))) {
// We also lie about the Offset and OffsetIsScalable parameters,
// as they aren't used in the SIInstrInfo implementation.
!SII->shouldClusterMemOps(CI.BaseOps, 0, false, BaseOps, 0, false,
2, 2)))) {
// Finish the current clause.
Changed |= emitClause(CI, SII);
CI = ClauseInfo();
Expand Down
2 changes: 2 additions & 0 deletions llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -541,7 +541,9 @@ static bool memOpsHaveSameBasePtr(const MachineInstr &MI1,
}

bool SIInstrInfo::shouldClusterMemOps(ArrayRef<const MachineOperand *> BaseOps1,
int64_t Offset1, bool OffsetIsScalable1,
ArrayRef<const MachineOperand *> BaseOps2,
int64_t Offset2, bool OffsetIsScalable2,
unsigned ClusterSize,
unsigned NumBytes) const {
// If the mem ops (to be clustered) do not have the same base ptr, then they
Expand Down
2 changes: 2 additions & 0 deletions llvm/lib/Target/AMDGPU/SIInstrInfo.h
Original file line number Diff line number Diff line change
Expand Up @@ -234,7 +234,9 @@ class SIInstrInfo final : public AMDGPUGenInstrInfo {
const TargetRegisterInfo *TRI) const final;

bool shouldClusterMemOps(ArrayRef<const MachineOperand *> BaseOps1,
int64_t Offset1, bool OffsetIsScalable1,
ArrayRef<const MachineOperand *> BaseOps2,
int64_t Offset2, bool OffsetIsScalable2,
unsigned ClusterSize,
unsigned NumBytes) const override;

Expand Down
5 changes: 3 additions & 2 deletions llvm/lib/Target/PowerPC/PPCInstrInfo.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2878,8 +2878,9 @@ static bool isClusterableLdStOpcPair(unsigned FirstOpc, unsigned SecondOpc,
}

bool PPCInstrInfo::shouldClusterMemOps(
ArrayRef<const MachineOperand *> BaseOps1,
ArrayRef<const MachineOperand *> BaseOps2, unsigned ClusterSize,
ArrayRef<const MachineOperand *> BaseOps1, int64_t OpOffset1,
bool OffsetIsScalable1, ArrayRef<const MachineOperand *> BaseOps2,
int64_t OpOffset2, bool OffsetIsScalable2, unsigned ClusterSize,
unsigned NumBytes) const {

assert(BaseOps1.size() == 1 && BaseOps2.size() == 1);
Expand Down
2 changes: 2 additions & 0 deletions llvm/lib/Target/PowerPC/PPCInstrInfo.h
Original file line number Diff line number Diff line change
Expand Up @@ -530,7 +530,9 @@ class PPCInstrInfo : public PPCGenInstrInfo {
/// Returns true if the two given memory operations should be scheduled
/// adjacent.
bool shouldClusterMemOps(ArrayRef<const MachineOperand *> BaseOps1,
int64_t Offset1, bool OffsetIsScalable1,
ArrayRef<const MachineOperand *> BaseOps2,
int64_t Offset2, bool OffsetIsScalable2,
unsigned ClusterSize,
unsigned NumBytes) const override;

Expand Down
5 changes: 3 additions & 2 deletions llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2266,8 +2266,9 @@ static bool memOpsHaveSameBasePtr(const MachineInstr &MI1,
}

bool RISCVInstrInfo::shouldClusterMemOps(
ArrayRef<const MachineOperand *> BaseOps1,
ArrayRef<const MachineOperand *> BaseOps2, unsigned ClusterSize,
ArrayRef<const MachineOperand *> BaseOps1, int64_t Offset1,
bool OffsetIsScalable1, ArrayRef<const MachineOperand *> BaseOps2,
int64_t Offset2, bool OffsetIsScalable2, unsigned ClusterSize,
unsigned NumBytes) const {
// If the mem ops (to be clustered) do not have the same base ptr, then they
// should not be clustered
Expand Down
2 changes: 2 additions & 0 deletions llvm/lib/Target/RISCV/RISCVInstrInfo.h
Original file line number Diff line number Diff line change
Expand Up @@ -158,7 +158,9 @@ class RISCVInstrInfo : public RISCVGenInstrInfo {
const TargetRegisterInfo *TRI) const override;

bool shouldClusterMemOps(ArrayRef<const MachineOperand *> BaseOps1,
int64_t Offset1, bool OffsetIsScalable1,
ArrayRef<const MachineOperand *> BaseOps2,
int64_t Offset2, bool OffsetIsScalable2,
unsigned ClusterSize,
unsigned NumBytes) const override;

Expand Down