Skip to content

Commit 8a74eca

Browse files
author
James Molloy
committed
[MachinePipeliner] Improve the TargetInstrInfo API analyzeLoop/reduceLoopCount
Recommit: fix asan errors. The way MachinePipeliner uses these target hooks is stateful - we reduce trip count by one per call to reduceLoopCount. It's a little overfit for hardware loops, where we don't have to worry about stitching a loop induction variable across prologs and epilogs (the induction variable is implicit). This patch introduces a new API: /// Analyze loop L, which must be a single-basic-block loop, and if the /// conditions can be understood enough produce a PipelinerLoopInfo object. virtual std::unique_ptr<PipelinerLoopInfo> analyzeLoopForPipelining(MachineBasicBlock *LoopBB) const; The return value is expected to be an implementation of the abstract class: /// Object returned by analyzeLoopForPipelining. Allows software pipelining /// implementations to query attributes of the loop being pipelined. class PipelinerLoopInfo { public: virtual ~PipelinerLoopInfo(); /// Return true if the given instruction should not be pipelined and should /// be ignored. An example could be a loop comparison, or induction variable /// update with no users being pipelined. virtual bool shouldIgnoreForPipelining(const MachineInstr *MI) const = 0; /// Create a condition to determine if the trip count of the loop is greater /// than TC. /// /// If the trip count is statically known to be greater than TC, return /// true. If the trip count is statically known to be not greater than TC, /// return false. Otherwise return nullopt and fill out Cond with the test /// condition. virtual Optional<bool> createTripCountGreaterCondition(int TC, MachineBasicBlock &MBB, SmallVectorImpl<MachineOperand> &Cond) = 0; /// Modify the loop such that the trip count is /// OriginalTC + TripCountAdjust. virtual void adjustTripCount(int TripCountAdjust) = 0; /// Called when the loop's preheader has been modified to NewPreheader. virtual void setPreheader(MachineBasicBlock *NewPreheader) = 0; /// Called when the loop is being removed. virtual void disposed() = 0; }; The Pipeliner (ModuloSchedule.cpp) can use this object to modify the loop while allowing the target to hold its own state across all calls. This API, in particular the disjunction of creating a trip count check condition and adjusting the loop, improves the code quality in ModuloSchedule.cpp. llvm-svn: 372463
1 parent c90fda6 commit 8a74eca

File tree

10 files changed

+253
-194
lines changed

10 files changed

+253
-194
lines changed

llvm/include/llvm/CodeGen/ModuloSchedule.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,7 @@
6262

6363
#include "llvm/CodeGen/MachineFunction.h"
6464
#include "llvm/CodeGen/MachineLoopInfo.h"
65+
#include "llvm/CodeGen/TargetInstrInfo.h"
6566
#include "llvm/CodeGen/TargetSubtargetInfo.h"
6667
#include <vector>
6768

@@ -168,6 +169,7 @@ class ModuloScheduleExpander {
168169
MachineBasicBlock *BB;
169170
MachineBasicBlock *Preheader;
170171
MachineBasicBlock *NewKernel = nullptr;
172+
std::unique_ptr<TargetInstrInfo::PipelinerLoopInfo> LoopInfo;
171173

172174
/// Map for each register and the max difference between its uses and def.
173175
/// The first element in the pair is the max difference in stages. The

llvm/include/llvm/CodeGen/TargetInstrInfo.h

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -662,6 +662,50 @@ class TargetInstrInfo : public MCInstrInfo {
662662
BytesAdded);
663663
}
664664

665+
/// Object returned by analyzeLoopForPipelining. Allows software pipelining
666+
/// implementations to query attributes of the loop being pipelined and to
667+
/// apply target-specific updates to the loop once pipelining is complete.
668+
class PipelinerLoopInfo {
669+
public:
670+
virtual ~PipelinerLoopInfo();
671+
/// Return true if the given instruction should not be pipelined and should
672+
/// be ignored. An example could be a loop comparison, or induction variable
673+
/// update with no users being pipelined.
674+
virtual bool shouldIgnoreForPipelining(const MachineInstr *MI) const = 0;
675+
676+
/// Create a condition to determine if the trip count of the loop is greater
677+
/// than TC.
678+
///
679+
/// If the trip count is statically known to be greater than TC, return
680+
/// true. If the trip count is statically known to be not greater than TC,
681+
/// return false. Otherwise return nullopt and fill out Cond with the test
682+
/// condition.
683+
virtual Optional<bool>
684+
createTripCountGreaterCondition(int TC, MachineBasicBlock &MBB,
685+
SmallVectorImpl<MachineOperand> &Cond) = 0;
686+
687+
/// Modify the loop such that the trip count is
688+
/// OriginalTC + TripCountAdjust.
689+
virtual void adjustTripCount(int TripCountAdjust) = 0;
690+
691+
/// Called when the loop's preheader has been modified to NewPreheader.
692+
virtual void setPreheader(MachineBasicBlock *NewPreheader) = 0;
693+
694+
/// Called when the loop is being removed. Any instructions in the preheader
695+
/// should be removed.
696+
///
697+
/// Once this function is called, no other functions on this object are
698+
/// valid; the loop has been removed.
699+
virtual void disposed() = 0;
700+
};
701+
702+
/// Analyze loop L, which must be a single-basic-block loop, and if the
703+
/// conditions can be understood enough produce a PipelinerLoopInfo object.
704+
virtual std::unique_ptr<PipelinerLoopInfo>
705+
analyzeLoopForPipelining(MachineBasicBlock *LoopBB) const {
706+
return nullptr;
707+
}
708+
665709
/// Analyze the loop code, return true if it cannot be understoo. Upon
666710
/// success, this function returns false and returns information about the
667711
/// induction variable and compare instruction used at the end.

llvm/lib/CodeGen/MachinePipeliner.cpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -326,7 +326,7 @@ bool MachinePipeliner::canPipelineLoop(MachineLoop &L) {
326326

327327
LI.LoopInductionVar = nullptr;
328328
LI.LoopCompare = nullptr;
329-
if (TII->analyzeLoop(L, LI.LoopInductionVar, LI.LoopCompare)) {
329+
if (!TII->analyzeLoopForPipelining(L.getTopBlock())) {
330330
LLVM_DEBUG(
331331
dbgs() << "Unable to analyzeLoop, can NOT pipeline current Loop\n");
332332
NumFailLoop++;

llvm/lib/CodeGen/ModuloSchedule.cpp

Lines changed: 17 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -105,6 +105,9 @@ void ModuloScheduleExpander::expand() {
105105
}
106106

107107
void ModuloScheduleExpander::generatePipelinedLoop() {
108+
LoopInfo = TII->analyzeLoopForPipelining(BB);
109+
assert(LoopInfo && "Must be able to analyze loop!");
110+
108111
// Create a new basic block for the kernel and add it to the CFG.
109112
MachineBasicBlock *KernelBB = MF.CreateMachineBasicBlock(BB->getBasicBlock());
110113

@@ -847,43 +850,27 @@ void ModuloScheduleExpander::addBranches(MachineBasicBlock &PreheaderBB,
847850
MBBVectorTy &EpilogBBs,
848851
ValueMapTy *VRMap) {
849852
assert(PrologBBs.size() == EpilogBBs.size() && "Prolog/Epilog mismatch");
850-
MachineInstr *IndVar;
851-
MachineInstr *Cmp;
852-
if (TII->analyzeLoop(*Schedule.getLoop(), IndVar, Cmp))
853-
llvm_unreachable("Must be able to analyze loop!");
854853
MachineBasicBlock *LastPro = KernelBB;
855854
MachineBasicBlock *LastEpi = KernelBB;
856855

857856
// Start from the blocks connected to the kernel and work "out"
858857
// to the first prolog and the last epilog blocks.
859858
SmallVector<MachineInstr *, 4> PrevInsts;
860859
unsigned MaxIter = PrologBBs.size() - 1;
861-
unsigned LC = UINT_MAX;
862-
unsigned LCMin = UINT_MAX;
863860
for (unsigned i = 0, j = MaxIter; i <= MaxIter; ++i, --j) {
864861
// Add branches to the prolog that go to the corresponding
865862
// epilog, and the fall-thru prolog/kernel block.
866863
MachineBasicBlock *Prolog = PrologBBs[j];
867864
MachineBasicBlock *Epilog = EpilogBBs[i];
868-
// We've executed one iteration, so decrement the loop count and check for
869-
// the loop end.
870-
SmallVector<MachineOperand, 4> Cond;
871-
// Check if the LOOP0 has already been removed. If so, then there is no need
872-
// to reduce the trip count.
873-
if (LC != 0)
874-
LC = TII->reduceLoopCount(*Prolog, PreheaderBB, IndVar, *Cmp, Cond,
875-
PrevInsts, j, MaxIter);
876-
877-
// Record the value of the first trip count, which is used to determine if
878-
// branches and blocks can be removed for constant trip counts.
879-
if (LCMin == UINT_MAX)
880-
LCMin = LC;
881865

866+
SmallVector<MachineOperand, 4> Cond;
867+
Optional<bool> StaticallyGreater =
868+
LoopInfo->createTripCountGreaterCondition(j + 1, *Prolog, Cond);
882869
unsigned numAdded = 0;
883-
if (Register::isVirtualRegister(LC)) {
870+
if (!StaticallyGreater.hasValue()) {
884871
Prolog->addSuccessor(Epilog);
885872
numAdded = TII->insertBranch(*Prolog, Epilog, LastPro, Cond, DebugLoc());
886-
} else if (j >= LCMin) {
873+
} else if (*StaticallyGreater == false) {
887874
Prolog->addSuccessor(Epilog);
888875
Prolog->removeSuccessor(LastPro);
889876
LastEpi->removeSuccessor(Epilog);
@@ -894,10 +881,12 @@ void ModuloScheduleExpander::addBranches(MachineBasicBlock &PreheaderBB,
894881
LastEpi->clear();
895882
LastEpi->eraseFromParent();
896883
}
884+
if (LastPro == KernelBB) {
885+
LoopInfo->disposed();
886+
NewKernel = nullptr;
887+
}
897888
LastPro->clear();
898889
LastPro->eraseFromParent();
899-
if (LastPro == KernelBB)
900-
NewKernel = nullptr;
901890
} else {
902891
numAdded = TII->insertBranch(*Prolog, LastPro, nullptr, Cond, DebugLoc());
903892
removePhis(Epilog, Prolog);
@@ -909,6 +898,11 @@ void ModuloScheduleExpander::addBranches(MachineBasicBlock &PreheaderBB,
909898
I != E && numAdded > 0; ++I, --numAdded)
910899
updateInstruction(&*I, false, j, 0, VRMap);
911900
}
901+
902+
if (NewKernel) {
903+
LoopInfo->setPreheader(PrologBBs[MaxIter]);
904+
LoopInfo->adjustTripCount(-(MaxIter + 1));
905+
}
912906
}
913907

914908
/// Return true if we can compute the amount the instruction changes

llvm/lib/CodeGen/TargetInstrInfo.cpp

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1257,3 +1257,5 @@ bool TargetInstrInfo::getInsertSubregInputs(
12571257
InsertedReg.SubIdx = (unsigned)MOSubIdx.getImm();
12581258
return true;
12591259
}
1260+
1261+
TargetInstrInfo::PipelinerLoopInfo::~PipelinerLoopInfo() {}

llvm/lib/Target/Hexagon/HexagonInstrInfo.cpp

Lines changed: 85 additions & 77 deletions
Original file line numberDiff line numberDiff line change
@@ -674,86 +674,94 @@ unsigned HexagonInstrInfo::insertBranch(MachineBasicBlock &MBB,
674674
return 2;
675675
}
676676

677-
/// Analyze the loop code to find the loop induction variable and compare used
678-
/// to compute the number of iterations. Currently, we analyze loop that are
679-
/// controlled using hardware loops. In this case, the induction variable
680-
/// instruction is null. For all other cases, this function returns true, which
681-
/// means we're unable to analyze it.
682-
bool HexagonInstrInfo::analyzeLoop(MachineLoop &L,
683-
MachineInstr *&IndVarInst,
684-
MachineInstr *&CmpInst) const {
685-
686-
MachineBasicBlock *LoopEnd = L.getBottomBlock();
687-
MachineBasicBlock::iterator I = LoopEnd->getFirstTerminator();
688-
// We really "analyze" only hardware loops right now.
689-
if (I != LoopEnd->end() && isEndLoopN(I->getOpcode())) {
690-
IndVarInst = nullptr;
691-
CmpInst = &*I;
692-
return false;
677+
class HexagonPipelinerLoopInfo : public TargetInstrInfo::PipelinerLoopInfo {
678+
MachineInstr *Loop, *EndLoop;
679+
MachineFunction *MF;
680+
const HexagonInstrInfo *TII;
681+
int64_t TripCount;
682+
Register LoopCount;
683+
DebugLoc DL;
684+
685+
public:
686+
HexagonPipelinerLoopInfo(MachineInstr *Loop, MachineInstr *EndLoop)
687+
: Loop(Loop), EndLoop(EndLoop), MF(Loop->getParent()->getParent()),
688+
TII(MF->getSubtarget<HexagonSubtarget>().getInstrInfo()),
689+
DL(Loop->getDebugLoc()) {
690+
// Inspect the Loop instruction up-front, as it may be deleted when we call
691+
// createTripCountGreaterCondition.
692+
TripCount = Loop->getOpcode() == Hexagon::J2_loop0r
693+
? -1
694+
: Loop->getOperand(1).getImm();
695+
if (TripCount == -1)
696+
LoopCount = Loop->getOperand(1).getReg();
693697
}
694-
return true;
695-
}
696698

697-
/// Generate code to reduce the loop iteration by one and check if the loop is
698-
/// finished. Return the value/register of the new loop count. this function
699-
/// assumes the nth iteration is peeled first.
700-
unsigned HexagonInstrInfo::reduceLoopCount(
701-
MachineBasicBlock &MBB, MachineBasicBlock &PreHeader, MachineInstr *IndVar,
702-
MachineInstr &Cmp, SmallVectorImpl<MachineOperand> &Cond,
703-
SmallVectorImpl<MachineInstr *> &PrevInsts, unsigned Iter,
704-
unsigned MaxIter) const {
705-
// We expect a hardware loop currently. This means that IndVar is set
706-
// to null, and the compare is the ENDLOOP instruction.
707-
assert((!IndVar) && isEndLoopN(Cmp.getOpcode())
708-
&& "Expecting a hardware loop");
709-
MachineFunction *MF = MBB.getParent();
710-
DebugLoc DL = Cmp.getDebugLoc();
711-
SmallPtrSet<MachineBasicBlock *, 8> VisitedBBs;
712-
MachineInstr *Loop = findLoopInstr(&MBB, Cmp.getOpcode(),
713-
Cmp.getOperand(0).getMBB(), VisitedBBs);
714-
if (!Loop)
715-
return 0;
716-
// If the loop trip count is a compile-time value, then just change the
717-
// value.
718-
if (Loop->getOpcode() == Hexagon::J2_loop0i ||
719-
Loop->getOpcode() == Hexagon::J2_loop1i) {
720-
int64_t Offset = Loop->getOperand(1).getImm();
721-
if (Offset <= 1)
722-
Loop->eraseFromParent();
723-
else
724-
Loop->getOperand(1).setImm(Offset - 1);
725-
return Offset - 1;
699+
bool shouldIgnoreForPipelining(const MachineInstr *MI) const override {
700+
// Only ignore the terminator.
701+
return MI == EndLoop;
726702
}
727-
// The loop trip count is a run-time value. We generate code to subtract
728-
// one from the trip count, and update the loop instruction.
729-
assert(Loop->getOpcode() == Hexagon::J2_loop0r && "Unexpected instruction");
730-
Register LoopCount = Loop->getOperand(1).getReg();
731-
// Check if we're done with the loop.
732-
unsigned LoopEnd = createVR(MF, MVT::i1);
733-
MachineInstr *NewCmp = BuildMI(&MBB, DL, get(Hexagon::C2_cmpgtui), LoopEnd).
734-
addReg(LoopCount).addImm(1);
735-
unsigned NewLoopCount = createVR(MF, MVT::i32);
736-
MachineInstr *NewAdd = BuildMI(&MBB, DL, get(Hexagon::A2_addi), NewLoopCount).
737-
addReg(LoopCount).addImm(-1);
738-
const HexagonRegisterInfo &HRI = *Subtarget.getRegisterInfo();
739-
// Update the previously generated instructions with the new loop counter.
740-
for (SmallVectorImpl<MachineInstr *>::iterator I = PrevInsts.begin(),
741-
E = PrevInsts.end(); I != E; ++I)
742-
(*I)->substituteRegister(LoopCount, NewLoopCount, 0, HRI);
743-
PrevInsts.clear();
744-
PrevInsts.push_back(NewCmp);
745-
PrevInsts.push_back(NewAdd);
746-
// Insert the new loop instruction if this is the last time the loop is
747-
// decremented.
748-
if (Iter == MaxIter)
749-
BuildMI(&MBB, DL, get(Hexagon::J2_loop0r)).
750-
addMBB(Loop->getOperand(0).getMBB()).addReg(NewLoopCount);
751-
// Delete the old loop instruction.
752-
if (Iter == 0)
753-
Loop->eraseFromParent();
754-
Cond.push_back(MachineOperand::CreateImm(Hexagon::J2_jumpf));
755-
Cond.push_back(NewCmp->getOperand(0));
756-
return NewLoopCount;
703+
704+
Optional<bool>
705+
createTripCountGreaterCondition(int TC, MachineBasicBlock &MBB,
706+
SmallVectorImpl<MachineOperand> &Cond) override {
707+
if (TripCount == -1) {
708+
// Check if we're done with the loop.
709+
unsigned Done = TII->createVR(MF, MVT::i1);
710+
MachineInstr *NewCmp = BuildMI(&MBB, DL,
711+
TII->get(Hexagon::C2_cmpgtui), Done)
712+
.addReg(LoopCount)
713+
.addImm(TC);
714+
Cond.push_back(MachineOperand::CreateImm(Hexagon::J2_jumpf));
715+
Cond.push_back(NewCmp->getOperand(0));
716+
return {};
717+
}
718+
719+
return TripCount > TC;
720+
}
721+
722+
void setPreheader(MachineBasicBlock *NewPreheader) override {
723+
NewPreheader->splice(NewPreheader->getFirstTerminator(), Loop->getParent(),
724+
Loop);
725+
}
726+
727+
void adjustTripCount(int TripCountAdjust) override {
728+
// If the loop trip count is a compile-time value, then just change the
729+
// value.
730+
if (Loop->getOpcode() == Hexagon::J2_loop0i ||
731+
Loop->getOpcode() == Hexagon::J2_loop1i) {
732+
int64_t TripCount = Loop->getOperand(1).getImm() + TripCountAdjust;
733+
assert(TripCount > 0 && "Can't create an empty or negative loop!");
734+
Loop->getOperand(1).setImm(TripCount);
735+
return;
736+
}
737+
738+
// The loop trip count is a run-time value. We generate code to subtract
739+
// one from the trip count, and update the loop instruction.
740+
Register LoopCount = Loop->getOperand(1).getReg();
741+
Register NewLoopCount = TII->createVR(MF, MVT::i32);
742+
BuildMI(*Loop->getParent(), Loop, Loop->getDebugLoc(),
743+
TII->get(Hexagon::A2_addi), NewLoopCount)
744+
.addReg(LoopCount)
745+
.addImm(TripCountAdjust);
746+
Loop->getOperand(1).setReg(NewLoopCount);
747+
}
748+
749+
void disposed() override { Loop->eraseFromParent(); }
750+
};
751+
752+
std::unique_ptr<TargetInstrInfo::PipelinerLoopInfo>
753+
HexagonInstrInfo::analyzeLoopForPipelining(MachineBasicBlock *LoopBB) const {
754+
// We really "analyze" only hardware loops right now.
755+
MachineBasicBlock::iterator I = LoopBB->getFirstTerminator();
756+
757+
if (I != LoopBB->end() && isEndLoopN(I->getOpcode())) {
758+
SmallPtrSet<MachineBasicBlock *, 8> VisitedBBs;
759+
MachineInstr *LoopInst = findLoopInstr(
760+
LoopBB, I->getOpcode(), I->getOperand(0).getMBB(), VisitedBBs);
761+
if (LoopInst)
762+
return std::make_unique<HexagonPipelinerLoopInfo>(LoopInst, &*I);
763+
}
764+
return nullptr;
757765
}
758766

759767
bool HexagonInstrInfo::isProfitableToIfCvt(MachineBasicBlock &MBB,

llvm/lib/Target/Hexagon/HexagonInstrInfo.h

Lines changed: 4 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -129,21 +129,10 @@ class HexagonInstrInfo : public HexagonGenInstrInfo {
129129
const DebugLoc &DL,
130130
int *BytesAdded = nullptr) const override;
131131

132-
/// Analyze the loop code, return true if it cannot be understood. Upon
133-
/// success, this function returns false and returns information about the
134-
/// induction variable and compare instruction used at the end.
135-
bool analyzeLoop(MachineLoop &L, MachineInstr *&IndVarInst,
136-
MachineInstr *&CmpInst) const override;
137-
138-
/// Generate code to reduce the loop iteration by one and check if the loop
139-
/// is finished. Return the value/register of the new loop count. We need
140-
/// this function when peeling off one or more iterations of a loop. This
141-
/// function assumes the nth iteration is peeled first.
142-
unsigned reduceLoopCount(MachineBasicBlock &MBB, MachineBasicBlock &PreHeader,
143-
MachineInstr *IndVar, MachineInstr &Cmp,
144-
SmallVectorImpl<MachineOperand> &Cond,
145-
SmallVectorImpl<MachineInstr *> &PrevInsts,
146-
unsigned Iter, unsigned MaxIter) const override;
132+
/// Analyze loop L, which must be a single-basic-block loop, and if the
133+
/// conditions can be understood enough produce a PipelinerLoopInfo object.
134+
std::unique_ptr<PipelinerLoopInfo>
135+
analyzeLoopForPipelining(MachineBasicBlock *LoopBB) const override;
147136

148137
/// Return true if it's profitable to predicate
149138
/// instructions with accumulated instruction latency of "NumCycles"

0 commit comments

Comments
 (0)