Skip to content

Commit c714846

Browse files
authored
[AArch64] Add an AArch64 pass for loop idiom transformations (#72273)
We have added a new pass that looks for loops such as the following: ``` while (i != max_len) if (a[i] != b[i]) break; ... use index i ... ``` Although similar to a memcmp, this is slightly different because instead of returning the difference between the values of the first non-matching pair of bytes, it returns the index of the first mismatch. As such, we are not able to lower this to a memcmp call. The new pass can now spot such idioms and transform them into a specialised predicated loop that gives a significant performance improvement for AArch64. It is intended as a stop-gap solution until this can be handled by the vectoriser, which doesn't currently deal with early exits. This specialised loop makes use of a generic intrinsic that counts the trailing zero elements in a predicate vector. This was added in https://reviews.llvm.org/D159283 and for SVE we end up with brkb & incp instructions. Although we have added this pass only for AArch64, it was written in a generic way so that in theory it could be used by other targets. Currently the pass requires scalable vector support and needs to know the minimum page size for the target, however it's possible to make it work for fixed-width vectors too. Also, the llvm.experimental.cttz.elts intrinsic used by the pass has generic lowering, but can be made efficient for targets with instructions similar to SVE's brkb, cntp and incp. Original version of patch was posted on Phabricator: https://reviews.llvm.org/D158291 Patch co-authored by Kerry McLaughlin (@kmclaughlin-arm) and David Sherwood (@david-arm) See the original discussion on Discourse: https://discourse.llvm.org/t/aarch64-target-specific-loop-idiom-recognition/72383
1 parent 839435c commit c714846

File tree

12 files changed

+2877
-0
lines changed

12 files changed

+2877
-0
lines changed

llvm/include/llvm/Analysis/TargetTransformInfo.h

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1174,6 +1174,9 @@ class TargetTransformInfo {
11741174
/// \return The associativity of the cache level, if available.
11751175
std::optional<unsigned> getCacheAssociativity(CacheLevel Level) const;
11761176

1177+
/// \return The minimum architectural page size for the target.
1178+
std::optional<unsigned> getMinPageSize() const;
1179+
11771180
/// \return How much before a load we should place the prefetch
11781181
/// instruction. This is currently measured in number of
11791182
/// instructions.
@@ -1923,6 +1926,7 @@ class TargetTransformInfo::Concept {
19231926
virtual std::optional<unsigned> getCacheSize(CacheLevel Level) const = 0;
19241927
virtual std::optional<unsigned> getCacheAssociativity(CacheLevel Level)
19251928
const = 0;
1929+
virtual std::optional<unsigned> getMinPageSize() const = 0;
19261930

19271931
/// \return How much before a load we should place the prefetch
19281932
/// instruction. This is currently measured in number of
@@ -2520,6 +2524,10 @@ class TargetTransformInfo::Model final : public TargetTransformInfo::Concept {
25202524
return Impl.getCacheAssociativity(Level);
25212525
}
25222526

2527+
std::optional<unsigned> getMinPageSize() const override {
2528+
return Impl.getMinPageSize();
2529+
}
2530+
25232531
/// Return the preferred prefetch distance in terms of instructions.
25242532
///
25252533
unsigned getPrefetchDistance() const override {

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -501,6 +501,8 @@ class TargetTransformInfoImplBase {
501501
llvm_unreachable("Unknown TargetTransformInfo::CacheLevel");
502502
}
503503

504+
std::optional<unsigned> getMinPageSize() const { return {}; }
505+
504506
unsigned getPrefetchDistance() const { return 0; }
505507
unsigned getMinPrefetchStride(unsigned NumMemAccesses,
506508
unsigned NumStridedMemAccesses,

llvm/lib/Analysis/TargetTransformInfo.cpp

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,10 @@ static cl::opt<unsigned> CacheLineSize(
3737
cl::desc("Use this to override the target cache line size when "
3838
"specified by the user."));
3939

40+
static cl::opt<unsigned> MinPageSize(
41+
"min-page-size", cl::init(0), cl::Hidden,
42+
cl::desc("Use this to override the target's minimum page size."));
43+
4044
static cl::opt<unsigned> PredictableBranchThreshold(
4145
"predictable-branch-threshold", cl::init(99), cl::Hidden,
4246
cl::desc(
@@ -762,6 +766,11 @@ TargetTransformInfo::getCacheAssociativity(CacheLevel Level) const {
762766
return TTIImpl->getCacheAssociativity(Level);
763767
}
764768

769+
std::optional<unsigned> TargetTransformInfo::getMinPageSize() const {
770+
return MinPageSize.getNumOccurrences() > 0 ? MinPageSize
771+
: TTIImpl->getMinPageSize();
772+
}
773+
765774
unsigned TargetTransformInfo::getPrefetchDistance() const {
766775
return TTIImpl->getPrefetchDistance();
767776
}

llvm/lib/Target/AArch64/AArch64.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -88,6 +88,7 @@ void initializeAArch64DeadRegisterDefinitionsPass(PassRegistry&);
8888
void initializeAArch64ExpandPseudoPass(PassRegistry &);
8989
void initializeAArch64GlobalsTaggingPass(PassRegistry &);
9090
void initializeAArch64LoadStoreOptPass(PassRegistry&);
91+
void initializeAArch64LoopIdiomTransformLegacyPassPass(PassRegistry &);
9192
void initializeAArch64LowerHomogeneousPrologEpilogPass(PassRegistry &);
9293
void initializeAArch64MIPeepholeOptPass(PassRegistry &);
9394
void initializeAArch64O0PreLegalizerCombinerPass(PassRegistry &);

0 commit comments

Comments
 (0)