-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[TTI][AArch64] Add preferFixedIfEqualToScalable hook #95818
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-llvm-analysis @llvm/pr-subscribers-backend-aarch64 Author: Sjoerd Meijer (sjoerdmeijer) ChangesThis adds a new hook to prefer fixed width loop vectorization over scalable. This will be used in the loop vectoriser to generate more NEON code instead of SVE if cost-model assigns equal costs to fixed and scalable versions of the vectorised loop. Full diff: https://github.com/llvm/llvm-project/pull/95818.diff 6 Files Affected:
diff --git a/llvm/include/llvm/Analysis/TargetTransformInfo.h b/llvm/include/llvm/Analysis/TargetTransformInfo.h
index f55f21c94a85a..84b811818beee 100644
--- a/llvm/include/llvm/Analysis/TargetTransformInfo.h
+++ b/llvm/include/llvm/Analysis/TargetTransformInfo.h
@@ -1674,6 +1674,8 @@ class TargetTransformInfo {
false; ///< If op is an fp min/max, whether NaNs may be present.
};
+ bool preferFixedIfEqualToScalable() const;
+
/// \returns True if the target prefers reductions in loop.
bool preferInLoopReduction(unsigned Opcode, Type *Ty,
ReductionFlags Flags) const;
@@ -2143,6 +2145,7 @@ class TargetTransformInfo::Concept {
virtual unsigned getStoreVectorFactor(unsigned VF, unsigned StoreSize,
unsigned ChainSizeInBytes,
VectorType *VecTy) const = 0;
+ virtual bool preferFixedIfEqualToScalable() const = 0;
virtual bool preferInLoopReduction(unsigned Opcode, Type *Ty,
ReductionFlags) const = 0;
virtual bool preferPredicatedReductionSelect(unsigned Opcode, Type *Ty,
@@ -2873,6 +2876,9 @@ class TargetTransformInfo::Model final : public TargetTransformInfo::Concept {
VectorType *VecTy) const override {
return Impl.getStoreVectorFactor(VF, StoreSize, ChainSizeInBytes, VecTy);
}
+ bool preferFixedIfEqualToScalable() const override {
+ return Impl.preferFixedIfEqualToScalable();
+ }
bool preferInLoopReduction(unsigned Opcode, Type *Ty,
ReductionFlags Flags) const override {
return Impl.preferInLoopReduction(Opcode, Ty, Flags);
diff --git a/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h b/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
index 7828bdc1f1f43..9679c9ec6bc4e 100644
--- a/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
+++ b/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
@@ -913,6 +913,10 @@ class TargetTransformInfoImplBase {
return VF;
}
+ bool preferFixedIfEqualToScalable() const {
+ return false;
+ }
+
bool preferInLoopReduction(unsigned Opcode, Type *Ty,
TTI::ReductionFlags Flags) const {
return false;
diff --git a/llvm/lib/Analysis/TargetTransformInfo.cpp b/llvm/lib/Analysis/TargetTransformInfo.cpp
index 7e721cbc87f3f..27a7f5b32d3cf 100644
--- a/llvm/lib/Analysis/TargetTransformInfo.cpp
+++ b/llvm/lib/Analysis/TargetTransformInfo.cpp
@@ -1282,6 +1282,10 @@ unsigned TargetTransformInfo::getStoreVectorFactor(unsigned VF,
return TTIImpl->getStoreVectorFactor(VF, StoreSize, ChainSizeInBytes, VecTy);
}
+bool TargetTransformInfo::preferFixedIfEqualToScalable() const {
+ return TTIImpl->preferFixedIfEqualToScalable();
+}
+
bool TargetTransformInfo::preferInLoopReduction(unsigned Opcode, Type *Ty,
ReductionFlags Flags) const {
return TTIImpl->preferInLoopReduction(Opcode, Ty, Flags);
diff --git a/llvm/lib/Target/AArch64/AArch64Features.td b/llvm/lib/Target/AArch64/AArch64Features.td
index ffb899a301459..988630769afdb 100644
--- a/llvm/lib/Target/AArch64/AArch64Features.td
+++ b/llvm/lib/Target/AArch64/AArch64Features.td
@@ -244,6 +244,9 @@ def FeatureExperimentalZeroingPseudos
def FeatureUseScalarIncVL : SubtargetFeature<"use-scalar-inc-vl",
"UseScalarIncVL", "true", "Prefer inc/dec over add+cnt">;
+def FeatureUseFixedIfEqualToScalable : SubtargetFeature<"use-fixed-if-equal-to-scalable",
+ "UseFixedIfEqualToScalable", "true", "Prefer fixed width loop vectorization over scalable if cost-model assigns equal costs">;
+
def FeatureBF16 : Extension<"bf16", "BF16",
"Enable BFloat16 Extension (FEAT_BF16)", [],
"FEAT_BF16", "+bf16", 280>;
diff --git a/llvm/lib/Target/AArch64/AArch64Processors.td b/llvm/lib/Target/AArch64/AArch64Processors.td
index cc33765307fb4..a0263f0164f4f 100644
--- a/llvm/lib/Target/AArch64/AArch64Processors.td
+++ b/llvm/lib/Target/AArch64/AArch64Processors.td
@@ -489,6 +489,7 @@ def TuneNeoverseV2 : SubtargetFeature<"neoversev2", "ARMProcFamily", "NeoverseV2
FeatureALULSLFast,
FeaturePostRAScheduler,
FeatureEnableSelectOptimize,
+ FeatureUseFixedIfEqualToScalable,
FeaturePredictableSelectIsExpensive]>;
def TuneNeoverseV3 : SubtargetFeature<"neoversev3", "ARMProcFamily", "NeoverseV3",
diff --git a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h
index feec1a4289c3a..13e66f9ea1913 100644
--- a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h
+++ b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h
@@ -371,6 +371,8 @@ class AArch64TTIImpl : public BasicTTIImplBase<AArch64TTIImpl> {
return TailFoldingStyle::DataWithoutLaneMask;
}
+ bool preferFixedIfEqualToScalable() const { return ST->useFixedIfEqualToScalable(); }
+
bool preferPredicateOverEpilogue(TailFoldingInfo *TFI);
bool supportsScalableVectors() const { return ST->hasSVE(); }
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
This adds a new hook to prefer fixed width loop vectorization over scalable. This will be used in the loop vectoriser to generate more NEON code instead of SVE if cost-model assigns equal costs to fixed and scalable versions of the vectorised loop. This is used in llvm#95819.
9e3fbe4
to
0a3352a
Compare
…erse V2) For the Neoverse V2, prefer fixed width vectorisation If the cost-model assigns an equal cost to fixed and scalable vectorisation. This improves 7 kernels from TSVC-2 by about 2x, and does not affect SPEC21017 INT and FP. This tends to benefit small kernels, like the ones in TSVC, for a number of reasons: processing the predicates does not come entirely for free, NEON tends to generate slightly less code which can have a big impact on these small kernels, and then there are second order affects that SVE codegen is slightly less optimal in some areas. This codegen strategy to generate more NEON is inline with GCC's codegen strategy, which is actually even more aggressive in generating NEON when no predication is required. We could be smarter and more aggressive too about generating more NEON (and improve performance), but this seems to be a first good and straight forward step. This depends on llvm#95818.
@@ -244,6 +244,9 @@ def FeatureExperimentalZeroingPseudos | |||
def FeatureUseScalarIncVL : SubtargetFeature<"use-scalar-inc-vl", | |||
"UseScalarIncVL", "true", "Prefer inc/dec over add+cnt">; | |||
|
|||
def FeatureUseFixedIfEqualToScalable : SubtargetFeature<"use-fixed-if-equal-to-scalable", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could probably do with the word "Cost" in their somewhere. Maybe FeatureUseFixedOverScalableIfEqualCost
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be a full-blown feature if you intend to just enable it by default for V2 anyway? When we added VScaleForTuning, we just used a boolean in AArch64Subtarget that gets set during initialisation for the particular CPU. See AArch64Subtarget::initializeProperties.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Subtarget features are considered the best way to add tuning features like this. There is even a comment about it
void AArch64Subtarget::initializeProperties(bool HasMinSize) {
// Initialize CPU specific properties. We should add a tablegen feature for
// this in the future so we can specify it together with the subtarget
// features.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok. Fair enough!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case maybe it's worth us revisiting VScaleForTuning as well and making that a feature for consistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the naming suggesting, I was struggling with the name, will change it to FeatureUseFixedOverScalableIfEqualCost
.
Could you combine this into #95819? They look atomic together and we would probably want them either both in or both out together. |
Yep, sure, will do. Makes my life a bit easier too. Cheers. |
@sjoerdmeijer Is this PR still relevant? |
This adds a new hook to prefer fixed width loop vectorization over scalable. This will be used in the loop vectoriser to generate more NEON code instead of SVE if cost-model assigns equal costs to fixed and scalable versions of the vectorised loop.
This is used in #95819.