Skip to content

[TTI][AArch64] Add preferFixedIfEqualToScalable hook #95818

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

sjoerdmeijer
Copy link
Collaborator

@sjoerdmeijer sjoerdmeijer commented Jun 17, 2024

This adds a new hook to prefer fixed width loop vectorization over scalable. This will be used in the loop vectoriser to generate more NEON code instead of SVE if cost-model assigns equal costs to fixed and scalable versions of the vectorised loop.

This is used in #95819.

@llvmbot llvmbot added backend:AArch64 llvm:analysis Includes value tracking, cost tables and constant folding labels Jun 17, 2024
@llvmbot
Copy link
Member

llvmbot commented Jun 17, 2024

@llvm/pr-subscribers-llvm-analysis

@llvm/pr-subscribers-backend-aarch64

Author: Sjoerd Meijer (sjoerdmeijer)

Changes

This adds a new hook to prefer fixed width loop vectorization over scalable. This will be used in the loop vectoriser to generate more NEON code instead of SVE if cost-model assigns equal costs to fixed and scalable versions of the vectorised loop.


Full diff: https://github.com/llvm/llvm-project/pull/95818.diff

6 Files Affected:

  • (modified) llvm/include/llvm/Analysis/TargetTransformInfo.h (+6)
  • (modified) llvm/include/llvm/Analysis/TargetTransformInfoImpl.h (+4)
  • (modified) llvm/lib/Analysis/TargetTransformInfo.cpp (+4)
  • (modified) llvm/lib/Target/AArch64/AArch64Features.td (+3)
  • (modified) llvm/lib/Target/AArch64/AArch64Processors.td (+1)
  • (modified) llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h (+2)
diff --git a/llvm/include/llvm/Analysis/TargetTransformInfo.h b/llvm/include/llvm/Analysis/TargetTransformInfo.h
index f55f21c94a85a..84b811818beee 100644
--- a/llvm/include/llvm/Analysis/TargetTransformInfo.h
+++ b/llvm/include/llvm/Analysis/TargetTransformInfo.h
@@ -1674,6 +1674,8 @@ class TargetTransformInfo {
         false; ///< If op is an fp min/max, whether NaNs may be present.
   };
 
+  bool preferFixedIfEqualToScalable() const;
+
   /// \returns True if the target prefers reductions in loop.
   bool preferInLoopReduction(unsigned Opcode, Type *Ty,
                              ReductionFlags Flags) const;
@@ -2143,6 +2145,7 @@ class TargetTransformInfo::Concept {
   virtual unsigned getStoreVectorFactor(unsigned VF, unsigned StoreSize,
                                         unsigned ChainSizeInBytes,
                                         VectorType *VecTy) const = 0;
+  virtual bool preferFixedIfEqualToScalable() const = 0;
   virtual bool preferInLoopReduction(unsigned Opcode, Type *Ty,
                                      ReductionFlags) const = 0;
   virtual bool preferPredicatedReductionSelect(unsigned Opcode, Type *Ty,
@@ -2873,6 +2876,9 @@ class TargetTransformInfo::Model final : public TargetTransformInfo::Concept {
                                 VectorType *VecTy) const override {
     return Impl.getStoreVectorFactor(VF, StoreSize, ChainSizeInBytes, VecTy);
   }
+  bool preferFixedIfEqualToScalable() const override {
+    return Impl.preferFixedIfEqualToScalable();
+  }
   bool preferInLoopReduction(unsigned Opcode, Type *Ty,
                              ReductionFlags Flags) const override {
     return Impl.preferInLoopReduction(Opcode, Ty, Flags);
diff --git a/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h b/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
index 7828bdc1f1f43..9679c9ec6bc4e 100644
--- a/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
+++ b/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
@@ -913,6 +913,10 @@ class TargetTransformInfoImplBase {
     return VF;
   }
 
+  bool preferFixedIfEqualToScalable() const {
+    return false;
+  }
+
   bool preferInLoopReduction(unsigned Opcode, Type *Ty,
                              TTI::ReductionFlags Flags) const {
     return false;
diff --git a/llvm/lib/Analysis/TargetTransformInfo.cpp b/llvm/lib/Analysis/TargetTransformInfo.cpp
index 7e721cbc87f3f..27a7f5b32d3cf 100644
--- a/llvm/lib/Analysis/TargetTransformInfo.cpp
+++ b/llvm/lib/Analysis/TargetTransformInfo.cpp
@@ -1282,6 +1282,10 @@ unsigned TargetTransformInfo::getStoreVectorFactor(unsigned VF,
   return TTIImpl->getStoreVectorFactor(VF, StoreSize, ChainSizeInBytes, VecTy);
 }
 
+bool TargetTransformInfo::preferFixedIfEqualToScalable() const {
+  return TTIImpl->preferFixedIfEqualToScalable();
+}
+
 bool TargetTransformInfo::preferInLoopReduction(unsigned Opcode, Type *Ty,
                                                 ReductionFlags Flags) const {
   return TTIImpl->preferInLoopReduction(Opcode, Ty, Flags);
diff --git a/llvm/lib/Target/AArch64/AArch64Features.td b/llvm/lib/Target/AArch64/AArch64Features.td
index ffb899a301459..988630769afdb 100644
--- a/llvm/lib/Target/AArch64/AArch64Features.td
+++ b/llvm/lib/Target/AArch64/AArch64Features.td
@@ -244,6 +244,9 @@ def FeatureExperimentalZeroingPseudos
 def FeatureUseScalarIncVL : SubtargetFeature<"use-scalar-inc-vl",
   "UseScalarIncVL", "true", "Prefer inc/dec over add+cnt">;
 
+def FeatureUseFixedIfEqualToScalable : SubtargetFeature<"use-fixed-if-equal-to-scalable",
+  "UseFixedIfEqualToScalable", "true", "Prefer fixed width loop vectorization over scalable if cost-model assigns equal costs">;
+
 def FeatureBF16 : Extension<"bf16", "BF16",
     "Enable BFloat16 Extension (FEAT_BF16)", [],
     "FEAT_BF16", "+bf16", 280>;
diff --git a/llvm/lib/Target/AArch64/AArch64Processors.td b/llvm/lib/Target/AArch64/AArch64Processors.td
index cc33765307fb4..a0263f0164f4f 100644
--- a/llvm/lib/Target/AArch64/AArch64Processors.td
+++ b/llvm/lib/Target/AArch64/AArch64Processors.td
@@ -489,6 +489,7 @@ def TuneNeoverseV2 : SubtargetFeature<"neoversev2", "ARMProcFamily", "NeoverseV2
                                       FeatureALULSLFast,
                                       FeaturePostRAScheduler,
                                       FeatureEnableSelectOptimize,
+                                      FeatureUseFixedIfEqualToScalable,
                                       FeaturePredictableSelectIsExpensive]>;
 
 def TuneNeoverseV3 : SubtargetFeature<"neoversev3", "ARMProcFamily", "NeoverseV3",
diff --git a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h
index feec1a4289c3a..13e66f9ea1913 100644
--- a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h
+++ b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h
@@ -371,6 +371,8 @@ class AArch64TTIImpl : public BasicTTIImplBase<AArch64TTIImpl> {
     return TailFoldingStyle::DataWithoutLaneMask;
   }
 
+  bool preferFixedIfEqualToScalable() const { return ST->useFixedIfEqualToScalable(); }
+
   bool preferPredicateOverEpilogue(TailFoldingInfo *TFI);
 
   bool supportsScalableVectors() const { return ST->hasSVE(); }

Copy link

github-actions bot commented Jun 17, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

This adds a new hook to prefer fixed width loop vectorization over
scalable. This will be used in the loop vectoriser to generate more NEON
code instead of SVE if cost-model assigns equal costs to fixed and
scalable versions of the vectorised loop.

This is used in llvm#95819.
sjoerdmeijer added a commit to sjoerdmeijer/llvm-project that referenced this pull request Jun 17, 2024
…erse V2)

For the Neoverse V2, prefer fixed width vectorisation If the cost-model
assigns an equal cost to fixed and scalable vectorisation. This improves
7 kernels from TSVC-2 by about 2x, and does not affect SPEC21017 INT
and FP.

This tends to benefit small kernels, like the ones in TSVC, for a
number of reasons: processing the predicates does not come entirely
for free, NEON tends to generate slightly less code which can have a
big impact on these small kernels, and then there are second order
affects that SVE codegen is slightly less optimal in some areas.

This codegen strategy to generate more NEON is inline with GCC's codegen
strategy, which is actually even more aggressive in generating NEON when
no predication is required. We could be smarter and more aggressive too
about generating more NEON (and improve performance), but this seems to
be a first good and straight forward step.

This depends on llvm#95818.
@@ -244,6 +244,9 @@ def FeatureExperimentalZeroingPseudos
def FeatureUseScalarIncVL : SubtargetFeature<"use-scalar-inc-vl",
"UseScalarIncVL", "true", "Prefer inc/dec over add+cnt">;

def FeatureUseFixedIfEqualToScalable : SubtargetFeature<"use-fixed-if-equal-to-scalable",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could probably do with the word "Cost" in their somewhere. Maybe FeatureUseFixedOverScalableIfEqualCost

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be a full-blown feature if you intend to just enable it by default for V2 anyway? When we added VScaleForTuning, we just used a boolean in AArch64Subtarget that gets set during initialisation for the particular CPU. See AArch64Subtarget::initializeProperties.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Subtarget features are considered the best way to add tuning features like this. There is even a comment about it

void AArch64Subtarget::initializeProperties(bool HasMinSize) {
  // Initialize CPU specific properties. We should add a tablegen feature for
  // this in the future so we can specify it together with the subtarget
  // features.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok. Fair enough!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case maybe it's worth us revisiting VScaleForTuning as well and making that a feature for consistency.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the naming suggesting, I was struggling with the name, will change it to FeatureUseFixedOverScalableIfEqualCost.

@davemgreen
Copy link
Collaborator

Could you combine this into #95819? They look atomic together and we would probably want them either both in or both out together.

@sjoerdmeijer
Copy link
Collaborator Author

Could you combine this into #95819? They look atomic together and we would probably want them either both in or both out together.

Yep, sure, will do. Makes my life a bit easier too. Cheers.

@paulwalker-arm
Copy link
Collaborator

@sjoerdmeijer Is this PR still relevant?

@david-arm david-arm removed their request for review October 9, 2024 09:20
@paulwalker-arm paulwalker-arm removed their request for review October 11, 2024 09:37
@sjoerdmeijer sjoerdmeijer deleted the tti-prefer-fixed branch November 14, 2024 09:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:AArch64 llvm:analysis Includes value tracking, cost tables and constant folding
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants