-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[GVN] Load-store forwaring of scalable store to fixed load. #124748
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-llvm-transforms Author: Lou (iamlouk) ChangesWhen storing a scalable vector and the vscale is a compile-time known The usecase is shown in this godbold typedef svfloat32_t svfloat32_fixed_t
__attribute__((arm_sve_vector_bits(512)));
struct svfloat32_wrapped_t {
svfloat32_fixed_t v;
};
static inline svfloat32_wrapped_t
add(svfloat32_wrapped_t a, svfloat32_wrapped_t b) {
return {svadd_f32_x(svptrue_b32(), a.v, b.v)};
}
svfloat32_wrapped_t
foo(svfloat32_wrapped_t a, svfloat32_wrapped_t b) {
// The IR pattern this patch matches is generated for this return:
return add(a, b);
} Full diff: https://github.com/llvm/llvm-project/pull/124748.diff 4 Files Affected:
diff --git a/llvm/include/llvm/Transforms/Utils/VNCoercion.h b/llvm/include/llvm/Transforms/Utils/VNCoercion.h
index f1ea94bf60fcc6..7a5bf80846cc48 100644
--- a/llvm/include/llvm/Transforms/Utils/VNCoercion.h
+++ b/llvm/include/llvm/Transforms/Utils/VNCoercion.h
@@ -23,6 +23,7 @@
namespace llvm {
class Constant;
+class Function;
class StoreInst;
class LoadInst;
class MemIntrinsic;
@@ -35,7 +36,7 @@ namespace VNCoercion {
/// Return true if CoerceAvailableValueToLoadType would succeed if it was
/// called.
bool canCoerceMustAliasedValueToLoad(Value *StoredVal, Type *LoadTy,
- const DataLayout &DL);
+ Function *F);
/// If we saw a store of a value to memory, and then a load from a must-aliased
/// pointer of a different type, try to coerce the stored value to the loaded
diff --git a/llvm/lib/Transforms/Scalar/GVN.cpp b/llvm/lib/Transforms/Scalar/GVN.cpp
index 21eb7f741d7c82..452dd1ece9e172 100644
--- a/llvm/lib/Transforms/Scalar/GVN.cpp
+++ b/llvm/lib/Transforms/Scalar/GVN.cpp
@@ -1291,7 +1291,8 @@ GVNPass::AnalyzeLoadAvailability(LoadInst *Load, MemDepResult DepInfo,
// If MD reported clobber, check it was nested.
if (DepInfo.isClobber() &&
- canCoerceMustAliasedValueToLoad(DepLoad, LoadType, DL)) {
+ canCoerceMustAliasedValueToLoad(DepLoad, LoadType,
+ DepLoad->getFunction())) {
const auto ClobberOff = MD->getClobberOffset(DepLoad);
// GVN has no deal with a negative offset.
Offset = (ClobberOff == std::nullopt || *ClobberOff < 0)
@@ -1343,7 +1344,7 @@ GVNPass::AnalyzeLoadAvailability(LoadInst *Load, MemDepResult DepInfo,
// different types if we have to. If the stored value is convertable to
// the loaded value, we can reuse it.
if (!canCoerceMustAliasedValueToLoad(S->getValueOperand(), Load->getType(),
- DL))
+ S->getFunction()))
return std::nullopt;
// Can't forward from non-atomic to atomic without violating memory model.
@@ -1357,7 +1358,8 @@ GVNPass::AnalyzeLoadAvailability(LoadInst *Load, MemDepResult DepInfo,
// If the types mismatch and we can't handle it, reject reuse of the load.
// If the stored value is larger or equal to the loaded value, we can reuse
// it.
- if (!canCoerceMustAliasedValueToLoad(LD, Load->getType(), DL))
+ if (!canCoerceMustAliasedValueToLoad(LD, Load->getType(),
+ LD->getFunction()))
return std::nullopt;
// Can't forward from non-atomic to atomic without violating memory model.
diff --git a/llvm/lib/Transforms/Utils/VNCoercion.cpp b/llvm/lib/Transforms/Utils/VNCoercion.cpp
index 7a61ab74166389..5949c90676f9fb 100644
--- a/llvm/lib/Transforms/Utils/VNCoercion.cpp
+++ b/llvm/lib/Transforms/Utils/VNCoercion.cpp
@@ -13,32 +13,54 @@ static bool isFirstClassAggregateOrScalableType(Type *Ty) {
return Ty->isStructTy() || Ty->isArrayTy() || isa<ScalableVectorType>(Ty);
}
+static std::optional<unsigned> getKnownVScale(Function *F) {
+ const auto &Attrs = F->getAttributes().getFnAttrs();
+ unsigned MinVScale = Attrs.getVScaleRangeMin();
+ if (Attrs.getVScaleRangeMax() == MinVScale)
+ return MinVScale;
+ return std::nullopt;
+}
+
/// Return true if coerceAvailableValueToLoadType will succeed.
bool canCoerceMustAliasedValueToLoad(Value *StoredVal, Type *LoadTy,
- const DataLayout &DL) {
+ Function *F) {
Type *StoredTy = StoredVal->getType();
-
if (StoredTy == LoadTy)
return true;
+ const DataLayout &DL = F->getDataLayout();
if (isa<ScalableVectorType>(StoredTy) && isa<ScalableVectorType>(LoadTy) &&
DL.getTypeSizeInBits(StoredTy) == DL.getTypeSizeInBits(LoadTy))
return true;
- // If the loaded/stored value is a first class array/struct, or scalable type,
- // don't try to transform them. We need to be able to bitcast to integer.
- if (isFirstClassAggregateOrScalableType(LoadTy) ||
- isFirstClassAggregateOrScalableType(StoredTy))
- return false;
-
- uint64_t StoreSize = DL.getTypeSizeInBits(StoredTy).getFixedValue();
+ // If the loaded/stored value is a first class array/struct, don't try to
+ // transform them. We need to be able to bitcast to integer. For scalable
+ // vectors forwarded to fixed-sized vectors with a compile-time known
+ // vscale, @llvm.vector.extract is used.
+ uint64_t StoreSize, LoadSize;
+ if (isa<ScalableVectorType>(StoredTy) && isa<FixedVectorType>(LoadTy)) {
+ std::optional<unsigned> VScale = getKnownVScale(F);
+ if (!VScale || StoredTy->getScalarType() != LoadTy->getScalarType())
+ return false;
+
+ StoreSize =
+ DL.getTypeSizeInBits(StoredTy).getKnownMinValue() * VScale.value();
+ LoadSize = DL.getTypeSizeInBits(LoadTy).getFixedValue();
+ } else {
+ if (isFirstClassAggregateOrScalableType(LoadTy) ||
+ isFirstClassAggregateOrScalableType(StoredTy))
+ return false;
+
+ StoreSize = DL.getTypeSizeInBits(StoredTy).getFixedValue();
+ LoadSize = DL.getTypeSizeInBits(LoadTy).getFixedValue();
+ }
// The store size must be byte-aligned to support future type casts.
if (llvm::alignTo(StoreSize, 8) != StoreSize)
return false;
// The store has to be at least as big as the load.
- if (StoreSize < DL.getTypeSizeInBits(LoadTy).getFixedValue())
+ if (StoreSize < LoadSize)
return false;
bool StoredNI = DL.isNonIntegralPointerType(StoredTy->getScalarType());
@@ -57,11 +79,10 @@ bool canCoerceMustAliasedValueToLoad(Value *StoredVal, Type *LoadTy,
return false;
}
-
// The implementation below uses inttoptr for vectors of unequal size; we
// can't allow this for non integral pointers. We could teach it to extract
// exact subvectors if desired.
- if (StoredNI && StoreSize != DL.getTypeSizeInBits(LoadTy).getFixedValue())
+ if (StoredNI && StoreSize != LoadSize)
return false;
if (StoredTy->isTargetExtTy() || LoadTy->isTargetExtTy())
@@ -79,7 +100,8 @@ bool canCoerceMustAliasedValueToLoad(Value *StoredVal, Type *LoadTy,
Value *coerceAvailableValueToLoadType(Value *StoredVal, Type *LoadedTy,
IRBuilderBase &Helper,
const DataLayout &DL) {
- assert(canCoerceMustAliasedValueToLoad(StoredVal, LoadedTy, DL) &&
+ assert(canCoerceMustAliasedValueToLoad(
+ StoredVal, LoadedTy, Helper.GetInsertBlock()->getParent()) &&
"precondition violation - materialization can't fail");
if (auto *C = dyn_cast<Constant>(StoredVal))
StoredVal = ConstantFoldConstant(C, DL);
@@ -87,6 +109,14 @@ Value *coerceAvailableValueToLoadType(Value *StoredVal, Type *LoadedTy,
// If this is already the right type, just return it.
Type *StoredValTy = StoredVal->getType();
+ // If this is a scalable vector forwarded to a fixed vector load, create
+ // a @llvm.vector.extract instead of bitcasts.
+ if (isa<ScalableVectorType>(StoredVal->getType()) &&
+ isa<FixedVectorType>(LoadedTy)) {
+ return Helper.CreateIntrinsic(LoadedTy, Intrinsic::vector_extract,
+ {StoredVal, Helper.getInt64(0)});
+ }
+
TypeSize StoredValSize = DL.getTypeSizeInBits(StoredValTy);
TypeSize LoadedValSize = DL.getTypeSizeInBits(LoadedTy);
@@ -220,7 +250,7 @@ int analyzeLoadFromClobberingStore(Type *LoadTy, Value *LoadPtr,
if (isFirstClassAggregateOrScalableType(StoredVal->getType()))
return -1;
- if (!canCoerceMustAliasedValueToLoad(StoredVal, LoadTy, DL))
+ if (!canCoerceMustAliasedValueToLoad(StoredVal, LoadTy, DepSI->getFunction()))
return -1;
Value *StorePtr = DepSI->getPointerOperand();
@@ -235,11 +265,11 @@ int analyzeLoadFromClobberingStore(Type *LoadTy, Value *LoadPtr,
/// the other load can feed into the second load.
int analyzeLoadFromClobberingLoad(Type *LoadTy, Value *LoadPtr, LoadInst *DepLI,
const DataLayout &DL) {
- // Cannot handle reading from store of first-class aggregate yet.
- if (DepLI->getType()->isStructTy() || DepLI->getType()->isArrayTy())
+ // Cannot handle reading from store of first-class aggregate or scalable type.
+ if (isFirstClassAggregateOrScalableType(DepLI->getType()))
return -1;
- if (!canCoerceMustAliasedValueToLoad(DepLI, LoadTy, DL))
+ if (!canCoerceMustAliasedValueToLoad(DepLI, LoadTy, DepLI->getFunction()))
return -1;
Value *DepPtr = DepLI->getPointerOperand();
@@ -315,6 +345,16 @@ static Value *getStoreValueForLoadHelper(Value *SrcVal, unsigned Offset,
return SrcVal;
}
+ // For the case of a scalable vector beeing forwarded to a fixed-sized load,
+ // only equal element types are allowed and a @llvm.vector.extract will be
+ // used instead of bitcasts.
+ if (isa<ScalableVectorType>(SrcVal->getType()) &&
+ isa<FixedVectorType>(LoadTy)) {
+ assert(Offset == 0 &&
+ SrcVal->getType()->getScalarType() == LoadTy->getScalarType());
+ return SrcVal;
+ }
+
uint64_t StoreSize =
(DL.getTypeSizeInBits(SrcVal->getType()).getFixedValue() + 7) / 8;
uint64_t LoadSize = (DL.getTypeSizeInBits(LoadTy).getFixedValue() + 7) / 8;
@@ -348,6 +388,10 @@ Value *getValueForLoad(Value *SrcVal, unsigned Offset, Type *LoadTy,
#ifndef NDEBUG
TypeSize SrcValSize = DL.getTypeStoreSize(SrcVal->getType());
TypeSize LoadSize = DL.getTypeStoreSize(LoadTy);
+ if (SrcValSize.isScalable() && !LoadSize.isScalable())
+ SrcValSize =
+ TypeSize::getFixed(SrcValSize.getKnownMinValue() *
+ getKnownVScale(InsertPt->getFunction()).value());
assert(SrcValSize.isScalable() == LoadSize.isScalable());
assert((SrcValSize.isScalable() || Offset + LoadSize <= SrcValSize) &&
"Expected Offset + LoadSize <= SrcValSize");
diff --git a/llvm/test/Transforms/GVN/vscale.ll b/llvm/test/Transforms/GVN/vscale.ll
index 67cbfc2f05ef84..2a212831513ada 100644
--- a/llvm/test/Transforms/GVN/vscale.ll
+++ b/llvm/test/Transforms/GVN/vscale.ll
@@ -641,3 +641,63 @@ entry:
call void @llvm.lifetime.end.p0(i64 -1, ptr nonnull %ref.tmp)
ret { <vscale x 16 x i8>, <vscale x 16 x i8>, <vscale x 16 x i8>, <vscale x 16 x i8> } %15
}
+
+define <vscale x 4 x float> @scalable_store_to_fixed_load(<vscale x 4 x float> %.coerce) #1 {
+; CHECK-LABEL: @scalable_store_to_fixed_load(
+; CHECK-NEXT: entry:
+; CHECK-NEXT: [[RETVAL:%.*]] = alloca { <16 x float> }, align 64
+; CHECK-NEXT: [[TMP0:%.*]] = fadd <vscale x 4 x float> [[DOTCOERCE:%.*]], [[DOTCOERCE]]
+; CHECK-NEXT: store <vscale x 4 x float> [[TMP0]], ptr [[RETVAL]], align 16
+; CHECK-NEXT: ret <vscale x 4 x float> [[TMP0]]
+;
+entry:
+ %retval = alloca { <16 x float> }
+ %0 = fadd <vscale x 4 x float> %.coerce, %.coerce
+ store <vscale x 4 x float> %0, ptr %retval
+ %1 = load <16 x float>, ptr %retval
+ %cast.scalable = tail call <vscale x 4 x float> @llvm.vector.insert.nxv4f32.v16f32(<vscale x 4 x float> poison, <16 x float> %1, i64 0)
+ ret <vscale x 4 x float> %cast.scalable
+}
+
+define <vscale x 4 x float> @scalable_store_to_fixed_load_unknon_vscale(<vscale x 4 x float> %.coerce) {
+; CHECK-LABEL: @scalable_store_to_fixed_load_unknon_vscale(
+; CHECK-NEXT: entry:
+; CHECK-NEXT: [[RETVAL:%.*]] = alloca { <16 x float> }, align 64
+; CHECK-NEXT: [[TMP0:%.*]] = fadd <vscale x 4 x float> [[DOTCOERCE:%.*]], [[DOTCOERCE]]
+; CHECK-NEXT: store <vscale x 4 x float> [[TMP0]], ptr [[RETVAL]], align 16
+; CHECK-NEXT: [[TMP1:%.*]] = load <16 x float>, ptr [[RETVAL]], align 64
+; CHECK-NEXT: [[CAST_SCALABLE:%.*]] = tail call <vscale x 4 x float> @llvm.vector.insert.nxv4f32.v16f32(<vscale x 4 x float> poison, <16 x float> [[TMP1]], i64 0)
+; CHECK-NEXT: ret <vscale x 4 x float> [[CAST_SCALABLE]]
+;
+entry:
+ %retval = alloca { <16 x float> }
+ %0 = fadd <vscale x 4 x float> %.coerce, %.coerce
+ store <vscale x 4 x float> %0, ptr %retval
+ %1 = load <16 x float>, ptr %retval
+ %cast.scalable = tail call <vscale x 4 x float> @llvm.vector.insert.nxv4f32.v16f32(<vscale x 4 x float> poison, <16 x float> %1, i64 0)
+ ret <vscale x 4 x float> %cast.scalable
+}
+
+define <vscale x 4 x float> @scalable_store_to_fixed_load_size_missmatch(<vscale x 4 x float> %.coerce) #1 {
+; CHECK-LABEL: @scalable_store_to_fixed_load_size_missmatch(
+; CHECK-NEXT: entry:
+; CHECK-NEXT: [[RETVAL:%.*]] = alloca { <32 x float> }, align 128
+; CHECK-NEXT: [[TMP0:%.*]] = fadd <vscale x 4 x float> [[DOTCOERCE:%.*]], [[DOTCOERCE]]
+; CHECK-NEXT: store <vscale x 4 x float> [[TMP0]], ptr [[RETVAL]], align 16
+; CHECK-NEXT: [[TMP1:%.*]] = load <32 x float>, ptr [[RETVAL]], align 128
+; CHECK-NEXT: [[CAST_SCALABLE:%.*]] = tail call <vscale x 4 x float> @llvm.vector.insert.nxv4f32.v32f32(<vscale x 4 x float> poison, <32 x float> [[TMP1]], i64 0)
+; CHECK-NEXT: ret <vscale x 4 x float> [[CAST_SCALABLE]]
+;
+entry:
+ %retval = alloca { <32 x float> }
+ %0 = fadd <vscale x 4 x float> %.coerce, %.coerce
+ store <vscale x 4 x float> %0, ptr %retval
+ %1 = load <32 x float>, ptr %retval
+ %cast.scalable = tail call <vscale x 4 x float> @llvm.vector.insert.nxv4f32.v32f32(<vscale x 4 x float> poison, <32 x float> %1, i64 0)
+ ret <vscale x 4 x float> %cast.scalable
+}
+
+declare <vscale x 4 x float> @llvm.vector.insert.nxv4f32.v16f32(<vscale x 4 x float>, <16 x float>, i64 immarg)
+declare <vscale x 4 x float> @llvm.vector.insert.nxv4f32.v32f32(<vscale x 4 x float>, <32 x float>, i64 immarg)
+
+attributes #1 = { vscale_range(4,4) }
|
@davemgreen If you would not mind, could you have a look at this? Thanks in advance! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks reasonable from a cursory look.
I think this could be straightforwardly generalized to the case where store >= load rather than store == load (in which case we don't need to know vscale exactly, just the minimum). In that case it's still correct to extract the prefix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to add a negative test where the load has an extra constant offset, so make sure we don't forward in that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Different types might be useful too. For example a scalable_store_to_fixed_load with <vscale x 4 x float>
input but <vscale x 4 x i32>
output for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added one just now (in the existing test file).
8397d75
to
e399594
Compare
✅ With the latest revision this PR passed the C/C++ code formatter. |
e399594
to
7dfc50a
Compare
I just updated the MR and added that functionality. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@davemgreen If you would not mind, could you have a look at this? Thanks in advance!
I just happened to be the last person who touched this, in #123984. One thing that I thought was probably useful was to add NewGVN test coverage, to make sure it doesn't do anything unexpected.
@@ -315,6 +345,16 @@ static Value *getStoreValueForLoadHelper(Value *SrcVal, unsigned Offset, | |||
return SrcVal; | |||
} | |||
|
|||
// For the case of a scalable vector beeing forwarded to a fixed-sized load, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-> being
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, sorry, fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Different types might be useful too. For example a scalable_store_to_fixed_load with <vscale x 4 x float>
input but <vscale x 4 x i32>
output for example.
|
||
// If the VScale is known at compile-time, use that information to | ||
// allow for wider loads. | ||
std::optional<unsigned> VScale = getKnownVScale(F); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be generalized to only look at getVScaleRangeMin to determine the minimum store size (without knowing the exact store size), as we only need StoreSize >= LoadSize. (Don't know if we expect that pattern to occur.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I did add this generalization, and therefore removed the getKnownVScale(...)
helper and also renamed the calculated size to MinStoreSize
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please also add a test for that case? Should like what you already have, just without the upper bound.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just added it.
7dfc50a
to
3f28c1b
Compare
llvm/test/Transforms/GVN/vscale.ll
Outdated
@@ -641,3 +641,117 @@ entry: | |||
call void @llvm.lifetime.end.p0(i64 -1, ptr nonnull %ref.tmp) | |||
ret { <vscale x 16 x i8>, <vscale x 16 x i8>, <vscale x 16 x i8>, <vscale x 16 x i8> } %15 | |||
} | |||
|
|||
define <vscale x 4 x float> @scalable_store_to_fixed_load(<vscale x 4 x float> %.coerce) #1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I think these tests are easier to read if you inline the vscale range.
define <vscale x 4 x float> @scalable_store_to_fixed_load(<vscale x 4 x float> %.coerce) #1 { | |
define <vscale x 4 x float> @scalable_store_to_fixed_load(<vscale x 4 x float> %.coerce) vscale_range(4,4) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I just did that.
llvm/test/Transforms/GVN/vscale.ll
Outdated
|
||
declare <vscale x 4 x float> @llvm.vector.insert.nxv4f32.v16f32(<vscale x 4 x float>, <16 x float>, i64 immarg) | ||
declare <vscale x 4 x i32> @llvm.vector.insert.nxv4i32.v16i32(<vscale x 4 x i32>, <16 x i32>, i64 immarg) | ||
declare <vscale x 4 x float> @llvm.vector.insert.nxv4f32.v32f32(<vscale x 4 x float>, <32 x float>, i64 immarg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: You can drop these declarations, they're not required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I just did that.
3f28c1b
to
7fa3527
Compare
NewGVN does not seam to use the |
When storing a scalable vector and the vscale is a compile-time known constant, store-to-load forwarding through temporary @llvm.vector.extract calls, even if the loaded vector is fixed-sized instead of scalable. InstCombine then folds the insert/extract pair away. The usecase is shown in this [godbold link](https://godbolt.org/z/KT3sMrMbd), which shows that clang generates IR that matches this pattern when the "arm_sve_vector_bits" attribute is used: ```c typedef svfloat32_t svfloat32_fixed_t __attribute__((arm_sve_vector_bits(512))); struct svfloat32_wrapped_t { svfloat32_fixed_t v; }; static inline svfloat32_wrapped_t add(svfloat32_wrapped_t a, svfloat32_wrapped_t b) { return {svadd_f32_x(svptrue_b32(), a.v, b.v)}; } svfloat32_wrapped_t foo(svfloat32_wrapped_t a, svfloat32_wrapped_t b) { // The IR pattern this patch matches is generated for this return: return add(a, b); } ```
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/30/builds/14919 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/73/builds/12725 Here is the relevant piece of the build log for the reference
|
When storing a scalable vector and loading a fixed-size vector, where the
scalable vector is known to be larger based on vscale_range, perform
store-to-load forwarding through temporary @llvm.vector.extract calls.
InstCombine then folds the insert/extract pair away.
The usecase is shown in this godbolt
link, which shows that clang generates
IR that matches this pattern when the "arm_sve_vector_bits" attribute is
used: