[AArch64][CostModel] Increase the cost of illegal SVE int-to-fp converts #130756

huntergr-arm · 2025-03-11T11:39:08Z

If a scalable vector uitofp or sitofp effectively extend the size of each element as part of the conversion, the AArch64 backend will need to plant multiple unpacks before converting.

llvmbot · 2025-03-11T11:39:42Z

@llvm/pr-subscribers-llvm-analysis

@llvm/pr-subscribers-backend-aarch64

Author: Graham Hunter (huntergr-arm)

Changes

If a scalable vector uitofp or sitofp effectively extend the size of each element as part of the conversion, the AArch64 backend will need to plant multiple unpacks before converting.

Patch is 28.65 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/130756.diff

2 Files Affected:

(modified) llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp (+15)
(added) llvm/test/Analysis/CostModel/AArch64/sve-itofp.ll (+268)

diff --git a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
index 7cec8a17dfaaa..8091fb8f990bf 100644
--- a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
@@ -3144,6 +3144,21 @@ InstructionCost AArch64TTIImpl::getCastInstrCost(unsigned Opcode, Type *Dst,
       {ISD::SIGN_EXTEND, MVT::nxv8i32, MVT::nxv8i16, 2},
       {ISD::SIGN_EXTEND, MVT::nxv8i64, MVT::nxv8i16, 6},
       {ISD::SIGN_EXTEND, MVT::nxv4i64, MVT::nxv4i32, 2},
+
+      // Add cost for extending and converting to illegal -too wide- scalable
+      // Extending one size (e.g. i32 -> f64) takes 2 unpacks and 2 fcvts, while
+      // extending twice (e.g. i16 -> f64) takes 6 unpacks and 4 fcvts.
+      {ISD::SINT_TO_FP, MVT::nxv16f16, MVT::nxv16i8, 12},
+      {ISD::SINT_TO_FP, MVT::nxv16f32, MVT::nxv16i8, 22},
+      {ISD::SINT_TO_FP, MVT::nxv8f32, MVT::nxv8i16, 12},
+      {ISD::SINT_TO_FP, MVT::nxv8f64, MVT::nxv8i16, 22},
+      {ISD::SINT_TO_FP, MVT::nxv4f64, MVT::nxv4i32, 12},
+
+      {ISD::UINT_TO_FP, MVT::nxv16f16, MVT::nxv16i8, 12},
+      {ISD::UINT_TO_FP, MVT::nxv16f32, MVT::nxv16i8, 22},
+      {ISD::UINT_TO_FP, MVT::nxv8f32, MVT::nxv8i16, 12},
+      {ISD::UINT_TO_FP, MVT::nxv8f64, MVT::nxv8i16, 22},
+      {ISD::UINT_TO_FP, MVT::nxv4f64, MVT::nxv4i32, 12},
   };
 
   // We have to estimate a cost of fixed length operation upon
diff --git a/llvm/test/Analysis/CostModel/AArch64/sve-itofp.ll b/llvm/test/Analysis/CostModel/AArch64/sve-itofp.ll
new file mode 100644
index 0000000000000..12fd6411255f2
--- /dev/null
+++ b/llvm/test/Analysis/CostModel/AArch64/sve-itofp.ll
@@ -0,0 +1,268 @@
+; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py UTC_ARGS: --version 5
+; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -mtriple aarch64-linux-gnu -mattr=+sve -o - -S < %s | FileCheck %s
+
+target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
+target triple = "aarch64-unknown-linux-gnu"
+
+define void @sve-itofp() {
+; CHECK-LABEL: 'sve-itofp'
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %nv1si8_to_f16 = sitofp <vscale x 1 x i8> undef to <vscale x 1 x half>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %nv1ui8_to_f16 = uitofp <vscale x 1 x i8> undef to <vscale x 1 x half>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %nv1si16_to_f16 = sitofp <vscale x 1 x i16> undef to <vscale x 1 x half>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %nv1ui16_to_f16 = uitofp <vscale x 1 x i16> undef to <vscale x 1 x half>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %nv1si32_to_f16 = sitofp <vscale x 1 x i32> undef to <vscale x 1 x half>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %nv1ui32_to_f16 = uitofp <vscale x 1 x i32> undef to <vscale x 1 x half>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %nv1si64_to_f16 = sitofp <vscale x 1 x i64> undef to <vscale x 1 x half>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %nv1ui64_to_f16 = uitofp <vscale x 1 x i64> undef to <vscale x 1 x half>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %nv1si8_to_f32 = sitofp <vscale x 1 x i8> undef to <vscale x 1 x float>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %nv1ui8_to_f32 = uitofp <vscale x 1 x i8> undef to <vscale x 1 x float>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %nv1si16_to_f32 = sitofp <vscale x 1 x i16> undef to <vscale x 1 x float>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %nv1ui16_to_f32 = uitofp <vscale x 1 x i16> undef to <vscale x 1 x float>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %nv1si32_to_f32 = sitofp <vscale x 1 x i32> undef to <vscale x 1 x float>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %nv1ui32_to_f32 = uitofp <vscale x 1 x i32> undef to <vscale x 1 x float>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %nv1si64_to_f32 = sitofp <vscale x 1 x i64> undef to <vscale x 1 x float>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %nv1ui64_to_f32 = uitofp <vscale x 1 x i64> undef to <vscale x 1 x float>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %nv1si8_to_f64 = sitofp <vscale x 1 x i8> undef to <vscale x 1 x double>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %nv1ui8_to_f64 = uitofp <vscale x 1 x i8> undef to <vscale x 1 x double>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %nv1si16_to_f64 = sitofp <vscale x 1 x i16> undef to <vscale x 1 x double>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %nv1ui16_to_f64 = uitofp <vscale x 1 x i16> undef to <vscale x 1 x double>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %nv1si32_to_f64 = sitofp <vscale x 1 x i32> undef to <vscale x 1 x double>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %nv1ui32_to_f64 = uitofp <vscale x 1 x i32> undef to <vscale x 1 x double>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %nv1si64_to_f64 = sitofp <vscale x 1 x i64> undef to <vscale x 1 x double>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %nv1ui64_to_f64 = uitofp <vscale x 1 x i64> undef to <vscale x 1 x double>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %nv2si8_to_f16 = sitofp <vscale x 2 x i8> undef to <vscale x 2 x half>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %nv2ui8_to_f16 = uitofp <vscale x 2 x i8> undef to <vscale x 2 x half>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %nv2si16_to_f16 = sitofp <vscale x 2 x i16> undef to <vscale x 2 x half>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %nv2ui16_to_f16 = uitofp <vscale x 2 x i16> undef to <vscale x 2 x half>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %nv2si32_to_f16 = sitofp <vscale x 2 x i32> undef to <vscale x 2 x half>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %nv2ui32_to_f16 = uitofp <vscale x 2 x i32> undef to <vscale x 2 x half>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %nv2si64_to_f16 = sitofp <vscale x 2 x i64> undef to <vscale x 2 x half>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %nv2ui64_to_f16 = uitofp <vscale x 2 x i64> undef to <vscale x 2 x half>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %nv2si8_to_f32 = sitofp <vscale x 2 x i8> undef to <vscale x 2 x float>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %nv2ui8_to_f32 = uitofp <vscale x 2 x i8> undef to <vscale x 2 x float>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %nv2si16_to_f32 = sitofp <vscale x 2 x i16> undef to <vscale x 2 x float>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %nv2ui16_to_f32 = uitofp <vscale x 2 x i16> undef to <vscale x 2 x float>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %nv2si32_to_f32 = sitofp <vscale x 2 x i32> undef to <vscale x 2 x float>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %nv2ui32_to_f32 = uitofp <vscale x 2 x i32> undef to <vscale x 2 x float>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %nv2si64_to_f32 = sitofp <vscale x 2 x i64> undef to <vscale x 2 x float>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %nv2ui64_to_f32 = uitofp <vscale x 2 x i64> undef to <vscale x 2 x float>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %nv2si8_to_f64 = sitofp <vscale x 2 x i8> undef to <vscale x 2 x double>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %nv2ui8_to_f64 = uitofp <vscale x 2 x i8> undef to <vscale x 2 x double>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %nv2si16_to_f64 = sitofp <vscale x 2 x i16> undef to <vscale x 2 x double>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %nv2ui16_to_f64 = uitofp <vscale x 2 x i16> undef to <vscale x 2 x double>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %nv2si32_to_f64 = sitofp <vscale x 2 x i32> undef to <vscale x 2 x double>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %nv2ui32_to_f64 = uitofp <vscale x 2 x i32> undef to <vscale x 2 x double>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %nv2si64_to_f64 = sitofp <vscale x 2 x i64> undef to <vscale x 2 x double>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %nv2ui64_to_f64 = uitofp <vscale x 2 x i64> undef to <vscale x 2 x double>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %nv4si8_to_f16 = sitofp <vscale x 4 x i8> undef to <vscale x 4 x half>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %nv4ui8_to_f16 = uitofp <vscale x 4 x i8> undef to <vscale x 4 x half>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %nv4si16_to_f16 = sitofp <vscale x 4 x i16> undef to <vscale x 4 x half>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %nv4ui16_to_f16 = uitofp <vscale x 4 x i16> undef to <vscale x 4 x half>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %nv4si32_to_f16 = sitofp <vscale x 4 x i32> undef to <vscale x 4 x half>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %nv4ui32_to_f16 = uitofp <vscale x 4 x i32> undef to <vscale x 4 x half>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %nv4si64_to_f16 = sitofp <vscale x 4 x i64> undef to <vscale x 4 x half>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %nv4ui64_to_f16 = uitofp <vscale x 4 x i64> undef to <vscale x 4 x half>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %nv4si8_to_f32 = sitofp <vscale x 4 x i8> undef to <vscale x 4 x float>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %nv4ui8_to_f32 = uitofp <vscale x 4 x i8> undef to <vscale x 4 x float>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %nv4si16_to_f32 = sitofp <vscale x 4 x i16> undef to <vscale x 4 x float>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %nv4ui16_to_f32 = uitofp <vscale x 4 x i16> undef to <vscale x 4 x float>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %nv4si32_to_f32 = sitofp <vscale x 4 x i32> undef to <vscale x 4 x float>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %nv4ui32_to_f32 = uitofp <vscale x 4 x i32> undef to <vscale x 4 x float>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %nv4si64_to_f32 = sitofp <vscale x 4 x i64> undef to <vscale x 4 x float>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %nv4ui64_to_f32 = uitofp <vscale x 4 x i64> undef to <vscale x 4 x float>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %nv4si8_to_f64 = sitofp <vscale x 4 x i8> undef to <vscale x 4 x double>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %nv4ui8_to_f64 = uitofp <vscale x 4 x i8> undef to <vscale x 4 x double>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %nv4si16_to_f64 = sitofp <vscale x 4 x i16> undef to <vscale x 4 x double>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %nv4ui16_to_f64 = uitofp <vscale x 4 x i16> undef to <vscale x 4 x double>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 12 for instruction: %nv4si32_to_f64 = sitofp <vscale x 4 x i32> undef to <vscale x 4 x double>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 12 for instruction: %nv4ui32_to_f64 = uitofp <vscale x 4 x i32> undef to <vscale x 4 x double>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %nv4si64_to_f64 = sitofp <vscale x 4 x i64> undef to <vscale x 4 x double>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %nv4ui64_to_f64 = uitofp <vscale x 4 x i64> undef to <vscale x 4 x double>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %nv8si8_to_f16 = sitofp <vscale x 8 x i8> undef to <vscale x 8 x half>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %nv8ui8_to_f16 = uitofp <vscale x 8 x i8> undef to <vscale x 8 x half>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %nv8si16_to_f16 = sitofp <vscale x 8 x i16> undef to <vscale x 8 x half>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %nv8ui16_to_f16 = uitofp <vscale x 8 x i16> undef to <vscale x 8 x half>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %nv8si32_to_f16 = sitofp <vscale x 8 x i32> undef to <vscale x 8 x half>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %nv8ui32_to_f16 = uitofp <vscale x 8 x i32> undef to <vscale x 8 x half>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 7 for instruction: %nv8si64_to_f16 = sitofp <vscale x 8 x i64> undef to <vscale x 8 x half>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 7 for instruction: %nv8ui64_to_f16 = uitofp <vscale x 8 x i64> undef to <vscale x 8 x half>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %nv8si8_to_f32 = sitofp <vscale x 8 x i8> undef to <vscale x 8 x float>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %nv8ui8_to_f32 = uitofp <vscale x 8 x i8> undef to <vscale x 8 x float>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 12 for instruction: %nv8si16_to_f32 = sitofp <vscale x 8 x i16> undef to <vscale x 8 x float>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 12 for instruction: %nv8ui16_to_f32 = uitofp <vscale x 8 x i16> undef to <vscale x 8 x float>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %nv8si32_to_f32 = sitofp <vscale x 8 x i32> undef to <vscale x 8 x float>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %nv8ui32_to_f32 = uitofp <vscale x 8 x i32> undef to <vscale x 8 x float>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %nv8si64_to_f32 = sitofp <vscale x 8 x i64> undef to <vscale x 8 x float>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %nv8ui64_to_f32 = uitofp <vscale x 8 x i64> undef to <vscale x 8 x float>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 7 for instruction: %nv8si8_to_f64 = sitofp <vscale x 8 x i8> undef to <vscale x 8 x double>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 7 for instruction: %nv8ui8_to_f64 = uitofp <vscale x 8 x i8> undef to <vscale x 8 x double>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 22 for instruction: %nv8si16_to_f64 = sitofp <vscale x 8 x i16> undef to <vscale x 8 x double>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 22 for instruction: %nv8ui16_to_f64 = uitofp <vscale x 8 x i16> undef to <vscale x 8 x double>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 24 for instruction: %nv8si32_to_f64 = sitofp <vscale x 8 x i32> undef to <vscale x 8 x double>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 24 for instruction: %nv8ui32_to_f64 = uitofp <vscale x 8 x i32> undef to <vscale x 8 x double>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %nv8si64_to_f64 = sitofp <vscale x 8 x i64> undef to <vscale x 8 x double>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %nv8ui64_to_f64 = uitofp <vscale x 8 x i64> undef to <vscale x 8 x double>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 12 for instruction: %nv16si8_to_f16 = sitofp <vscale x 16 x i8> undef to <vscale x 16 x half>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 12 for instruction: %nv16ui8_to_f16 = uitofp <vscale x 16 x i8> undef to <vscale x 16 x half>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %nv16si16_to_f16 = sitofp <vscale x 16 x i16> undef to <vscale x 16 x half>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %nv16ui16_to_f16 = uitofp <vscale x 16 x i16> undef to <vscale x 16 x half>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %nv16si32_to_f16 = sitofp <vscale x 16 x i32> undef to <vscale x 16 x half>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %nv16ui32_to_f16 = uitofp <vscale x 16 x i32> undef to <vscale x 16 x half>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 14 for instruction: %nv16si64_to_f16 = sitofp <vscale x 16 x i64> undef to <vscale x 16 x half>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 14 for instruction: %nv16ui64_to_f16 = uitofp <vscale x 16 x i64> undef to <vscale x 16 x half>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 22 for instruction: %nv16si8_to_f32 = sitofp <vscale x 16 x i8> undef to <vscale x 16 x float>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 22 for instruction: %nv16ui8_to_f32 = uitofp <vscale x 16 x i8> undef to <vscale x 16 x float>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 24 for instruction: %nv16si16_to_f32 = sitofp <vscale x 16 x i16> undef to <vscale x 16 x float>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 24 for instruction: %nv16ui16_to_f32 = uitofp <vscale x 16 x i16> undef to <vscale x 16 x float>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %nv16si32_to_f32 = sitofp <vscale x 16 x i32> undef to <vscale x 16 x float>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %nv16ui32_to_f32 = uitofp <vscale x 16 x i32> undef to <vscale x 16 x float>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 12 for instruction: %nv16si64_to_f32 = sitofp <vscale x 16 x i64> undef to <vscale x 16 x float>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 12 for instruction: %nv16ui64_to_f32 = uitofp <vscale x 16 x i64> undef to <vscale x 16 x float>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 15 for instruction: %nv16si8_to_f64 = sitofp <vscale x 16 x i8> undef to <vscale x 16 x double>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 15 for instruction: %nv16ui8_to_f64 = uitofp <vscale x 16 x i8> undef to <vscale x 16 x double>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 44 for instruction: %nv16si16_to_f64 = sitofp <vscale x 16 x i16> undef to <vscale x 16 x double>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 44 for instruction: %nv16ui16_to_f64 = uitofp <vscale x 16 x i16> undef to <vscale x 16 x double>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 48 for instruction: %nv16si32_to_f64 = sitofp <vscale x 16 x i32> undef to <vscale x 16 x double>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 48 for instruction: %nv16ui32_to_f64 = uitofp <vscale x 16 x i32> undef to <vscale x 16 x double>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %nv16si64_to_f64 = sitofp <vscale x 16 x i64> undef to <vscale x 16 x double>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %nv16ui64_to_f64 = uitofp <vscale x 16 x i64> undef to <vscale x 16 x double>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret void
+;
+
+  %nv1si8_to_f16  = sitofp <vscale x 1 x i8> undef to <vscale x 1 x half>
+  %nv1ui8_to_f16  = uitofp <vscale x 1 x i8> undef to <vscale x 1 x half>
+  %nv1si16_to_f16 = sitofp <vscale x 1 x i16> undef to <vscale x 1 x half>
+  %nv1ui16_to_f16 = uitofp <vscale x 1 x i16> undef to <vscale x 1 x half>
+  %nv1si32_to_f16 = sitofp <vscale x 1 x i32> undef to <vscale x 1 x half>
+  %nv1ui32_to_f16 = uitofp <vscale x 1 x i32> undef to <vscale x 1 x half>
+  %nv1si64_t...
[truncated]

github-actions · 2025-03-11T11:42:44Z

✅ With the latest revision this PR passed the undef deprecator.

sdesmalen-arm · 2025-03-11T11:54:08Z

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp

@@ -3144,6 +3144,21 @@ InstructionCost AArch64TTIImpl::getCastInstrCost(unsigned Opcode, Type *Dst,
      {ISD::SIGN_EXTEND, MVT::nxv8i32, MVT::nxv8i16, 2},
      {ISD::SIGN_EXTEND, MVT::nxv8i64, MVT::nxv8i16, 6},
      {ISD::SIGN_EXTEND, MVT::nxv4i64, MVT::nxv4i32, 2},
+
+      // Add cost for extending and converting to illegal -too wide- scalable
+      // Extending one size (e.g. i32 -> f64) takes 2 unpacks and 2 fcvts, while


I know that the cost-model is a bit of a guessing game, but is there any rationale behind picking a factor of 3? (i.e. why the cost is 12 instead of 4)

The fcvt instructions seem to have 1/2 to 1/8 the throughput (depending on type) compared to simple arithmetic instructions, e.g. add, so I bumped the cost of those. The numbers may not be the best overall, but don't seem to lead to regressions at present. We may want to try a range of values at some point to see if there's a better estimate.

Should the cost of converts of the 'not too wide' types then also be increased to reflect a higher reciprocal cost?
e.g. I see a cost of 1 for:

CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %nv2si16_to_f64 = sitofp <vscale x 2 x i16> undef to <vscale x 2 x double>

We just discussed this offline, but just sharing my thoughts here: IMO the table should represent the cost of casts of legal types. Illegal types should be handled by generic code that multiplies the cost by the 'type legalization cost'. This is actually what happens for fixed-length types (see the code just below the table), but not (yet) for scalable types. Otherwise, any other illegal types that are not in the table (which includes types that cannot be represented by MVTs because they're "too wide") will get some default cost, which may be far too low.

It also seems that SINT_TO_FP records are missing in the table for scalable vector types (only FP_TO_SINT is handled). This is probably just a historical omission because this table gets updated/botched on an ad-hoc basis when people find that the cost is wrong for some workload, for some type and operation. It would be nice to clean this up.

So I've changed the approach slightly to use the pseudo-legalization from the base getCastInstrCost that NEON uses (note the code below the table is about using SVE for fixed-length, so doesn't always apply).

Using this approach, we'll still get some illegal types (e.g. mapping nxv2i16 -> nxv2f64, the input would be promoted to nxv2i64 but that's not done in the current code for NEON), but I'm covering the cases where the destination type is legal.

I've decided to back away from increasing the cost of direct fcvts here – even though they have less throughput than add, the NEON values are not written with that in mind so we might incorrectly decide to favour NEON (or scalar) code.

I'll rerun some benchmarking with these adjusted values to see whether there's any regression from doing this.

davemgreen

The numbers look OK to me for the most part.

Some of the Neon numbers might need to increase after #130665 due to the double-rounding. I believe SVE has native instructions for a lot of the problem cases, so it should hopefully not be so much of an issue there.

(I don't remember if we use undef for a reason that is different to poison or we just didn't change them yet. I usually just ignore that bot).

davemgreen · 2025-03-13T08:59:32Z

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp

+      {ISD::UINT_TO_FP, MVT::nxv4f16, MVT::nxv4i32, 1},
+
+      // SVE: to nxv8f16
+      {ISD::SINT_TO_FP, MVT::nxv8f16, MVT::nxv8i8, 3},


Any reason for 3 as opposed to 2 here? (i'm not sure if the sxtb would disappear if the input was not from an arg too, but it is probably OK to include it).

define <vscale x 8 x half> @test(<vscale x 8 x i8> %a) { %r = sitofp <vscale x 8 x i8> %a to <vscale x 8 x half> ret <vscale x 8 x half> %r } ptrue p0.h sxtb z0.h, p0/m, z0.h scvtf z0.h, p0/m, z0.h ret

davemgreen · 2025-03-13T09:21:00Z

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp

+      // SVE: to nxv2f64
+      {ISD::SINT_TO_FP, MVT::nxv2f64, MVT::nxv2i8, 7},
+      {ISD::SINT_TO_FP, MVT::nxv2f64, MVT::nxv2i16, 5},
+      {ISD::SINT_TO_FP, MVT::nxv2f64, MVT::nxv2i32, 3},


Could this be 1, considering there is a native instruction for it?

IR: define <vscale x 2 x double> @test(<vscale x 2 x i32> %a) { %r = sitofp <vscale x 2 x i32> %a to <vscale x 2 x double> ret <vscale x 2 x double> %r } ptrue p0.d scvtf z0.d, p0/m, z0.s ret

I based the entries on how the corresponding NEON entries are used. <vscale x 2 x i32> is not a legal type, and here it's treated as a packed vector – so rather than interleaving lanes the data is in the first half of the vector. The cost represents the required unpack operations on top of the fcvts themselves.

This isn't great, but it does mostly work with the multiply-by-legalization-factor approach discussed above with Sander.

This work is to address a particular regression in SPEC when max vector bandwidth is enabled, and the cost of a vplan with VF vscale x 8 is considered to be cheaper than a fixed VF of 8 due to the cost of the converts.

In the NEON case, a v8i16 is converted to a v8f64; TTI reaches this function, hits the call to BasicTTIImplBase::getCastInstrCost at the bottom, retries with v4i16 to v4f64, calls the base again and finally finds a match when called with v2i16 (an illegal type) to v2f64. That cost (4) then gets multiplied by the 2 rounds of splitting to give 16, and there's an extra penalty of 3 on top giving a score of 19.

For SVE, it was costed as 1 * 4 (for 2 rounds of splitting) + 3, giving 7. But NEON was able to use 4 tbl instructions, where SVE currently uses 6 unpack instructions. So now the line with a cost of 5 fornxv2i16 to nxv2f64 gives us a total cost of 23, and we now pick the fixed length VF instead, preventing the regression.

The same applies to the nxv2i32 to nxv2f64 case – we're expecting this to come from a conversion of nxv4i32 to nxv4f64, so the cost of the unpacks is bundled in the same way it is for NEON.

I don't particularly like it, but I don't want to overhaul all the existing NEON code here – some of the numbers in the table date back to when the backend was first merged upstream, and I'm not sure how they were derived.

One possible alternative for this current patch would be to have an SVE-specific helper which calculates legalization separately (and comes up with the cost of the unpacks separately from the cost of the fcvt, which would allow us to change that cost in future if we decide to use tbl instructions for SVE as well), then only asks for the cost of a fully legalized fcvt to multiply by the number of registers required. I initially decided against that because I didn't want to reimplement a bunch of logic for legalizing the types, but it would be more accurate.

Would that be preferable?

Okay so if I understand you correctly, the cost you've encoded here includes the cost of possibly (or likely) extracting a sub-vector as part of type legalisation (like extracting a <vscale x 4 x i16> from a <vscale x 8 x i16> source operand for example, when it has to split a <vscale x 8 x float> uitofp <vscale x 8 x 16> %in). Ideally the code in BaseT::getCastInstrCost would add some cost for extracting a subvector when calculating the legalisation cost, but this doesn't really happen anywhere at the moment.

If you want to reflect these costs in this table, can you decompose the cost into a "cost of convert" + "cost of type-legalisation/extract-subvector" in that case?

I've update the code to include two kinds of entry in the table -- one where the destination type is legal, with minimal costs for them. The other is where the destination type is illegal (too wide) but the source type is either legal or narrower than legal and requires splitting, so I've added a penalty cost to them.

I've done that via symbolic constants, so we can change the numbers later without trying to figure out what we were modeling.

The target-independent code still gets involved in some of the cases and seems to work ok (e.g nxv16i8 -> nxv16f64, since there's currently no MVT for the latter type and it has to be split at the EVT level before we can look it up in the table).

david-arm · 2025-03-17T14:05:37Z

llvm/test/Analysis/CostModel/AArch64/sve-cast.ll

-  %r207 = sitofp <4 x i32> undef to <4 x double>
-  %r208 = uitofp <4 x i64> undef to <4 x double>
-  %r209 = sitofp <4 x i64> undef to <4 x double>
+  %r200 = uitofp <4 x i1> poison to <4 x double>


When making cost model changes I've typically avoided converting undef to poison to avoid the github pre-commit errors to make patches easier to review. If you do want to update the cost model tests to use poison I think that's best done as a NFC patch separately?

I've reverted the undef -> poison change for the existing test file, but the new file will use poison. I have left the changes to the duplicated lines though (we had two sets of i16 -> f64 tests and skipped over i32 -> f64; I assumed that wasn't intended.)

…, add wider entries to table

davemgreen

LGTM, if there are no other comments.

davemgreen · 2025-03-20T08:31:01Z

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp

+  // FIXME: Use tbl instructions for SVE as well, at least in cases where the
+  //        conversion is done in a loop.


I would remove this FIXME - mostly as it just doesn't feel like the right place for it and I think if we supported tbl extensions that would only apply to loops, and the cost model should probably be the worst-case of with and without tbl. (There is a chance we want to change that in the future to have something that can cost "invariant" vs "fixed" costs, similar for constants, but for the moment it is probably OK to stick with the higher).

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp

Co-authored-by: Benjamin Maxwell <[email protected]>

[AArch64][CostModel] Increase the cost of illegal SVE int-to-fp converts

8f23b43

If a scalable vector uitofp or sitofp effectively extend the size of each element as part of the conversion, the AArch64 backend will need to plant multiple unpacks before converting.

huntergr-arm requested review from davemgreen, paulwalker-arm and sdesmalen-arm March 11, 2025 11:39

llvmbot added backend:AArch64 llvm:analysis Includes value tracking, cost tables and constant folding labels Mar 11, 2025

sdesmalen-arm reviewed Mar 11, 2025

View reviewed changes

huntergr-arm added 2 commits March 12, 2025 12:10

Use pseudo-legalization from base getCastInstrCost

55541fb

Use poison instead of undef for tests

dc0174d

davemgreen reviewed Mar 13, 2025

View reviewed changes

david-arm reviewed Mar 17, 2025

View reviewed changes

Revert poison change for existing tests, introduce symbolic constants…

8d0fef2

…, add wider entries to table

davemgreen approved these changes Mar 20, 2025

View reviewed changes

MacDue reviewed Mar 24, 2025

View reviewed changes

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp Outdated Show resolved Hide resolved

Update llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp

29c7fff

Co-authored-by: Benjamin Maxwell <[email protected]>

huntergr-arm merged commit f737df7 into llvm:main Mar 25, 2025
11 checks passed

huntergr-arm deleted the sve-wide-int-to-fp-cost branch March 26, 2025 11:36

		// FIXME: Use tbl instructions for SVE as well, at least in cases where the
		// conversion is done in a loop.

[AArch64][CostModel] Increase the cost of illegal SVE int-to-fp converts #130756

[AArch64][CostModel] Increase the cost of illegal SVE int-to-fp converts #130756

Uh oh!

Conversation

huntergr-arm commented Mar 11, 2025

Uh oh!

llvmbot commented Mar 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

davemgreen left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

davemgreen left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

llvmbot commented Mar 11, 2025 •

edited

Loading

github-actions bot commented Mar 11, 2025 •

edited

Loading