[TLI] replace-with-veclib works with FRem Instruction. #76166

paschalis-mpeis · 2023-12-21T16:32:05Z

Updated SLEEF and ArmPL tests with Fixed-Width and Scalable cases for frem.
Those are mapped to fmod/fmodf.

llvmbot · 2023-12-21T16:32:32Z

@llvm/pr-subscribers-backend-aarch64

Author: Paschalis Mpeis (paschalis-mpeis)

Changes

Updated SLEEF and ArmPL tests with Fixed-Width and Scalable cases for frem.
Those are mapped to fmod/fmodf.

Full diff: https://github.com/llvm/llvm-project/pull/76166.diff

4 Files Affected:

(modified) llvm/lib/CodeGen/ReplaceWithVeclib.cpp (+70-52)
(modified) llvm/test/CodeGen/AArch64/replace-intrinsics-with-veclib-armpl.ll (+41-1)
(modified) llvm/test/CodeGen/AArch64/replace-intrinsics-with-veclib-sleef-scalable.ll (+19-1)
(modified) llvm/test/CodeGen/AArch64/replace-intrinsics-with-veclib-sleef.ll (+19-1)

diff --git a/llvm/lib/CodeGen/ReplaceWithVeclib.cpp b/llvm/lib/CodeGen/ReplaceWithVeclib.cpp
index 893aa4a91828d3..e3ba9e3c0c3fa3 100644
--- a/llvm/lib/CodeGen/ReplaceWithVeclib.cpp
+++ b/llvm/lib/CodeGen/ReplaceWithVeclib.cpp
@@ -69,52 +69,57 @@ Function *getTLIFunction(Module *M, FunctionType *VectorFTy,
   return TLIFunc;
 }
 
-/// Replace the call to the vector intrinsic ( \p CalltoReplace ) with a call to
-/// the corresponding function from the vector library ( \p TLIVecFunc ).
-static void replaceWithTLIFunction(CallInst &CalltoReplace, VFInfo &Info,
+/// Replace the Instruction \p I, that may be a vector intrinsic CallInst or
+/// the frem instruction,  with a call to the corresponding function from the
+/// vector library ( \p TLIVecFunc ).
+static void replaceWithTLIFunction(Instruction &I, VFInfo &Info,
                                    Function *TLIVecFunc) {
-  IRBuilder<> IRBuilder(&CalltoReplace);
-  SmallVector<Value *> Args(CalltoReplace.args());
+  IRBuilder<> IRBuilder(&I);
+  auto *CI = dyn_cast<CallInst>(&I);
+  SmallVector<Value *> Args(CI ? CI->args() : I.operands());
   if (auto OptMaskpos = Info.getParamIndexForOptionalMask()) {
-    auto *MaskTy = VectorType::get(Type::getInt1Ty(CalltoReplace.getContext()),
-                                   Info.Shape.VF);
+    auto *MaskTy =
+        VectorType::get(Type::getInt1Ty(I.getContext()), Info.Shape.VF);
     Args.insert(Args.begin() + OptMaskpos.value(),
                 Constant::getAllOnesValue(MaskTy));
   }
 
-  // Preserve the operand bundles.
+  // Preserve the operand bundles for CallInsts.
   SmallVector<OperandBundleDef, 1> OpBundles;
-  CalltoReplace.getOperandBundlesAsDefs(OpBundles);
+  if (CI)
+    CI->getOperandBundlesAsDefs(OpBundles);
+
   CallInst *Replacement = IRBuilder.CreateCall(TLIVecFunc, Args, OpBundles);
-  CalltoReplace.replaceAllUsesWith(Replacement);
+  I.replaceAllUsesWith(Replacement);
   // Preserve fast math flags for FP math.
   if (isa<FPMathOperator>(Replacement))
-    Replacement->copyFastMathFlags(&CalltoReplace);
+    Replacement->copyFastMathFlags(&I);
 }
 
-/// Returns true when successfully replaced \p CallToReplace with a suitable
-/// function taking vector arguments, based on available mappings in the \p TLI.
-/// Currently only works when \p CallToReplace is a call to vectorized
-/// intrinsic.
+/// Returns true when successfully replaced \p I with a suitable function taking
+/// vector arguments, based on available mappings in the \p TLI. Currently only
+/// works when \p I is a call to vectorized intrinsic or the FRem Instruction.
 static bool replaceWithCallToVeclib(const TargetLibraryInfo &TLI,
-                                    CallInst &CallToReplace) {
-  if (!CallToReplace.getCalledFunction())
-    return false;
-
-  auto IntrinsicID = CallToReplace.getCalledFunction()->getIntrinsicID();
-  // Replacement is only performed for intrinsic functions.
-  if (IntrinsicID == Intrinsic::not_intrinsic)
-    return false;
-
+                                    Instruction &I) {
+  CallInst *CI = dyn_cast<CallInst>(&I);
+  Intrinsic::ID IID = Intrinsic::not_intrinsic;
+  if (CI)
+    IID = CI->getCalledFunction()->getIntrinsicID();
   // Compute arguments types of the corresponding scalar call. Additionally
   // checks if in the vector call, all vector operands have the same EC.
   ElementCount VF = ElementCount::getFixed(0);
-  SmallVector<Type *> ScalarArgTypes;
-  for (auto Arg : enumerate(CallToReplace.args())) {
+  SmallVector<Type *, 8> ScalarArgTypes;
+  for (auto Arg : enumerate(CI ? CI->args() : I.operands())) {
     auto *ArgTy = Arg.value()->getType();
-    if (isVectorIntrinsicWithScalarOpAtArg(IntrinsicID, Arg.index())) {
+    if (CI && isVectorIntrinsicWithScalarOpAtArg(IID, Arg.index())) {
       ScalarArgTypes.push_back(ArgTy);
-    } else if (auto *VectorArgTy = dyn_cast<VectorType>(ArgTy)) {
+    } else {
+      auto *VectorArgTy = dyn_cast<VectorType>(ArgTy);
+      // We are expecting only VectorTypes, as:
+      // - with a CallInst, scalar operands are handled earlier
+      // - with the FRem Instruction, both operands must be vectors.
+      if (!VectorArgTy)
+        return false;
       ScalarArgTypes.push_back(ArgTy->getScalarType());
       // Disallow vector arguments with different VFs. When processing the first
       // vector argument, store it's VF, and for the rest ensure that they match
@@ -123,18 +128,22 @@ static bool replaceWithCallToVeclib(const TargetLibraryInfo &TLI,
         VF = VectorArgTy->getElementCount();
       else if (VF != VectorArgTy->getElementCount())
         return false;
-    } else
-      // Exit when it is supposed to be a vector argument but it isn't.
-      return false;
+    }
   }
 
-  // Try to reconstruct the name for the scalar version of this intrinsic using
-  // the intrinsic ID and the argument types converted to scalar above.
-  std::string ScalarName =
-      (Intrinsic::isOverloaded(IntrinsicID)
-           ? Intrinsic::getName(IntrinsicID, ScalarArgTypes,
-                                CallToReplace.getModule())
-           : Intrinsic::getName(IntrinsicID).str());
+  // Try to reconstruct the name for the scalar version of the instruction.
+  std::string ScalarName;
+  if (CI) {
+    // For intrinsics, use scalar argument types
+    ScalarName = Intrinsic::isOverloaded(IID)
+                     ? Intrinsic::getName(IID, ScalarArgTypes, I.getModule())
+                     : Intrinsic::getName(IID).str();
+  } else {
+    LibFunc Func;
+    if (!TLI.getLibFunc(I.getOpcode(), I.getType()->getScalarType(), Func))
+      return false;
+    ScalarName = TLI.getName(Func);
+  }
 
   // Try to find the mapping for the scalar version of this intrinsic and the
   // exact vector width of the call operands in the TargetLibraryInfo. First,
@@ -150,7 +159,7 @@ static bool replaceWithCallToVeclib(const TargetLibraryInfo &TLI,
 
   // Replace the call to the intrinsic with a call to the vector library
   // function.
-  Type *ScalarRetTy = CallToReplace.getType()->getScalarType();
+  Type *ScalarRetTy = I.getType()->getScalarType();
   FunctionType *ScalarFTy =
       FunctionType::get(ScalarRetTy, ScalarArgTypes, /*isVarArg*/ false);
   const std::string MangledName = VD->getVectorFunctionABIVariantString();
@@ -162,27 +171,36 @@ static bool replaceWithCallToVeclib(const TargetLibraryInfo &TLI,
   if (!VectorFTy)
     return false;
 
-  Function *FuncToReplace = CallToReplace.getCalledFunction();
-  Function *TLIFunc = getTLIFunction(CallToReplace.getModule(), VectorFTy,
+  Function *FuncToReplace = CI ? CI->getCalledFunction() : nullptr;
+  Function *TLIFunc = getTLIFunction(I.getModule(), VectorFTy,
                                      VD->getVectorFnName(), FuncToReplace);
-  replaceWithTLIFunction(CallToReplace, *OptInfo, TLIFunc);
-
-  LLVM_DEBUG(dbgs() << DEBUG_TYPE << ": Replaced call to `"
-                    << FuncToReplace->getName() << "` with call to `"
-                    << TLIFunc->getName() << "`.\n");
+  replaceWithTLIFunction(I, *OptInfo, TLIFunc);
+  LLVM_DEBUG(dbgs() << DEBUG_TYPE << ": Replaced call to `" << ScalarName
+                    << "` with call to `" << TLIFunc->getName() << "`.\n");
   ++NumCallsReplaced;
   return true;
 }
 
+/// Supported Instructions \p I are either FRem or CallInsts to Intrinsics.
+static bool isSupportedInstruction(Instruction *I) {
+  if (auto *CI = dyn_cast<CallInst>(I)) {
+    if (!CI->getCalledFunction())
+      return false;
+    if (CI->getCalledFunction()->getIntrinsicID() == Intrinsic::not_intrinsic)
+      return false;
+  } else if (I->getOpcode() != Instruction::FRem)
+    return false;
+
+  return true;
+}
+
 static bool runImpl(const TargetLibraryInfo &TLI, Function &F) {
   bool Changed = false;
-  SmallVector<CallInst *> ReplacedCalls;
+  SmallVector<Instruction *> ReplacedCalls;
   for (auto &I : instructions(F)) {
-    if (auto *CI = dyn_cast<CallInst>(&I)) {
-      if (replaceWithCallToVeclib(TLI, *CI)) {
-        ReplacedCalls.push_back(CI);
-        Changed = true;
-      }
+    if (isSupportedInstruction(&I) && replaceWithCallToVeclib(TLI, I)) {
+      ReplacedCalls.push_back(&I);
+      Changed = true;
     }
   }
   // Erase the calls to the intrinsics that have been replaced
diff --git a/llvm/test/CodeGen/AArch64/replace-intrinsics-with-veclib-armpl.ll b/llvm/test/CodeGen/AArch64/replace-intrinsics-with-veclib-armpl.ll
index d41870ec6e7915..4480a90a2728d3 100644
--- a/llvm/test/CodeGen/AArch64/replace-intrinsics-with-veclib-armpl.ll
+++ b/llvm/test/CodeGen/AArch64/replace-intrinsics-with-veclib-armpl.ll
@@ -15,7 +15,7 @@ declare <vscale x 2 x double> @llvm.cos.nxv2f64(<vscale x 2 x double>)
 declare <vscale x 4 x float> @llvm.cos.nxv4f32(<vscale x 4 x float>)
 
 ;.
-; CHECK: @llvm.compiler.used = appending global [32 x ptr] [ptr @armpl_vcosq_f64, ptr @armpl_vcosq_f32, ptr @armpl_svcos_f64_x, ptr @armpl_svcos_f32_x, ptr @armpl_vsinq_f64, ptr @armpl_vsinq_f32, ptr @armpl_svsin_f64_x, ptr @armpl_svsin_f32_x, ptr @armpl_vexpq_f64, ptr @armpl_vexpq_f32, ptr @armpl_svexp_f64_x, ptr @armpl_svexp_f32_x, ptr @armpl_vexp2q_f64, ptr @armpl_vexp2q_f32, ptr @armpl_svexp2_f64_x, ptr @armpl_svexp2_f32_x, ptr @armpl_vexp10q_f64, ptr @armpl_vexp10q_f32, ptr @armpl_svexp10_f64_x, ptr @armpl_svexp10_f32_x, ptr @armpl_vlogq_f64, ptr @armpl_vlogq_f32, ptr @armpl_svlog_f64_x, ptr @armpl_svlog_f32_x, ptr @armpl_vlog2q_f64, ptr @armpl_vlog2q_f32, ptr @armpl_svlog2_f64_x, ptr @armpl_svlog2_f32_x, ptr @armpl_vlog10q_f64, ptr @armpl_vlog10q_f32, ptr @armpl_svlog10_f64_x, ptr @armpl_svlog10_f32_x], section "llvm.metadata"
+; CHECK: @llvm.compiler.used = appending global [36 x ptr] [ptr @armpl_vcosq_f64, ptr @armpl_vcosq_f32, ptr @armpl_svcos_f64_x, ptr @armpl_svcos_f32_x, ptr @armpl_vsinq_f64, ptr @armpl_vsinq_f32, ptr @armpl_svsin_f64_x, ptr @armpl_svsin_f32_x, ptr @armpl_vexpq_f64, ptr @armpl_vexpq_f32, ptr @armpl_svexp_f64_x, ptr @armpl_svexp_f32_x, ptr @armpl_vexp2q_f64, ptr @armpl_vexp2q_f32, ptr @armpl_svexp2_f64_x, ptr @armpl_svexp2_f32_x, ptr @armpl_vexp10q_f64, ptr @armpl_vexp10q_f32, ptr @armpl_svexp10_f64_x, ptr @armpl_svexp10_f32_x, ptr @armpl_vlogq_f64, ptr @armpl_vlogq_f32, ptr @armpl_svlog_f64_x, ptr @armpl_svlog_f32_x, ptr @armpl_vlog2q_f64, ptr @armpl_vlog2q_f32, ptr @armpl_svlog2_f64_x, ptr @armpl_svlog2_f32_x, ptr @armpl_vlog10q_f64, ptr @armpl_vlog10q_f32, ptr @armpl_svlog10_f64_x, ptr @armpl_svlog10_f32_x, ptr @armpl_vfmodq_f64, ptr @armpl_vfmodq_f32, ptr @armpl_svfmod_f64_x, ptr @armpl_svfmod_f32_x], section "llvm.metadata"
 ;.
 define <2 x double> @llvm_cos_f64(<2 x double> %in) {
 ; CHECK-LABEL: define <2 x double> @llvm_cos_f64
@@ -424,6 +424,46 @@ define <vscale x 4 x float> @llvm_pow_vscale_f32(<vscale x 4 x float> %in, <vsca
   ret <vscale x 4 x float> %1
 }
 
+define <2 x double> @frem_f64(<2 x double> %in) {
+; CHECK-LABEL: define <2 x double> @frem_f64
+; CHECK-SAME: (<2 x double> [[IN:%.*]]) {
+; CHECK-NEXT:    [[TMP1:%.*]] = call <2 x double> @armpl_vfmodq_f64(<2 x double> [[IN]], <2 x double> [[IN]])
+; CHECK-NEXT:    ret <2 x double> [[TMP1]]
+;
+  %1= frem <2 x double> %in, %in
+  ret <2 x double> %1
+}
+
+define <4 x float> @frem_f32(<4 x float> %in) {
+; CHECK-LABEL: define <4 x float> @frem_f32
+; CHECK-SAME: (<4 x float> [[IN:%.*]]) {
+; CHECK-NEXT:    [[TMP1:%.*]] = call <4 x float> @armpl_vfmodq_f32(<4 x float> [[IN]], <4 x float> [[IN]])
+; CHECK-NEXT:    ret <4 x float> [[TMP1]]
+;
+  %1= frem <4 x float> %in, %in
+  ret <4 x float> %1
+}
+
+define <vscale x 2 x double> @frem_vscale_f64(<vscale x 2 x double> %in) #0 {
+; CHECK-LABEL: define <vscale x 2 x double> @frem_vscale_f64
+; CHECK-SAME: (<vscale x 2 x double> [[IN:%.*]]) #[[ATTR1]] {
+; CHECK-NEXT:    [[TMP1:%.*]] = call <vscale x 2 x double> @armpl_svfmod_f64_x(<vscale x 2 x double> [[IN]], <vscale x 2 x double> [[IN]], <vscale x 2 x i1> shufflevector (<vscale x 2 x i1> insertelement (<vscale x 2 x i1> poison, i1 true, i64 0), <vscale x 2 x i1> poison, <vscale x 2 x i32> zeroinitializer))
+; CHECK-NEXT:    ret <vscale x 2 x double> [[TMP1]]
+;
+  %1= frem <vscale x 2 x double> %in, %in
+  ret <vscale x 2 x double> %1
+}
+
+define <vscale x 4 x float> @frem_vscale_f32(<vscale x 4 x float> %in) #0 {
+; CHECK-LABEL: define <vscale x 4 x float> @frem_vscale_f32
+; CHECK-SAME: (<vscale x 4 x float> [[IN:%.*]]) #[[ATTR1]] {
+; CHECK-NEXT:    [[TMP1:%.*]] = call <vscale x 4 x float> @armpl_svfmod_f32_x(<vscale x 4 x float> [[IN]], <vscale x 4 x float> [[IN]], <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer))
+; CHECK-NEXT:    ret <vscale x 4 x float> [[TMP1]]
+;
+  %1= frem <vscale x 4 x float> %in, %in
+  ret <vscale x 4 x float> %1
+}
+
 attributes #0 = { "target-features"="+sve" }
 ;.
 ; CHECK: attributes #[[ATTR0:[0-9]+]] = { nocallback nofree nosync nounwind speculatable willreturn memory(none) }
diff --git a/llvm/test/CodeGen/AArch64/replace-intrinsics-with-veclib-sleef-scalable.ll b/llvm/test/CodeGen/AArch64/replace-intrinsics-with-veclib-sleef-scalable.ll
index c2ff6014bc6944..590dd9effac0ea 100644
--- a/llvm/test/CodeGen/AArch64/replace-intrinsics-with-veclib-sleef-scalable.ll
+++ b/llvm/test/CodeGen/AArch64/replace-intrinsics-with-veclib-sleef-scalable.ll
@@ -4,7 +4,7 @@
 target triple = "aarch64-unknown-linux-gnu"
 
 ;.
-; CHECK: @llvm.compiler.used = appending global [16 x ptr] [ptr @_ZGVsMxv_cos, ptr @_ZGVsMxv_cosf, ptr @_ZGVsMxv_exp, ptr @_ZGVsMxv_expf, ptr @_ZGVsMxv_exp2, ptr @_ZGVsMxv_exp2f, ptr @_ZGVsMxv_exp10, ptr @_ZGVsMxv_exp10f, ptr @_ZGVsMxv_log, ptr @_ZGVsMxv_logf, ptr @_ZGVsMxv_log10, ptr @_ZGVsMxv_log10f, ptr @_ZGVsMxv_log2, ptr @_ZGVsMxv_log2f, ptr @_ZGVsMxv_sin, ptr @_ZGVsMxv_sinf], section "llvm.metadata"
+; CHECK: @llvm.compiler.used = appending global [18 x ptr] [ptr @_ZGVsMxv_cos, ptr @_ZGVsMxv_cosf, ptr @_ZGVsMxv_exp, ptr @_ZGVsMxv_expf, ptr @_ZGVsMxv_exp2, ptr @_ZGVsMxv_exp2f, ptr @_ZGVsMxv_exp10, ptr @_ZGVsMxv_exp10f, ptr @_ZGVsMxv_log, ptr @_ZGVsMxv_logf, ptr @_ZGVsMxv_log10, ptr @_ZGVsMxv_log10f, ptr @_ZGVsMxv_log2, ptr @_ZGVsMxv_log2f, ptr @_ZGVsMxv_sin, ptr @_ZGVsMxv_sinf, ptr @_ZGVsMxvv_fmod, ptr @_ZGVsMxvv_fmodf], section "llvm.metadata"
 ;.
 define <vscale x 2 x double> @llvm_ceil_vscale_f64(<vscale x 2 x double> %in) {
 ; CHECK-LABEL: @llvm_ceil_vscale_f64(
@@ -384,6 +384,24 @@ define <vscale x 4 x float> @llvm_trunc_vscale_f32(<vscale x 4 x float> %in) {
   ret <vscale x 4 x float> %1
 }
 
+define <vscale x 2 x double> @frem_f64(<vscale x 2 x double> %in) {
+; CHECK-LABEL: @frem_f64(
+; CHECK-NEXT:    [[TMP1:%.*]] = call <vscale x 2 x double> @_ZGVsMxvv_fmod(<vscale x 2 x double> [[IN:%.*]], <vscale x 2 x double> [[IN]], <vscale x 2 x i1> shufflevector (<vscale x 2 x i1> insertelement (<vscale x 2 x i1> poison, i1 true, i64 0), <vscale x 2 x i1> poison, <vscale x 2 x i32> zeroinitializer))
+; CHECK-NEXT:    ret <vscale x 2 x double> [[TMP1]]
+;
+  %1= frem <vscale x 2 x double> %in, %in
+  ret <vscale x 2 x double> %1
+}
+
+define <vscale x 4 x float> @frem_f32(<vscale x 4 x float> %in) {
+; CHECK-LABEL: @frem_f32(
+; CHECK-NEXT:    [[TMP1:%.*]] = call <vscale x 4 x float> @_ZGVsMxvv_fmodf(<vscale x 4 x float> [[IN:%.*]], <vscale x 4 x float> [[IN]], <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer))
+; CHECK-NEXT:    ret <vscale x 4 x float> [[TMP1]]
+;
+  %1= frem <vscale x 4 x float> %in, %in
+  ret <vscale x 4 x float> %1
+}
+
 declare <vscale x 2 x double> @llvm.ceil.nxv2f64(<vscale x 2 x double>)
 declare <vscale x 4 x float> @llvm.ceil.nxv4f32(<vscale x 4 x float>)
 declare <vscale x 2 x double> @llvm.copysign.nxv2f64(<vscale x 2 x double>, <vscale x 2 x double>)
diff --git a/llvm/test/CodeGen/AArch64/replace-intrinsics-with-veclib-sleef.ll b/llvm/test/CodeGen/AArch64/replace-intrinsics-with-veclib-sleef.ll
index be247de368056e..865a46009b205f 100644
--- a/llvm/test/CodeGen/AArch64/replace-intrinsics-with-veclib-sleef.ll
+++ b/llvm/test/CodeGen/AArch64/replace-intrinsics-with-veclib-sleef.ll
@@ -4,7 +4,7 @@
 target triple = "aarch64-unknown-linux-gnu"
 
 ;.
-; CHECK: @llvm.compiler.used = appending global [16 x ptr] [ptr @_ZGVnN2v_cos, ptr @_ZGVnN4v_cosf, ptr @_ZGVnN2v_exp, ptr @_ZGVnN4v_expf, ptr @_ZGVnN2v_exp2, ptr @_ZGVnN4v_exp2f, ptr @_ZGVnN2v_exp10, ptr @_ZGVnN4v_exp10f, ptr @_ZGVnN2v_log, ptr @_ZGVnN4v_logf, ptr @_ZGVnN2v_log10, ptr @_ZGVnN4v_log10f, ptr @_ZGVnN2v_log2, ptr @_ZGVnN4v_log2f, ptr @_ZGVnN2v_sin, ptr @_ZGVnN4v_sinf], section "llvm.metadata"
+; CHECK: @llvm.compiler.used = appending global [18 x ptr] [ptr @_ZGVnN2v_cos, ptr @_ZGVnN4v_cosf, ptr @_ZGVnN2v_exp, ptr @_ZGVnN4v_expf, ptr @_ZGVnN2v_exp2, ptr @_ZGVnN4v_exp2f, ptr @_ZGVnN2v_exp10, ptr @_ZGVnN4v_exp10f, ptr @_ZGVnN2v_log, ptr @_ZGVnN4v_logf, ptr @_ZGVnN2v_log10, ptr @_ZGVnN4v_log10f, ptr @_ZGVnN2v_log2, ptr @_ZGVnN4v_log2f, ptr @_ZGVnN2v_sin, ptr @_ZGVnN4v_sinf, ptr @_ZGVnN2vv_fmod, ptr @_ZGVnN4vv_fmodf], section "llvm.metadata"
 ;.
 define <2 x double> @llvm_ceil_f64(<2 x double> %in) {
 ; CHECK-LABEL: @llvm_ceil_f64(
@@ -384,6 +384,24 @@ define <4 x float> @llvm_trunc_f32(<4 x float> %in) {
   ret <4 x float> %1
 }
 
+define <2 x double> @frem_f64(<2 x double> %in) {
+; CHECK-LABEL: @frem_f64(
+; CHECK-NEXT:    [[TMP1:%.*]] = call <2 x double> @_ZGVnN2vv_fmod(<2 x double> [[IN:%.*]], <2 x double> [[IN]])
+; CHECK-NEXT:    ret <2 x double> [[TMP1]]
+;
+  %1= frem <2 x double> %in, %in
+  ret <2 x double> %1
+}
+
+define <4 x float> @frem_f32(<4 x float> %in) {
+; CHECK-LABEL: @frem_f32(
+; CHECK-NEXT:    [[TMP1:%.*]] = call <4 x float> @_ZGVnN4vv_fmodf(<4 x float> [[IN:%.*]], <4 x float> [[IN]])
+; CHECK-NEXT:    ret <4 x float> [[TMP1]]
+;
+  %1= frem <4 x float> %in, %in
+  ret <4 x float> %1
+}
+
 declare <2 x double> @llvm.ceil.v2f64(<2 x double>)
 declare <4 x float> @llvm.ceil.v4f32(<4 x float>)
 declare <2 x double> @llvm.copysign.v2f64(<2 x double>, <2 x double>)

llvm/lib/CodeGen/ReplaceWithVeclib.cpp

llvm/test/CodeGen/AArch64/replace-intrinsics-with-veclib-sleef-scalable.ll

llvm/lib/CodeGen/ReplaceWithVeclib.cpp

labrinea · 2024-01-03T15:27:51Z

llvm/lib/CodeGen/ReplaceWithVeclib.cpp

+                                    Instruction &I) {
+  std::string ScalarName;
+  ElementCount EC = ElementCount::getFixed(0);
+  CallInst *CI = dyn_cast<CallInst>(&I);


Is it not preferable to dyn_cast to IntrinsicInst, like in isSupportedInstruction?

Indeed, all tests for this pass are using an IntrinsicInst and not a CallInst.

If there are no valid cases where a CallInst is ever preferred, this could change.
Maybe there are other places (related to the VFABI and its users) that could change as well.

For now, I would suggest keeping CallInst in all places as it was, and upon further investigation address this in a future PR.

llvm/lib/CodeGen/ReplaceWithVeclib.cpp

labrinea

Looks good now. We can reconsider replacing CallInst with IntrinsicInst later.

llvm/lib/CodeGen/ReplaceWithVeclib.cpp

Updated SLEEF and ArmPL tests with Fixed-Width and Scalable cases for frem. Those are mapped to fmod/fmodf.

One handles CallInst and the other the frem instruction.

paschalis-mpeis · 2024-01-05T12:16:35Z

Force pushed to rebase to main. Ensured no breaking changes with newly merged VFABI changes.

mgabka

thanks for all the changes, it LGTM now!

llvm/lib/CodeGen/ReplaceWithVeclib.cpp

Updated SLEEF and ArmPL tests with Fixed-Width and Scalable cases for frem. Those are mapped to fmod/fmodf.

llvmbot added the backend:AArch64 label Dec 21, 2023

paschalis-mpeis requested review from mgabka, paulwalker-arm and labrinea December 21, 2023 16:36

paschalis-mpeis mentioned this pull request Dec 22, 2023

[LV][AArch64] LoopVectorizer allows scalable frem instructions #76247

Merged

mgabka reviewed Dec 22, 2023

View reviewed changes

mgabka reviewed Dec 28, 2023

View reviewed changes

labrinea reviewed Jan 3, 2024

View reviewed changes

llvm/lib/CodeGen/ReplaceWithVeclib.cpp Show resolved Hide resolved

labrinea reviewed Jan 3, 2024

View reviewed changes

llvm/lib/CodeGen/ReplaceWithVeclib.cpp Outdated Show resolved Hide resolved

labrinea reviewed Jan 3, 2024

View reviewed changes

llvm/lib/CodeGen/ReplaceWithVeclib.cpp Outdated Show resolved Hide resolved

labrinea approved these changes Jan 4, 2024

View reviewed changes

mgabka reviewed Jan 4, 2024

View reviewed changes

paschalis-mpeis added 5 commits January 5, 2024 11:20

[TLI] replace-with-veclib works with FRem Instruction.

da179eb

Updated SLEEF and ArmPL tests with Fixed-Width and Scalable cases for frem. Those are mapped to fmod/fmodf.

Split replaceWithCallToVeclib to two blocks

0daec94

One handles CallInst and the other the frem instruction.

Addressing reviewers.

9523a0a

Addressing reviewers (2)

285279b

Better handling of ElementCount

4b6ed67

paschalis-mpeis force-pushed the users/paschalis-mpeis/tli-veclib-frem branch from c226cb5 to 4b6ed67 Compare January 5, 2024 12:14

mgabka approved these changes Jan 5, 2024

View reviewed changes

llvm/lib/CodeGen/ReplaceWithVeclib.cpp Show resolved Hide resolved

Addressing reviewers (3)

31f6608

paschalis-mpeis mentioned this pull request Jan 5, 2024

[TLI] Fix replace-with-veclib crash with invalid arguments #77112

Merged

paschalis-mpeis merged commit 68a1583 into main Jan 8, 2024

paschalis-mpeis deleted the users/paschalis-mpeis/tli-veclib-frem branch January 8, 2024 08:53

justinfargnoli pushed a commit to justinfargnoli/llvm-project that referenced this pull request Jan 28, 2024

[TLI] replace-with-veclib works with FRem Instruction. (llvm#76166)

ea4924a

Updated SLEEF and ArmPL tests with Fixed-Width and Scalable cases for frem. Those are mapped to fmod/fmodf.

[TLI] replace-with-veclib works with FRem Instruction. #76166

[TLI] replace-with-veclib works with FRem Instruction. #76166

Uh oh!

Conversation

paschalis-mpeis commented Dec 21, 2023

Uh oh!

llvmbot commented Dec 21, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

labrinea Jan 3, 2024

Choose a reason for hiding this comment

Uh oh!

paschalis-mpeis Jan 3, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

labrinea left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

paschalis-mpeis commented Jan 5, 2024

Uh oh!

mgabka left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!