[AARCH64][Neon] switch to using bitcasts in arm_neon.h where appropriate #127043

Lukacma · 2025-02-13T11:16:40Z

Currently arm_neon.h emits C-style casts to do vector type casts. This relies on implicit conversion between vector types to be enabled, which is currently deprecated behaviour and soon will disappear. To ensure NEON code will keep working afterwards, this patch changes all this vector type casts into bitcasts.

Co-authored-by: Momchil Velikov [email protected]

llvmbot · 2025-02-13T11:17:15Z

@llvm/pr-subscribers-clang

@llvm/pr-subscribers-backend-aarch64

Author: None (Lukacma)

Changes

Currently arm_neon.h emits C-style casts to do vector type casts. This relies on implicit conversion between vector types to be enabled, which is currently deprecated behaviour and soon will disappear. To ensure NEON code will keep working afterwards, this patch changes all this vector type casts into bitcasts.

Co-authored-by: Momchil Velikov <[email protected]>

Patch is 6.96 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/127043.diff

48 Files Affected:

(modified) clang/include/clang/Basic/TargetBuiltins.h (+4)
(modified) clang/include/clang/Basic/arm_neon.td (+34-34)
(modified) clang/lib/CodeGen/CGBuiltin.cpp (+66-36)
(modified) clang/lib/CodeGen/CodeGenFunction.h (+4-4)
(modified) clang/test/CodeGen/AArch64/bf16-dotprod-intrinsics.c (+236-148)
(modified) clang/test/CodeGen/AArch64/bf16-getset-intrinsics.c (+17-13)
(modified) clang/test/CodeGen/AArch64/bf16-reinterpret-intrinsics.c (+266-186)
(modified) clang/test/CodeGen/AArch64/fp8-intrinsics/acle_neon_fp8_cvt.c (+30-14)
(modified) clang/test/CodeGen/AArch64/fp8-intrinsics/acle_neon_fp8_fdot.c (+50-34)
(modified) clang/test/CodeGen/AArch64/fp8-intrinsics/acle_neon_fp8_fmla.c (+50-34)
(modified) clang/test/CodeGen/AArch64/fp8-intrinsics/acle_neon_fp8_reinterpret.c (+96-62)
(modified) clang/test/CodeGen/AArch64/neon-2velem.c (+1232-594)
(modified) clang/test/CodeGen/AArch64/neon-extract.c (+228-145)
(modified) clang/test/CodeGen/AArch64/neon-fma.c (+87-59)
(modified) clang/test/CodeGen/AArch64/neon-fp16fml.c (+593-833)
(modified) clang/test/CodeGen/AArch64/neon-intrinsics-constrained.c (+1409-453)
(modified) clang/test/CodeGen/AArch64/neon-intrinsics.c (+16202-10053)
(modified) clang/test/CodeGen/AArch64/neon-ldst-one-rcpc3.c (+23-17)
(modified) clang/test/CodeGen/AArch64/neon-ldst-one.c (+3870-4665)
(modified) clang/test/CodeGen/AArch64/neon-misc-constrained.c (+78-33)
(modified) clang/test/CodeGen/AArch64/neon-misc.c (+2734-1396)
(modified) clang/test/CodeGen/AArch64/neon-perm.c (+1670-1207)
(modified) clang/test/CodeGen/AArch64/neon-scalar-x-indexed-elem-constrained.c (+219-89)
(modified) clang/test/CodeGen/AArch64/neon-scalar-x-indexed-elem.c (+401-252)
(modified) clang/test/CodeGen/AArch64/neon-vcmla.c (+889-425)
(modified) clang/test/CodeGen/AArch64/poly-add.c (+1-1)
(modified) clang/test/CodeGen/AArch64/poly128.c (+28-28)
(modified) clang/test/CodeGen/AArch64/poly64.c (+443-338)
(modified) clang/test/CodeGen/AArch64/v8.1a-neon-intrinsics.c (+81-17)
(modified) clang/test/CodeGen/AArch64/v8.2a-neon-intrinsics-constrained.c (+669-233)
(modified) clang/test/CodeGen/AArch64/v8.2a-neon-intrinsics-generic.c (+154-134)
(modified) clang/test/CodeGen/AArch64/v8.2a-neon-intrinsics.c (+773-411)
(modified) clang/test/CodeGen/AArch64/v8.5a-neon-frint3264-intrinsic.c (+202-49)
(modified) clang/test/CodeGen/AArch64/v8.6a-neon-intrinsics.c (+145-87)
(modified) clang/test/CodeGen/arm-bf16-dotprod-intrinsics.c (+237-149)
(modified) clang/test/CodeGen/arm-bf16-getset-intrinsics.c (+18-14)
(modified) clang/test/CodeGen/arm-neon-directed-rounding.c (+285-62)
(modified) clang/test/CodeGen/arm-neon-fma.c (+45-21)
(modified) clang/test/CodeGen/arm-neon-numeric-maxmin.c (+43-19)
(modified) clang/test/CodeGen/arm-neon-vcvtX.c (+73-41)
(modified) clang/test/CodeGen/arm-neon-vst.c (+2443-1695)
(modified) clang/test/CodeGen/arm64-vrnd-constrained.c (+193-26)
(modified) clang/test/CodeGen/arm64-vrnd.c (+115-6)
(modified) clang/test/CodeGen/arm64_vcreate.c (+18-3)
(modified) clang/test/CodeGen/arm64_vdupq_n_f64.c (+58-38)
(modified) clang/test/CodeGen/arm_neon_intrinsics.c (+19524-12225)
(modified) clang/utils/TableGen/NeonEmitter.cpp (+17-11)
(added) llvm/test/CodeGen/AArch64/v8.2a-neon-intrinsics-constrained.ll (+276)

diff --git a/clang/include/clang/Basic/TargetBuiltins.h b/clang/include/clang/Basic/TargetBuiltins.h
index 95eb110bb9c24..6178aded91e2a 100644
--- a/clang/include/clang/Basic/TargetBuiltins.h
+++ b/clang/include/clang/Basic/TargetBuiltins.h
@@ -225,6 +225,10 @@ namespace clang {
       EltType ET = getEltType();
       return ET == Poly8 || ET == Poly16 || ET == Poly64;
     }
+    bool isFloatingPoint() const {
+      EltType ET = getEltType();
+      return ET == Float16 || ET == Float32 || ET == Float64 || ET == BFloat16;
+    }
     bool isUnsigned() const { return (Flags & UnsignedFlag) != 0; }
     bool isQuad() const { return (Flags & QuadFlag) != 0; }
     unsigned getEltSizeInBits() const {
diff --git a/clang/include/clang/Basic/arm_neon.td b/clang/include/clang/Basic/arm_neon.td
index 3e73dd054933f..ab0051efe5159 100644
--- a/clang/include/clang/Basic/arm_neon.td
+++ b/clang/include/clang/Basic/arm_neon.td
@@ -31,8 +31,8 @@ def OP_MLAL     : Op<(op "+", $p0, (call "vmull", $p1, $p2))>;
 def OP_MULLHi   : Op<(call "vmull", (call "vget_high", $p0),
                                     (call "vget_high", $p1))>;
 def OP_MULLHi_P64 : Op<(call "vmull",
-                         (cast "poly64_t", (call "vget_high", $p0)),
-                         (cast "poly64_t", (call "vget_high", $p1)))>;
+                         (bitcast "poly64_t", (call "vget_high", $p0)),
+                         (bitcast "poly64_t", (call "vget_high", $p1)))>;
 def OP_MULLHi_N : Op<(call "vmull_n", (call "vget_high", $p0), $p1)>;
 def OP_MLALHi   : Op<(call "vmlal", $p0, (call "vget_high", $p1),
                                          (call "vget_high", $p2))>;
@@ -95,11 +95,11 @@ def OP_TRN2     : Op<(shuffle $p0, $p1, (interleave
 def OP_ZIP2     : Op<(shuffle $p0, $p1, (highhalf (interleave mask0, mask1)))>;
 def OP_UZP2     : Op<(shuffle $p0, $p1, (add (decimate (rotl mask0, 1), 2),
                                              (decimate (rotl mask1, 1), 2)))>;
-def OP_EQ       : Op<(cast "R", (op "==", $p0, $p1))>;
-def OP_GE       : Op<(cast "R", (op ">=", $p0, $p1))>;
-def OP_LE       : Op<(cast "R", (op "<=", $p0, $p1))>;
-def OP_GT       : Op<(cast "R", (op ">", $p0, $p1))>;
-def OP_LT       : Op<(cast "R", (op "<", $p0, $p1))>;
+def OP_EQ       : Op<(bitcast "R", (op "==", $p0, $p1))>;
+def OP_GE       : Op<(bitcast "R", (op ">=", $p0, $p1))>;
+def OP_LE       : Op<(bitcast "R", (op "<=", $p0, $p1))>;
+def OP_GT       : Op<(bitcast "R", (op ">", $p0, $p1))>;
+def OP_LT       : Op<(bitcast "R", (op "<", $p0, $p1))>;
 def OP_NEG      : Op<(op "-", $p0)>;
 def OP_NOT      : Op<(op "~", $p0)>;
 def OP_AND      : Op<(op "&", $p0, $p1)>;
@@ -108,20 +108,20 @@ def OP_XOR      : Op<(op "^", $p0, $p1)>;
 def OP_ANDN     : Op<(op "&", $p0, (op "~", $p1))>;
 def OP_ORN      : Op<(op "|", $p0, (op "~", $p1))>;
 def OP_CAST     : LOp<[(save_temp $promote, $p0),
-                       (cast "R", $promote)]>;
+                       (bitcast "R", $promote)]>;
 def OP_HI       : Op<(shuffle $p0, $p0, (highhalf mask0))>;
 def OP_LO       : Op<(shuffle $p0, $p0, (lowhalf mask0))>;
 def OP_CONC     : Op<(shuffle $p0, $p1, (add mask0, mask1))>;
 def OP_DUP      : Op<(dup $p0)>;
 def OP_DUP_LN   : Op<(call_mangled "splat_lane", $p0, $p1)>;
-def OP_SEL      : Op<(cast "R", (op "|",
-                                    (op "&", $p0, (cast $p0, $p1)),
-                                    (op "&", (op "~", $p0), (cast $p0, $p2))))>;
+def OP_SEL      : Op<(bitcast "R", (op "|",
+                                    (op "&", $p0, (bitcast $p0, $p1)),
+                                    (op "&", (op "~", $p0), (bitcast $p0, $p2))))>;
 def OP_REV16    : Op<(shuffle $p0, $p0, (rev 16, mask0))>;
 def OP_REV32    : Op<(shuffle $p0, $p0, (rev 32, mask0))>;
 def OP_REV64    : Op<(shuffle $p0, $p0, (rev 64, mask0))>;
 def OP_XTN      : Op<(call "vcombine", $p0, (call "vmovn", $p1))>;
-def OP_SQXTUN   : Op<(call "vcombine", (cast $p0, "U", $p0),
+def OP_SQXTUN   : Op<(call "vcombine", (bitcast $p0, "U", $p0),
                                        (call "vqmovun", $p1))>;
 def OP_QXTN     : Op<(call "vcombine", $p0, (call "vqmovn", $p1))>;
 def OP_VCVT_NA_HI_F16 : Op<(call "vcombine", $p0, (call "vcvt_f16_f32", $p1))>;
@@ -129,12 +129,12 @@ def OP_VCVT_NA_HI_F32 : Op<(call "vcombine", $p0, (call "vcvt_f32_f64", $p1))>;
 def OP_VCVT_EX_HI_F32 : Op<(call "vcvt_f32_f16", (call "vget_high", $p0))>;
 def OP_VCVT_EX_HI_F64 : Op<(call "vcvt_f64_f32", (call "vget_high", $p0))>;
 def OP_VCVTX_HI : Op<(call "vcombine", $p0, (call "vcvtx_f32", $p1))>;
-def OP_REINT    : Op<(cast "R", $p0)>;
+def OP_REINT    : Op<(bitcast "R", $p0)>;
 def OP_ADDHNHi  : Op<(call "vcombine", $p0, (call "vaddhn", $p1, $p2))>;
 def OP_RADDHNHi : Op<(call "vcombine", $p0, (call "vraddhn", $p1, $p2))>;
 def OP_SUBHNHi  : Op<(call "vcombine", $p0, (call "vsubhn", $p1, $p2))>;
 def OP_RSUBHNHi : Op<(call "vcombine", $p0, (call "vrsubhn", $p1, $p2))>;
-def OP_ABDL     : Op<(cast "R", (call "vmovl", (cast $p0, "U",
+def OP_ABDL     : Op<(bitcast "R", (call "vmovl", (bitcast $p0, "U",
                                                      (call "vabd", $p0, $p1))))>;
 def OP_ABDLHi   : Op<(call "vabdl", (call "vget_high", $p0),
                                     (call "vget_high", $p1))>;
@@ -152,15 +152,15 @@ def OP_QDMLSLHi : Op<(call "vqdmlsl", $p0, (call "vget_high", $p1),
                                            (call "vget_high", $p2))>;
 def OP_QDMLSLHi_N : Op<(call "vqdmlsl_n", $p0, (call "vget_high", $p1), $p2)>;
 def OP_DIV  : Op<(op "/", $p0, $p1)>;
-def OP_LONG_HI : Op<(cast "R", (call (name_replace "_high_", "_"),
+def OP_LONG_HI : Op<(bitcast "R", (call (name_replace "_high_", "_"),
                                                 (call "vget_high", $p0), $p1))>;
-def OP_NARROW_HI : Op<(cast "R", (call "vcombine",
-                                       (cast "R", "H", $p0),
-                                       (cast "R", "H",
+def OP_NARROW_HI : Op<(bitcast "R", (call "vcombine",
+                                       (bitcast "R", "H", $p0),
+                                       (bitcast "R", "H",
                                            (call (name_replace "_high_", "_"),
                                                  $p1, $p2))))>;
 def OP_MOVL_HI  : LOp<[(save_temp $a1, (call "vget_high", $p0)),
-                       (cast "R",
+                       (bitcast "R",
                             (call "vshll_n", $a1, (literal "int32_t", "0")))]>;
 def OP_COPY_LN : Op<(call "vset_lane", (call "vget_lane", $p2, $p3), $p0, $p1)>;
 def OP_SCALAR_MUL_LN : Op<(op "*", $p0, (call "vget_lane", $p1, $p2))>;
@@ -221,18 +221,18 @@ def OP_FMLSL_LN_Hi  : Op<(call "vfmlsl_high", $p0, $p1,
 
 def OP_USDOT_LN
     : Op<(call "vusdot", $p0, $p1,
-          (cast "8", "S", (call_mangled "splat_lane", (bitcast "int32x2_t", $p2), $p3)))>;
+          (bitcast "8", "S", (call_mangled "splat_lane", (bitcast "int32x2_t", $p2), $p3)))>;
 def OP_USDOT_LNQ
     : Op<(call "vusdot", $p0, $p1,
-          (cast "8", "S", (call_mangled "splat_lane", (bitcast "int32x4_t", $p2), $p3)))>;
+          (bitcast "8", "S", (call_mangled "splat_lane", (bitcast "int32x4_t", $p2), $p3)))>;
 
 // sudot splats the second vector and then calls vusdot
 def OP_SUDOT_LN
     : Op<(call "vusdot", $p0,
-          (cast "8", "U", (call_mangled "splat_lane", (bitcast "int32x2_t", $p2), $p3)), $p1)>;
+          (bitcast "8", "U", (call_mangled "splat_lane", (bitcast "int32x2_t", $p2), $p3)), $p1)>;
 def OP_SUDOT_LNQ
     : Op<(call "vusdot", $p0,
-          (cast "8", "U", (call_mangled "splat_lane", (bitcast "int32x4_t", $p2), $p3)), $p1)>;
+          (bitcast "8", "U", (call_mangled "splat_lane", (bitcast "int32x4_t", $p2), $p3)), $p1)>;
 
 def OP_BFDOT_LN
     : Op<(call "vbfdot", $p0, $p1,
@@ -263,7 +263,7 @@ def OP_VCVT_BF16_F32_A32
     : Op<(call "__a32_vcvt_bf16", $p0)>;
 
 def OP_VCVT_BF16_F32_LO_A32
-    : Op<(call "vcombine", (cast "bfloat16x4_t", (literal "uint64_t", "0ULL")),
+    : Op<(call "vcombine", (bitcast "bfloat16x4_t", (literal "uint64_t", "0ULL")),
                            (call "__a32_vcvt_bf16", $p0))>;
 def OP_VCVT_BF16_F32_HI_A32
     : Op<(call "vcombine", (call "__a32_vcvt_bf16", $p1),
@@ -924,12 +924,12 @@ def CFMLE  : SOpInst<"vcle", "U..", "lUldQdQlQUl", OP_LE>;
 def CFMGT  : SOpInst<"vcgt", "U..", "lUldQdQlQUl", OP_GT>;
 def CFMLT  : SOpInst<"vclt", "U..", "lUldQdQlQUl", OP_LT>;
 
-def CMEQ  : SInst<"vceqz", "U.",
+def CMEQ  : SInst<"vceqz", "U(.!)",
                   "csilfUcUsUiUlPcPlQcQsQiQlQfQUcQUsQUiQUlQPcdQdQPl">;
-def CMGE  : SInst<"vcgez", "U.", "csilfdQcQsQiQlQfQd">;
-def CMLE  : SInst<"vclez", "U.", "csilfdQcQsQiQlQfQd">;
-def CMGT  : SInst<"vcgtz", "U.", "csilfdQcQsQiQlQfQd">;
-def CMLT  : SInst<"vcltz", "U.", "csilfdQcQsQiQlQfQd">;
+def CMGE  : SInst<"vcgez", "U(.!)", "csilfdQcQsQiQlQfQd">;
+def CMLE  : SInst<"vclez", "U(.!)", "csilfdQcQsQiQlQfQd">;
+def CMGT  : SInst<"vcgtz", "U(.!)", "csilfdQcQsQiQlQfQd">;
+def CMLT  : SInst<"vcltz", "U(.!)", "csilfdQcQsQiQlQfQd">;
 
 ////////////////////////////////////////////////////////////////////////////////
 // Max/Min Integer
@@ -1667,11 +1667,11 @@ let TargetGuard = "fullfp16,neon" in {
   // ARMv8.2-A FP16 one-operand vector intrinsics.
 
   // Comparison
-  def CMEQH    : SInst<"vceqz", "U.", "hQh">;
-  def CMGEH    : SInst<"vcgez", "U.", "hQh">;
-  def CMGTH    : SInst<"vcgtz", "U.", "hQh">;
-  def CMLEH    : SInst<"vclez", "U.", "hQh">;
-  def CMLTH    : SInst<"vcltz", "U.", "hQh">;
+  def CMEQH    : SInst<"vceqz", "U(.!)", "hQh">;
+  def CMGEH    : SInst<"vcgez", "U(.!)", "hQh">;
+  def CMGTH    : SInst<"vcgtz", "U(.!)", "hQh">;
+  def CMLEH    : SInst<"vclez", "U(.!)", "hQh">;
+  def CMLTH    : SInst<"vcltz", "U(.!)", "hQh">;
 
   // Vector conversion
   def VCVT_F16     : SInst<"vcvt_f16", "F(.!)",  "sUsQsQUs">;
diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index 7ec9d59bfed5c..9a5413a964679 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -8065,8 +8065,9 @@ Value *CodeGenFunction::EmitCommonNeonBuiltinExpr(
 
   // Determine the type of this overloaded NEON intrinsic.
   NeonTypeFlags Type(NeonTypeConst->getZExtValue());
-  bool Usgn = Type.isUnsigned();
-  bool Quad = Type.isQuad();
+  const bool Usgn = Type.isUnsigned();
+  const bool Quad = Type.isQuad();
+  const bool Floating = Type.isFloatingPoint();
   const bool HasLegalHalfType = getTarget().hasLegalHalfType();
   const bool AllowBFloatArgsAndRet =
       getTargetHooks().getABIInfo().allowBFloatArgsAndRet();
@@ -8167,24 +8168,28 @@ Value *CodeGenFunction::EmitCommonNeonBuiltinExpr(
   }
   case NEON::BI__builtin_neon_vceqz_v:
   case NEON::BI__builtin_neon_vceqzq_v:
-    return EmitAArch64CompareBuiltinExpr(Ops[0], Ty, ICmpInst::FCMP_OEQ,
-                                         ICmpInst::ICMP_EQ, "vceqz");
+    return EmitAArch64CompareBuiltinExpr(
+        Ops[0], Ty, Floating ? ICmpInst::FCMP_OEQ : ICmpInst::ICMP_EQ, "vceqz");
   case NEON::BI__builtin_neon_vcgez_v:
   case NEON::BI__builtin_neon_vcgezq_v:
-    return EmitAArch64CompareBuiltinExpr(Ops[0], Ty, ICmpInst::FCMP_OGE,
-                                         ICmpInst::ICMP_SGE, "vcgez");
+    return EmitAArch64CompareBuiltinExpr(
+        Ops[0], Ty, Floating ? ICmpInst::FCMP_OGE : ICmpInst::ICMP_SGE,
+        "vcgez");
   case NEON::BI__builtin_neon_vclez_v:
   case NEON::BI__builtin_neon_vclezq_v:
-    return EmitAArch64CompareBuiltinExpr(Ops[0], Ty, ICmpInst::FCMP_OLE,
-                                         ICmpInst::ICMP_SLE, "vclez");
+    return EmitAArch64CompareBuiltinExpr(
+        Ops[0], Ty, Floating ? ICmpInst::FCMP_OLE : ICmpInst::ICMP_SLE,
+        "vclez");
   case NEON::BI__builtin_neon_vcgtz_v:
   case NEON::BI__builtin_neon_vcgtzq_v:
-    return EmitAArch64CompareBuiltinExpr(Ops[0], Ty, ICmpInst::FCMP_OGT,
-                                         ICmpInst::ICMP_SGT, "vcgtz");
+    return EmitAArch64CompareBuiltinExpr(
+        Ops[0], Ty, Floating ? ICmpInst::FCMP_OGT : ICmpInst::ICMP_SGT,
+        "vcgtz");
   case NEON::BI__builtin_neon_vcltz_v:
   case NEON::BI__builtin_neon_vcltzq_v:
-    return EmitAArch64CompareBuiltinExpr(Ops[0], Ty, ICmpInst::FCMP_OLT,
-                                         ICmpInst::ICMP_SLT, "vcltz");
+    return EmitAArch64CompareBuiltinExpr(
+        Ops[0], Ty, Floating ? ICmpInst::FCMP_OLT : ICmpInst::ICMP_SLT,
+        "vcltz");
   case NEON::BI__builtin_neon_vclz_v:
   case NEON::BI__builtin_neon_vclzq_v:
     // We generate target-independent intrinsic, which needs a second argument
@@ -8747,28 +8752,32 @@ Value *CodeGenFunction::EmitCommonNeonBuiltinExpr(
   return Builder.CreateBitCast(Result, ResultType, NameHint);
 }
 
-Value *CodeGenFunction::EmitAArch64CompareBuiltinExpr(
-    Value *Op, llvm::Type *Ty, const CmpInst::Predicate Fp,
-    const CmpInst::Predicate Ip, const Twine &Name) {
-  llvm::Type *OTy = Op->getType();
-
-  // FIXME: this is utterly horrific. We should not be looking at previous
-  // codegen context to find out what needs doing. Unfortunately TableGen
-  // currently gives us exactly the same calls for vceqz_f32 and vceqz_s32
-  // (etc).
-  if (BitCastInst *BI = dyn_cast<BitCastInst>(Op))
-    OTy = BI->getOperand(0)->getType();
-
-  Op = Builder.CreateBitCast(Op, OTy);
-  if (OTy->getScalarType()->isFloatingPointTy()) {
-    if (Fp == CmpInst::FCMP_OEQ)
-      Op = Builder.CreateFCmp(Fp, Op, Constant::getNullValue(OTy));
+Value *
+CodeGenFunction::EmitAArch64CompareBuiltinExpr(Value *Op, llvm::Type *Ty,
+                                               const CmpInst::Predicate Pred,
+                                               const Twine &Name) {
+
+  if (isa<FixedVectorType>(Ty)) {
+    // Vector types are cast to i8 vectors. Recover original type.
+    Op = Builder.CreateBitCast(Op, Ty);
+  }
+
+  if (CmpInst::isFPPredicate(Pred)) {
+    if (Pred == CmpInst::FCMP_OEQ)
+      Op = Builder.CreateFCmp(Pred, Op, Constant::getNullValue(Op->getType()));
     else
-      Op = Builder.CreateFCmpS(Fp, Op, Constant::getNullValue(OTy));
+      Op = Builder.CreateFCmpS(Pred, Op, Constant::getNullValue(Op->getType()));
   } else {
-    Op = Builder.CreateICmp(Ip, Op, Constant::getNullValue(OTy));
+    Op = Builder.CreateICmp(Pred, Op, Constant::getNullValue(Op->getType()));
   }
-  return Builder.CreateSExt(Op, Ty, Name);
+
+  llvm::Type *ResTy = Ty;
+  if (auto *VTy = dyn_cast<FixedVectorType>(Ty))
+    ResTy = FixedVectorType::get(
+        IntegerType::get(getLLVMContext(), VTy->getScalarSizeInBits()),
+        VTy->getNumElements());
+
+  return Builder.CreateSExt(Op, ResTy, Name);
 }
 
 static Value *packTBLDVectorList(CodeGenFunction &CGF, ArrayRef<Value *> Ops,
@@ -12276,45 +12285,66 @@ Value *CodeGenFunction::EmitAArch64BuiltinExpr(unsigned BuiltinID,
     return Builder.CreateFAdd(Op0, Op1, "vpaddd");
   }
   case NEON::BI__builtin_neon_vceqzd_s64:
+    Ops.push_back(EmitScalarExpr(E->getArg(0)));
+    return EmitAArch64CompareBuiltinExpr(
+        Ops[0], ConvertType(E->getCallReturnType(getContext())),
+        ICmpInst::ICMP_EQ, "vceqz");
   case NEON::BI__builtin_neon_vceqzd_f64:
   case NEON::BI__builtin_neon_vceqzs_f32:
   case NEON::BI__builtin_neon_vceqzh_f16:
     Ops.push_back(EmitScalarExpr(E->getArg(0)));
     return EmitAArch64CompareBuiltinExpr(
         Ops[0], ConvertType(E->getCallReturnType(getContext())),
-        ICmpInst::FCMP_OEQ, ICmpInst::ICMP_EQ, "vceqz");
+        ICmpInst::FCMP_OEQ, "vceqz");
   case NEON::BI__builtin_neon_vcgezd_s64:
+    Ops.push_back(EmitScalarExpr(E->getArg(0)));
+    return EmitAArch64CompareBuiltinExpr(
+        Ops[0], ConvertType(E->getCallReturnType(getContext())),
+        ICmpInst::ICMP_SGE, "vcgez");
   case NEON::BI__builtin_neon_vcgezd_f64:
   case NEON::BI__builtin_neon_vcgezs_f32:
   case NEON::BI__builtin_neon_vcgezh_f16:
     Ops.push_back(EmitScalarExpr(E->getArg(0)));
     return EmitAArch64CompareBuiltinExpr(
         Ops[0], ConvertType(E->getCallReturnType(getContext())),
-        ICmpInst::FCMP_OGE, ICmpInst::ICMP_SGE, "vcgez");
+        ICmpInst::FCMP_OGE, "vcgez");
   case NEON::BI__builtin_neon_vclezd_s64:
+    Ops.push_back(EmitScalarExpr(E->getArg(0)));
+    return EmitAArch64CompareBuiltinExpr(
+        Ops[0], ConvertType(E->getCallReturnType(getContext())),
+        ICmpInst::ICMP_SLE, "vclez");
   case NEON::BI__builtin_neon_vclezd_f64:
   case NEON::BI__builtin_neon_vclezs_f32:
   case NEON::BI__builtin_neon_vclezh_f16:
     Ops.push_back(EmitScalarExpr(E->getArg(0)));
     return EmitAArch64CompareBuiltinExpr(
         Ops[0], ConvertType(E->getCallReturnType(getContext())),
-        ICmpInst::FCMP_OLE, ICmpInst::ICMP_SLE, "vclez");
+        ICmpInst::FCMP_OLE, "vclez");
   case NEON::BI__builtin_neon_vcgtzd_s64:
+    Ops.push_back(EmitScalarExpr(E->getArg(0)));
+    return EmitAArch64CompareBuiltinExpr(
+        Ops[0], ConvertType(E->getCallReturnType(getContext())),
+        ICmpInst::ICMP_SGT, "vcgtz");
   case NEON::BI__builtin_neon_vcgtzd_f64:
   case NEON::BI__builtin_neon_vcgtzs_f32:
   case NEON::BI__builtin_neon_vcgtzh_f16:
     Ops.push_back(EmitScalarExpr(E->getArg(0)));
     return EmitAArch64CompareBuiltinExpr(
         Ops[0], ConvertType(E->getCallReturnType(getContext())),
-        ICmpInst::FCMP_OGT, ICmpInst::ICMP_SGT, "vcgtz");
+        ICmpInst::FCMP_OGT, "vcgtz");
   case NEON::BI__builtin_neon_vcltzd_s64:
+    Ops.push_back(EmitScalarExpr(E->getArg(0)));
+    return EmitAArch64CompareBuiltinExpr(
+        Ops[0], ConvertType(E->getCallReturnType(getContext())),
+        ICmpInst::ICMP_SLT, "vcltz");
+
   case NEON::BI__builtin_neon_vcltzd_f64:
   case NEON::BI__builtin_neon_vcltzs_f32:
   case NEON::BI__builtin_neon_vcltzh_f16:
     Ops.push_back(EmitScalarExpr(E->getArg(0)));
     return EmitAArch64CompareBuiltinExpr(
         Ops[0], ConvertType(E->getCallReturnType(getContext())),
-        ICmpInst::FCMP_OLT, ICmpInst::ICMP_SLT, "vcltz");
+        ICmpInst::FCMP_OLT, "vcltz");
 
   case NEON::BI__builtin_neon_vceqzd_u64: {
     Ops.push_back(EmitScalarExpr(E->getArg(0)));
diff --git a/clang/lib/CodeGen/CodeGenFunction.h b/clang/lib/CodeGen/CodeGenFunction.h
index e978cad433623..95be50a7fd436 100644
--- a/clang/lib/CodeGen/CodeGenFunction.h
+++ b/clang/lib/CodeGen/CodeGenFunction.h
@@ -4671,10 +4671,10 @@ class CodeGenFunction : public CodeGenTypeCache {
   llvm::Value *EmitTargetBuiltinExpr(unsigned BuiltinID, const CallExpr *E,
                                      ReturnValueSlot ReturnValue);
 
-  llvm::Value *EmitAArch64CompareBuiltinExpr(llvm::Value *Op, llvm::Type *Ty,
-                                             const llvm::CmpInst::Predicate Fp,
-                                             const llvm::CmpInst::Predicate Ip,
-                                             const llvm::Twine &Name = "");
+  llvm::Value *
+  EmitAArch64CompareBuiltinExpr(llvm::Value *Op, llvm::Type *Ty,
+                                const llvm::CmpInst::Predicate Pred,
+                                const llvm::Twine &Name = "");
   llvm::Value *EmitARMBuiltinExpr(unsigned BuiltinID, const CallExpr *E,
                                   ReturnValueSlot ReturnValue,
                                   llvm::Triple::ArchType Arch);
diff --git a/clang/test/CodeGen/AArch64/bf16-dotprod-intrinsics.c b/clang/test/CodeGen/AArch64/bf16-dotprod-intrinsics.c
index 877d83c0fa395..2097495b3baee 100644
--- a/clang/test/CodeGen/AArch64/bf16-dotprod-intrinsics.c
+++ b/clang/test/CodeGen/AArch64/bf16-dotprod-intrinsics.c
@@ -1,6 +1,6 @@
 // NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
 // RUN: %clang_cc1 -triple aarch64 -target-feature +neon -target-feature +bf16 \
-// RUN: -disable-O0-optnone -emit-llvm %s -o - | opt -S -passes=mem2reg | FileCheck %s
+// RUN: -disable-O0-optnone -emit-llvm %s -o - | opt -S -passes=mem2reg,sroa | FileCheck %s
 
 // REQUIRES: aarch64-registered-target || arm-registered-target
 
@@ -8,10 +8,16 @@
 
 // CHECK-LABEL: @test_vbfdot_f32(
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[TMP0:%.*]] = bitcast <2 x float> [[R:%.*]] to <8 x i8>
-// CHECK-NEXT:    [[TMP1:%.*]] = bitcast <4 x bfloat> [[A:%.*]] to <8 x i8>
-// CHECK-NEXT:    [[TMP2:%.*]] = bitcast <4 x bfloat> [[B:%.*]] to <8 x i8>
-// CHECK-NEXT:    [[VBFDOT3_I:%.*]] = call <2 x float> @llvm.aarch64.neon.bfdot....
[truncated]

llvmbot · 2025-02-13T11:17:16Z

@llvm/pr-subscribers-clang-codegen

Author: None (Lukacma)

Changes

Currently arm_neon.h emits C-style casts to do vector type casts. This relies on implicit conversion between vector types to be enabled, which is currently deprecated behaviour and soon will disappear. To ensure NEON code will keep working afterwards, this patch changes all this vector type casts into bitcasts.

Co-authored-by: Momchil Velikov <[email protected]>

Patch is 6.96 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/127043.diff

48 Files Affected:

(modified) clang/include/clang/Basic/TargetBuiltins.h (+4)
(modified) clang/include/clang/Basic/arm_neon.td (+34-34)
(modified) clang/lib/CodeGen/CGBuiltin.cpp (+66-36)
(modified) clang/lib/CodeGen/CodeGenFunction.h (+4-4)
(modified) clang/test/CodeGen/AArch64/bf16-dotprod-intrinsics.c (+236-148)
(modified) clang/test/CodeGen/AArch64/bf16-getset-intrinsics.c (+17-13)
(modified) clang/test/CodeGen/AArch64/bf16-reinterpret-intrinsics.c (+266-186)
(modified) clang/test/CodeGen/AArch64/fp8-intrinsics/acle_neon_fp8_cvt.c (+30-14)
(modified) clang/test/CodeGen/AArch64/fp8-intrinsics/acle_neon_fp8_fdot.c (+50-34)
(modified) clang/test/CodeGen/AArch64/fp8-intrinsics/acle_neon_fp8_fmla.c (+50-34)
(modified) clang/test/CodeGen/AArch64/fp8-intrinsics/acle_neon_fp8_reinterpret.c (+96-62)
(modified) clang/test/CodeGen/AArch64/neon-2velem.c (+1232-594)
(modified) clang/test/CodeGen/AArch64/neon-extract.c (+228-145)
(modified) clang/test/CodeGen/AArch64/neon-fma.c (+87-59)
(modified) clang/test/CodeGen/AArch64/neon-fp16fml.c (+593-833)
(modified) clang/test/CodeGen/AArch64/neon-intrinsics-constrained.c (+1409-453)
(modified) clang/test/CodeGen/AArch64/neon-intrinsics.c (+16202-10053)
(modified) clang/test/CodeGen/AArch64/neon-ldst-one-rcpc3.c (+23-17)
(modified) clang/test/CodeGen/AArch64/neon-ldst-one.c (+3870-4665)
(modified) clang/test/CodeGen/AArch64/neon-misc-constrained.c (+78-33)
(modified) clang/test/CodeGen/AArch64/neon-misc.c (+2734-1396)
(modified) clang/test/CodeGen/AArch64/neon-perm.c (+1670-1207)
(modified) clang/test/CodeGen/AArch64/neon-scalar-x-indexed-elem-constrained.c (+219-89)
(modified) clang/test/CodeGen/AArch64/neon-scalar-x-indexed-elem.c (+401-252)
(modified) clang/test/CodeGen/AArch64/neon-vcmla.c (+889-425)
(modified) clang/test/CodeGen/AArch64/poly-add.c (+1-1)
(modified) clang/test/CodeGen/AArch64/poly128.c (+28-28)
(modified) clang/test/CodeGen/AArch64/poly64.c (+443-338)
(modified) clang/test/CodeGen/AArch64/v8.1a-neon-intrinsics.c (+81-17)
(modified) clang/test/CodeGen/AArch64/v8.2a-neon-intrinsics-constrained.c (+669-233)
(modified) clang/test/CodeGen/AArch64/v8.2a-neon-intrinsics-generic.c (+154-134)
(modified) clang/test/CodeGen/AArch64/v8.2a-neon-intrinsics.c (+773-411)
(modified) clang/test/CodeGen/AArch64/v8.5a-neon-frint3264-intrinsic.c (+202-49)
(modified) clang/test/CodeGen/AArch64/v8.6a-neon-intrinsics.c (+145-87)
(modified) clang/test/CodeGen/arm-bf16-dotprod-intrinsics.c (+237-149)
(modified) clang/test/CodeGen/arm-bf16-getset-intrinsics.c (+18-14)
(modified) clang/test/CodeGen/arm-neon-directed-rounding.c (+285-62)
(modified) clang/test/CodeGen/arm-neon-fma.c (+45-21)
(modified) clang/test/CodeGen/arm-neon-numeric-maxmin.c (+43-19)
(modified) clang/test/CodeGen/arm-neon-vcvtX.c (+73-41)
(modified) clang/test/CodeGen/arm-neon-vst.c (+2443-1695)
(modified) clang/test/CodeGen/arm64-vrnd-constrained.c (+193-26)
(modified) clang/test/CodeGen/arm64-vrnd.c (+115-6)
(modified) clang/test/CodeGen/arm64_vcreate.c (+18-3)
(modified) clang/test/CodeGen/arm64_vdupq_n_f64.c (+58-38)
(modified) clang/test/CodeGen/arm_neon_intrinsics.c (+19524-12225)
(modified) clang/utils/TableGen/NeonEmitter.cpp (+17-11)
(added) llvm/test/CodeGen/AArch64/v8.2a-neon-intrinsics-constrained.ll (+276)

diff --git a/clang/include/clang/Basic/TargetBuiltins.h b/clang/include/clang/Basic/TargetBuiltins.h
index 95eb110bb9c24..6178aded91e2a 100644
--- a/clang/include/clang/Basic/TargetBuiltins.h
+++ b/clang/include/clang/Basic/TargetBuiltins.h
@@ -225,6 +225,10 @@ namespace clang {
       EltType ET = getEltType();
       return ET == Poly8 || ET == Poly16 || ET == Poly64;
     }
+    bool isFloatingPoint() const {
+      EltType ET = getEltType();
+      return ET == Float16 || ET == Float32 || ET == Float64 || ET == BFloat16;
+    }
     bool isUnsigned() const { return (Flags & UnsignedFlag) != 0; }
     bool isQuad() const { return (Flags & QuadFlag) != 0; }
     unsigned getEltSizeInBits() const {
diff --git a/clang/include/clang/Basic/arm_neon.td b/clang/include/clang/Basic/arm_neon.td
index 3e73dd054933f..ab0051efe5159 100644
--- a/clang/include/clang/Basic/arm_neon.td
+++ b/clang/include/clang/Basic/arm_neon.td
@@ -31,8 +31,8 @@ def OP_MLAL     : Op<(op "+", $p0, (call "vmull", $p1, $p2))>;
 def OP_MULLHi   : Op<(call "vmull", (call "vget_high", $p0),
                                     (call "vget_high", $p1))>;
 def OP_MULLHi_P64 : Op<(call "vmull",
-                         (cast "poly64_t", (call "vget_high", $p0)),
-                         (cast "poly64_t", (call "vget_high", $p1)))>;
+                         (bitcast "poly64_t", (call "vget_high", $p0)),
+                         (bitcast "poly64_t", (call "vget_high", $p1)))>;
 def OP_MULLHi_N : Op<(call "vmull_n", (call "vget_high", $p0), $p1)>;
 def OP_MLALHi   : Op<(call "vmlal", $p0, (call "vget_high", $p1),
                                          (call "vget_high", $p2))>;
@@ -95,11 +95,11 @@ def OP_TRN2     : Op<(shuffle $p0, $p1, (interleave
 def OP_ZIP2     : Op<(shuffle $p0, $p1, (highhalf (interleave mask0, mask1)))>;
 def OP_UZP2     : Op<(shuffle $p0, $p1, (add (decimate (rotl mask0, 1), 2),
                                              (decimate (rotl mask1, 1), 2)))>;
-def OP_EQ       : Op<(cast "R", (op "==", $p0, $p1))>;
-def OP_GE       : Op<(cast "R", (op ">=", $p0, $p1))>;
-def OP_LE       : Op<(cast "R", (op "<=", $p0, $p1))>;
-def OP_GT       : Op<(cast "R", (op ">", $p0, $p1))>;
-def OP_LT       : Op<(cast "R", (op "<", $p0, $p1))>;
+def OP_EQ       : Op<(bitcast "R", (op "==", $p0, $p1))>;
+def OP_GE       : Op<(bitcast "R", (op ">=", $p0, $p1))>;
+def OP_LE       : Op<(bitcast "R", (op "<=", $p0, $p1))>;
+def OP_GT       : Op<(bitcast "R", (op ">", $p0, $p1))>;
+def OP_LT       : Op<(bitcast "R", (op "<", $p0, $p1))>;
 def OP_NEG      : Op<(op "-", $p0)>;
 def OP_NOT      : Op<(op "~", $p0)>;
 def OP_AND      : Op<(op "&", $p0, $p1)>;
@@ -108,20 +108,20 @@ def OP_XOR      : Op<(op "^", $p0, $p1)>;
 def OP_ANDN     : Op<(op "&", $p0, (op "~", $p1))>;
 def OP_ORN      : Op<(op "|", $p0, (op "~", $p1))>;
 def OP_CAST     : LOp<[(save_temp $promote, $p0),
-                       (cast "R", $promote)]>;
+                       (bitcast "R", $promote)]>;
 def OP_HI       : Op<(shuffle $p0, $p0, (highhalf mask0))>;
 def OP_LO       : Op<(shuffle $p0, $p0, (lowhalf mask0))>;
 def OP_CONC     : Op<(shuffle $p0, $p1, (add mask0, mask1))>;
 def OP_DUP      : Op<(dup $p0)>;
 def OP_DUP_LN   : Op<(call_mangled "splat_lane", $p0, $p1)>;
-def OP_SEL      : Op<(cast "R", (op "|",
-                                    (op "&", $p0, (cast $p0, $p1)),
-                                    (op "&", (op "~", $p0), (cast $p0, $p2))))>;
+def OP_SEL      : Op<(bitcast "R", (op "|",
+                                    (op "&", $p0, (bitcast $p0, $p1)),
+                                    (op "&", (op "~", $p0), (bitcast $p0, $p2))))>;
 def OP_REV16    : Op<(shuffle $p0, $p0, (rev 16, mask0))>;
 def OP_REV32    : Op<(shuffle $p0, $p0, (rev 32, mask0))>;
 def OP_REV64    : Op<(shuffle $p0, $p0, (rev 64, mask0))>;
 def OP_XTN      : Op<(call "vcombine", $p0, (call "vmovn", $p1))>;
-def OP_SQXTUN   : Op<(call "vcombine", (cast $p0, "U", $p0),
+def OP_SQXTUN   : Op<(call "vcombine", (bitcast $p0, "U", $p0),
                                        (call "vqmovun", $p1))>;
 def OP_QXTN     : Op<(call "vcombine", $p0, (call "vqmovn", $p1))>;
 def OP_VCVT_NA_HI_F16 : Op<(call "vcombine", $p0, (call "vcvt_f16_f32", $p1))>;
@@ -129,12 +129,12 @@ def OP_VCVT_NA_HI_F32 : Op<(call "vcombine", $p0, (call "vcvt_f32_f64", $p1))>;
 def OP_VCVT_EX_HI_F32 : Op<(call "vcvt_f32_f16", (call "vget_high", $p0))>;
 def OP_VCVT_EX_HI_F64 : Op<(call "vcvt_f64_f32", (call "vget_high", $p0))>;
 def OP_VCVTX_HI : Op<(call "vcombine", $p0, (call "vcvtx_f32", $p1))>;
-def OP_REINT    : Op<(cast "R", $p0)>;
+def OP_REINT    : Op<(bitcast "R", $p0)>;
 def OP_ADDHNHi  : Op<(call "vcombine", $p0, (call "vaddhn", $p1, $p2))>;
 def OP_RADDHNHi : Op<(call "vcombine", $p0, (call "vraddhn", $p1, $p2))>;
 def OP_SUBHNHi  : Op<(call "vcombine", $p0, (call "vsubhn", $p1, $p2))>;
 def OP_RSUBHNHi : Op<(call "vcombine", $p0, (call "vrsubhn", $p1, $p2))>;
-def OP_ABDL     : Op<(cast "R", (call "vmovl", (cast $p0, "U",
+def OP_ABDL     : Op<(bitcast "R", (call "vmovl", (bitcast $p0, "U",
                                                      (call "vabd", $p0, $p1))))>;
 def OP_ABDLHi   : Op<(call "vabdl", (call "vget_high", $p0),
                                     (call "vget_high", $p1))>;
@@ -152,15 +152,15 @@ def OP_QDMLSLHi : Op<(call "vqdmlsl", $p0, (call "vget_high", $p1),
                                            (call "vget_high", $p2))>;
 def OP_QDMLSLHi_N : Op<(call "vqdmlsl_n", $p0, (call "vget_high", $p1), $p2)>;
 def OP_DIV  : Op<(op "/", $p0, $p1)>;
-def OP_LONG_HI : Op<(cast "R", (call (name_replace "_high_", "_"),
+def OP_LONG_HI : Op<(bitcast "R", (call (name_replace "_high_", "_"),
                                                 (call "vget_high", $p0), $p1))>;
-def OP_NARROW_HI : Op<(cast "R", (call "vcombine",
-                                       (cast "R", "H", $p0),
-                                       (cast "R", "H",
+def OP_NARROW_HI : Op<(bitcast "R", (call "vcombine",
+                                       (bitcast "R", "H", $p0),
+                                       (bitcast "R", "H",
                                            (call (name_replace "_high_", "_"),
                                                  $p1, $p2))))>;
 def OP_MOVL_HI  : LOp<[(save_temp $a1, (call "vget_high", $p0)),
-                       (cast "R",
+                       (bitcast "R",
                             (call "vshll_n", $a1, (literal "int32_t", "0")))]>;
 def OP_COPY_LN : Op<(call "vset_lane", (call "vget_lane", $p2, $p3), $p0, $p1)>;
 def OP_SCALAR_MUL_LN : Op<(op "*", $p0, (call "vget_lane", $p1, $p2))>;
@@ -221,18 +221,18 @@ def OP_FMLSL_LN_Hi  : Op<(call "vfmlsl_high", $p0, $p1,
 
 def OP_USDOT_LN
     : Op<(call "vusdot", $p0, $p1,
-          (cast "8", "S", (call_mangled "splat_lane", (bitcast "int32x2_t", $p2), $p3)))>;
+          (bitcast "8", "S", (call_mangled "splat_lane", (bitcast "int32x2_t", $p2), $p3)))>;
 def OP_USDOT_LNQ
     : Op<(call "vusdot", $p0, $p1,
-          (cast "8", "S", (call_mangled "splat_lane", (bitcast "int32x4_t", $p2), $p3)))>;
+          (bitcast "8", "S", (call_mangled "splat_lane", (bitcast "int32x4_t", $p2), $p3)))>;
 
 // sudot splats the second vector and then calls vusdot
 def OP_SUDOT_LN
     : Op<(call "vusdot", $p0,
-          (cast "8", "U", (call_mangled "splat_lane", (bitcast "int32x2_t", $p2), $p3)), $p1)>;
+          (bitcast "8", "U", (call_mangled "splat_lane", (bitcast "int32x2_t", $p2), $p3)), $p1)>;
 def OP_SUDOT_LNQ
     : Op<(call "vusdot", $p0,
-          (cast "8", "U", (call_mangled "splat_lane", (bitcast "int32x4_t", $p2), $p3)), $p1)>;
+          (bitcast "8", "U", (call_mangled "splat_lane", (bitcast "int32x4_t", $p2), $p3)), $p1)>;
 
 def OP_BFDOT_LN
     : Op<(call "vbfdot", $p0, $p1,
@@ -263,7 +263,7 @@ def OP_VCVT_BF16_F32_A32
     : Op<(call "__a32_vcvt_bf16", $p0)>;
 
 def OP_VCVT_BF16_F32_LO_A32
-    : Op<(call "vcombine", (cast "bfloat16x4_t", (literal "uint64_t", "0ULL")),
+    : Op<(call "vcombine", (bitcast "bfloat16x4_t", (literal "uint64_t", "0ULL")),
                            (call "__a32_vcvt_bf16", $p0))>;
 def OP_VCVT_BF16_F32_HI_A32
     : Op<(call "vcombine", (call "__a32_vcvt_bf16", $p1),
@@ -924,12 +924,12 @@ def CFMLE  : SOpInst<"vcle", "U..", "lUldQdQlQUl", OP_LE>;
 def CFMGT  : SOpInst<"vcgt", "U..", "lUldQdQlQUl", OP_GT>;
 def CFMLT  : SOpInst<"vclt", "U..", "lUldQdQlQUl", OP_LT>;
 
-def CMEQ  : SInst<"vceqz", "U.",
+def CMEQ  : SInst<"vceqz", "U(.!)",
                   "csilfUcUsUiUlPcPlQcQsQiQlQfQUcQUsQUiQUlQPcdQdQPl">;
-def CMGE  : SInst<"vcgez", "U.", "csilfdQcQsQiQlQfQd">;
-def CMLE  : SInst<"vclez", "U.", "csilfdQcQsQiQlQfQd">;
-def CMGT  : SInst<"vcgtz", "U.", "csilfdQcQsQiQlQfQd">;
-def CMLT  : SInst<"vcltz", "U.", "csilfdQcQsQiQlQfQd">;
+def CMGE  : SInst<"vcgez", "U(.!)", "csilfdQcQsQiQlQfQd">;
+def CMLE  : SInst<"vclez", "U(.!)", "csilfdQcQsQiQlQfQd">;
+def CMGT  : SInst<"vcgtz", "U(.!)", "csilfdQcQsQiQlQfQd">;
+def CMLT  : SInst<"vcltz", "U(.!)", "csilfdQcQsQiQlQfQd">;
 
 ////////////////////////////////////////////////////////////////////////////////
 // Max/Min Integer
@@ -1667,11 +1667,11 @@ let TargetGuard = "fullfp16,neon" in {
   // ARMv8.2-A FP16 one-operand vector intrinsics.
 
   // Comparison
-  def CMEQH    : SInst<"vceqz", "U.", "hQh">;
-  def CMGEH    : SInst<"vcgez", "U.", "hQh">;
-  def CMGTH    : SInst<"vcgtz", "U.", "hQh">;
-  def CMLEH    : SInst<"vclez", "U.", "hQh">;
-  def CMLTH    : SInst<"vcltz", "U.", "hQh">;
+  def CMEQH    : SInst<"vceqz", "U(.!)", "hQh">;
+  def CMGEH    : SInst<"vcgez", "U(.!)", "hQh">;
+  def CMGTH    : SInst<"vcgtz", "U(.!)", "hQh">;
+  def CMLEH    : SInst<"vclez", "U(.!)", "hQh">;
+  def CMLTH    : SInst<"vcltz", "U(.!)", "hQh">;
 
   // Vector conversion
   def VCVT_F16     : SInst<"vcvt_f16", "F(.!)",  "sUsQsQUs">;
diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index 7ec9d59bfed5c..9a5413a964679 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -8065,8 +8065,9 @@ Value *CodeGenFunction::EmitCommonNeonBuiltinExpr(
 
   // Determine the type of this overloaded NEON intrinsic.
   NeonTypeFlags Type(NeonTypeConst->getZExtValue());
-  bool Usgn = Type.isUnsigned();
-  bool Quad = Type.isQuad();
+  const bool Usgn = Type.isUnsigned();
+  const bool Quad = Type.isQuad();
+  const bool Floating = Type.isFloatingPoint();
   const bool HasLegalHalfType = getTarget().hasLegalHalfType();
   const bool AllowBFloatArgsAndRet =
       getTargetHooks().getABIInfo().allowBFloatArgsAndRet();
@@ -8167,24 +8168,28 @@ Value *CodeGenFunction::EmitCommonNeonBuiltinExpr(
   }
   case NEON::BI__builtin_neon_vceqz_v:
   case NEON::BI__builtin_neon_vceqzq_v:
-    return EmitAArch64CompareBuiltinExpr(Ops[0], Ty, ICmpInst::FCMP_OEQ,
-                                         ICmpInst::ICMP_EQ, "vceqz");
+    return EmitAArch64CompareBuiltinExpr(
+        Ops[0], Ty, Floating ? ICmpInst::FCMP_OEQ : ICmpInst::ICMP_EQ, "vceqz");
   case NEON::BI__builtin_neon_vcgez_v:
   case NEON::BI__builtin_neon_vcgezq_v:
-    return EmitAArch64CompareBuiltinExpr(Ops[0], Ty, ICmpInst::FCMP_OGE,
-                                         ICmpInst::ICMP_SGE, "vcgez");
+    return EmitAArch64CompareBuiltinExpr(
+        Ops[0], Ty, Floating ? ICmpInst::FCMP_OGE : ICmpInst::ICMP_SGE,
+        "vcgez");
   case NEON::BI__builtin_neon_vclez_v:
   case NEON::BI__builtin_neon_vclezq_v:
-    return EmitAArch64CompareBuiltinExpr(Ops[0], Ty, ICmpInst::FCMP_OLE,
-                                         ICmpInst::ICMP_SLE, "vclez");
+    return EmitAArch64CompareBuiltinExpr(
+        Ops[0], Ty, Floating ? ICmpInst::FCMP_OLE : ICmpInst::ICMP_SLE,
+        "vclez");
   case NEON::BI__builtin_neon_vcgtz_v:
   case NEON::BI__builtin_neon_vcgtzq_v:
-    return EmitAArch64CompareBuiltinExpr(Ops[0], Ty, ICmpInst::FCMP_OGT,
-                                         ICmpInst::ICMP_SGT, "vcgtz");
+    return EmitAArch64CompareBuiltinExpr(
+        Ops[0], Ty, Floating ? ICmpInst::FCMP_OGT : ICmpInst::ICMP_SGT,
+        "vcgtz");
   case NEON::BI__builtin_neon_vcltz_v:
   case NEON::BI__builtin_neon_vcltzq_v:
-    return EmitAArch64CompareBuiltinExpr(Ops[0], Ty, ICmpInst::FCMP_OLT,
-                                         ICmpInst::ICMP_SLT, "vcltz");
+    return EmitAArch64CompareBuiltinExpr(
+        Ops[0], Ty, Floating ? ICmpInst::FCMP_OLT : ICmpInst::ICMP_SLT,
+        "vcltz");
   case NEON::BI__builtin_neon_vclz_v:
   case NEON::BI__builtin_neon_vclzq_v:
     // We generate target-independent intrinsic, which needs a second argument
@@ -8747,28 +8752,32 @@ Value *CodeGenFunction::EmitCommonNeonBuiltinExpr(
   return Builder.CreateBitCast(Result, ResultType, NameHint);
 }
 
-Value *CodeGenFunction::EmitAArch64CompareBuiltinExpr(
-    Value *Op, llvm::Type *Ty, const CmpInst::Predicate Fp,
-    const CmpInst::Predicate Ip, const Twine &Name) {
-  llvm::Type *OTy = Op->getType();
-
-  // FIXME: this is utterly horrific. We should not be looking at previous
-  // codegen context to find out what needs doing. Unfortunately TableGen
-  // currently gives us exactly the same calls for vceqz_f32 and vceqz_s32
-  // (etc).
-  if (BitCastInst *BI = dyn_cast<BitCastInst>(Op))
-    OTy = BI->getOperand(0)->getType();
-
-  Op = Builder.CreateBitCast(Op, OTy);
-  if (OTy->getScalarType()->isFloatingPointTy()) {
-    if (Fp == CmpInst::FCMP_OEQ)
-      Op = Builder.CreateFCmp(Fp, Op, Constant::getNullValue(OTy));
+Value *
+CodeGenFunction::EmitAArch64CompareBuiltinExpr(Value *Op, llvm::Type *Ty,
+                                               const CmpInst::Predicate Pred,
+                                               const Twine &Name) {
+
+  if (isa<FixedVectorType>(Ty)) {
+    // Vector types are cast to i8 vectors. Recover original type.
+    Op = Builder.CreateBitCast(Op, Ty);
+  }
+
+  if (CmpInst::isFPPredicate(Pred)) {
+    if (Pred == CmpInst::FCMP_OEQ)
+      Op = Builder.CreateFCmp(Pred, Op, Constant::getNullValue(Op->getType()));
     else
-      Op = Builder.CreateFCmpS(Fp, Op, Constant::getNullValue(OTy));
+      Op = Builder.CreateFCmpS(Pred, Op, Constant::getNullValue(Op->getType()));
   } else {
-    Op = Builder.CreateICmp(Ip, Op, Constant::getNullValue(OTy));
+    Op = Builder.CreateICmp(Pred, Op, Constant::getNullValue(Op->getType()));
   }
-  return Builder.CreateSExt(Op, Ty, Name);
+
+  llvm::Type *ResTy = Ty;
+  if (auto *VTy = dyn_cast<FixedVectorType>(Ty))
+    ResTy = FixedVectorType::get(
+        IntegerType::get(getLLVMContext(), VTy->getScalarSizeInBits()),
+        VTy->getNumElements());
+
+  return Builder.CreateSExt(Op, ResTy, Name);
 }
 
 static Value *packTBLDVectorList(CodeGenFunction &CGF, ArrayRef<Value *> Ops,
@@ -12276,45 +12285,66 @@ Value *CodeGenFunction::EmitAArch64BuiltinExpr(unsigned BuiltinID,
     return Builder.CreateFAdd(Op0, Op1, "vpaddd");
   }
   case NEON::BI__builtin_neon_vceqzd_s64:
+    Ops.push_back(EmitScalarExpr(E->getArg(0)));
+    return EmitAArch64CompareBuiltinExpr(
+        Ops[0], ConvertType(E->getCallReturnType(getContext())),
+        ICmpInst::ICMP_EQ, "vceqz");
   case NEON::BI__builtin_neon_vceqzd_f64:
   case NEON::BI__builtin_neon_vceqzs_f32:
   case NEON::BI__builtin_neon_vceqzh_f16:
     Ops.push_back(EmitScalarExpr(E->getArg(0)));
     return EmitAArch64CompareBuiltinExpr(
         Ops[0], ConvertType(E->getCallReturnType(getContext())),
-        ICmpInst::FCMP_OEQ, ICmpInst::ICMP_EQ, "vceqz");
+        ICmpInst::FCMP_OEQ, "vceqz");
   case NEON::BI__builtin_neon_vcgezd_s64:
+    Ops.push_back(EmitScalarExpr(E->getArg(0)));
+    return EmitAArch64CompareBuiltinExpr(
+        Ops[0], ConvertType(E->getCallReturnType(getContext())),
+        ICmpInst::ICMP_SGE, "vcgez");
   case NEON::BI__builtin_neon_vcgezd_f64:
   case NEON::BI__builtin_neon_vcgezs_f32:
   case NEON::BI__builtin_neon_vcgezh_f16:
     Ops.push_back(EmitScalarExpr(E->getArg(0)));
     return EmitAArch64CompareBuiltinExpr(
         Ops[0], ConvertType(E->getCallReturnType(getContext())),
-        ICmpInst::FCMP_OGE, ICmpInst::ICMP_SGE, "vcgez");
+        ICmpInst::FCMP_OGE, "vcgez");
   case NEON::BI__builtin_neon_vclezd_s64:
+    Ops.push_back(EmitScalarExpr(E->getArg(0)));
+    return EmitAArch64CompareBuiltinExpr(
+        Ops[0], ConvertType(E->getCallReturnType(getContext())),
+        ICmpInst::ICMP_SLE, "vclez");
   case NEON::BI__builtin_neon_vclezd_f64:
   case NEON::BI__builtin_neon_vclezs_f32:
   case NEON::BI__builtin_neon_vclezh_f16:
     Ops.push_back(EmitScalarExpr(E->getArg(0)));
     return EmitAArch64CompareBuiltinExpr(
         Ops[0], ConvertType(E->getCallReturnType(getContext())),
-        ICmpInst::FCMP_OLE, ICmpInst::ICMP_SLE, "vclez");
+        ICmpInst::FCMP_OLE, "vclez");
   case NEON::BI__builtin_neon_vcgtzd_s64:
+    Ops.push_back(EmitScalarExpr(E->getArg(0)));
+    return EmitAArch64CompareBuiltinExpr(
+        Ops[0], ConvertType(E->getCallReturnType(getContext())),
+        ICmpInst::ICMP_SGT, "vcgtz");
   case NEON::BI__builtin_neon_vcgtzd_f64:
   case NEON::BI__builtin_neon_vcgtzs_f32:
   case NEON::BI__builtin_neon_vcgtzh_f16:
     Ops.push_back(EmitScalarExpr(E->getArg(0)));
     return EmitAArch64CompareBuiltinExpr(
         Ops[0], ConvertType(E->getCallReturnType(getContext())),
-        ICmpInst::FCMP_OGT, ICmpInst::ICMP_SGT, "vcgtz");
+        ICmpInst::FCMP_OGT, "vcgtz");
   case NEON::BI__builtin_neon_vcltzd_s64:
+    Ops.push_back(EmitScalarExpr(E->getArg(0)));
+    return EmitAArch64CompareBuiltinExpr(
+        Ops[0], ConvertType(E->getCallReturnType(getContext())),
+        ICmpInst::ICMP_SLT, "vcltz");
+
   case NEON::BI__builtin_neon_vcltzd_f64:
   case NEON::BI__builtin_neon_vcltzs_f32:
   case NEON::BI__builtin_neon_vcltzh_f16:
     Ops.push_back(EmitScalarExpr(E->getArg(0)));
     return EmitAArch64CompareBuiltinExpr(
         Ops[0], ConvertType(E->getCallReturnType(getContext())),
-        ICmpInst::FCMP_OLT, ICmpInst::ICMP_SLT, "vcltz");
+        ICmpInst::FCMP_OLT, "vcltz");
 
   case NEON::BI__builtin_neon_vceqzd_u64: {
     Ops.push_back(EmitScalarExpr(E->getArg(0)));
diff --git a/clang/lib/CodeGen/CodeGenFunction.h b/clang/lib/CodeGen/CodeGenFunction.h
index e978cad433623..95be50a7fd436 100644
--- a/clang/lib/CodeGen/CodeGenFunction.h
+++ b/clang/lib/CodeGen/CodeGenFunction.h
@@ -4671,10 +4671,10 @@ class CodeGenFunction : public CodeGenTypeCache {
   llvm::Value *EmitTargetBuiltinExpr(unsigned BuiltinID, const CallExpr *E,
                                      ReturnValueSlot ReturnValue);
 
-  llvm::Value *EmitAArch64CompareBuiltinExpr(llvm::Value *Op, llvm::Type *Ty,
-                                             const llvm::CmpInst::Predicate Fp,
-                                             const llvm::CmpInst::Predicate Ip,
-                                             const llvm::Twine &Name = "");
+  llvm::Value *
+  EmitAArch64CompareBuiltinExpr(llvm::Value *Op, llvm::Type *Ty,
+                                const llvm::CmpInst::Predicate Pred,
+                                const llvm::Twine &Name = "");
   llvm::Value *EmitARMBuiltinExpr(unsigned BuiltinID, const CallExpr *E,
                                   ReturnValueSlot ReturnValue,
                                   llvm::Triple::ArchType Arch);
diff --git a/clang/test/CodeGen/AArch64/bf16-dotprod-intrinsics.c b/clang/test/CodeGen/AArch64/bf16-dotprod-intrinsics.c
index 877d83c0fa395..2097495b3baee 100644
--- a/clang/test/CodeGen/AArch64/bf16-dotprod-intrinsics.c
+++ b/clang/test/CodeGen/AArch64/bf16-dotprod-intrinsics.c
@@ -1,6 +1,6 @@
 // NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
 // RUN: %clang_cc1 -triple aarch64 -target-feature +neon -target-feature +bf16 \
-// RUN: -disable-O0-optnone -emit-llvm %s -o - | opt -S -passes=mem2reg | FileCheck %s
+// RUN: -disable-O0-optnone -emit-llvm %s -o - | opt -S -passes=mem2reg,sroa | FileCheck %s
 
 // REQUIRES: aarch64-registered-target || arm-registered-target
 
@@ -8,10 +8,16 @@
 
 // CHECK-LABEL: @test_vbfdot_f32(
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[TMP0:%.*]] = bitcast <2 x float> [[R:%.*]] to <8 x i8>
-// CHECK-NEXT:    [[TMP1:%.*]] = bitcast <4 x bfloat> [[A:%.*]] to <8 x i8>
-// CHECK-NEXT:    [[TMP2:%.*]] = bitcast <4 x bfloat> [[B:%.*]] to <8 x i8>
-// CHECK-NEXT:    [[VBFDOT3_I:%.*]] = call <2 x float> @llvm.aarch64.neon.bfdot....
[truncated]

jthackray

LGTM. Glad to see the "utterly horrific" code gone :)

CarolineConcatto

Hi Marian,
I am seeing some errors by the CI, but it goes away If I re-run update_cc_tests it fixes it.
But like the other files it also adds some extra check lines.

CarolineConcatto · 2025-03-20T17:32:21Z

clang/lib/CodeGen/CGBuiltin.cpp

  }
-  return Builder.CreateSExt(Op, Ty, Name);
+
+  llvm::Type *ResTy = Ty;


Why do we need that for? I removed and could not see any failing tests.

I wrote that. IIRC, the reason was that Ty could be a floating-point vector type, but the result of the compare is always an integer vector with the same number of elements.

I believe this code is necessary because LLVM comparison instructions return i1 vectors, which need to be explicitly extended to i<elem_size> for correct behavior. As for why this part could be removed without immediate failures, I’m not entirely sure — LLVM doesn’t support implicit type casts, so the issue might be masked. It’s possible we’re missing tests that cover the full lowering path from C to assembly, which would otherwise reveal that the generated LLVM IR doesn’t map to any valid instruction.

CarolineConcatto

Thank you Marian,
The patch looks good

llvm-ci · 2025-04-01T09:00:38Z

LLVM Buildbot has detected a new failure on builder llvm-clang-x86_64-gcc-ubuntu running on sie-linux-worker3 while building clang,llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/174/builds/15469

Here is the relevant piece of the build log for the reference

Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'Clang :: CodeGen/arm-neon-directed-rounding-constrained.c' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
/home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/build/bin/clang -cc1 -internal-isystem /home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/build/lib/clang/21/include -nostdsysteminc -triple thumbv8-linux-gnueabihf -target-cpu cortex-a57      -ffreestanding -disable-O0-optnone -emit-llvm /home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/llvm-project/clang/test/CodeGen/arm-neon-directed-rounding-constrained.c -o - |      /home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/build/bin/opt -S -passes=mem2reg | /home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/build/bin/FileCheck -check-prefixes=COMMON,COMMONIR,UNCONSTRAINED /home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/llvm-project/clang/test/CodeGen/arm-neon-directed-rounding-constrained.c # RUN: at line 1
+ /home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/build/bin/clang -cc1 -internal-isystem /home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/build/lib/clang/21/include -nostdsysteminc -triple thumbv8-linux-gnueabihf -target-cpu cortex-a57 -ffreestanding -disable-O0-optnone -emit-llvm /home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/llvm-project/clang/test/CodeGen/arm-neon-directed-rounding-constrained.c -o -
+ /home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/build/bin/FileCheck -check-prefixes=COMMON,COMMONIR,UNCONSTRAINED /home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/llvm-project/clang/test/CodeGen/arm-neon-directed-rounding-constrained.c
+ /home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/build/bin/opt -S -passes=mem2reg
�[1m/home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/llvm-project/clang/test/CodeGen/arm-neon-directed-rounding-constrained.c:40:14: �[0m�[0;1;31merror: �[0m�[1mCOMMONIR: expected string not found in input
�[0m// COMMONIR: [[TMP0:%.*]] = bitcast <2 x float> %a to <8 x i8>
�[0;1;32m             ^
�[0m�[1m<stdin>:7:45: �[0m�[0;1;30mnote: �[0m�[1mscanning from here
�[0mdefine dso_local <2 x float> @test_vrndi_f32(<2 x float> noundef %a) #0 {
�[0;1;32m                                            ^
�[0m�[1m<stdin>:13:8: �[0m�[0;1;30mnote: �[0m�[1mpossible intended match here
�[0m %vrndi_v.i = bitcast <8 x i8> %0 to <2 x float>
�[0;1;32m       ^
�[0m�[1m/home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/llvm-project/clang/test/CodeGen/arm-neon-directed-rounding-constrained.c:52:14: �[0m�[0;1;31merror: �[0m�[1mCOMMONIR: expected string not found in input
�[0m// COMMONIR: [[TMP0:%.*]] = bitcast <4 x float> %a to <16 x i8>
�[0;1;32m             ^
�[0m�[1m<stdin>:21:46: �[0m�[0;1;30mnote: �[0m�[1mscanning from here
�[0mdefine dso_local <4 x float> @test_vrndiq_f32(<4 x float> noundef %a) #0 {
�[0;1;32m                                             ^
�[0m�[1m<stdin>:27:9: �[0m�[0;1;30mnote: �[0m�[1mpossible intended match here
�[0m %vrndiq_v.i = bitcast <16 x i8> %0 to <4 x float>
�[0;1;32m        ^
�[0m
Input file: <stdin>
Check file: /home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/llvm-project/clang/test/CodeGen/arm-neon-directed-rounding-constrained.c

-dump-input=help explains the following input dump.

Input was:
<<<<<<
�[1m�[0m�[0;1;30m            1: �[0m�[1m�[0;1;46m; ModuleID = '<stdin>' �[0m
�[0;1;30m            2: �[0m�[1m�[0;1;46msource_filename = "/home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/llvm-project/clang/test/CodeGen/arm-neon-directed-rounding-constrained.c" �[0m
�[0;1;30m            3: �[0m�[1m�[0;1;46mtarget datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64" �[0m
�[0;1;30m            4: �[0m�[1m�[0;1;46mtarget triple = "thumbv8-unknown-linux-gnueabihf" �[0m
�[0;1;30m            5: �[0m�[1m�[0;1;46m �[0m
�[0;1;30m            6: �[0m�[1m�[0;1;46m; Function Attrs: noinline nounwind �[0m
�[0;1;30m            7: �[0m�[1m�[0;1;46mdefine dso_local <2 x float> @�[0mtest_vrndi_f32�[0;1;46m(<2 x float> noundef %a) #0 { �[0m
�[0;1;32mlabel:39'0                                   ^~~~~~~~~~~~~~
�[0m�[0;1;32mlabel:39'1                                   ^~~~~~~~~~~~~~
�[0m�[0;1;31mcheck:40'0                                                 X~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ error: no match found
�[0m�[0;1;30m            8: �[0m�[1m�[0;1;46mentry: �[0m
�[0;1;31mcheck:40'0     ~~~~~~~
�[0m�[0;1;30m            9: �[0m�[1m�[0;1;46m %__p0.addr.i = alloca <2 x float>, align 8 �[0m
�[0;1;31mcheck:40'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
�[0m�[0;1;30m           10: �[0m�[1m�[0;1;46m %ref.tmp.i = alloca <8 x i8>, align 8 �[0m
...

llvm-ci · 2025-04-01T13:01:35Z

LLVM Buildbot has detected a new failure on builder llvm-x86_64-debian-dylib running on gribozavr4 while building clang,llvm at step 6 "test-build-unified-tree-check-clang".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/60/builds/23567

Here is the relevant piece of the build log for the reference

Step 6 (test-build-unified-tree-check-clang) failure: test (failure)
******************** TEST 'Clang :: CodeGen/arm-bf16-convert-intrinsics.c' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
/b/1/llvm-x86_64-debian-dylib/build/bin/clang -cc1 -internal-isystem /b/1/llvm-x86_64-debian-dylib/build/lib/clang/21/include -nostdsysteminc    -triple aarch64 -target-feature +neon -target-feature +bf16    -disable-O0-optnone -emit-llvm -o - /b/1/llvm-x86_64-debian-dylib/llvm-project/clang/test/CodeGen/arm-bf16-convert-intrinsics.c    | /b/1/llvm-x86_64-debian-dylib/build/bin/opt -S -passes=mem2reg    | /b/1/llvm-x86_64-debian-dylib/build/bin/FileCheck --check-prefixes=CHECK,CHECK-A64 /b/1/llvm-x86_64-debian-dylib/llvm-project/clang/test/CodeGen/arm-bf16-convert-intrinsics.c # RUN: at line 2
+ /b/1/llvm-x86_64-debian-dylib/build/bin/opt -S -passes=mem2reg
+ /b/1/llvm-x86_64-debian-dylib/build/bin/clang -cc1 -internal-isystem /b/1/llvm-x86_64-debian-dylib/build/lib/clang/21/include -nostdsysteminc -triple aarch64 -target-feature +neon -target-feature +bf16 -disable-O0-optnone -emit-llvm -o - /b/1/llvm-x86_64-debian-dylib/llvm-project/clang/test/CodeGen/arm-bf16-convert-intrinsics.c
+ /b/1/llvm-x86_64-debian-dylib/build/bin/FileCheck --check-prefixes=CHECK,CHECK-A64 /b/1/llvm-x86_64-debian-dylib/llvm-project/clang/test/CodeGen/arm-bf16-convert-intrinsics.c
/b/1/llvm-x86_64-debian-dylib/llvm-project/clang/test/CodeGen/arm-bf16-convert-intrinsics.c:29:20: error: CHECK-A64-NEXT: is not on the line after the previous match
// CHECK-A64-NEXT: store <4 x bfloat> [[A:%.*]], ptr [[__REINT_808_I]], align 8
                   ^
<stdin>:13:2: note: 'next' match was here
 store <4 x bfloat> %a, ptr %__p0_808.addr.i, align 8
 ^
<stdin>:10:41: note: previous match ended here
 %ref.tmp.i = alloca <4 x i32>, align 16
                                        ^
<stdin>:11:1: note: non-matching line after previous match is here
 %__s0.i = alloca <4 x i16>, align 8
^
/b/1/llvm-x86_64-debian-dylib/llvm-project/clang/test/CodeGen/arm-bf16-convert-intrinsics.c:81:20: error: CHECK-A64-NEXT: is not on the line after the previous match
// CHECK-A64-NEXT: [[SHUFFLE_I:%.*]] = shufflevector <8 x bfloat> [[A:%.*]], <8 x bfloat> [[A]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>
                   ^
<stdin>:34:2: note: 'next' match was here
 %shuffle.i = shufflevector <8 x bfloat> %a, <8 x bfloat> %a, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
 ^
<stdin>:31:43: note: previous match ended here
 %ref.tmp.i.i = alloca <4 x i32>, align 16
                                          ^
<stdin>:32:1: note: non-matching line after previous match is here
 %__s0.i.i = alloca <4 x i16>, align 8
^
/b/1/llvm-x86_64-debian-dylib/llvm-project/clang/test/CodeGen/arm-bf16-convert-intrinsics.c:154:20: error: CHECK-A64-NEXT: is not on the line after the previous match
// CHECK-A64-NEXT: [[SHUFFLE_I:%.*]] = shufflevector <8 x bfloat> [[A:%.*]], <8 x bfloat> [[A]], <4 x i32> <i32 4, i32 5, i32 6, i32 7>
                   ^
<stdin>:56:2: note: 'next' match was here
 %shuffle.i = shufflevector <8 x bfloat> %a, <8 x bfloat> %a, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
 ^
<stdin>:53:43: note: previous match ended here
 %ref.tmp.i.i = alloca <4 x i32>, align 16
                                          ^
<stdin>:54:1: note: non-matching line after previous match is here
 %__s0.i.i = alloca <4 x i16>, align 8
^
/b/1/llvm-x86_64-debian-dylib/llvm-project/clang/test/CodeGen/arm-bf16-convert-intrinsics.c:225:20: error: CHECK-A64-NEXT: expected string not found in input
// CHECK-A64-NEXT: [[TMP0:%.*]] = bitcast <4 x float> [[A:%.*]] to <16 x i8>
                   ^
<stdin>:73:7: note: scanning from here
entry:
...

…ate (llvm#127043) Currently arm_neon.h emits C-style casts to do vector type casts. This relies on implicit conversion between vector types to be enabled, which is currently deprecated behaviour and soon will disappear. To ensure NEON code will keep working afterwards, this patch changes all this vector type casts into bitcasts. Co-authored-by: Momchil Velikov <[email protected]>

Lukacma requested review from momchil-velikov, paulwalker-arm and CarolineConcatto February 13, 2025 11:16

llvmbot added clang Clang issues not falling into any other category backend:AArch64 clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:codegen IR generation bugs: mangling, exceptions, etc. labels Feb 13, 2025

This comment was marked as off-topic.

Sign in to view

Lukacma mentioned this pull request Feb 20, 2025

[Clang][AArch64] Add fp8 variants for untyped NEON intrinsics #128019

Merged

jthackray approved these changes Feb 24, 2025

View reviewed changes

CarolineConcatto reviewed Mar 20, 2025

View reviewed changes

CarolineConcatto approved these changes Mar 31, 2025

View reviewed changes

Lukacma force-pushed the bitcasts_neon branch from f2a1ec8 to 136bc11 Compare March 31, 2025 11:05

[AARCH64][Neon] switch to using bitcasts in arm_neon.h where appropriate

27e42e1

Lukacma force-pushed the bitcasts_neon branch from 136bc11 to 27e42e1 Compare March 31, 2025 14:22

Lukacma merged commit 6c3adaa into llvm:main Apr 1, 2025
7 of 11 checks passed

Lukacma deleted the bitcasts_neon branch April 1, 2025 08:45

Lukacma added a commit to Lukacma/llvm-project that referenced this pull request Apr 1, 2025

Fix test failures caused by llvm#127043

202e230

Lukacma added a commit that referenced this pull request Apr 1, 2025

Fix test failures caused by #127043 (#133895)

518102f

Ankur-0429 pushed a commit to Ankur-0429/llvm-project that referenced this pull request Apr 2, 2025

Fix test failures caused by llvm#127043 (llvm#133895)

faa7db6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AARCH64][Neon] switch to using bitcasts in arm_neon.h where appropriate #127043

[AARCH64][Neon] switch to using bitcasts in arm_neon.h where appropriate #127043

Uh oh!

Lukacma commented Feb 13, 2025

Uh oh!

llvmbot commented Feb 13, 2025 •

edited

Loading

Uh oh!

llvmbot commented Feb 13, 2025

Uh oh!

This comment was marked as off-topic.

jthackray left a comment

Uh oh!

CarolineConcatto left a comment

Uh oh!

CarolineConcatto Mar 20, 2025

Uh oh!

momchil-velikov Mar 20, 2025

Uh oh!

Lukacma Mar 21, 2025

Uh oh!

CarolineConcatto left a comment

Uh oh!

Uh oh!

llvm-ci commented Apr 1, 2025

Uh oh!

llvm-ci commented Apr 1, 2025

Uh oh!

Uh oh!

[AARCH64][Neon] switch to using bitcasts in arm_neon.h where appropriate #127043

[AARCH64][Neon] switch to using bitcasts in arm_neon.h where appropriate #127043

Uh oh!

Conversation

Lukacma commented Feb 13, 2025

Uh oh!

llvmbot commented Feb 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Feb 13, 2025

Uh oh!

This comment was marked as off-topic.

jthackray left a comment

Choose a reason for hiding this comment

Uh oh!

CarolineConcatto left a comment

Choose a reason for hiding this comment

Uh oh!

CarolineConcatto Mar 20, 2025

Choose a reason for hiding this comment

Uh oh!

momchil-velikov Mar 20, 2025

Choose a reason for hiding this comment

Uh oh!

Lukacma Mar 21, 2025

Choose a reason for hiding this comment

Uh oh!

CarolineConcatto left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

llvm-ci commented Apr 1, 2025

Uh oh!

llvm-ci commented Apr 1, 2025

Uh oh!

Uh oh!

llvmbot commented Feb 13, 2025 •

edited

Loading