Skip to content

[AARCH64][Neon] switch to using bitcasts in arm_neon.h where appropriate #127043

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 1, 2025

Conversation

Lukacma
Copy link
Contributor

@Lukacma Lukacma commented Feb 13, 2025

Currently arm_neon.h emits C-style casts to do vector type casts. This relies on implicit conversion between vector types to be enabled, which is currently deprecated behaviour and soon will disappear. To ensure NEON code will keep working afterwards, this patch changes all this vector type casts into bitcasts.

Co-authored-by: Momchil Velikov [email protected]

@llvmbot llvmbot added clang Clang issues not falling into any other category backend:AArch64 clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:codegen IR generation bugs: mangling, exceptions, etc. labels Feb 13, 2025
@llvmbot
Copy link
Member

llvmbot commented Feb 13, 2025

@llvm/pr-subscribers-clang

@llvm/pr-subscribers-backend-aarch64

Author: None (Lukacma)

Changes

Currently arm_neon.h emits C-style casts to do vector type casts. This relies on implicit conversion between vector types to be enabled, which is currently deprecated behaviour and soon will disappear. To ensure NEON code will keep working afterwards, this patch changes all this vector type casts into bitcasts.

Co-authored-by: Momchil Velikov <[email protected]>


Patch is 6.96 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/127043.diff

48 Files Affected:

  • (modified) clang/include/clang/Basic/TargetBuiltins.h (+4)
  • (modified) clang/include/clang/Basic/arm_neon.td (+34-34)
  • (modified) clang/lib/CodeGen/CGBuiltin.cpp (+66-36)
  • (modified) clang/lib/CodeGen/CodeGenFunction.h (+4-4)
  • (modified) clang/test/CodeGen/AArch64/bf16-dotprod-intrinsics.c (+236-148)
  • (modified) clang/test/CodeGen/AArch64/bf16-getset-intrinsics.c (+17-13)
  • (modified) clang/test/CodeGen/AArch64/bf16-reinterpret-intrinsics.c (+266-186)
  • (modified) clang/test/CodeGen/AArch64/fp8-intrinsics/acle_neon_fp8_cvt.c (+30-14)
  • (modified) clang/test/CodeGen/AArch64/fp8-intrinsics/acle_neon_fp8_fdot.c (+50-34)
  • (modified) clang/test/CodeGen/AArch64/fp8-intrinsics/acle_neon_fp8_fmla.c (+50-34)
  • (modified) clang/test/CodeGen/AArch64/fp8-intrinsics/acle_neon_fp8_reinterpret.c (+96-62)
  • (modified) clang/test/CodeGen/AArch64/neon-2velem.c (+1232-594)
  • (modified) clang/test/CodeGen/AArch64/neon-extract.c (+228-145)
  • (modified) clang/test/CodeGen/AArch64/neon-fma.c (+87-59)
  • (modified) clang/test/CodeGen/AArch64/neon-fp16fml.c (+593-833)
  • (modified) clang/test/CodeGen/AArch64/neon-intrinsics-constrained.c (+1409-453)
  • (modified) clang/test/CodeGen/AArch64/neon-intrinsics.c (+16202-10053)
  • (modified) clang/test/CodeGen/AArch64/neon-ldst-one-rcpc3.c (+23-17)
  • (modified) clang/test/CodeGen/AArch64/neon-ldst-one.c (+3870-4665)
  • (modified) clang/test/CodeGen/AArch64/neon-misc-constrained.c (+78-33)
  • (modified) clang/test/CodeGen/AArch64/neon-misc.c (+2734-1396)
  • (modified) clang/test/CodeGen/AArch64/neon-perm.c (+1670-1207)
  • (modified) clang/test/CodeGen/AArch64/neon-scalar-x-indexed-elem-constrained.c (+219-89)
  • (modified) clang/test/CodeGen/AArch64/neon-scalar-x-indexed-elem.c (+401-252)
  • (modified) clang/test/CodeGen/AArch64/neon-vcmla.c (+889-425)
  • (modified) clang/test/CodeGen/AArch64/poly-add.c (+1-1)
  • (modified) clang/test/CodeGen/AArch64/poly128.c (+28-28)
  • (modified) clang/test/CodeGen/AArch64/poly64.c (+443-338)
  • (modified) clang/test/CodeGen/AArch64/v8.1a-neon-intrinsics.c (+81-17)
  • (modified) clang/test/CodeGen/AArch64/v8.2a-neon-intrinsics-constrained.c (+669-233)
  • (modified) clang/test/CodeGen/AArch64/v8.2a-neon-intrinsics-generic.c (+154-134)
  • (modified) clang/test/CodeGen/AArch64/v8.2a-neon-intrinsics.c (+773-411)
  • (modified) clang/test/CodeGen/AArch64/v8.5a-neon-frint3264-intrinsic.c (+202-49)
  • (modified) clang/test/CodeGen/AArch64/v8.6a-neon-intrinsics.c (+145-87)
  • (modified) clang/test/CodeGen/arm-bf16-dotprod-intrinsics.c (+237-149)
  • (modified) clang/test/CodeGen/arm-bf16-getset-intrinsics.c (+18-14)
  • (modified) clang/test/CodeGen/arm-neon-directed-rounding.c (+285-62)
  • (modified) clang/test/CodeGen/arm-neon-fma.c (+45-21)
  • (modified) clang/test/CodeGen/arm-neon-numeric-maxmin.c (+43-19)
  • (modified) clang/test/CodeGen/arm-neon-vcvtX.c (+73-41)
  • (modified) clang/test/CodeGen/arm-neon-vst.c (+2443-1695)
  • (modified) clang/test/CodeGen/arm64-vrnd-constrained.c (+193-26)
  • (modified) clang/test/CodeGen/arm64-vrnd.c (+115-6)
  • (modified) clang/test/CodeGen/arm64_vcreate.c (+18-3)
  • (modified) clang/test/CodeGen/arm64_vdupq_n_f64.c (+58-38)
  • (modified) clang/test/CodeGen/arm_neon_intrinsics.c (+19524-12225)
  • (modified) clang/utils/TableGen/NeonEmitter.cpp (+17-11)
  • (added) llvm/test/CodeGen/AArch64/v8.2a-neon-intrinsics-constrained.ll (+276)
diff --git a/clang/include/clang/Basic/TargetBuiltins.h b/clang/include/clang/Basic/TargetBuiltins.h
index 95eb110bb9c24..6178aded91e2a 100644
--- a/clang/include/clang/Basic/TargetBuiltins.h
+++ b/clang/include/clang/Basic/TargetBuiltins.h
@@ -225,6 +225,10 @@ namespace clang {
       EltType ET = getEltType();
       return ET == Poly8 || ET == Poly16 || ET == Poly64;
     }
+    bool isFloatingPoint() const {
+      EltType ET = getEltType();
+      return ET == Float16 || ET == Float32 || ET == Float64 || ET == BFloat16;
+    }
     bool isUnsigned() const { return (Flags & UnsignedFlag) != 0; }
     bool isQuad() const { return (Flags & QuadFlag) != 0; }
     unsigned getEltSizeInBits() const {
diff --git a/clang/include/clang/Basic/arm_neon.td b/clang/include/clang/Basic/arm_neon.td
index 3e73dd054933f..ab0051efe5159 100644
--- a/clang/include/clang/Basic/arm_neon.td
+++ b/clang/include/clang/Basic/arm_neon.td
@@ -31,8 +31,8 @@ def OP_MLAL     : Op<(op "+", $p0, (call "vmull", $p1, $p2))>;
 def OP_MULLHi   : Op<(call "vmull", (call "vget_high", $p0),
                                     (call "vget_high", $p1))>;
 def OP_MULLHi_P64 : Op<(call "vmull",
-                         (cast "poly64_t", (call "vget_high", $p0)),
-                         (cast "poly64_t", (call "vget_high", $p1)))>;
+                         (bitcast "poly64_t", (call "vget_high", $p0)),
+                         (bitcast "poly64_t", (call "vget_high", $p1)))>;
 def OP_MULLHi_N : Op<(call "vmull_n", (call "vget_high", $p0), $p1)>;
 def OP_MLALHi   : Op<(call "vmlal", $p0, (call "vget_high", $p1),
                                          (call "vget_high", $p2))>;
@@ -95,11 +95,11 @@ def OP_TRN2     : Op<(shuffle $p0, $p1, (interleave
 def OP_ZIP2     : Op<(shuffle $p0, $p1, (highhalf (interleave mask0, mask1)))>;
 def OP_UZP2     : Op<(shuffle $p0, $p1, (add (decimate (rotl mask0, 1), 2),
                                              (decimate (rotl mask1, 1), 2)))>;
-def OP_EQ       : Op<(cast "R", (op "==", $p0, $p1))>;
-def OP_GE       : Op<(cast "R", (op ">=", $p0, $p1))>;
-def OP_LE       : Op<(cast "R", (op "<=", $p0, $p1))>;
-def OP_GT       : Op<(cast "R", (op ">", $p0, $p1))>;
-def OP_LT       : Op<(cast "R", (op "<", $p0, $p1))>;
+def OP_EQ       : Op<(bitcast "R", (op "==", $p0, $p1))>;
+def OP_GE       : Op<(bitcast "R", (op ">=", $p0, $p1))>;
+def OP_LE       : Op<(bitcast "R", (op "<=", $p0, $p1))>;
+def OP_GT       : Op<(bitcast "R", (op ">", $p0, $p1))>;
+def OP_LT       : Op<(bitcast "R", (op "<", $p0, $p1))>;
 def OP_NEG      : Op<(op "-", $p0)>;
 def OP_NOT      : Op<(op "~", $p0)>;
 def OP_AND      : Op<(op "&", $p0, $p1)>;
@@ -108,20 +108,20 @@ def OP_XOR      : Op<(op "^", $p0, $p1)>;
 def OP_ANDN     : Op<(op "&", $p0, (op "~", $p1))>;
 def OP_ORN      : Op<(op "|", $p0, (op "~", $p1))>;
 def OP_CAST     : LOp<[(save_temp $promote, $p0),
-                       (cast "R", $promote)]>;
+                       (bitcast "R", $promote)]>;
 def OP_HI       : Op<(shuffle $p0, $p0, (highhalf mask0))>;
 def OP_LO       : Op<(shuffle $p0, $p0, (lowhalf mask0))>;
 def OP_CONC     : Op<(shuffle $p0, $p1, (add mask0, mask1))>;
 def OP_DUP      : Op<(dup $p0)>;
 def OP_DUP_LN   : Op<(call_mangled "splat_lane", $p0, $p1)>;
-def OP_SEL      : Op<(cast "R", (op "|",
-                                    (op "&", $p0, (cast $p0, $p1)),
-                                    (op "&", (op "~", $p0), (cast $p0, $p2))))>;
+def OP_SEL      : Op<(bitcast "R", (op "|",
+                                    (op "&", $p0, (bitcast $p0, $p1)),
+                                    (op "&", (op "~", $p0), (bitcast $p0, $p2))))>;
 def OP_REV16    : Op<(shuffle $p0, $p0, (rev 16, mask0))>;
 def OP_REV32    : Op<(shuffle $p0, $p0, (rev 32, mask0))>;
 def OP_REV64    : Op<(shuffle $p0, $p0, (rev 64, mask0))>;
 def OP_XTN      : Op<(call "vcombine", $p0, (call "vmovn", $p1))>;
-def OP_SQXTUN   : Op<(call "vcombine", (cast $p0, "U", $p0),
+def OP_SQXTUN   : Op<(call "vcombine", (bitcast $p0, "U", $p0),
                                        (call "vqmovun", $p1))>;
 def OP_QXTN     : Op<(call "vcombine", $p0, (call "vqmovn", $p1))>;
 def OP_VCVT_NA_HI_F16 : Op<(call "vcombine", $p0, (call "vcvt_f16_f32", $p1))>;
@@ -129,12 +129,12 @@ def OP_VCVT_NA_HI_F32 : Op<(call "vcombine", $p0, (call "vcvt_f32_f64", $p1))>;
 def OP_VCVT_EX_HI_F32 : Op<(call "vcvt_f32_f16", (call "vget_high", $p0))>;
 def OP_VCVT_EX_HI_F64 : Op<(call "vcvt_f64_f32", (call "vget_high", $p0))>;
 def OP_VCVTX_HI : Op<(call "vcombine", $p0, (call "vcvtx_f32", $p1))>;
-def OP_REINT    : Op<(cast "R", $p0)>;
+def OP_REINT    : Op<(bitcast "R", $p0)>;
 def OP_ADDHNHi  : Op<(call "vcombine", $p0, (call "vaddhn", $p1, $p2))>;
 def OP_RADDHNHi : Op<(call "vcombine", $p0, (call "vraddhn", $p1, $p2))>;
 def OP_SUBHNHi  : Op<(call "vcombine", $p0, (call "vsubhn", $p1, $p2))>;
 def OP_RSUBHNHi : Op<(call "vcombine", $p0, (call "vrsubhn", $p1, $p2))>;
-def OP_ABDL     : Op<(cast "R", (call "vmovl", (cast $p0, "U",
+def OP_ABDL     : Op<(bitcast "R", (call "vmovl", (bitcast $p0, "U",
                                                      (call "vabd", $p0, $p1))))>;
 def OP_ABDLHi   : Op<(call "vabdl", (call "vget_high", $p0),
                                     (call "vget_high", $p1))>;
@@ -152,15 +152,15 @@ def OP_QDMLSLHi : Op<(call "vqdmlsl", $p0, (call "vget_high", $p1),
                                            (call "vget_high", $p2))>;
 def OP_QDMLSLHi_N : Op<(call "vqdmlsl_n", $p0, (call "vget_high", $p1), $p2)>;
 def OP_DIV  : Op<(op "/", $p0, $p1)>;
-def OP_LONG_HI : Op<(cast "R", (call (name_replace "_high_", "_"),
+def OP_LONG_HI : Op<(bitcast "R", (call (name_replace "_high_", "_"),
                                                 (call "vget_high", $p0), $p1))>;
-def OP_NARROW_HI : Op<(cast "R", (call "vcombine",
-                                       (cast "R", "H", $p0),
-                                       (cast "R", "H",
+def OP_NARROW_HI : Op<(bitcast "R", (call "vcombine",
+                                       (bitcast "R", "H", $p0),
+                                       (bitcast "R", "H",
                                            (call (name_replace "_high_", "_"),
                                                  $p1, $p2))))>;
 def OP_MOVL_HI  : LOp<[(save_temp $a1, (call "vget_high", $p0)),
-                       (cast "R",
+                       (bitcast "R",
                             (call "vshll_n", $a1, (literal "int32_t", "0")))]>;
 def OP_COPY_LN : Op<(call "vset_lane", (call "vget_lane", $p2, $p3), $p0, $p1)>;
 def OP_SCALAR_MUL_LN : Op<(op "*", $p0, (call "vget_lane", $p1, $p2))>;
@@ -221,18 +221,18 @@ def OP_FMLSL_LN_Hi  : Op<(call "vfmlsl_high", $p0, $p1,
 
 def OP_USDOT_LN
     : Op<(call "vusdot", $p0, $p1,
-          (cast "8", "S", (call_mangled "splat_lane", (bitcast "int32x2_t", $p2), $p3)))>;
+          (bitcast "8", "S", (call_mangled "splat_lane", (bitcast "int32x2_t", $p2), $p3)))>;
 def OP_USDOT_LNQ
     : Op<(call "vusdot", $p0, $p1,
-          (cast "8", "S", (call_mangled "splat_lane", (bitcast "int32x4_t", $p2), $p3)))>;
+          (bitcast "8", "S", (call_mangled "splat_lane", (bitcast "int32x4_t", $p2), $p3)))>;
 
 // sudot splats the second vector and then calls vusdot
 def OP_SUDOT_LN
     : Op<(call "vusdot", $p0,
-          (cast "8", "U", (call_mangled "splat_lane", (bitcast "int32x2_t", $p2), $p3)), $p1)>;
+          (bitcast "8", "U", (call_mangled "splat_lane", (bitcast "int32x2_t", $p2), $p3)), $p1)>;
 def OP_SUDOT_LNQ
     : Op<(call "vusdot", $p0,
-          (cast "8", "U", (call_mangled "splat_lane", (bitcast "int32x4_t", $p2), $p3)), $p1)>;
+          (bitcast "8", "U", (call_mangled "splat_lane", (bitcast "int32x4_t", $p2), $p3)), $p1)>;
 
 def OP_BFDOT_LN
     : Op<(call "vbfdot", $p0, $p1,
@@ -263,7 +263,7 @@ def OP_VCVT_BF16_F32_A32
     : Op<(call "__a32_vcvt_bf16", $p0)>;
 
 def OP_VCVT_BF16_F32_LO_A32
-    : Op<(call "vcombine", (cast "bfloat16x4_t", (literal "uint64_t", "0ULL")),
+    : Op<(call "vcombine", (bitcast "bfloat16x4_t", (literal "uint64_t", "0ULL")),
                            (call "__a32_vcvt_bf16", $p0))>;
 def OP_VCVT_BF16_F32_HI_A32
     : Op<(call "vcombine", (call "__a32_vcvt_bf16", $p1),
@@ -924,12 +924,12 @@ def CFMLE  : SOpInst<"vcle", "U..", "lUldQdQlQUl", OP_LE>;
 def CFMGT  : SOpInst<"vcgt", "U..", "lUldQdQlQUl", OP_GT>;
 def CFMLT  : SOpInst<"vclt", "U..", "lUldQdQlQUl", OP_LT>;
 
-def CMEQ  : SInst<"vceqz", "U.",
+def CMEQ  : SInst<"vceqz", "U(.!)",
                   "csilfUcUsUiUlPcPlQcQsQiQlQfQUcQUsQUiQUlQPcdQdQPl">;
-def CMGE  : SInst<"vcgez", "U.", "csilfdQcQsQiQlQfQd">;
-def CMLE  : SInst<"vclez", "U.", "csilfdQcQsQiQlQfQd">;
-def CMGT  : SInst<"vcgtz", "U.", "csilfdQcQsQiQlQfQd">;
-def CMLT  : SInst<"vcltz", "U.", "csilfdQcQsQiQlQfQd">;
+def CMGE  : SInst<"vcgez", "U(.!)", "csilfdQcQsQiQlQfQd">;
+def CMLE  : SInst<"vclez", "U(.!)", "csilfdQcQsQiQlQfQd">;
+def CMGT  : SInst<"vcgtz", "U(.!)", "csilfdQcQsQiQlQfQd">;
+def CMLT  : SInst<"vcltz", "U(.!)", "csilfdQcQsQiQlQfQd">;
 
 ////////////////////////////////////////////////////////////////////////////////
 // Max/Min Integer
@@ -1667,11 +1667,11 @@ let TargetGuard = "fullfp16,neon" in {
   // ARMv8.2-A FP16 one-operand vector intrinsics.
 
   // Comparison
-  def CMEQH    : SInst<"vceqz", "U.", "hQh">;
-  def CMGEH    : SInst<"vcgez", "U.", "hQh">;
-  def CMGTH    : SInst<"vcgtz", "U.", "hQh">;
-  def CMLEH    : SInst<"vclez", "U.", "hQh">;
-  def CMLTH    : SInst<"vcltz", "U.", "hQh">;
+  def CMEQH    : SInst<"vceqz", "U(.!)", "hQh">;
+  def CMGEH    : SInst<"vcgez", "U(.!)", "hQh">;
+  def CMGTH    : SInst<"vcgtz", "U(.!)", "hQh">;
+  def CMLEH    : SInst<"vclez", "U(.!)", "hQh">;
+  def CMLTH    : SInst<"vcltz", "U(.!)", "hQh">;
 
   // Vector conversion
   def VCVT_F16     : SInst<"vcvt_f16", "F(.!)",  "sUsQsQUs">;
diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index 7ec9d59bfed5c..9a5413a964679 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -8065,8 +8065,9 @@ Value *CodeGenFunction::EmitCommonNeonBuiltinExpr(
 
   // Determine the type of this overloaded NEON intrinsic.
   NeonTypeFlags Type(NeonTypeConst->getZExtValue());
-  bool Usgn = Type.isUnsigned();
-  bool Quad = Type.isQuad();
+  const bool Usgn = Type.isUnsigned();
+  const bool Quad = Type.isQuad();
+  const bool Floating = Type.isFloatingPoint();
   const bool HasLegalHalfType = getTarget().hasLegalHalfType();
   const bool AllowBFloatArgsAndRet =
       getTargetHooks().getABIInfo().allowBFloatArgsAndRet();
@@ -8167,24 +8168,28 @@ Value *CodeGenFunction::EmitCommonNeonBuiltinExpr(
   }
   case NEON::BI__builtin_neon_vceqz_v:
   case NEON::BI__builtin_neon_vceqzq_v:
-    return EmitAArch64CompareBuiltinExpr(Ops[0], Ty, ICmpInst::FCMP_OEQ,
-                                         ICmpInst::ICMP_EQ, "vceqz");
+    return EmitAArch64CompareBuiltinExpr(
+        Ops[0], Ty, Floating ? ICmpInst::FCMP_OEQ : ICmpInst::ICMP_EQ, "vceqz");
   case NEON::BI__builtin_neon_vcgez_v:
   case NEON::BI__builtin_neon_vcgezq_v:
-    return EmitAArch64CompareBuiltinExpr(Ops[0], Ty, ICmpInst::FCMP_OGE,
-                                         ICmpInst::ICMP_SGE, "vcgez");
+    return EmitAArch64CompareBuiltinExpr(
+        Ops[0], Ty, Floating ? ICmpInst::FCMP_OGE : ICmpInst::ICMP_SGE,
+        "vcgez");
   case NEON::BI__builtin_neon_vclez_v:
   case NEON::BI__builtin_neon_vclezq_v:
-    return EmitAArch64CompareBuiltinExpr(Ops[0], Ty, ICmpInst::FCMP_OLE,
-                                         ICmpInst::ICMP_SLE, "vclez");
+    return EmitAArch64CompareBuiltinExpr(
+        Ops[0], Ty, Floating ? ICmpInst::FCMP_OLE : ICmpInst::ICMP_SLE,
+        "vclez");
   case NEON::BI__builtin_neon_vcgtz_v:
   case NEON::BI__builtin_neon_vcgtzq_v:
-    return EmitAArch64CompareBuiltinExpr(Ops[0], Ty, ICmpInst::FCMP_OGT,
-                                         ICmpInst::ICMP_SGT, "vcgtz");
+    return EmitAArch64CompareBuiltinExpr(
+        Ops[0], Ty, Floating ? ICmpInst::FCMP_OGT : ICmpInst::ICMP_SGT,
+        "vcgtz");
   case NEON::BI__builtin_neon_vcltz_v:
   case NEON::BI__builtin_neon_vcltzq_v:
-    return EmitAArch64CompareBuiltinExpr(Ops[0], Ty, ICmpInst::FCMP_OLT,
-                                         ICmpInst::ICMP_SLT, "vcltz");
+    return EmitAArch64CompareBuiltinExpr(
+        Ops[0], Ty, Floating ? ICmpInst::FCMP_OLT : ICmpInst::ICMP_SLT,
+        "vcltz");
   case NEON::BI__builtin_neon_vclz_v:
   case NEON::BI__builtin_neon_vclzq_v:
     // We generate target-independent intrinsic, which needs a second argument
@@ -8747,28 +8752,32 @@ Value *CodeGenFunction::EmitCommonNeonBuiltinExpr(
   return Builder.CreateBitCast(Result, ResultType, NameHint);
 }
 
-Value *CodeGenFunction::EmitAArch64CompareBuiltinExpr(
-    Value *Op, llvm::Type *Ty, const CmpInst::Predicate Fp,
-    const CmpInst::Predicate Ip, const Twine &Name) {
-  llvm::Type *OTy = Op->getType();
-
-  // FIXME: this is utterly horrific. We should not be looking at previous
-  // codegen context to find out what needs doing. Unfortunately TableGen
-  // currently gives us exactly the same calls for vceqz_f32 and vceqz_s32
-  // (etc).
-  if (BitCastInst *BI = dyn_cast<BitCastInst>(Op))
-    OTy = BI->getOperand(0)->getType();
-
-  Op = Builder.CreateBitCast(Op, OTy);
-  if (OTy->getScalarType()->isFloatingPointTy()) {
-    if (Fp == CmpInst::FCMP_OEQ)
-      Op = Builder.CreateFCmp(Fp, Op, Constant::getNullValue(OTy));
+Value *
+CodeGenFunction::EmitAArch64CompareBuiltinExpr(Value *Op, llvm::Type *Ty,
+                                               const CmpInst::Predicate Pred,
+                                               const Twine &Name) {
+
+  if (isa<FixedVectorType>(Ty)) {
+    // Vector types are cast to i8 vectors. Recover original type.
+    Op = Builder.CreateBitCast(Op, Ty);
+  }
+
+  if (CmpInst::isFPPredicate(Pred)) {
+    if (Pred == CmpInst::FCMP_OEQ)
+      Op = Builder.CreateFCmp(Pred, Op, Constant::getNullValue(Op->getType()));
     else
-      Op = Builder.CreateFCmpS(Fp, Op, Constant::getNullValue(OTy));
+      Op = Builder.CreateFCmpS(Pred, Op, Constant::getNullValue(Op->getType()));
   } else {
-    Op = Builder.CreateICmp(Ip, Op, Constant::getNullValue(OTy));
+    Op = Builder.CreateICmp(Pred, Op, Constant::getNullValue(Op->getType()));
   }
-  return Builder.CreateSExt(Op, Ty, Name);
+
+  llvm::Type *ResTy = Ty;
+  if (auto *VTy = dyn_cast<FixedVectorType>(Ty))
+    ResTy = FixedVectorType::get(
+        IntegerType::get(getLLVMContext(), VTy->getScalarSizeInBits()),
+        VTy->getNumElements());
+
+  return Builder.CreateSExt(Op, ResTy, Name);
 }
 
 static Value *packTBLDVectorList(CodeGenFunction &CGF, ArrayRef<Value *> Ops,
@@ -12276,45 +12285,66 @@ Value *CodeGenFunction::EmitAArch64BuiltinExpr(unsigned BuiltinID,
     return Builder.CreateFAdd(Op0, Op1, "vpaddd");
   }
   case NEON::BI__builtin_neon_vceqzd_s64:
+    Ops.push_back(EmitScalarExpr(E->getArg(0)));
+    return EmitAArch64CompareBuiltinExpr(
+        Ops[0], ConvertType(E->getCallReturnType(getContext())),
+        ICmpInst::ICMP_EQ, "vceqz");
   case NEON::BI__builtin_neon_vceqzd_f64:
   case NEON::BI__builtin_neon_vceqzs_f32:
   case NEON::BI__builtin_neon_vceqzh_f16:
     Ops.push_back(EmitScalarExpr(E->getArg(0)));
     return EmitAArch64CompareBuiltinExpr(
         Ops[0], ConvertType(E->getCallReturnType(getContext())),
-        ICmpInst::FCMP_OEQ, ICmpInst::ICMP_EQ, "vceqz");
+        ICmpInst::FCMP_OEQ, "vceqz");
   case NEON::BI__builtin_neon_vcgezd_s64:
+    Ops.push_back(EmitScalarExpr(E->getArg(0)));
+    return EmitAArch64CompareBuiltinExpr(
+        Ops[0], ConvertType(E->getCallReturnType(getContext())),
+        ICmpInst::ICMP_SGE, "vcgez");
   case NEON::BI__builtin_neon_vcgezd_f64:
   case NEON::BI__builtin_neon_vcgezs_f32:
   case NEON::BI__builtin_neon_vcgezh_f16:
     Ops.push_back(EmitScalarExpr(E->getArg(0)));
     return EmitAArch64CompareBuiltinExpr(
         Ops[0], ConvertType(E->getCallReturnType(getContext())),
-        ICmpInst::FCMP_OGE, ICmpInst::ICMP_SGE, "vcgez");
+        ICmpInst::FCMP_OGE, "vcgez");
   case NEON::BI__builtin_neon_vclezd_s64:
+    Ops.push_back(EmitScalarExpr(E->getArg(0)));
+    return EmitAArch64CompareBuiltinExpr(
+        Ops[0], ConvertType(E->getCallReturnType(getContext())),
+        ICmpInst::ICMP_SLE, "vclez");
   case NEON::BI__builtin_neon_vclezd_f64:
   case NEON::BI__builtin_neon_vclezs_f32:
   case NEON::BI__builtin_neon_vclezh_f16:
     Ops.push_back(EmitScalarExpr(E->getArg(0)));
     return EmitAArch64CompareBuiltinExpr(
         Ops[0], ConvertType(E->getCallReturnType(getContext())),
-        ICmpInst::FCMP_OLE, ICmpInst::ICMP_SLE, "vclez");
+        ICmpInst::FCMP_OLE, "vclez");
   case NEON::BI__builtin_neon_vcgtzd_s64:
+    Ops.push_back(EmitScalarExpr(E->getArg(0)));
+    return EmitAArch64CompareBuiltinExpr(
+        Ops[0], ConvertType(E->getCallReturnType(getContext())),
+        ICmpInst::ICMP_SGT, "vcgtz");
   case NEON::BI__builtin_neon_vcgtzd_f64:
   case NEON::BI__builtin_neon_vcgtzs_f32:
   case NEON::BI__builtin_neon_vcgtzh_f16:
     Ops.push_back(EmitScalarExpr(E->getArg(0)));
     return EmitAArch64CompareBuiltinExpr(
         Ops[0], ConvertType(E->getCallReturnType(getContext())),
-        ICmpInst::FCMP_OGT, ICmpInst::ICMP_SGT, "vcgtz");
+        ICmpInst::FCMP_OGT, "vcgtz");
   case NEON::BI__builtin_neon_vcltzd_s64:
+    Ops.push_back(EmitScalarExpr(E->getArg(0)));
+    return EmitAArch64CompareBuiltinExpr(
+        Ops[0], ConvertType(E->getCallReturnType(getContext())),
+        ICmpInst::ICMP_SLT, "vcltz");
+
   case NEON::BI__builtin_neon_vcltzd_f64:
   case NEON::BI__builtin_neon_vcltzs_f32:
   case NEON::BI__builtin_neon_vcltzh_f16:
     Ops.push_back(EmitScalarExpr(E->getArg(0)));
     return EmitAArch64CompareBuiltinExpr(
         Ops[0], ConvertType(E->getCallReturnType(getContext())),
-        ICmpInst::FCMP_OLT, ICmpInst::ICMP_SLT, "vcltz");
+        ICmpInst::FCMP_OLT, "vcltz");
 
   case NEON::BI__builtin_neon_vceqzd_u64: {
     Ops.push_back(EmitScalarExpr(E->getArg(0)));
diff --git a/clang/lib/CodeGen/CodeGenFunction.h b/clang/lib/CodeGen/CodeGenFunction.h
index e978cad433623..95be50a7fd436 100644
--- a/clang/lib/CodeGen/CodeGenFunction.h
+++ b/clang/lib/CodeGen/CodeGenFunction.h
@@ -4671,10 +4671,10 @@ class CodeGenFunction : public CodeGenTypeCache {
   llvm::Value *EmitTargetBuiltinExpr(unsigned BuiltinID, const CallExpr *E,
                                      ReturnValueSlot ReturnValue);
 
-  llvm::Value *EmitAArch64CompareBuiltinExpr(llvm::Value *Op, llvm::Type *Ty,
-                                             const llvm::CmpInst::Predicate Fp,
-                                             const llvm::CmpInst::Predicate Ip,
-                                             const llvm::Twine &Name = "");
+  llvm::Value *
+  EmitAArch64CompareBuiltinExpr(llvm::Value *Op, llvm::Type *Ty,
+                                const llvm::CmpInst::Predicate Pred,
+                                const llvm::Twine &Name = "");
   llvm::Value *EmitARMBuiltinExpr(unsigned BuiltinID, const CallExpr *E,
                                   ReturnValueSlot ReturnValue,
                                   llvm::Triple::ArchType Arch);
diff --git a/clang/test/CodeGen/AArch64/bf16-dotprod-intrinsics.c b/clang/test/CodeGen/AArch64/bf16-dotprod-intrinsics.c
index 877d83c0fa395..2097495b3baee 100644
--- a/clang/test/CodeGen/AArch64/bf16-dotprod-intrinsics.c
+++ b/clang/test/CodeGen/AArch64/bf16-dotprod-intrinsics.c
@@ -1,6 +1,6 @@
 // NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
 // RUN: %clang_cc1 -triple aarch64 -target-feature +neon -target-feature +bf16 \
-// RUN: -disable-O0-optnone -emit-llvm %s -o - | opt -S -passes=mem2reg | FileCheck %s
+// RUN: -disable-O0-optnone -emit-llvm %s -o - | opt -S -passes=mem2reg,sroa | FileCheck %s
 
 // REQUIRES: aarch64-registered-target || arm-registered-target
 
@@ -8,10 +8,16 @@
 
 // CHECK-LABEL: @test_vbfdot_f32(
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[TMP0:%.*]] = bitcast <2 x float> [[R:%.*]] to <8 x i8>
-// CHECK-NEXT:    [[TMP1:%.*]] = bitcast <4 x bfloat> [[A:%.*]] to <8 x i8>
-// CHECK-NEXT:    [[TMP2:%.*]] = bitcast <4 x bfloat> [[B:%.*]] to <8 x i8>
-// CHECK-NEXT:    [[VBFDOT3_I:%.*]] = call <2 x float> @llvm.aarch64.neon.bfdot....
[truncated]

@llvmbot
Copy link
Member

llvmbot commented Feb 13, 2025

@llvm/pr-subscribers-clang-codegen

Author: None (Lukacma)

Changes

Currently arm_neon.h emits C-style casts to do vector type casts. This relies on implicit conversion between vector types to be enabled, which is currently deprecated behaviour and soon will disappear. To ensure NEON code will keep working afterwards, this patch changes all this vector type casts into bitcasts.

Co-authored-by: Momchil Velikov <[email protected]>


Patch is 6.96 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/127043.diff

48 Files Affected:

  • (modified) clang/include/clang/Basic/TargetBuiltins.h (+4)
  • (modified) clang/include/clang/Basic/arm_neon.td (+34-34)
  • (modified) clang/lib/CodeGen/CGBuiltin.cpp (+66-36)
  • (modified) clang/lib/CodeGen/CodeGenFunction.h (+4-4)
  • (modified) clang/test/CodeGen/AArch64/bf16-dotprod-intrinsics.c (+236-148)
  • (modified) clang/test/CodeGen/AArch64/bf16-getset-intrinsics.c (+17-13)
  • (modified) clang/test/CodeGen/AArch64/bf16-reinterpret-intrinsics.c (+266-186)
  • (modified) clang/test/CodeGen/AArch64/fp8-intrinsics/acle_neon_fp8_cvt.c (+30-14)
  • (modified) clang/test/CodeGen/AArch64/fp8-intrinsics/acle_neon_fp8_fdot.c (+50-34)
  • (modified) clang/test/CodeGen/AArch64/fp8-intrinsics/acle_neon_fp8_fmla.c (+50-34)
  • (modified) clang/test/CodeGen/AArch64/fp8-intrinsics/acle_neon_fp8_reinterpret.c (+96-62)
  • (modified) clang/test/CodeGen/AArch64/neon-2velem.c (+1232-594)
  • (modified) clang/test/CodeGen/AArch64/neon-extract.c (+228-145)
  • (modified) clang/test/CodeGen/AArch64/neon-fma.c (+87-59)
  • (modified) clang/test/CodeGen/AArch64/neon-fp16fml.c (+593-833)
  • (modified) clang/test/CodeGen/AArch64/neon-intrinsics-constrained.c (+1409-453)
  • (modified) clang/test/CodeGen/AArch64/neon-intrinsics.c (+16202-10053)
  • (modified) clang/test/CodeGen/AArch64/neon-ldst-one-rcpc3.c (+23-17)
  • (modified) clang/test/CodeGen/AArch64/neon-ldst-one.c (+3870-4665)
  • (modified) clang/test/CodeGen/AArch64/neon-misc-constrained.c (+78-33)
  • (modified) clang/test/CodeGen/AArch64/neon-misc.c (+2734-1396)
  • (modified) clang/test/CodeGen/AArch64/neon-perm.c (+1670-1207)
  • (modified) clang/test/CodeGen/AArch64/neon-scalar-x-indexed-elem-constrained.c (+219-89)
  • (modified) clang/test/CodeGen/AArch64/neon-scalar-x-indexed-elem.c (+401-252)
  • (modified) clang/test/CodeGen/AArch64/neon-vcmla.c (+889-425)
  • (modified) clang/test/CodeGen/AArch64/poly-add.c (+1-1)
  • (modified) clang/test/CodeGen/AArch64/poly128.c (+28-28)
  • (modified) clang/test/CodeGen/AArch64/poly64.c (+443-338)
  • (modified) clang/test/CodeGen/AArch64/v8.1a-neon-intrinsics.c (+81-17)
  • (modified) clang/test/CodeGen/AArch64/v8.2a-neon-intrinsics-constrained.c (+669-233)
  • (modified) clang/test/CodeGen/AArch64/v8.2a-neon-intrinsics-generic.c (+154-134)
  • (modified) clang/test/CodeGen/AArch64/v8.2a-neon-intrinsics.c (+773-411)
  • (modified) clang/test/CodeGen/AArch64/v8.5a-neon-frint3264-intrinsic.c (+202-49)
  • (modified) clang/test/CodeGen/AArch64/v8.6a-neon-intrinsics.c (+145-87)
  • (modified) clang/test/CodeGen/arm-bf16-dotprod-intrinsics.c (+237-149)
  • (modified) clang/test/CodeGen/arm-bf16-getset-intrinsics.c (+18-14)
  • (modified) clang/test/CodeGen/arm-neon-directed-rounding.c (+285-62)
  • (modified) clang/test/CodeGen/arm-neon-fma.c (+45-21)
  • (modified) clang/test/CodeGen/arm-neon-numeric-maxmin.c (+43-19)
  • (modified) clang/test/CodeGen/arm-neon-vcvtX.c (+73-41)
  • (modified) clang/test/CodeGen/arm-neon-vst.c (+2443-1695)
  • (modified) clang/test/CodeGen/arm64-vrnd-constrained.c (+193-26)
  • (modified) clang/test/CodeGen/arm64-vrnd.c (+115-6)
  • (modified) clang/test/CodeGen/arm64_vcreate.c (+18-3)
  • (modified) clang/test/CodeGen/arm64_vdupq_n_f64.c (+58-38)
  • (modified) clang/test/CodeGen/arm_neon_intrinsics.c (+19524-12225)
  • (modified) clang/utils/TableGen/NeonEmitter.cpp (+17-11)
  • (added) llvm/test/CodeGen/AArch64/v8.2a-neon-intrinsics-constrained.ll (+276)
diff --git a/clang/include/clang/Basic/TargetBuiltins.h b/clang/include/clang/Basic/TargetBuiltins.h
index 95eb110bb9c24..6178aded91e2a 100644
--- a/clang/include/clang/Basic/TargetBuiltins.h
+++ b/clang/include/clang/Basic/TargetBuiltins.h
@@ -225,6 +225,10 @@ namespace clang {
       EltType ET = getEltType();
       return ET == Poly8 || ET == Poly16 || ET == Poly64;
     }
+    bool isFloatingPoint() const {
+      EltType ET = getEltType();
+      return ET == Float16 || ET == Float32 || ET == Float64 || ET == BFloat16;
+    }
     bool isUnsigned() const { return (Flags & UnsignedFlag) != 0; }
     bool isQuad() const { return (Flags & QuadFlag) != 0; }
     unsigned getEltSizeInBits() const {
diff --git a/clang/include/clang/Basic/arm_neon.td b/clang/include/clang/Basic/arm_neon.td
index 3e73dd054933f..ab0051efe5159 100644
--- a/clang/include/clang/Basic/arm_neon.td
+++ b/clang/include/clang/Basic/arm_neon.td
@@ -31,8 +31,8 @@ def OP_MLAL     : Op<(op "+", $p0, (call "vmull", $p1, $p2))>;
 def OP_MULLHi   : Op<(call "vmull", (call "vget_high", $p0),
                                     (call "vget_high", $p1))>;
 def OP_MULLHi_P64 : Op<(call "vmull",
-                         (cast "poly64_t", (call "vget_high", $p0)),
-                         (cast "poly64_t", (call "vget_high", $p1)))>;
+                         (bitcast "poly64_t", (call "vget_high", $p0)),
+                         (bitcast "poly64_t", (call "vget_high", $p1)))>;
 def OP_MULLHi_N : Op<(call "vmull_n", (call "vget_high", $p0), $p1)>;
 def OP_MLALHi   : Op<(call "vmlal", $p0, (call "vget_high", $p1),
                                          (call "vget_high", $p2))>;
@@ -95,11 +95,11 @@ def OP_TRN2     : Op<(shuffle $p0, $p1, (interleave
 def OP_ZIP2     : Op<(shuffle $p0, $p1, (highhalf (interleave mask0, mask1)))>;
 def OP_UZP2     : Op<(shuffle $p0, $p1, (add (decimate (rotl mask0, 1), 2),
                                              (decimate (rotl mask1, 1), 2)))>;
-def OP_EQ       : Op<(cast "R", (op "==", $p0, $p1))>;
-def OP_GE       : Op<(cast "R", (op ">=", $p0, $p1))>;
-def OP_LE       : Op<(cast "R", (op "<=", $p0, $p1))>;
-def OP_GT       : Op<(cast "R", (op ">", $p0, $p1))>;
-def OP_LT       : Op<(cast "R", (op "<", $p0, $p1))>;
+def OP_EQ       : Op<(bitcast "R", (op "==", $p0, $p1))>;
+def OP_GE       : Op<(bitcast "R", (op ">=", $p0, $p1))>;
+def OP_LE       : Op<(bitcast "R", (op "<=", $p0, $p1))>;
+def OP_GT       : Op<(bitcast "R", (op ">", $p0, $p1))>;
+def OP_LT       : Op<(bitcast "R", (op "<", $p0, $p1))>;
 def OP_NEG      : Op<(op "-", $p0)>;
 def OP_NOT      : Op<(op "~", $p0)>;
 def OP_AND      : Op<(op "&", $p0, $p1)>;
@@ -108,20 +108,20 @@ def OP_XOR      : Op<(op "^", $p0, $p1)>;
 def OP_ANDN     : Op<(op "&", $p0, (op "~", $p1))>;
 def OP_ORN      : Op<(op "|", $p0, (op "~", $p1))>;
 def OP_CAST     : LOp<[(save_temp $promote, $p0),
-                       (cast "R", $promote)]>;
+                       (bitcast "R", $promote)]>;
 def OP_HI       : Op<(shuffle $p0, $p0, (highhalf mask0))>;
 def OP_LO       : Op<(shuffle $p0, $p0, (lowhalf mask0))>;
 def OP_CONC     : Op<(shuffle $p0, $p1, (add mask0, mask1))>;
 def OP_DUP      : Op<(dup $p0)>;
 def OP_DUP_LN   : Op<(call_mangled "splat_lane", $p0, $p1)>;
-def OP_SEL      : Op<(cast "R", (op "|",
-                                    (op "&", $p0, (cast $p0, $p1)),
-                                    (op "&", (op "~", $p0), (cast $p0, $p2))))>;
+def OP_SEL      : Op<(bitcast "R", (op "|",
+                                    (op "&", $p0, (bitcast $p0, $p1)),
+                                    (op "&", (op "~", $p0), (bitcast $p0, $p2))))>;
 def OP_REV16    : Op<(shuffle $p0, $p0, (rev 16, mask0))>;
 def OP_REV32    : Op<(shuffle $p0, $p0, (rev 32, mask0))>;
 def OP_REV64    : Op<(shuffle $p0, $p0, (rev 64, mask0))>;
 def OP_XTN      : Op<(call "vcombine", $p0, (call "vmovn", $p1))>;
-def OP_SQXTUN   : Op<(call "vcombine", (cast $p0, "U", $p0),
+def OP_SQXTUN   : Op<(call "vcombine", (bitcast $p0, "U", $p0),
                                        (call "vqmovun", $p1))>;
 def OP_QXTN     : Op<(call "vcombine", $p0, (call "vqmovn", $p1))>;
 def OP_VCVT_NA_HI_F16 : Op<(call "vcombine", $p0, (call "vcvt_f16_f32", $p1))>;
@@ -129,12 +129,12 @@ def OP_VCVT_NA_HI_F32 : Op<(call "vcombine", $p0, (call "vcvt_f32_f64", $p1))>;
 def OP_VCVT_EX_HI_F32 : Op<(call "vcvt_f32_f16", (call "vget_high", $p0))>;
 def OP_VCVT_EX_HI_F64 : Op<(call "vcvt_f64_f32", (call "vget_high", $p0))>;
 def OP_VCVTX_HI : Op<(call "vcombine", $p0, (call "vcvtx_f32", $p1))>;
-def OP_REINT    : Op<(cast "R", $p0)>;
+def OP_REINT    : Op<(bitcast "R", $p0)>;
 def OP_ADDHNHi  : Op<(call "vcombine", $p0, (call "vaddhn", $p1, $p2))>;
 def OP_RADDHNHi : Op<(call "vcombine", $p0, (call "vraddhn", $p1, $p2))>;
 def OP_SUBHNHi  : Op<(call "vcombine", $p0, (call "vsubhn", $p1, $p2))>;
 def OP_RSUBHNHi : Op<(call "vcombine", $p0, (call "vrsubhn", $p1, $p2))>;
-def OP_ABDL     : Op<(cast "R", (call "vmovl", (cast $p0, "U",
+def OP_ABDL     : Op<(bitcast "R", (call "vmovl", (bitcast $p0, "U",
                                                      (call "vabd", $p0, $p1))))>;
 def OP_ABDLHi   : Op<(call "vabdl", (call "vget_high", $p0),
                                     (call "vget_high", $p1))>;
@@ -152,15 +152,15 @@ def OP_QDMLSLHi : Op<(call "vqdmlsl", $p0, (call "vget_high", $p1),
                                            (call "vget_high", $p2))>;
 def OP_QDMLSLHi_N : Op<(call "vqdmlsl_n", $p0, (call "vget_high", $p1), $p2)>;
 def OP_DIV  : Op<(op "/", $p0, $p1)>;
-def OP_LONG_HI : Op<(cast "R", (call (name_replace "_high_", "_"),
+def OP_LONG_HI : Op<(bitcast "R", (call (name_replace "_high_", "_"),
                                                 (call "vget_high", $p0), $p1))>;
-def OP_NARROW_HI : Op<(cast "R", (call "vcombine",
-                                       (cast "R", "H", $p0),
-                                       (cast "R", "H",
+def OP_NARROW_HI : Op<(bitcast "R", (call "vcombine",
+                                       (bitcast "R", "H", $p0),
+                                       (bitcast "R", "H",
                                            (call (name_replace "_high_", "_"),
                                                  $p1, $p2))))>;
 def OP_MOVL_HI  : LOp<[(save_temp $a1, (call "vget_high", $p0)),
-                       (cast "R",
+                       (bitcast "R",
                             (call "vshll_n", $a1, (literal "int32_t", "0")))]>;
 def OP_COPY_LN : Op<(call "vset_lane", (call "vget_lane", $p2, $p3), $p0, $p1)>;
 def OP_SCALAR_MUL_LN : Op<(op "*", $p0, (call "vget_lane", $p1, $p2))>;
@@ -221,18 +221,18 @@ def OP_FMLSL_LN_Hi  : Op<(call "vfmlsl_high", $p0, $p1,
 
 def OP_USDOT_LN
     : Op<(call "vusdot", $p0, $p1,
-          (cast "8", "S", (call_mangled "splat_lane", (bitcast "int32x2_t", $p2), $p3)))>;
+          (bitcast "8", "S", (call_mangled "splat_lane", (bitcast "int32x2_t", $p2), $p3)))>;
 def OP_USDOT_LNQ
     : Op<(call "vusdot", $p0, $p1,
-          (cast "8", "S", (call_mangled "splat_lane", (bitcast "int32x4_t", $p2), $p3)))>;
+          (bitcast "8", "S", (call_mangled "splat_lane", (bitcast "int32x4_t", $p2), $p3)))>;
 
 // sudot splats the second vector and then calls vusdot
 def OP_SUDOT_LN
     : Op<(call "vusdot", $p0,
-          (cast "8", "U", (call_mangled "splat_lane", (bitcast "int32x2_t", $p2), $p3)), $p1)>;
+          (bitcast "8", "U", (call_mangled "splat_lane", (bitcast "int32x2_t", $p2), $p3)), $p1)>;
 def OP_SUDOT_LNQ
     : Op<(call "vusdot", $p0,
-          (cast "8", "U", (call_mangled "splat_lane", (bitcast "int32x4_t", $p2), $p3)), $p1)>;
+          (bitcast "8", "U", (call_mangled "splat_lane", (bitcast "int32x4_t", $p2), $p3)), $p1)>;
 
 def OP_BFDOT_LN
     : Op<(call "vbfdot", $p0, $p1,
@@ -263,7 +263,7 @@ def OP_VCVT_BF16_F32_A32
     : Op<(call "__a32_vcvt_bf16", $p0)>;
 
 def OP_VCVT_BF16_F32_LO_A32
-    : Op<(call "vcombine", (cast "bfloat16x4_t", (literal "uint64_t", "0ULL")),
+    : Op<(call "vcombine", (bitcast "bfloat16x4_t", (literal "uint64_t", "0ULL")),
                            (call "__a32_vcvt_bf16", $p0))>;
 def OP_VCVT_BF16_F32_HI_A32
     : Op<(call "vcombine", (call "__a32_vcvt_bf16", $p1),
@@ -924,12 +924,12 @@ def CFMLE  : SOpInst<"vcle", "U..", "lUldQdQlQUl", OP_LE>;
 def CFMGT  : SOpInst<"vcgt", "U..", "lUldQdQlQUl", OP_GT>;
 def CFMLT  : SOpInst<"vclt", "U..", "lUldQdQlQUl", OP_LT>;
 
-def CMEQ  : SInst<"vceqz", "U.",
+def CMEQ  : SInst<"vceqz", "U(.!)",
                   "csilfUcUsUiUlPcPlQcQsQiQlQfQUcQUsQUiQUlQPcdQdQPl">;
-def CMGE  : SInst<"vcgez", "U.", "csilfdQcQsQiQlQfQd">;
-def CMLE  : SInst<"vclez", "U.", "csilfdQcQsQiQlQfQd">;
-def CMGT  : SInst<"vcgtz", "U.", "csilfdQcQsQiQlQfQd">;
-def CMLT  : SInst<"vcltz", "U.", "csilfdQcQsQiQlQfQd">;
+def CMGE  : SInst<"vcgez", "U(.!)", "csilfdQcQsQiQlQfQd">;
+def CMLE  : SInst<"vclez", "U(.!)", "csilfdQcQsQiQlQfQd">;
+def CMGT  : SInst<"vcgtz", "U(.!)", "csilfdQcQsQiQlQfQd">;
+def CMLT  : SInst<"vcltz", "U(.!)", "csilfdQcQsQiQlQfQd">;
 
 ////////////////////////////////////////////////////////////////////////////////
 // Max/Min Integer
@@ -1667,11 +1667,11 @@ let TargetGuard = "fullfp16,neon" in {
   // ARMv8.2-A FP16 one-operand vector intrinsics.
 
   // Comparison
-  def CMEQH    : SInst<"vceqz", "U.", "hQh">;
-  def CMGEH    : SInst<"vcgez", "U.", "hQh">;
-  def CMGTH    : SInst<"vcgtz", "U.", "hQh">;
-  def CMLEH    : SInst<"vclez", "U.", "hQh">;
-  def CMLTH    : SInst<"vcltz", "U.", "hQh">;
+  def CMEQH    : SInst<"vceqz", "U(.!)", "hQh">;
+  def CMGEH    : SInst<"vcgez", "U(.!)", "hQh">;
+  def CMGTH    : SInst<"vcgtz", "U(.!)", "hQh">;
+  def CMLEH    : SInst<"vclez", "U(.!)", "hQh">;
+  def CMLTH    : SInst<"vcltz", "U(.!)", "hQh">;
 
   // Vector conversion
   def VCVT_F16     : SInst<"vcvt_f16", "F(.!)",  "sUsQsQUs">;
diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index 7ec9d59bfed5c..9a5413a964679 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -8065,8 +8065,9 @@ Value *CodeGenFunction::EmitCommonNeonBuiltinExpr(
 
   // Determine the type of this overloaded NEON intrinsic.
   NeonTypeFlags Type(NeonTypeConst->getZExtValue());
-  bool Usgn = Type.isUnsigned();
-  bool Quad = Type.isQuad();
+  const bool Usgn = Type.isUnsigned();
+  const bool Quad = Type.isQuad();
+  const bool Floating = Type.isFloatingPoint();
   const bool HasLegalHalfType = getTarget().hasLegalHalfType();
   const bool AllowBFloatArgsAndRet =
       getTargetHooks().getABIInfo().allowBFloatArgsAndRet();
@@ -8167,24 +8168,28 @@ Value *CodeGenFunction::EmitCommonNeonBuiltinExpr(
   }
   case NEON::BI__builtin_neon_vceqz_v:
   case NEON::BI__builtin_neon_vceqzq_v:
-    return EmitAArch64CompareBuiltinExpr(Ops[0], Ty, ICmpInst::FCMP_OEQ,
-                                         ICmpInst::ICMP_EQ, "vceqz");
+    return EmitAArch64CompareBuiltinExpr(
+        Ops[0], Ty, Floating ? ICmpInst::FCMP_OEQ : ICmpInst::ICMP_EQ, "vceqz");
   case NEON::BI__builtin_neon_vcgez_v:
   case NEON::BI__builtin_neon_vcgezq_v:
-    return EmitAArch64CompareBuiltinExpr(Ops[0], Ty, ICmpInst::FCMP_OGE,
-                                         ICmpInst::ICMP_SGE, "vcgez");
+    return EmitAArch64CompareBuiltinExpr(
+        Ops[0], Ty, Floating ? ICmpInst::FCMP_OGE : ICmpInst::ICMP_SGE,
+        "vcgez");
   case NEON::BI__builtin_neon_vclez_v:
   case NEON::BI__builtin_neon_vclezq_v:
-    return EmitAArch64CompareBuiltinExpr(Ops[0], Ty, ICmpInst::FCMP_OLE,
-                                         ICmpInst::ICMP_SLE, "vclez");
+    return EmitAArch64CompareBuiltinExpr(
+        Ops[0], Ty, Floating ? ICmpInst::FCMP_OLE : ICmpInst::ICMP_SLE,
+        "vclez");
   case NEON::BI__builtin_neon_vcgtz_v:
   case NEON::BI__builtin_neon_vcgtzq_v:
-    return EmitAArch64CompareBuiltinExpr(Ops[0], Ty, ICmpInst::FCMP_OGT,
-                                         ICmpInst::ICMP_SGT, "vcgtz");
+    return EmitAArch64CompareBuiltinExpr(
+        Ops[0], Ty, Floating ? ICmpInst::FCMP_OGT : ICmpInst::ICMP_SGT,
+        "vcgtz");
   case NEON::BI__builtin_neon_vcltz_v:
   case NEON::BI__builtin_neon_vcltzq_v:
-    return EmitAArch64CompareBuiltinExpr(Ops[0], Ty, ICmpInst::FCMP_OLT,
-                                         ICmpInst::ICMP_SLT, "vcltz");
+    return EmitAArch64CompareBuiltinExpr(
+        Ops[0], Ty, Floating ? ICmpInst::FCMP_OLT : ICmpInst::ICMP_SLT,
+        "vcltz");
   case NEON::BI__builtin_neon_vclz_v:
   case NEON::BI__builtin_neon_vclzq_v:
     // We generate target-independent intrinsic, which needs a second argument
@@ -8747,28 +8752,32 @@ Value *CodeGenFunction::EmitCommonNeonBuiltinExpr(
   return Builder.CreateBitCast(Result, ResultType, NameHint);
 }
 
-Value *CodeGenFunction::EmitAArch64CompareBuiltinExpr(
-    Value *Op, llvm::Type *Ty, const CmpInst::Predicate Fp,
-    const CmpInst::Predicate Ip, const Twine &Name) {
-  llvm::Type *OTy = Op->getType();
-
-  // FIXME: this is utterly horrific. We should not be looking at previous
-  // codegen context to find out what needs doing. Unfortunately TableGen
-  // currently gives us exactly the same calls for vceqz_f32 and vceqz_s32
-  // (etc).
-  if (BitCastInst *BI = dyn_cast<BitCastInst>(Op))
-    OTy = BI->getOperand(0)->getType();
-
-  Op = Builder.CreateBitCast(Op, OTy);
-  if (OTy->getScalarType()->isFloatingPointTy()) {
-    if (Fp == CmpInst::FCMP_OEQ)
-      Op = Builder.CreateFCmp(Fp, Op, Constant::getNullValue(OTy));
+Value *
+CodeGenFunction::EmitAArch64CompareBuiltinExpr(Value *Op, llvm::Type *Ty,
+                                               const CmpInst::Predicate Pred,
+                                               const Twine &Name) {
+
+  if (isa<FixedVectorType>(Ty)) {
+    // Vector types are cast to i8 vectors. Recover original type.
+    Op = Builder.CreateBitCast(Op, Ty);
+  }
+
+  if (CmpInst::isFPPredicate(Pred)) {
+    if (Pred == CmpInst::FCMP_OEQ)
+      Op = Builder.CreateFCmp(Pred, Op, Constant::getNullValue(Op->getType()));
     else
-      Op = Builder.CreateFCmpS(Fp, Op, Constant::getNullValue(OTy));
+      Op = Builder.CreateFCmpS(Pred, Op, Constant::getNullValue(Op->getType()));
   } else {
-    Op = Builder.CreateICmp(Ip, Op, Constant::getNullValue(OTy));
+    Op = Builder.CreateICmp(Pred, Op, Constant::getNullValue(Op->getType()));
   }
-  return Builder.CreateSExt(Op, Ty, Name);
+
+  llvm::Type *ResTy = Ty;
+  if (auto *VTy = dyn_cast<FixedVectorType>(Ty))
+    ResTy = FixedVectorType::get(
+        IntegerType::get(getLLVMContext(), VTy->getScalarSizeInBits()),
+        VTy->getNumElements());
+
+  return Builder.CreateSExt(Op, ResTy, Name);
 }
 
 static Value *packTBLDVectorList(CodeGenFunction &CGF, ArrayRef<Value *> Ops,
@@ -12276,45 +12285,66 @@ Value *CodeGenFunction::EmitAArch64BuiltinExpr(unsigned BuiltinID,
     return Builder.CreateFAdd(Op0, Op1, "vpaddd");
   }
   case NEON::BI__builtin_neon_vceqzd_s64:
+    Ops.push_back(EmitScalarExpr(E->getArg(0)));
+    return EmitAArch64CompareBuiltinExpr(
+        Ops[0], ConvertType(E->getCallReturnType(getContext())),
+        ICmpInst::ICMP_EQ, "vceqz");
   case NEON::BI__builtin_neon_vceqzd_f64:
   case NEON::BI__builtin_neon_vceqzs_f32:
   case NEON::BI__builtin_neon_vceqzh_f16:
     Ops.push_back(EmitScalarExpr(E->getArg(0)));
     return EmitAArch64CompareBuiltinExpr(
         Ops[0], ConvertType(E->getCallReturnType(getContext())),
-        ICmpInst::FCMP_OEQ, ICmpInst::ICMP_EQ, "vceqz");
+        ICmpInst::FCMP_OEQ, "vceqz");
   case NEON::BI__builtin_neon_vcgezd_s64:
+    Ops.push_back(EmitScalarExpr(E->getArg(0)));
+    return EmitAArch64CompareBuiltinExpr(
+        Ops[0], ConvertType(E->getCallReturnType(getContext())),
+        ICmpInst::ICMP_SGE, "vcgez");
   case NEON::BI__builtin_neon_vcgezd_f64:
   case NEON::BI__builtin_neon_vcgezs_f32:
   case NEON::BI__builtin_neon_vcgezh_f16:
     Ops.push_back(EmitScalarExpr(E->getArg(0)));
     return EmitAArch64CompareBuiltinExpr(
         Ops[0], ConvertType(E->getCallReturnType(getContext())),
-        ICmpInst::FCMP_OGE, ICmpInst::ICMP_SGE, "vcgez");
+        ICmpInst::FCMP_OGE, "vcgez");
   case NEON::BI__builtin_neon_vclezd_s64:
+    Ops.push_back(EmitScalarExpr(E->getArg(0)));
+    return EmitAArch64CompareBuiltinExpr(
+        Ops[0], ConvertType(E->getCallReturnType(getContext())),
+        ICmpInst::ICMP_SLE, "vclez");
   case NEON::BI__builtin_neon_vclezd_f64:
   case NEON::BI__builtin_neon_vclezs_f32:
   case NEON::BI__builtin_neon_vclezh_f16:
     Ops.push_back(EmitScalarExpr(E->getArg(0)));
     return EmitAArch64CompareBuiltinExpr(
         Ops[0], ConvertType(E->getCallReturnType(getContext())),
-        ICmpInst::FCMP_OLE, ICmpInst::ICMP_SLE, "vclez");
+        ICmpInst::FCMP_OLE, "vclez");
   case NEON::BI__builtin_neon_vcgtzd_s64:
+    Ops.push_back(EmitScalarExpr(E->getArg(0)));
+    return EmitAArch64CompareBuiltinExpr(
+        Ops[0], ConvertType(E->getCallReturnType(getContext())),
+        ICmpInst::ICMP_SGT, "vcgtz");
   case NEON::BI__builtin_neon_vcgtzd_f64:
   case NEON::BI__builtin_neon_vcgtzs_f32:
   case NEON::BI__builtin_neon_vcgtzh_f16:
     Ops.push_back(EmitScalarExpr(E->getArg(0)));
     return EmitAArch64CompareBuiltinExpr(
         Ops[0], ConvertType(E->getCallReturnType(getContext())),
-        ICmpInst::FCMP_OGT, ICmpInst::ICMP_SGT, "vcgtz");
+        ICmpInst::FCMP_OGT, "vcgtz");
   case NEON::BI__builtin_neon_vcltzd_s64:
+    Ops.push_back(EmitScalarExpr(E->getArg(0)));
+    return EmitAArch64CompareBuiltinExpr(
+        Ops[0], ConvertType(E->getCallReturnType(getContext())),
+        ICmpInst::ICMP_SLT, "vcltz");
+
   case NEON::BI__builtin_neon_vcltzd_f64:
   case NEON::BI__builtin_neon_vcltzs_f32:
   case NEON::BI__builtin_neon_vcltzh_f16:
     Ops.push_back(EmitScalarExpr(E->getArg(0)));
     return EmitAArch64CompareBuiltinExpr(
         Ops[0], ConvertType(E->getCallReturnType(getContext())),
-        ICmpInst::FCMP_OLT, ICmpInst::ICMP_SLT, "vcltz");
+        ICmpInst::FCMP_OLT, "vcltz");
 
   case NEON::BI__builtin_neon_vceqzd_u64: {
     Ops.push_back(EmitScalarExpr(E->getArg(0)));
diff --git a/clang/lib/CodeGen/CodeGenFunction.h b/clang/lib/CodeGen/CodeGenFunction.h
index e978cad433623..95be50a7fd436 100644
--- a/clang/lib/CodeGen/CodeGenFunction.h
+++ b/clang/lib/CodeGen/CodeGenFunction.h
@@ -4671,10 +4671,10 @@ class CodeGenFunction : public CodeGenTypeCache {
   llvm::Value *EmitTargetBuiltinExpr(unsigned BuiltinID, const CallExpr *E,
                                      ReturnValueSlot ReturnValue);
 
-  llvm::Value *EmitAArch64CompareBuiltinExpr(llvm::Value *Op, llvm::Type *Ty,
-                                             const llvm::CmpInst::Predicate Fp,
-                                             const llvm::CmpInst::Predicate Ip,
-                                             const llvm::Twine &Name = "");
+  llvm::Value *
+  EmitAArch64CompareBuiltinExpr(llvm::Value *Op, llvm::Type *Ty,
+                                const llvm::CmpInst::Predicate Pred,
+                                const llvm::Twine &Name = "");
   llvm::Value *EmitARMBuiltinExpr(unsigned BuiltinID, const CallExpr *E,
                                   ReturnValueSlot ReturnValue,
                                   llvm::Triple::ArchType Arch);
diff --git a/clang/test/CodeGen/AArch64/bf16-dotprod-intrinsics.c b/clang/test/CodeGen/AArch64/bf16-dotprod-intrinsics.c
index 877d83c0fa395..2097495b3baee 100644
--- a/clang/test/CodeGen/AArch64/bf16-dotprod-intrinsics.c
+++ b/clang/test/CodeGen/AArch64/bf16-dotprod-intrinsics.c
@@ -1,6 +1,6 @@
 // NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
 // RUN: %clang_cc1 -triple aarch64 -target-feature +neon -target-feature +bf16 \
-// RUN: -disable-O0-optnone -emit-llvm %s -o - | opt -S -passes=mem2reg | FileCheck %s
+// RUN: -disable-O0-optnone -emit-llvm %s -o - | opt -S -passes=mem2reg,sroa | FileCheck %s
 
 // REQUIRES: aarch64-registered-target || arm-registered-target
 
@@ -8,10 +8,16 @@
 
 // CHECK-LABEL: @test_vbfdot_f32(
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[TMP0:%.*]] = bitcast <2 x float> [[R:%.*]] to <8 x i8>
-// CHECK-NEXT:    [[TMP1:%.*]] = bitcast <4 x bfloat> [[A:%.*]] to <8 x i8>
-// CHECK-NEXT:    [[TMP2:%.*]] = bitcast <4 x bfloat> [[B:%.*]] to <8 x i8>
-// CHECK-NEXT:    [[VBFDOT3_I:%.*]] = call <2 x float> @llvm.aarch64.neon.bfdot....
[truncated]

This comment was marked as off-topic.

Copy link
Contributor

@jthackray jthackray left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Glad to see the "utterly horrific" code gone :)

Copy link
Contributor

@CarolineConcatto CarolineConcatto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Marian,
I am seeing some errors by the CI, but it goes away If I re-run update_cc_tests it fixes it.
But like the other files it also adds some extra check lines.

}
return Builder.CreateSExt(Op, Ty, Name);

llvm::Type *ResTy = Ty;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need that for? I removed and could not see any failing tests.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote that. IIRC, the reason was that Ty could be a floating-point vector type, but the result of the compare is always an integer vector with the same number of elements.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this code is necessary because LLVM comparison instructions return i1 vectors, which need to be explicitly extended to i<elem_size> for correct behavior. As for why this part could be removed without immediate failures, I’m not entirely sure — LLVM doesn’t support implicit type casts, so the issue might be masked. It’s possible we’re missing tests that cover the full lowering path from C to assembly, which would otherwise reveal that the generated LLVM IR doesn’t map to any valid instruction.

Copy link
Contributor

@CarolineConcatto CarolineConcatto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Marian,
The patch looks good

@Lukacma Lukacma merged commit 6c3adaa into llvm:main Apr 1, 2025
7 of 11 checks passed
@Lukacma Lukacma deleted the bitcasts_neon branch April 1, 2025 08:45
@llvm-ci
Copy link
Collaborator

llvm-ci commented Apr 1, 2025

LLVM Buildbot has detected a new failure on builder llvm-clang-x86_64-gcc-ubuntu running on sie-linux-worker3 while building clang,llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/174/builds/15469

Here is the relevant piece of the build log for the reference
Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'Clang :: CodeGen/arm-neon-directed-rounding-constrained.c' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
/home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/build/bin/clang -cc1 -internal-isystem /home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/build/lib/clang/21/include -nostdsysteminc -triple thumbv8-linux-gnueabihf -target-cpu cortex-a57      -ffreestanding -disable-O0-optnone -emit-llvm /home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/llvm-project/clang/test/CodeGen/arm-neon-directed-rounding-constrained.c -o - |      /home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/build/bin/opt -S -passes=mem2reg | /home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/build/bin/FileCheck -check-prefixes=COMMON,COMMONIR,UNCONSTRAINED /home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/llvm-project/clang/test/CodeGen/arm-neon-directed-rounding-constrained.c # RUN: at line 1
+ /home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/build/bin/clang -cc1 -internal-isystem /home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/build/lib/clang/21/include -nostdsysteminc -triple thumbv8-linux-gnueabihf -target-cpu cortex-a57 -ffreestanding -disable-O0-optnone -emit-llvm /home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/llvm-project/clang/test/CodeGen/arm-neon-directed-rounding-constrained.c -o -
+ /home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/build/bin/FileCheck -check-prefixes=COMMON,COMMONIR,UNCONSTRAINED /home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/llvm-project/clang/test/CodeGen/arm-neon-directed-rounding-constrained.c
+ /home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/build/bin/opt -S -passes=mem2reg
�[1m/home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/llvm-project/clang/test/CodeGen/arm-neon-directed-rounding-constrained.c:40:14: �[0m�[0;1;31merror: �[0m�[1mCOMMONIR: expected string not found in input
�[0m// COMMONIR: [[TMP0:%.*]] = bitcast <2 x float> %a to <8 x i8>
�[0;1;32m             ^
�[0m�[1m<stdin>:7:45: �[0m�[0;1;30mnote: �[0m�[1mscanning from here
�[0mdefine dso_local <2 x float> @test_vrndi_f32(<2 x float> noundef %a) #0 {
�[0;1;32m                                            ^
�[0m�[1m<stdin>:13:8: �[0m�[0;1;30mnote: �[0m�[1mpossible intended match here
�[0m %vrndi_v.i = bitcast <8 x i8> %0 to <2 x float>
�[0;1;32m       ^
�[0m�[1m/home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/llvm-project/clang/test/CodeGen/arm-neon-directed-rounding-constrained.c:52:14: �[0m�[0;1;31merror: �[0m�[1mCOMMONIR: expected string not found in input
�[0m// COMMONIR: [[TMP0:%.*]] = bitcast <4 x float> %a to <16 x i8>
�[0;1;32m             ^
�[0m�[1m<stdin>:21:46: �[0m�[0;1;30mnote: �[0m�[1mscanning from here
�[0mdefine dso_local <4 x float> @test_vrndiq_f32(<4 x float> noundef %a) #0 {
�[0;1;32m                                             ^
�[0m�[1m<stdin>:27:9: �[0m�[0;1;30mnote: �[0m�[1mpossible intended match here
�[0m %vrndiq_v.i = bitcast <16 x i8> %0 to <4 x float>
�[0;1;32m        ^
�[0m
Input file: <stdin>
Check file: /home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/llvm-project/clang/test/CodeGen/arm-neon-directed-rounding-constrained.c

-dump-input=help explains the following input dump.

Input was:
<<<<<<
�[1m�[0m�[0;1;30m            1: �[0m�[1m�[0;1;46m; ModuleID = '<stdin>' �[0m
�[0;1;30m            2: �[0m�[1m�[0;1;46msource_filename = "/home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/llvm-project/clang/test/CodeGen/arm-neon-directed-rounding-constrained.c" �[0m
�[0;1;30m            3: �[0m�[1m�[0;1;46mtarget datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64" �[0m
�[0;1;30m            4: �[0m�[1m�[0;1;46mtarget triple = "thumbv8-unknown-linux-gnueabihf" �[0m
�[0;1;30m            5: �[0m�[1m�[0;1;46m �[0m
�[0;1;30m            6: �[0m�[1m�[0;1;46m; Function Attrs: noinline nounwind �[0m
�[0;1;30m            7: �[0m�[1m�[0;1;46mdefine dso_local <2 x float> @�[0mtest_vrndi_f32�[0;1;46m(<2 x float> noundef %a) #0 { �[0m
�[0;1;32mlabel:39'0                                   ^~~~~~~~~~~~~~
�[0m�[0;1;32mlabel:39'1                                   ^~~~~~~~~~~~~~
�[0m�[0;1;31mcheck:40'0                                                 X~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ error: no match found
�[0m�[0;1;30m            8: �[0m�[1m�[0;1;46mentry: �[0m
�[0;1;31mcheck:40'0     ~~~~~~~
�[0m�[0;1;30m            9: �[0m�[1m�[0;1;46m %__p0.addr.i = alloca <2 x float>, align 8 �[0m
�[0;1;31mcheck:40'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
�[0m�[0;1;30m           10: �[0m�[1m�[0;1;46m %ref.tmp.i = alloca <8 x i8>, align 8 �[0m
...

Lukacma added a commit to Lukacma/llvm-project that referenced this pull request Apr 1, 2025
Lukacma added a commit that referenced this pull request Apr 1, 2025
@llvm-ci
Copy link
Collaborator

llvm-ci commented Apr 1, 2025

LLVM Buildbot has detected a new failure on builder llvm-x86_64-debian-dylib running on gribozavr4 while building clang,llvm at step 6 "test-build-unified-tree-check-clang".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/60/builds/23567

Here is the relevant piece of the build log for the reference
Step 6 (test-build-unified-tree-check-clang) failure: test (failure)
******************** TEST 'Clang :: CodeGen/arm-bf16-convert-intrinsics.c' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
/b/1/llvm-x86_64-debian-dylib/build/bin/clang -cc1 -internal-isystem /b/1/llvm-x86_64-debian-dylib/build/lib/clang/21/include -nostdsysteminc    -triple aarch64 -target-feature +neon -target-feature +bf16    -disable-O0-optnone -emit-llvm -o - /b/1/llvm-x86_64-debian-dylib/llvm-project/clang/test/CodeGen/arm-bf16-convert-intrinsics.c    | /b/1/llvm-x86_64-debian-dylib/build/bin/opt -S -passes=mem2reg    | /b/1/llvm-x86_64-debian-dylib/build/bin/FileCheck --check-prefixes=CHECK,CHECK-A64 /b/1/llvm-x86_64-debian-dylib/llvm-project/clang/test/CodeGen/arm-bf16-convert-intrinsics.c # RUN: at line 2
+ /b/1/llvm-x86_64-debian-dylib/build/bin/opt -S -passes=mem2reg
+ /b/1/llvm-x86_64-debian-dylib/build/bin/clang -cc1 -internal-isystem /b/1/llvm-x86_64-debian-dylib/build/lib/clang/21/include -nostdsysteminc -triple aarch64 -target-feature +neon -target-feature +bf16 -disable-O0-optnone -emit-llvm -o - /b/1/llvm-x86_64-debian-dylib/llvm-project/clang/test/CodeGen/arm-bf16-convert-intrinsics.c
+ /b/1/llvm-x86_64-debian-dylib/build/bin/FileCheck --check-prefixes=CHECK,CHECK-A64 /b/1/llvm-x86_64-debian-dylib/llvm-project/clang/test/CodeGen/arm-bf16-convert-intrinsics.c
/b/1/llvm-x86_64-debian-dylib/llvm-project/clang/test/CodeGen/arm-bf16-convert-intrinsics.c:29:20: error: CHECK-A64-NEXT: is not on the line after the previous match
// CHECK-A64-NEXT: store <4 x bfloat> [[A:%.*]], ptr [[__REINT_808_I]], align 8
                   ^
<stdin>:13:2: note: 'next' match was here
 store <4 x bfloat> %a, ptr %__p0_808.addr.i, align 8
 ^
<stdin>:10:41: note: previous match ended here
 %ref.tmp.i = alloca <4 x i32>, align 16
                                        ^
<stdin>:11:1: note: non-matching line after previous match is here
 %__s0.i = alloca <4 x i16>, align 8
^
/b/1/llvm-x86_64-debian-dylib/llvm-project/clang/test/CodeGen/arm-bf16-convert-intrinsics.c:81:20: error: CHECK-A64-NEXT: is not on the line after the previous match
// CHECK-A64-NEXT: [[SHUFFLE_I:%.*]] = shufflevector <8 x bfloat> [[A:%.*]], <8 x bfloat> [[A]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>
                   ^
<stdin>:34:2: note: 'next' match was here
 %shuffle.i = shufflevector <8 x bfloat> %a, <8 x bfloat> %a, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
 ^
<stdin>:31:43: note: previous match ended here
 %ref.tmp.i.i = alloca <4 x i32>, align 16
                                          ^
<stdin>:32:1: note: non-matching line after previous match is here
 %__s0.i.i = alloca <4 x i16>, align 8
^
/b/1/llvm-x86_64-debian-dylib/llvm-project/clang/test/CodeGen/arm-bf16-convert-intrinsics.c:154:20: error: CHECK-A64-NEXT: is not on the line after the previous match
// CHECK-A64-NEXT: [[SHUFFLE_I:%.*]] = shufflevector <8 x bfloat> [[A:%.*]], <8 x bfloat> [[A]], <4 x i32> <i32 4, i32 5, i32 6, i32 7>
                   ^
<stdin>:56:2: note: 'next' match was here
 %shuffle.i = shufflevector <8 x bfloat> %a, <8 x bfloat> %a, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
 ^
<stdin>:53:43: note: previous match ended here
 %ref.tmp.i.i = alloca <4 x i32>, align 16
                                          ^
<stdin>:54:1: note: non-matching line after previous match is here
 %__s0.i.i = alloca <4 x i16>, align 8
^
/b/1/llvm-x86_64-debian-dylib/llvm-project/clang/test/CodeGen/arm-bf16-convert-intrinsics.c:225:20: error: CHECK-A64-NEXT: expected string not found in input
// CHECK-A64-NEXT: [[TMP0:%.*]] = bitcast <4 x float> [[A:%.*]] to <16 x i8>
                   ^
<stdin>:73:7: note: scanning from here
entry:
...

Ankur-0429 pushed a commit to Ankur-0429/llvm-project that referenced this pull request Apr 2, 2025
…ate (llvm#127043)

Currently arm_neon.h emits C-style casts to do vector type casts. This
relies on implicit conversion between vector types to be enabled, which
is currently deprecated behaviour and soon will disappear. To ensure
NEON code will keep working afterwards, this patch changes all this
vector type casts into bitcasts.


Co-authored-by: Momchil Velikov <[email protected]>
Ankur-0429 pushed a commit to Ankur-0429/llvm-project that referenced this pull request Apr 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:AArch64 clang:codegen IR generation bugs: mangling, exceptions, etc. clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants