[AArch64][GlobalISel] Legalize G_VECREDUCE_{MIN/MAX} #69461

chuongg3 · 2023-10-18T13:32:58Z

Legalizes G_VECREDUCE_{MIN/MAX} and selects instructions for vecreduce_{min/max}

llvmbot · 2023-10-18T13:34:12Z

@llvm/pr-subscribers-backend-aarch64

@llvm/pr-subscribers-llvm-globalisel

Author: None (chuongg3)

Changes

Legalizes G_VECREDUCE_{MIN/MAX} and selects instructions for vecreduce_{min/max}

Patch is 70.64 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/69461.diff

5 Files Affected:

(modified) llvm/lib/Target/AArch64/AArch64InstrGISel.td (+5)
(modified) llvm/lib/Target/AArch64/AArch64InstrInfo.td (+36)
(modified) llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp (+15)
(modified) llvm/test/CodeGen/AArch64/GlobalISel/legalizer-info-validation.mir (+11-8)
(modified) llvm/test/CodeGen/AArch64/aarch64-minmaxv.ll (+1787-82)

diff --git a/llvm/lib/Target/AArch64/AArch64InstrGISel.td b/llvm/lib/Target/AArch64/AArch64InstrGISel.td
index 27338bd24393325..a3e8b1fff32eee9 100644
--- a/llvm/lib/Target/AArch64/AArch64InstrGISel.td
+++ b/llvm/lib/Target/AArch64/AArch64InstrGISel.td
@@ -274,6 +274,11 @@ def : GINodeEquiv<G_EXTRACT_VECTOR_ELT, vector_extract>;
 
 def : GINodeEquiv<G_PREFETCH, AArch64Prefetch>;
 
+def : GINodeEquiv<G_VECREDUCE_UMIN, vecreduce_umin>;
+def : GINodeEquiv<G_VECREDUCE_UMAX, vecreduce_umax>;
+def : GINodeEquiv<G_VECREDUCE_SMIN, vecreduce_smin>;
+def : GINodeEquiv<G_VECREDUCE_SMAX, vecreduce_smax>;
+
 // These are patterns that we only use for GlobalISel via the importer.
 def : Pat<(f32 (fadd (vector_extract (v2f32 FPR64:$Rn), (i64 0)),
                      (vector_extract (v2f32 FPR64:$Rn), (i64 1)))),
diff --git a/llvm/lib/Target/AArch64/AArch64InstrInfo.td b/llvm/lib/Target/AArch64/AArch64InstrInfo.td
index df59dc4ad27fadb..d1b23da7dbfac6c 100644
--- a/llvm/lib/Target/AArch64/AArch64InstrInfo.td
+++ b/llvm/lib/Target/AArch64/AArch64InstrInfo.td
@@ -6642,6 +6642,42 @@ defm : SIMDAcrossLanesUnsignedIntrinsic<"UMINV", AArch64uminv>;
 def : Pat<(v2i32 (AArch64uminv (v2i32 V64:$Rn))),
           (UMINPv2i32 V64:$Rn, V64:$Rn)>;
 
+// For vecreduce_{opc}
+multiclass SIMDAcrossLanesVecReductionIntrinsic<string baseOpc,
+                                            SDPatternOperator opNode> {
+def : Pat<(i8 (opNode (v8i8 FPR64:$Rn))),
+          (!cast<Instruction>(!strconcat(baseOpc, "v8i8v")) FPR64:$Rn)>;
+
+def : Pat<(i8 (opNode (v16i8 FPR128:$Rn))),
+          (!cast<Instruction>(!strconcat(baseOpc, "v16i8v")) FPR128:$Rn)>;
+
+def : Pat<(i16 (opNode (v4i16 FPR64:$Rn))),
+          (!cast<Instruction>(!strconcat(baseOpc, "v4i16v")) FPR64:$Rn)>;
+
+def : Pat<(i16 (opNode (v8i16 FPR128:$Rn))),
+          (!cast<Instruction>(!strconcat(baseOpc, "v8i16v")) FPR128:$Rn)>;
+
+def : Pat<(i32 (opNode (v4i32 V128:$Rn))), 
+          (!cast<Instruction>(!strconcat(baseOpc, "v4i32v")) V128:$Rn)>;
+
+}
+
+defm : SIMDAcrossLanesVecReductionIntrinsic<"UMINV", vecreduce_umin>;
+def : Pat<(i32 (vecreduce_umin (v2i32 V64:$Rn))), 
+          (i32 (EXTRACT_SUBREG (UMINPv2i32 V64:$Rn, V64:$Rn), ssub))>;
+
+defm : SIMDAcrossLanesVecReductionIntrinsic<"UMAXV", vecreduce_umax>;
+def : Pat<(i32 (vecreduce_umax (v2i32 V64:$Rn))), 
+          (i32 (EXTRACT_SUBREG (UMAXPv2i32 V64:$Rn, V64:$Rn), ssub))>;
+
+defm : SIMDAcrossLanesVecReductionIntrinsic<"SMINV", vecreduce_smin>;
+def : Pat<(i32 (vecreduce_smin (v2i32 V64:$Rn))), 
+          (i32 (EXTRACT_SUBREG (SMINPv2i32 V64:$Rn, V64:$Rn), ssub))>;
+
+defm : SIMDAcrossLanesVecReductionIntrinsic<"SMAXV", vecreduce_smax>;
+def : Pat<(i32 (vecreduce_smax (v2i32 V64:$Rn))), 
+          (i32 (EXTRACT_SUBREG (SMAXPv2i32 V64:$Rn, V64:$Rn), ssub))>;
+
 multiclass SIMDAcrossLanesSignedLongIntrinsic<string baseOpc, Intrinsic intOp> {
   def : Pat<(i32 (intOp (v8i8 V64:$Rn))),
         (i32 (SMOVvi16to32
diff --git a/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp b/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
index ddc27bebb767693..3c0d61f3ba4213b 100644
--- a/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
+++ b/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
@@ -902,6 +902,21 @@ AArch64LegalizerInfo::AArch64LegalizerInfo(const AArch64Subtarget &ST)
       .scalarize(1)
       .lower();
 
+  getActionDefinitionsBuilder(
+      {G_VECREDUCE_SMIN, G_VECREDUCE_SMAX, G_VECREDUCE_UMIN, G_VECREDUCE_UMAX})
+      .legalFor({{s8, v8s8},
+                 {s8, v16s8},
+                 {s16, v4s16},
+                 {s16, v8s16},
+                 {s32, v2s32},
+                 {s32, v4s32}})
+      .clampMaxNumElements(1, s64, 2)
+      .clampMaxNumElements(1, s32, 4)
+      .clampMaxNumElements(1, s16, 8)
+      .clampMaxNumElements(1, s8, 16)
+      .scalarize(1)
+      .lower();
+
   getActionDefinitionsBuilder(
       {G_VECREDUCE_OR, G_VECREDUCE_AND, G_VECREDUCE_XOR})
       // Try to break down into smaller vectors as long as they're at least 64
diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/legalizer-info-validation.mir b/llvm/test/CodeGen/AArch64/GlobalISel/legalizer-info-validation.mir
index 549f36b2afd066f..745ba70140d42d4 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/legalizer-info-validation.mir
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/legalizer-info-validation.mir
@@ -768,17 +768,20 @@
 # DEBUG-NEXT: .. type index coverage check SKIPPED: user-defined predicate detected
 # DEBUG-NEXT: .. imm index coverage check SKIPPED: user-defined predicate detected
 # DEBUG-NEXT: G_VECREDUCE_SMAX (opcode {{[0-9]+}}): 2 type indices, 0 imm indices
-# DEBUG-NEXT: .. type index coverage check SKIPPED: no rules defined
-# DEBUG-NEXT: .. imm index coverage check SKIPPED: no rules defined
+# DEBUG-NEXT: .. opcode {{[0-9]+}} is aliased to {{[0-9]+}}
+# DEBUG-NEXT: .. type index coverage check SKIPPED: user-defined predicate detected
+# DEBUG-NEXT: .. imm index coverage check SKIPPED: user-defined predicate detected
 # DEBUG-NEXT: G_VECREDUCE_SMIN (opcode {{[0-9]+}}): 2 type indices, 0 imm indices
-# DEBUG-NEXT: .. type index coverage check SKIPPED: no rules defined
-# DEBUG-NEXT: .. imm index coverage check SKIPPED: no rules defined
+# DEBUG-NEXT: .. type index coverage check SKIPPED: user-defined predicate detected
+# DEBUG-NEXT: .. imm index coverage check SKIPPED: user-defined predicate detected
 # DEBUG-NEXT: G_VECREDUCE_UMAX (opcode {{[0-9]+}}): 2 type indices, 0 imm indices
-# DEBUG-NEXT: .. type index coverage check SKIPPED: no rules defined
-# DEBUG-NEXT: .. imm index coverage check SKIPPED: no rules defined
+# DEBUG-NEXT: .. opcode {{[0-9]+}} is aliased to {{[0-9]+}}
+# DEBUG-NEXT: .. type index coverage check SKIPPED: user-defined predicate detected
+# DEBUG-NEXT: .. imm index coverage check SKIPPED: user-defined predicate detected
 # DEBUG-NEXT: G_VECREDUCE_UMIN (opcode {{[0-9]+}}): 2 type indices, 0 imm indices
-# DEBUG-NEXT: .. type index coverage check SKIPPED: no rules defined
-# DEBUG-NEXT: .. imm index coverage check SKIPPED: no rules defined
+# DEBUG-NEXT: .. opcode {{[0-9]+}} is aliased to {{[0-9]+}}
+# DEBUG-NEXT: .. type index coverage check SKIPPED: user-defined predicate detected
+# DEBUG-NEXT: .. imm index coverage check SKIPPED: user-defined predicate detected
 # DEBUG-NEXT: G_SBFX (opcode {{[0-9]+}}): 2 type indices, 0 imm indices
 # DEBUG-NEXT: .. the first uncovered type index: 2, OK
 # DEBUG-NEXT: .. the first uncovered imm index: 0, OK
diff --git a/llvm/test/CodeGen/AArch64/aarch64-minmaxv.ll b/llvm/test/CodeGen/AArch64/aarch64-minmaxv.ll
index f5d7d330b45c449..df35b4ecb3d6623 100644
--- a/llvm/test/CodeGen/AArch64/aarch64-minmaxv.ll
+++ b/llvm/test/CodeGen/AArch64/aarch64-minmaxv.ll
@@ -1,228 +1,1933 @@
-; RUN: llc < %s -mtriple=aarch64-linux--gnu -aarch64-neon-syntax=generic | FileCheck %s
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 3
+; RUN: llc -mtriple=aarch64 -verify-machineinstrs %s -o - 2>&1 | FileCheck %s --check-prefixes=CHECK,CHECK-SD
+; RUN: llc -mtriple=aarch64 -global-isel -global-isel-abort=2 -verify-machineinstrs %s -o - 2>&1 | FileCheck %s --check-prefixes=CHECK,CHECK-GI
 
 target datalayout = "e-m:e-i64:64-i128:128-n32:64-S128"
 
-declare i8 @llvm.vector.reduce.smax.v16i8(<16 x i8>)
-declare i16 @llvm.vector.reduce.smax.v8i16(<8 x i16>)
-declare i32 @llvm.vector.reduce.smax.v4i32(<4 x i32>)
-declare i8 @llvm.vector.reduce.umax.v16i8(<16 x i8>)
-declare i16 @llvm.vector.reduce.umax.v8i16(<8 x i16>)
-declare i32 @llvm.vector.reduce.umax.v4i32(<4 x i32>)
+; CHECK-GI:         warning: Instruction selection used fallback path for sminv_v3i64
+; CHECK-GI-NEXT:    warning: Instruction selection used fallback path for smaxv_v3i64
+; CHECK-GI-NEXT:    warning: Instruction selection used fallback path for uminv_v3i64
+; CHECK-GI-NEXT:    warning: Instruction selection used fallback path for umaxv_v3i64
 
+declare i8 @llvm.vector.reduce.smin.v2i8(<2 x i8>)
+declare i8 @llvm.vector.reduce.smin.v3i8(<3 x i8>)
+declare i8 @llvm.vector.reduce.smin.v4i8(<4 x i8>)
+declare i8 @llvm.vector.reduce.smin.v8i8(<8 x i8>)
 declare i8 @llvm.vector.reduce.smin.v16i8(<16 x i8>)
+declare i8 @llvm.vector.reduce.smin.v32i8(<32 x i8>)
+declare i16 @llvm.vector.reduce.smin.v2i16(<2 x i16>)
+declare i16 @llvm.vector.reduce.smin.v3i16(<3 x i16>)
+declare i16 @llvm.vector.reduce.smin.v4i16(<4 x i16>)
 declare i16 @llvm.vector.reduce.smin.v8i16(<8 x i16>)
+declare i16 @llvm.vector.reduce.smin.v16i16(<16 x i16>)
+declare i32 @llvm.vector.reduce.smin.v2i32(<2 x i32>)
+declare i32 @llvm.vector.reduce.smin.v3i32(<3 x i32>)
 declare i32 @llvm.vector.reduce.smin.v4i32(<4 x i32>)
+declare i32 @llvm.vector.reduce.smin.v8i32(<8 x i32>)
+declare i32 @llvm.vector.reduce.smin.v16i32(<16 x i32>)
+declare i64 @llvm.vector.reduce.smin.v2i64(<2 x i64>)
+declare i64 @llvm.vector.reduce.smin.v3i64(<3 x i64>)
+declare i64 @llvm.vector.reduce.smin.v4i64(<4 x i64>)
+declare i128 @llvm.vector.reduce.smin.v2i128(<2 x i128>)
+declare i8 @llvm.vector.reduce.smax.v2i8(<2 x i8>)
+declare i8 @llvm.vector.reduce.smax.v3i8(<3 x i8>)
+declare i8 @llvm.vector.reduce.smax.v4i8(<4 x i8>)
+declare i8 @llvm.vector.reduce.smax.v8i8(<8 x i8>)
+declare i8 @llvm.vector.reduce.smax.v16i8(<16 x i8>)
+declare i8 @llvm.vector.reduce.smax.v32i8(<32 x i8>)
+declare i16 @llvm.vector.reduce.smax.v2i16(<2 x i16>)
+declare i16 @llvm.vector.reduce.smax.v3i16(<3 x i16>)
+declare i16 @llvm.vector.reduce.smax.v4i16(<4 x i16>)
+declare i16 @llvm.vector.reduce.smax.v8i16(<8 x i16>)
+declare i16 @llvm.vector.reduce.smax.v16i16(<16 x i16>)
+declare i32 @llvm.vector.reduce.smax.v2i32(<2 x i32>)
+declare i32 @llvm.vector.reduce.smax.v3i32(<3 x i32>)
+declare i32 @llvm.vector.reduce.smax.v4i32(<4 x i32>)
+declare i32 @llvm.vector.reduce.smax.v8i32(<8 x i32>)
+declare i32 @llvm.vector.reduce.smax.v16i32(<16 x i32>)
+declare i64 @llvm.vector.reduce.smax.v2i64(<2 x i64>)
+declare i64 @llvm.vector.reduce.smax.v3i64(<3 x i64>)
+declare i64 @llvm.vector.reduce.smax.v4i64(<4 x i64>)
+declare i128 @llvm.vector.reduce.smax.v2i128(<2 x i128>)
+declare i8 @llvm.vector.reduce.umin.v2i8(<2 x i8>)
+declare i8 @llvm.vector.reduce.umin.v3i8(<3 x i8>)
+declare i8 @llvm.vector.reduce.umin.v4i8(<4 x i8>)
+declare i8 @llvm.vector.reduce.umin.v8i8(<8 x i8>)
 declare i8 @llvm.vector.reduce.umin.v16i8(<16 x i8>)
+declare i8 @llvm.vector.reduce.umin.v32i8(<32 x i8>)
+declare i16 @llvm.vector.reduce.umin.v2i16(<2 x i16>)
+declare i16 @llvm.vector.reduce.umin.v3i16(<3 x i16>)
+declare i16 @llvm.vector.reduce.umin.v4i16(<4 x i16>)
 declare i16 @llvm.vector.reduce.umin.v8i16(<8 x i16>)
+declare i16 @llvm.vector.reduce.umin.v16i16(<16 x i16>)
+declare i32 @llvm.vector.reduce.umin.v2i32(<2 x i32>)
+declare i32 @llvm.vector.reduce.umin.v3i32(<3 x i32>)
 declare i32 @llvm.vector.reduce.umin.v4i32(<4 x i32>)
+declare i32 @llvm.vector.reduce.umin.v8i32(<8 x i32>)
+declare i32 @llvm.vector.reduce.umin.v16i32(<16 x i32>)
+declare i64 @llvm.vector.reduce.umin.v2i64(<2 x i64>)
+declare i64 @llvm.vector.reduce.umin.v3i64(<3 x i64>)
+declare i64 @llvm.vector.reduce.umin.v4i64(<4 x i64>)
+declare i128 @llvm.vector.reduce.umin.v2i128(<2 x i128>)
+declare i8 @llvm.vector.reduce.umax.v2i8(<2 x i8>)
+declare i8 @llvm.vector.reduce.umax.v3i8(<3 x i8>)
+declare i8 @llvm.vector.reduce.umax.v4i8(<4 x i8>)
+declare i8 @llvm.vector.reduce.umax.v8i8(<8 x i8>)
+declare i8 @llvm.vector.reduce.umax.v16i8(<16 x i8>)
+declare i8 @llvm.vector.reduce.umax.v32i8(<32 x i8>)
+declare i16 @llvm.vector.reduce.umax.v2i16(<2 x i16>)
+declare i16 @llvm.vector.reduce.umax.v3i16(<3 x i16>)
+declare i16 @llvm.vector.reduce.umax.v4i16(<4 x i16>)
+declare i16 @llvm.vector.reduce.umax.v8i16(<8 x i16>)
+declare i16 @llvm.vector.reduce.umax.v16i16(<16 x i16>)
+declare i32 @llvm.vector.reduce.umax.v2i32(<2 x i32>)
+declare i32 @llvm.vector.reduce.umax.v3i32(<3 x i32>)
+declare i32 @llvm.vector.reduce.umax.v4i32(<4 x i32>)
+declare i32 @llvm.vector.reduce.umax.v8i32(<8 x i32>)
+declare i32 @llvm.vector.reduce.umax.v16i32(<16 x i32>)
+declare i64 @llvm.vector.reduce.umax.v2i64(<2 x i64>)
+declare i64 @llvm.vector.reduce.umax.v3i64(<3 x i64>)
+declare i64 @llvm.vector.reduce.umax.v4i64(<4 x i64>)
+declare i128 @llvm.vector.reduce.umax.v2i128(<2 x i128>)
 
 declare float @llvm.vector.reduce.fmax.v4f32(<4 x float>)
 declare float @llvm.vector.reduce.fmin.v4f32(<4 x float>)
 
-; CHECK-LABEL: smax_B
-; CHECK: smaxv {{b[0-9]+}}, {{v[0-9]+}}.16b
 define i8 @smax_B(ptr nocapture readonly %arr)  {
+; CHECK-LABEL: smax_B:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    ldr q0, [x0]
+; CHECK-NEXT:    smaxv b0, v0.16b
+; CHECK-NEXT:    fmov w0, s0
+; CHECK-NEXT:    ret
   %arr.load = load <16 x i8>, ptr %arr
   %r = call i8 @llvm.vector.reduce.smax.v16i8(<16 x i8> %arr.load)
   ret i8 %r
 }
 
-; CHECK-LABEL: smax_H
-; CHECK: smaxv {{h[0-9]+}}, {{v[0-9]+}}.8h
 define i16 @smax_H(ptr nocapture readonly %arr) {
+; CHECK-LABEL: smax_H:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    ldr q0, [x0]
+; CHECK-NEXT:    smaxv h0, v0.8h
+; CHECK-NEXT:    fmov w0, s0
+; CHECK-NEXT:    ret
   %arr.load = load <8 x i16>, ptr %arr
   %r = call i16 @llvm.vector.reduce.smax.v8i16(<8 x i16> %arr.load)
   ret i16 %r
 }
 
-; CHECK-LABEL: smax_S
-; CHECK: smaxv {{s[0-9]+}}, {{v[0-9]+}}.4s
 define i32 @smax_S(ptr nocapture readonly %arr)  {
+; CHECK-LABEL: smax_S:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    ldr q0, [x0]
+; CHECK-NEXT:    smaxv s0, v0.4s
+; CHECK-NEXT:    fmov w0, s0
+; CHECK-NEXT:    ret
   %arr.load = load <4 x i32>, ptr %arr
   %r = call i32 @llvm.vector.reduce.smax.v4i32(<4 x i32> %arr.load)
   ret i32 %r
 }
 
-; CHECK-LABEL: umax_B
-; CHECK: umaxv {{b[0-9]+}}, {{v[0-9]+}}.16b
 define i8 @umax_B(ptr nocapture readonly %arr)  {
+; CHECK-LABEL: umax_B:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    ldr q0, [x0]
+; CHECK-NEXT:    umaxv b0, v0.16b
+; CHECK-NEXT:    fmov w0, s0
+; CHECK-NEXT:    ret
   %arr.load = load <16 x i8>, ptr %arr
   %r = call i8 @llvm.vector.reduce.umax.v16i8(<16 x i8> %arr.load)
   ret i8 %r
 }
 
-; CHECK-LABEL: umax_H
-; CHECK: umaxv {{h[0-9]+}}, {{v[0-9]+}}.8h
 define i16 @umax_H(ptr nocapture readonly %arr)  {
+; CHECK-LABEL: umax_H:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    ldr q0, [x0]
+; CHECK-NEXT:    umaxv h0, v0.8h
+; CHECK-NEXT:    fmov w0, s0
+; CHECK-NEXT:    ret
   %arr.load = load <8 x i16>, ptr %arr
   %r = call i16 @llvm.vector.reduce.umax.v8i16(<8 x i16> %arr.load)
   ret i16 %r
 }
 
-; CHECK-LABEL: umax_S
-; CHECK: umaxv {{s[0-9]+}}, {{v[0-9]+}}.4s
 define i32 @umax_S(ptr nocapture readonly %arr) {
+; CHECK-LABEL: umax_S:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    ldr q0, [x0]
+; CHECK-NEXT:    umaxv s0, v0.4s
+; CHECK-NEXT:    fmov w0, s0
+; CHECK-NEXT:    ret
   %arr.load = load <4 x i32>, ptr %arr
   %r = call i32 @llvm.vector.reduce.umax.v4i32(<4 x i32> %arr.load)
   ret i32 %r
 }
 
-; CHECK-LABEL: smin_B
-; CHECK: sminv {{b[0-9]+}}, {{v[0-9]+}}.16b
 define i8 @smin_B(ptr nocapture readonly %arr) {
+; CHECK-LABEL: smin_B:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    ldr q0, [x0]
+; CHECK-NEXT:    sminv b0, v0.16b
+; CHECK-NEXT:    fmov w0, s0
+; CHECK-NEXT:    ret
   %arr.load = load <16 x i8>, ptr %arr
   %r = call i8 @llvm.vector.reduce.smin.v16i8(<16 x i8> %arr.load)
   ret i8 %r
 }
 
-; CHECK-LABEL: smin_H
-; CHECK: sminv {{h[0-9]+}}, {{v[0-9]+}}.8h
 define i16 @smin_H(ptr nocapture readonly %arr) {
+; CHECK-LABEL: smin_H:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    ldr q0, [x0]
+; CHECK-NEXT:    sminv h0, v0.8h
+; CHECK-NEXT:    fmov w0, s0
+; CHECK-NEXT:    ret
   %arr.load = load <8 x i16>, ptr %arr
   %r = call i16 @llvm.vector.reduce.smin.v8i16(<8 x i16> %arr.load)
   ret i16 %r
 }
 
-; CHECK-LABEL: smin_S
-; CHECK: sminv {{s[0-9]+}}, {{v[0-9]+}}.4s
 define i32 @smin_S(ptr nocapture readonly %arr) {
+; CHECK-LABEL: smin_S:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    ldr q0, [x0]
+; CHECK-NEXT:    sminv s0, v0.4s
+; CHECK-NEXT:    fmov w0, s0
+; CHECK-NEXT:    ret
   %arr.load = load <4 x i32>, ptr %arr
   %r = call i32 @llvm.vector.reduce.smin.v4i32(<4 x i32> %arr.load)
   ret i32 %r
 }
 
-; CHECK-LABEL: umin_B
-; CHECK: uminv {{b[0-9]+}}, {{v[0-9]+}}.16b
 define i8 @umin_B(ptr nocapture readonly %arr)  {
+; CHECK-LABEL: umin_B:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    ldr q0, [x0]
+; CHECK-NEXT:    uminv b0, v0.16b
+; CHECK-NEXT:    fmov w0, s0
+; CHECK-NEXT:    ret
   %arr.load = load <16 x i8>, ptr %arr
   %r = call i8 @llvm.vector.reduce.umin.v16i8(<16 x i8> %arr.load)
   ret i8 %r
 }
 
-; CHECK-LABEL: umin_H
-; CHECK: uminv {{h[0-9]+}}, {{v[0-9]+}}.8h
 define i16 @umin_H(ptr nocapture readonly %arr)  {
+; CHECK-LABEL: umin_H:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    ldr q0, [x0]
+; CHECK-NEXT:    uminv h0, v0.8h
+; CHECK-NEXT:    fmov w0, s0
+; CHECK-NEXT:    ret
   %arr.load = load <8 x i16>, ptr %arr
   %r = call i16 @llvm.vector.reduce.umin.v8i16(<8 x i16> %arr.load)
   ret i16 %r
 }
 
-; CHECK-LABEL: umin_S
-; CHECK: uminv {{s[0-9]+}}, {{v[0-9]+}}.4s
 define i32 @umin_S(ptr nocapture readonly %arr) {
+; CHECK-LABEL: umin_S:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    ldr q0, [x0]
+; CHECK-NEXT:    uminv s0, v0.4s
+; CHECK-NEXT:    fmov w0, s0
+; CHECK-NEXT:    ret
   %arr.load = load <4 x i32>, ptr %arr
   %r = call i32 @llvm.vector.reduce.umin.v4i32(<4 x i32> %arr.load)
   ret i32 %r
 }
 
-; CHECK-LABEL: fmaxnm_S
-; CHECK: fmaxnmv
 define float @fmaxnm_S(ptr nocapture readonly %arr) {
+; CHECK-LABEL: fmaxnm_S:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    ldr q0, [x0]
+; CHECK-NEXT:    fmaxnmv s0, v0.4s
+; CHECK-NEXT:    ret
   %arr.load  = load <4 x float>, ptr %arr
   %r = call nnan float @llvm.vector.reduce.fmax.v4f32(<4 x float> %arr.load)
   ret float %r
 }
 
-; CHECK-LABEL: fminnm_S
-; CHECK: fminnmv
 define float @fminnm_S(ptr nocapture readonly %arr) {
+; CHECK-LABEL: fminnm_S:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    ldr q0, [x0]
+; CHECK-NEXT:    fminnmv s0, v0.4s
+; CHECK-NEXT:    ret
   %arr.load  = load <4 x float>, ptr %arr
   %r = call nnan float @llvm.vector.reduce.fmin.v4f32(<4 x float> %arr.load)
   ret float %r
 }
 
-declare i16 @llvm.vector.reduce.umax.v16i16(<16 x i16>)
-
 define i16 @oversized_umax_256(ptr nocapture readonly %arr)  {
-; CHECK-LABEL: oversized_umax_256
-; CHECK: umax [[V0:v[0-9]+]].8h, {{v[0-9]+}}.8h, {{v[0-9]+}}.8h
-; CHECK: umaxv {{h[0-9]+}}, [[V0]]
+; CHECK-SD-LABEL: oversized_umax_256:
+; CHECK-SD:       // %bb.0:
+; CHECK-SD-NEXT:    ldp q1, q0, [x0]
+; CHECK-SD-NEXT:    umax v0.8h, v1.8h, v0.8h
+; CHECK-SD-NEXT:    umaxv h0, v0.8h
+; CHECK-SD-NEXT:    fmov w0, s0
+; CHECK-SD-NEXT:    ret
+;
+; CHECK-GI-LABEL: oversized_umax_256:
+; CHECK-GI:       // %bb.0:
+; CHECK-GI-NEXT:    ldp q0, q1, [x0]
+; CHECK-GI-NEXT:    umax v0.8h, v0.8h, v1.8h
+; CHECK-GI-NEXT:    umaxv h0, v0.8h
+; CHECK-GI-NEXT:    fmov w0, s0
+; CHECK-GI-NEXT:    ret
   %arr.load = load <16 x i16>, ptr %arr
   %r = call i16 @llvm.vector.reduce.umax.v16i16(<16 x i16> %arr.load)
   ret i16 %r
 }
 
-declare i32 @llvm.vector.reduce.umax.v16i32(<16 x i32>)
-
 define i32 @oversized_umax_512(ptr nocapture readonly %arr)  {
-; CHECK-LABEL: oversized_umax_512
-; CHECK: umax v
-; CHECK-NEXT: umax v
-; CHECK-NEXT: umax [[V0:v[0-9]+]].4s, {{v[0-9]+}}.4s, {{v[0-9]+}}.4s
-; CHECK-NEXT: umaxv {{s[0-9]+}}, [[V0]]
+; CHECK-SD-LABEL: oversized_umax_512:
+; CHECK-SD:       // %bb.0:
+; CHECK-SD-NEXT:    ldp q0, q1, [x0, #32]
+; CHECK-SD-NEXT:    ldp q2, q3, [x0]
+; CHECK-SD-NEXT:    umax v1.4s, v3.4s, v1.4s
+; CHECK-SD-NEXT:    umax v0.4s, v2.4s, v0.4s
+; CHECK-SD-NEXT:    umax v0.4s, v0.4s, v1.4s
+; CHECK-SD-NEXT:    umaxv s0, v0.4s
+; CHECK-SD-NEXT:    fmov w0, s0
+; CHECK-SD-NEXT:    ret
+;
+; CHECK-GI-LABEL: oversized_umax_512:
+; CHECK-GI:       // %bb.0:
+; CHECK-GI-NEXT:    ldp q0, q1, [x0]
+; CHECK-GI-NEXT:    ldp q2, q3, [x0, #32]
+; CHECK-GI-NEXT:    umax v0.4s, v0.4s, v1.4s
+; CHECK-GI-NEXT:    umax v1.4s, v2.4s, v3.4s
+; CHECK-GI-NEXT:    umax v0.4s, v0.4s, v1.4s
+; CHECK-GI-NEXT:    umaxv s0, v0.4s
+; CHECK-GI-NEXT:    fmov w0, s0
+; CHECK-GI-NEXT:    ret
   %arr.load = load <16 x i32>, ptr %arr
   %r = call i32 @llvm.vector.reduce.umax.v16i32(<16 x i32> %arr.load)
   ret i32 %r
 }
 
-declare i16 @llvm.vector.reduce.umin.v16i16(<16 x i16>)
-
 define i16 @oversized_umin_256(ptr nocapture readonly %arr)  {
-; CHECK-LABEL: oversized_umin_256
-; CHECK: umin [[V0:v[0-9]+]].8h, {{v[...
[truncated]

aemerson · 2023-10-18T21:29:46Z

llvm/test/CodeGen/AArch64/aarch64-minmaxv.ll

+define i64 @umaxv_v3i64(<3 x i64> %a) {
+; CHECK-LABEL: umaxv_v3i64:
+; CHECK:       // %bb.0: // %entry
+; CHECK-NEXT:    // kill: def $d2 killed $d2 def $q2
+; CHECK-NEXT:    // kill: def $d0 killed $d0 def $q0
+; CHECK-NEXT:    mov v3.16b, v0.16b
+; CHECK-NEXT:    mov v4.16b, v2.16b
+; CHECK-NEXT:    // kill: def $d1 killed $d1 def $q1
+; CHECK-NEXT:    mov v3.d[1], v1.d[0]
+; CHECK-NEXT:    mov v4.d[1], xzr
+; CHECK-NEXT:    cmhi v3.2d, v3.2d, v4.2d
+; CHECK-NEXT:    ext v4.16b, v3.16b, v3.16b, #8
+; CHECK-NEXT:    bif v0.8b, v2.8b, v3.8b
+; CHECK-NEXT:    and v1.8b, v1.8b, v4.8b
+; CHECK-NEXT:    cmhi d2, d0, d1
+; CHECK-NEXT:    bif v0.8b, v1.8b, v2.8b
+; CHECK-NEXT:    fmov x0, d0
+; CHECK-NEXT:    ret
+entry:
+  %arg1 = call i64 @llvm.vector.reduce.umax.v3i64(<3 x i64> %a)
+  ret i64 %arg1
+}


Is this case falling back? I guess we have a gap in our legalization for non-power-of-2 types?

The FewerElements for vector reductions returns UnableToLegalize when it encounters non-power-of-2-types.
The non-power-2 reductions are not well supported yet and could be something we can improve upon in future patches.

This is true for float reductions too I believe. We don't yet know about adding identity elements to pad reductions to legal lengths.

dzhidzhoev · 2023-10-19T12:20:18Z

Can we run llvm/test/CodeGen/AArch64//vecreduce-umax-legalization.ll on GlobalISel with this commit?

davemgreen

Thanks. I think this LGTM

davemgreen · 2023-11-02T08:19:13Z

llvm/test/CodeGen/AArch64/aarch64-minmaxv.ll

+; CHECK-GI-NEXT:    fmov x8, d0
+; CHECK-GI-NEXT:    fmov x9, d1
+; CHECK-GI-NEXT:    cmp x8, x9
+; CHECK-GI-NEXT:    fcsel d0, d0, d1, lt


This would be better as a csel, if the operands are already in gprs. Something to improve in regbankselect perhaps. And we may want to generate code similar to sdag in places, by using further steps in vector regs, but I think this looks good for now.

llvm/lib/Target/AArch64/AArch64InstrInfo.td

llvm/lib/Target/AArch64/AArch64InstrGISel.td

davemgreen

Thanks. LGTM

Legalizes G_VECREDUCE_{MIN/MAX} and selects instructions for vecreduce_{min/max}

chuongg3 requested review from aemerson and davemgreen October 18, 2023 13:32

llvmbot added backend:AArch64 llvm:globalisel labels Oct 18, 2023

aemerson reviewed Oct 18, 2023

View reviewed changes

chuongg3 force-pushed the GlobalISel_VECREDUCE_MINMAX branch from 514a160 to 98d4c39 Compare October 30, 2023 10:51

davemgreen approved these changes Nov 2, 2023

View reviewed changes

davemgreen reviewed Nov 2, 2023

View reviewed changes

llvm/lib/Target/AArch64/AArch64InstrInfo.td Outdated Show resolved Hide resolved

llvm/lib/Target/AArch64/AArch64InstrInfo.td Outdated Show resolved Hide resolved

davemgreen reviewed Nov 2, 2023

View reviewed changes

llvm/lib/Target/AArch64/AArch64InstrGISel.td Outdated Show resolved Hide resolved

davemgreen approved these changes Nov 6, 2023

View reviewed changes

chuongg3 added 3 commits November 9, 2023 15:52

[AArch64][GlobalISel] Legalize G_VECREDUCE_{MIN/MAX}

56fdfe9

Legalizes G_VECREDUCE_{MIN/MAX} and selects instructions for vecreduce_{min/max}

fixup! [AArch64][GlobalISel] Legalize G_VECREDUCE_{MIN/MAX}

dcc506b

fixup! [AArch64][GlobalISel] Legalize G_VECREDUCE_{MIN/MAX}

05cf492

chuongg3 force-pushed the GlobalISel_VECREDUCE_MINMAX branch from 69fd4dc to 05cf492 Compare November 9, 2023 16:10

chuongg3 merged commit 451bc3e into llvm:main Nov 9, 2023

zahiraam pushed a commit to zahiraam/llvm-project that referenced this pull request Nov 20, 2023

[AArch64][GlobalISel] Legalize G_VECREDUCE_{MIN/MAX} (llvm#69461)

389804a

Legalizes G_VECREDUCE_{MIN/MAX} and selects instructions for vecreduce_{min/max}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AArch64][GlobalISel] Legalize G_VECREDUCE_{MIN/MAX} #69461

[AArch64][GlobalISel] Legalize G_VECREDUCE_{MIN/MAX} #69461

Uh oh!

chuongg3 commented Oct 18, 2023

Uh oh!

llvmbot commented Oct 18, 2023 •

edited

Loading

Uh oh!

aemerson Oct 18, 2023

Uh oh!

chuongg3 Oct 23, 2023

Uh oh!

davemgreen Oct 23, 2023

Uh oh!

dzhidzhoev commented Oct 19, 2023

Uh oh!

davemgreen left a comment

Uh oh!

davemgreen Nov 2, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

davemgreen left a comment

Uh oh!

Uh oh!

[AArch64][GlobalISel] Legalize G_VECREDUCE_{MIN/MAX} #69461

[AArch64][GlobalISel] Legalize G_VECREDUCE_{MIN/MAX} #69461

Uh oh!

Conversation

chuongg3 commented Oct 18, 2023

Uh oh!

llvmbot commented Oct 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aemerson Oct 18, 2023

Choose a reason for hiding this comment

Uh oh!

chuongg3 Oct 23, 2023

Choose a reason for hiding this comment

Uh oh!

davemgreen Oct 23, 2023

Choose a reason for hiding this comment

Uh oh!

dzhidzhoev commented Oct 19, 2023

Uh oh!

davemgreen left a comment

Choose a reason for hiding this comment

Uh oh!

davemgreen Nov 2, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

davemgreen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

llvmbot commented Oct 18, 2023 •

edited

Loading