Skip to content

Commit b087699

Browse files
authored
[AArch64][GlobalISel] Clean up CTLZ vector type legalization. (#131514)
Similar to other operations, s8, s16 and s32 vector elements are clamped to legal vector sizes, but in this case s64 are scalarized to use the gpr instructions. This allows vector types to split as opposed to scalarizing.
1 parent b37be0e commit b087699

File tree

3 files changed

+77
-332
lines changed

3 files changed

+77
-332
lines changed

llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6139,6 +6139,7 @@ LegalizerHelper::moreElementsVector(MachineInstr &MI, unsigned TypeIdx,
61396139
case TargetOpcode::G_FCANONICALIZE:
61406140
case TargetOpcode::G_SEXT_INREG:
61416141
case TargetOpcode::G_ABS:
6142+
case TargetOpcode::G_CTLZ:
61426143
if (TypeIdx != 0)
61436144
return UnableToLegalize;
61446145
Observer.changingInstr(MI);

llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp

Lines changed: 14 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -326,12 +326,23 @@ AArch64LegalizerInfo::AArch64LegalizerInfo(const AArch64Subtarget &ST)
326326
.maxScalarEltSameAsIf(always, 1, 0);
327327

328328
getActionDefinitionsBuilder(G_CTLZ)
329-
.legalForCartesianProduct(
330-
{s32, s64, v8s8, v16s8, v4s16, v8s16, v2s32, v4s32})
331-
.scalarize(1)
329+
.legalFor({{s32, s32},
330+
{s64, s64},
331+
{v8s8, v8s8},
332+
{v16s8, v16s8},
333+
{v4s16, v4s16},
334+
{v8s16, v8s16},
335+
{v2s32, v2s32},
336+
{v4s32, v4s32}})
332337
.widenScalarToNextPow2(1, /*Min=*/32)
333338
.clampScalar(1, s32, s64)
339+
.clampNumElements(0, v8s8, v16s8)
340+
.clampNumElements(0, v4s16, v8s16)
341+
.clampNumElements(0, v2s32, v4s32)
342+
.moreElementsToNextPow2(0)
343+
.scalarizeIf(scalarOrEltWiderThan(0, 32), 0)
334344
.scalarSameSizeAs(0, 1);
345+
335346
getActionDefinitionsBuilder(G_CTLZ_ZERO_UNDEF).lower();
336347

337348
getActionDefinitionsBuilder(G_CTTZ)

0 commit comments

Comments
 (0)