Skip to content

Commit 895dd64

Browse files
committed
Add an all-in-one histogram intrinsic, along with lowering for AArch64
Current interface is: llvm.experimental.vector.histogram.op(<vecty> ptrs, <intty> inc_amount, <vecty> mask) Where op is the update operation (currently limited to 'add'). The integer type used by 'inc_amount' needs to match the type of the buckets in memory. The intrinsic covers the following operations: * Gather load * histogram on the elements of 'ptrs' * multiply the histogram results by 'inc_amount' * add the result of the multiply to the values loaded by the gather * scatter store the results of the add These operations can obviously be scalarized on platforms without the relevant instructions.
1 parent 2125080 commit 895dd64

File tree

12 files changed

+310
-0
lines changed

12 files changed

+310
-0
lines changed

llvm/docs/LangRef.rst

Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19030,6 +19030,65 @@ will be on any later loop iteration.
1903019030
This intrinsic will only return 0 if the input count is also 0. A non-zero input
1903119031
count will produce a non-zero result.
1903219032

19033+
'``llvm.experimental.vector.histogram.*``' Intrinsic
19034+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
19035+
19036+
This is an overloaded intrinsic.
19037+
19038+
These intrinsics represent histogram-like operations; that is, updating values
19039+
in memory that may not be contiguous, and where multiple elements within a
19040+
single vector may be updating the same value in memory.
19041+
19042+
The update operation must be specified as part of the intrinsic name. For a
19043+
simple histogram like the following the ``add`` operation would be used.
19044+
19045+
.. code-block:: c
19046+
19047+
void simple_histogram(int *restrict buckets, unsigned *indices, int N, int inc) {
19048+
for (int i = 0; i < N; ++i)
19049+
buckets[indices[i]] += inc;
19050+
}
19051+
19052+
More update operation types may be added in the future.
19053+
19054+
::
19055+
19056+
declare <8 x i32> @llvm.experimental.vector.histogram.add.v8p0.i32(<8 x ptr> %ptrs, i32 %inc, <8 x i1> %mask)
19057+
declare <vscale x 2 x i64> @llvm.experimental.vector.histogram.add.nxv2p0.i64(<vscale x 2 x ptr> %ptrs, i64 %inc, <vscale x 2 x i1> %mask)
19058+
19059+
Arguments:
19060+
""""""""""
19061+
19062+
The first argument is a vector of pointers to the memory locations to be
19063+
updated. The second argument is a scalar used to update the value from
19064+
memory; it must match the type of value to be updated. The final argument
19065+
is a mask value which will exclude that vector element from being updated,
19066+
and will exclude it from any cross-lane calculations to determine the
19067+
final values for each memory location.
19068+
19069+
Semantics:
19070+
""""""""""
19071+
19072+
The '``llvm.experimental.vector.histogram``' intrinsics are used to perform
19073+
updates on potentially overlapping values in memory. The intrinsics represent
19074+
the follow sequence of operations:
19075+
19076+
1. Gather load from the ``ptrs`` operand, with element type matching that of
19077+
the ``inc`` operand.
19078+
2. Update of the values loaded from memory. In the case of the ``add``
19079+
update operation, this means:
19080+
19081+
1. Perform a cross-vector histogram operation on the ``ptrs`` operand,
19082+
or a set of index values if it can be decomposed into a base pointer
19083+
with smaller indices matching the type of ``inc``.
19084+
2. Multiply the result by the ``inc`` operand.
19085+
3. Add the result to the values loaded from memory
19086+
3. Scatter the result of the update operation to the memory locations from
19087+
the ``ptrs`` operand.
19088+
19089+
The ``mask`` operand will apply to at least the gather and scatter operations,
19090+
and potentially the update if supported.
19091+
1903319092
Matrix Intrinsics
1903419093
-----------------
1903519094

llvm/include/llvm/CodeGen/ISDOpcodes.h

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1401,6 +1401,11 @@ enum NodeType {
14011401
// which is later translated to an implicit use in the MIR.
14021402
CONVERGENCECTRL_GLUE,
14031403

1404+
// Experimental vector histogram intrinsic
1405+
// Operands: Input Chain, Inc, Mask, Base, Index, Scale, ID
1406+
// Output: Output Chain
1407+
EXPERIMENTAL_HISTOGRAM,
1408+
14041409
/// BUILTIN_OP_END - This must be the last enum value in this list.
14051410
/// The target-specific pre-isel opcode values start here.
14061411
BUILTIN_OP_END

llvm/include/llvm/CodeGen/SelectionDAG.h

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1526,6 +1526,9 @@ class SelectionDAG {
15261526
ArrayRef<SDValue> Ops, MachineMemOperand *MMO,
15271527
ISD::MemIndexType IndexType,
15281528
bool IsTruncating = false);
1529+
SDValue getMaskedHistogram(SDVTList VTs, EVT MemVT, const SDLoc &dl,
1530+
ArrayRef<SDValue> Ops, MachineMemOperand *MMO,
1531+
ISD::MemIndexType IndexType);
15291532

15301533
SDValue getGetFPEnv(SDValue Chain, const SDLoc &dl, SDValue Ptr, EVT MemVT,
15311534
MachineMemOperand *MMO);

llvm/include/llvm/CodeGen/SelectionDAGNodes.h

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -542,6 +542,7 @@ BEGIN_TWO_BYTE_PACK()
542542
friend class MaskedLoadStoreSDNode;
543543
friend class MaskedGatherScatterSDNode;
544544
friend class VPGatherScatterSDNode;
545+
friend class MaskedHistogramSDNode;
545546

546547
uint16_t : NumMemSDNodeBits;
547548

@@ -564,6 +565,7 @@ BEGIN_TWO_BYTE_PACK()
564565
friend class MaskedLoadSDNode;
565566
friend class MaskedGatherSDNode;
566567
friend class VPGatherSDNode;
568+
friend class MaskedHistogramSDNode;
567569

568570
uint16_t : NumLSBaseSDNodeBits;
569571

@@ -1413,6 +1415,7 @@ class MemSDNode : public SDNode {
14131415
return getOperand(2);
14141416
case ISD::MGATHER:
14151417
case ISD::MSCATTER:
1418+
case ISD::EXPERIMENTAL_HISTOGRAM:
14161419
return getOperand(3);
14171420
default:
14181421
return getOperand(1);
@@ -1461,6 +1464,7 @@ class MemSDNode : public SDNode {
14611464
case ISD::EXPERIMENTAL_VP_STRIDED_STORE:
14621465
case ISD::GET_FPENV_MEM:
14631466
case ISD::SET_FPENV_MEM:
1467+
case ISD::EXPERIMENTAL_HISTOGRAM:
14641468
return true;
14651469
default:
14661470
return N->isMemIntrinsic() || N->isTargetMemoryOpcode();
@@ -2946,6 +2950,33 @@ class MaskedScatterSDNode : public MaskedGatherScatterSDNode {
29462950
}
29472951
};
29482952

2953+
class MaskedHistogramSDNode : public MemSDNode {
2954+
public:
2955+
friend class SelectionDAG;
2956+
2957+
MaskedHistogramSDNode(unsigned Order, const DebugLoc &DL, SDVTList VTs,
2958+
EVT MemVT, MachineMemOperand *MMO,
2959+
ISD::MemIndexType IndexType)
2960+
: MemSDNode(ISD::EXPERIMENTAL_HISTOGRAM, Order, DL, VTs, MemVT, MMO) {
2961+
LSBaseSDNodeBits.AddressingMode = IndexType;
2962+
}
2963+
2964+
ISD::MemIndexType getIndexType() const {
2965+
return static_cast<ISD::MemIndexType>(LSBaseSDNodeBits.AddressingMode);
2966+
}
2967+
2968+
const SDValue &getBasePtr() const { return getOperand(3); }
2969+
const SDValue &getIndex() const { return getOperand(4); }
2970+
const SDValue &getMask() const { return getOperand(2); }
2971+
const SDValue &getScale() const { return getOperand(5); }
2972+
const SDValue &getInc() const { return getOperand(1); }
2973+
const SDValue &getIntID() const { return getOperand(6); }
2974+
2975+
static bool classof(const SDNode *N) {
2976+
return N->getOpcode() == ISD::EXPERIMENTAL_HISTOGRAM;
2977+
}
2978+
};
2979+
29492980
class FPStateAccessSDNode : public MemSDNode {
29502981
public:
29512982
friend class SelectionDAG;

llvm/include/llvm/IR/Intrinsics.td

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1850,6 +1850,13 @@ def int_experimental_vp_strided_load : DefaultAttrsIntrinsic<[llvm_anyvector_ty
18501850
llvm_i32_ty],
18511851
[ NoCapture<ArgIndex<0>>, IntrNoSync, IntrReadMem, IntrWillReturn, IntrArgMemOnly ]>;
18521852

1853+
// Experimental histogram
1854+
def int_experimental_vector_histogram_add : DefaultAttrsIntrinsic<[],
1855+
[ llvm_anyvector_ty, // Vector of pointers
1856+
llvm_anyint_ty, // Increment
1857+
LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>], // Mask
1858+
[]>;
1859+
18531860
// Operators
18541861
let IntrProperties = [IntrNoMem, IntrNoSync, IntrWillReturn] in {
18551862
// Integer arithmetic

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9576,6 +9576,44 @@ SDValue SelectionDAG::getMaskedScatter(SDVTList VTs, EVT MemVT, const SDLoc &dl,
95769576
return V;
95779577
}
95789578

9579+
SDValue SelectionDAG::getMaskedHistogram(SDVTList VTs, EVT MemVT,
9580+
const SDLoc &dl, ArrayRef<SDValue> Ops,
9581+
MachineMemOperand *MMO,
9582+
ISD::MemIndexType IndexType) {
9583+
assert(Ops.size() == 7 && "Incompatible number of operands");
9584+
9585+
FoldingSetNodeID ID;
9586+
AddNodeIDNode(ID, ISD::EXPERIMENTAL_HISTOGRAM, VTs, Ops);
9587+
ID.AddInteger(MemVT.getRawBits());
9588+
ID.AddInteger(getSyntheticNodeSubclassData<MaskedHistogramSDNode>(
9589+
dl.getIROrder(), VTs, MemVT, MMO, IndexType));
9590+
ID.AddInteger(MMO->getPointerInfo().getAddrSpace());
9591+
ID.AddInteger(MMO->getFlags());
9592+
void *IP = nullptr;
9593+
if (SDNode *E = FindNodeOrInsertPos(ID, dl, IP)) {
9594+
cast<MaskedGatherSDNode>(E)->refineAlignment(MMO);
9595+
return SDValue(E, 0);
9596+
}
9597+
9598+
auto *N = newSDNode<MaskedHistogramSDNode>(dl.getIROrder(), dl.getDebugLoc(),
9599+
VTs, MemVT, MMO, IndexType);
9600+
createOperands(N, Ops);
9601+
9602+
assert(N->getMask().getValueType().getVectorElementCount() ==
9603+
N->getIndex().getValueType().getVectorElementCount() &&
9604+
"Vector width mismatch between mask and data");
9605+
assert(isa<ConstantSDNode>(N->getScale()) &&
9606+
N->getScale()->getAsAPIntVal().isPowerOf2() &&
9607+
"Scale should be a constant power of 2");
9608+
assert(N->getInc().getValueType().isInteger() && "Non integer update value");
9609+
9610+
CSEMap.InsertNode(N, IP);
9611+
InsertNode(N);
9612+
SDValue V(N, 0);
9613+
NewSDValueDbgMsg(V, "Creating new node: ", this);
9614+
return V;
9615+
}
9616+
95799617
SDValue SelectionDAG::getGetFPEnv(SDValue Chain, const SDLoc &dl, SDValue Ptr,
95809618
EVT MemVT, MachineMemOperand *MMO) {
95819619
assert(Chain.getValueType() == MVT::Other && "Invalid chain type");

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6281,6 +6281,64 @@ void SelectionDAGBuilder::visitConvergenceControl(const CallInst &I,
62816281
}
62826282
}
62836283

6284+
void SelectionDAGBuilder::visitVectorHistogram(const CallInst &I,
6285+
unsigned IntrinsicID) {
6286+
// For now, we're only lowering an 'add' histogram.
6287+
// We can add others later, e.g. saturating adds, min/max.
6288+
assert(IntrinsicID == Intrinsic::experimental_vector_histogram_add &&
6289+
"Tried to lower unsupported histogram type");
6290+
SDLoc sdl = getCurSDLoc();
6291+
Value *Ptr = I.getOperand(0);
6292+
SDValue Inc = getValue(I.getOperand(1));
6293+
SDValue Mask = getValue(I.getOperand(2));
6294+
6295+
const TargetLowering &TLI = DAG.getTargetLoweringInfo();
6296+
DataLayout TargetDL = DAG.getDataLayout();
6297+
EVT VT = Inc.getValueType();
6298+
Align Alignment = DAG.getEVTAlign(VT);
6299+
6300+
const MDNode *Ranges = getRangeMetadata(I);
6301+
6302+
SDValue Root = DAG.getRoot();
6303+
SDValue Base;
6304+
SDValue Index;
6305+
ISD::MemIndexType IndexType;
6306+
SDValue Scale;
6307+
bool UniformBase = getUniformBase(Ptr, Base, Index, IndexType, Scale, this,
6308+
I.getParent(), VT.getScalarStoreSize());
6309+
6310+
unsigned AS = Ptr->getType()->getScalarType()->getPointerAddressSpace();
6311+
6312+
MachineMemOperand *MMO = DAG.getMachineFunction().getMachineMemOperand(
6313+
MachinePointerInfo(AS),
6314+
MachineMemOperand::MOLoad | MachineMemOperand::MOStore,
6315+
MemoryLocation::UnknownSize, Alignment, I.getAAMetadata(), Ranges);
6316+
6317+
if (!UniformBase) {
6318+
Base = DAG.getConstant(0, sdl, TLI.getPointerTy(DAG.getDataLayout()));
6319+
Index = getValue(Ptr);
6320+
IndexType = ISD::SIGNED_SCALED;
6321+
Scale =
6322+
DAG.getTargetConstant(1, sdl, TLI.getPointerTy(DAG.getDataLayout()));
6323+
}
6324+
6325+
EVT IdxVT = Index.getValueType();
6326+
EVT EltTy = IdxVT.getVectorElementType();
6327+
if (TLI.shouldExtendGSIndex(IdxVT, EltTy)) {
6328+
EVT NewIdxVT = IdxVT.changeVectorElementType(EltTy);
6329+
Index = DAG.getNode(ISD::SIGN_EXTEND, sdl, NewIdxVT, Index);
6330+
}
6331+
6332+
SDValue ID = DAG.getTargetConstant(IntrinsicID, sdl, MVT::i32);
6333+
6334+
SDValue Ops[] = {Root, Inc, Mask, Base, Index, Scale, ID};
6335+
SDValue Histogram = DAG.getMaskedHistogram(DAG.getVTList(MVT::Other), VT, sdl,
6336+
Ops, MMO, IndexType);
6337+
6338+
setValue(&I, Histogram);
6339+
DAG.setRoot(Histogram);
6340+
}
6341+
62846342
/// Lower the call to the specified intrinsic function.
62856343
void SelectionDAGBuilder::visitIntrinsicCall(const CallInst &I,
62866344
unsigned Intrinsic) {
@@ -7949,6 +8007,11 @@ void SelectionDAGBuilder::visitIntrinsicCall(const CallInst &I,
79498007
case Intrinsic::experimental_convergence_entry:
79508008
case Intrinsic::experimental_convergence_loop:
79518009
visitConvergenceControl(I, Intrinsic);
8010+
return;
8011+
case Intrinsic::experimental_vector_histogram_add: {
8012+
visitVectorHistogram(I, Intrinsic);
8013+
return;
8014+
}
79528015
}
79538016
}
79548017

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -624,6 +624,7 @@ class SelectionDAGBuilder {
624624
void visitTargetIntrinsic(const CallInst &I, unsigned Intrinsic);
625625
void visitConstrainedFPIntrinsic(const ConstrainedFPIntrinsic &FPI);
626626
void visitConvergenceControl(const CallInst &I, unsigned Intrinsic);
627+
void visitVectorHistogram(const CallInst &I, unsigned IntrinsicID);
627628
void visitVPLoad(const VPIntrinsic &VPIntrin, EVT VT,
628629
const SmallVectorImpl<SDValue> &OpValues);
629630
void visitVPStore(const VPIntrinsic &VPIntrin,

llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -529,6 +529,9 @@ std::string SDNode::getOperationName(const SelectionDAG *G) const {
529529
case ISD::PATCHPOINT:
530530
return "patchpoint";
531531

532+
case ISD::EXPERIMENTAL_HISTOGRAM:
533+
return "histogram";
534+
532535
// Vector Predication
533536
#define BEGIN_REGISTER_VP_SDNODE(SDID, LEGALARG, NAME, ...) \
534537
case ISD::SDID: \

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1603,6 +1603,10 @@ AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,
16031603
setOperationAction(ISD::VECREDUCE_SEQ_FADD, VT, Custom);
16041604
}
16051605

1606+
// Histcnt is SVE2 only
1607+
if (Subtarget->hasSVE2() && Subtarget->isSVEAvailable())
1608+
setOperationAction(ISD::EXPERIMENTAL_HISTOGRAM, MVT::Other, Custom);
1609+
16061610
// NOTE: Currently this has to happen after computeRegisterProperties rather
16071611
// than the preferred option of combining it with the addRegisterClass call.
16081612
if (Subtarget->useSVEForFixedLengthVectors()) {
@@ -6643,6 +6647,8 @@ SDValue AArch64TargetLowering::LowerOperation(SDValue Op,
66436647
return LowerFunnelShift(Op, DAG);
66446648
case ISD::FLDEXP:
66456649
return LowerFLDEXP(Op, DAG);
6650+
case ISD::EXPERIMENTAL_HISTOGRAM:
6651+
return LowerVECTOR_HISTOGRAM(Op, DAG);
66466652
}
66476653
}
66486654

@@ -27182,6 +27188,62 @@ SDValue AArch64TargetLowering::LowerVECTOR_INTERLEAVE(SDValue Op,
2718227188
return DAG.getMergeValues({Lo, Hi}, DL);
2718327189
}
2718427190

27191+
SDValue AArch64TargetLowering::LowerVECTOR_HISTOGRAM(SDValue Op,
27192+
SelectionDAG &DAG) const {
27193+
// FIXME: Maybe share some code with LowerMGather/Scatter?
27194+
MaskedHistogramSDNode *HG = cast<MaskedHistogramSDNode>(Op);
27195+
SDLoc DL(HG);
27196+
SDValue Chain = HG->getOperand(0);
27197+
SDValue Inc = HG->getInc();
27198+
SDValue Mask = HG->getMask();
27199+
SDValue Ptr = HG->getBasePtr();
27200+
SDValue Index = HG->getIndex();
27201+
SDValue Scale = HG->getScale();
27202+
SDValue IntID = HG->getIntID();
27203+
27204+
// The Intrinsic ID determines the type of update operation.
27205+
ConstantSDNode *CID = cast<ConstantSDNode>(IntID.getNode());
27206+
// Right now, we only support 'add' as an update.
27207+
assert(CID->getZExtValue() == Intrinsic::experimental_vector_histogram_add &&
27208+
"Unexpected histogram update operation");
27209+
27210+
EVT IncVT = Inc.getValueType();
27211+
EVT IndexVT = Index.getValueType();
27212+
EVT MemVT = EVT::getVectorVT(*DAG.getContext(), IncVT,
27213+
IndexVT.getVectorElementCount());
27214+
SDValue Zero = DAG.getConstant(0, DL, MVT::i64);
27215+
SDValue PassThru = DAG.getSplatVector(MemVT, DL, Zero);
27216+
SDValue IncSplat = DAG.getSplatVector(MemVT, DL, Inc);
27217+
SDValue Ops[] = {Chain, PassThru, Mask, Ptr, Index, Scale};
27218+
27219+
// Set the MMO to load only, rather than load|store.
27220+
MachineMemOperand *GMMO = HG->getMemOperand();
27221+
GMMO->setFlags(MachineMemOperand::MOLoad);
27222+
ISD::MemIndexType IndexType = HG->getIndexType();
27223+
SDValue Gather =
27224+
DAG.getMaskedGather(DAG.getVTList(MemVT, MVT::Other), MemVT, DL, Ops,
27225+
HG->getMemOperand(), IndexType, ISD::NON_EXTLOAD);
27226+
27227+
SDValue GChain = Gather.getValue(1);
27228+
27229+
// Perform the histcnt, multiply by inc, add to bucket data.
27230+
SDValue ID = DAG.getTargetConstant(Intrinsic::aarch64_sve_histcnt, DL, IncVT);
27231+
SDValue HistCnt =
27232+
DAG.getNode(ISD::INTRINSIC_WO_CHAIN, DL, IndexVT, ID, Mask, Index, Index);
27233+
SDValue Mul = DAG.getNode(ISD::MUL, DL, MemVT, HistCnt, IncSplat);
27234+
SDValue Add = DAG.getNode(ISD::ADD, DL, MemVT, Gather, Mul);
27235+
27236+
// Create a new MMO for the scatter.
27237+
MachineMemOperand *SMMO = DAG.getMachineFunction().getMachineMemOperand(
27238+
GMMO->getPointerInfo(), MachineMemOperand::MOStore, GMMO->getSize(),
27239+
GMMO->getAlign(), GMMO->getAAInfo());
27240+
27241+
SDValue ScatterOps[] = {GChain, Add, Mask, Ptr, Index, Scale};
27242+
SDValue Scatter = DAG.getMaskedScatter(DAG.getVTList(MVT::Other), MemVT, DL,
27243+
ScatterOps, SMMO, IndexType, false);
27244+
return Scatter;
27245+
}
27246+
2718527247
SDValue
2718627248
AArch64TargetLowering::LowerFixedLengthFPToIntToSVE(SDValue Op,
2718727249
SelectionDAG &DAG) const {

llvm/lib/Target/AArch64/AArch64ISelLowering.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1143,6 +1143,7 @@ class AArch64TargetLowering : public TargetLowering {
11431143
SDValue LowerINSERT_SUBVECTOR(SDValue Op, SelectionDAG &DAG) const;
11441144
SDValue LowerVECTOR_DEINTERLEAVE(SDValue Op, SelectionDAG &DAG) const;
11451145
SDValue LowerVECTOR_INTERLEAVE(SDValue Op, SelectionDAG &DAG) const;
1146+
SDValue LowerVECTOR_HISTOGRAM(SDValue Op, SelectionDAG &DAG) const;
11461147
SDValue LowerDIV(SDValue Op, SelectionDAG &DAG) const;
11471148
SDValue LowerMUL(SDValue Op, SelectionDAG &DAG) const;
11481149
SDValue LowerVectorSRA_SRL_SHL(SDValue Op, SelectionDAG &DAG) const;

0 commit comments

Comments
 (0)