Skip to content

Commit 07812db

Browse files
committed
Add an all-in-one histogram intrinsic, along with lowering for AArch64
Current interface is: llvm.experimental.vector.histogram.op(<vecty> ptrs, <intty> inc_amount, <vecty> mask) Where op is the update operation (currently limited to 'add'). The integer type used by 'inc_amount' needs to match the type of the buckets in memory. The intrinsic covers the following operations: * Gather load * histogram on the elements of 'ptrs' * multiply the histogram results by 'inc_amount' * add the result of the multiply to the values loaded by the gather * scatter store the results of the add These operations can obviously be scalarized on platforms without the relevant instructions.
1 parent 5f2f390 commit 07812db

17 files changed

+523
-0
lines changed

llvm/docs/LangRef.rst

Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19068,6 +19068,60 @@ will be on any later loop iteration.
1906819068
This intrinsic will only return 0 if the input count is also 0. A non-zero input
1906919069
count will produce a non-zero result.
1907019070

19071+
'``llvm.experimental.vector.histogram.*``' Intrinsics
19072+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
19073+
19074+
These intrinsics are overloaded.
19075+
19076+
These intrinsics represent histogram-like operations; that is, updating values
19077+
in memory that may not be contiguous, and where multiple elements within a
19078+
single vector may be updating the same value in memory.
19079+
19080+
The update operation must be specified as part of the intrinsic name. For a
19081+
simple histogram like the following the ``add`` operation would be used.
19082+
19083+
.. code-block:: c
19084+
19085+
void simple_histogram(int *restrict buckets, unsigned *indices, int N, int inc) {
19086+
for (int i = 0; i < N; ++i)
19087+
buckets[indices[i]] += inc;
19088+
}
19089+
19090+
More update operation types may be added in the future.
19091+
19092+
::
19093+
19094+
declare <8 x i32> @llvm.experimental.vector.histogram.add.v8p0.i32(<8 x ptr> %ptrs, i32 %inc, <8 x i1> %mask)
19095+
declare <vscale x 2 x i64> @llvm.experimental.vector.histogram.add.nxv2p0.i64(<vscale x 2 x ptr> %ptrs, i64 %inc, <vscale x 2 x i1> %mask)
19096+
19097+
Arguments:
19098+
""""""""""
19099+
19100+
The first argument is a vector of pointers to the memory locations to be
19101+
updated. The second argument is a scalar used to update the value from
19102+
memory; it must match the type of value to be updated. The final argument
19103+
is a mask value to exclude locations from being modified.
19104+
19105+
Semantics:
19106+
""""""""""
19107+
19108+
The '``llvm.experimental.vector.histogram.*``' intrinsics are used to perform
19109+
updates on potentially overlapping values in memory. The intrinsics represent
19110+
the follow sequence of operations:
19111+
19112+
1. Gather load from the ``ptrs`` operand, with element type matching that of
19113+
the ``inc`` operand.
19114+
2. Update of the values loaded from memory. In the case of the ``add``
19115+
update operation, this means:
19116+
19117+
1. Perform a cross-vector histogram operation on the ``ptrs`` operand.
19118+
2. Multiply the result by the ``inc`` operand.
19119+
3. Add the result to the values loaded from memory
19120+
3. Scatter the result of the update operation to the memory locations from
19121+
the ``ptrs`` operand.
19122+
19123+
The ``mask`` operand will apply to at least the gather and scatter operations.
19124+
1907119125
Matrix Intrinsics
1907219126
-----------------
1907319127

llvm/include/llvm/Analysis/TargetTransformInfo.h

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -797,6 +797,9 @@ class TargetTransformInfo {
797797
/// Return true if the target supports strided load.
798798
bool isLegalStridedLoadStore(Type *DataType, Align Alignment) const;
799799

800+
// Return true if the target supports masked vector histograms.
801+
bool isLegalMaskedVectorHistogram(Type *AddrType, Type *DataType) const;
802+
800803
/// Return true if this is an alternating opcode pattern that can be lowered
801804
/// to a single instruction on the target. In X86 this is for the addsub
802805
/// instruction which corrsponds to a Shuffle + Fadd + FSub pattern in IR.
@@ -1883,6 +1886,7 @@ class TargetTransformInfo::Concept {
18831886
virtual bool isLegalMaskedCompressStore(Type *DataType, Align Alignment) = 0;
18841887
virtual bool isLegalMaskedExpandLoad(Type *DataType, Align Alignment) = 0;
18851888
virtual bool isLegalStridedLoadStore(Type *DataType, Align Alignment) = 0;
1889+
virtual bool isLegalMaskedVectorHistogram(Type *AddrType, Type *DataType) = 0;
18861890
virtual bool isLegalAltInstr(VectorType *VecTy, unsigned Opcode0,
18871891
unsigned Opcode1,
18881892
const SmallBitVector &OpcodeMask) const = 0;
@@ -2386,6 +2390,9 @@ class TargetTransformInfo::Model final : public TargetTransformInfo::Concept {
23862390
bool isLegalStridedLoadStore(Type *DataType, Align Alignment) override {
23872391
return Impl.isLegalStridedLoadStore(DataType, Alignment);
23882392
}
2393+
bool isLegalMaskedVectorHistogram(Type *AddrType, Type *DataType) override {
2394+
return Impl.isLegalMaskedVectorHistogram(AddrType, DataType);
2395+
}
23892396
bool isLegalAltInstr(VectorType *VecTy, unsigned Opcode0, unsigned Opcode1,
23902397
const SmallBitVector &OpcodeMask) const override {
23912398
return Impl.isLegalAltInstr(VecTy, Opcode0, Opcode1, OpcodeMask);

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -315,6 +315,10 @@ class TargetTransformInfoImplBase {
315315
return false;
316316
}
317317

318+
bool isLegalMaskedVectorHistogram(Type *AddrType, Type *DataType) const {
319+
return false;
320+
}
321+
318322
bool enableOrderedReductions() const { return false; }
319323

320324
bool hasDivRemOp(Type *DataType, bool IsSigned) const { return false; }

llvm/include/llvm/CodeGen/ISDOpcodes.h

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1402,6 +1402,11 @@ enum NodeType {
14021402
// which is later translated to an implicit use in the MIR.
14031403
CONVERGENCECTRL_GLUE,
14041404

1405+
// Experimental vector histogram intrinsic
1406+
// Operands: Input Chain, Inc, Mask, Base, Index, Scale, ID
1407+
// Output: Output Chain
1408+
EXPERIMENTAL_VECTOR_HISTOGRAM,
1409+
14051410
/// BUILTIN_OP_END - This must be the last enum value in this list.
14061411
/// The target-specific pre-isel opcode values start here.
14071412
BUILTIN_OP_END

llvm/include/llvm/CodeGen/SelectionDAG.h

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1526,6 +1526,9 @@ class SelectionDAG {
15261526
ArrayRef<SDValue> Ops, MachineMemOperand *MMO,
15271527
ISD::MemIndexType IndexType,
15281528
bool IsTruncating = false);
1529+
SDValue getMaskedHistogram(SDVTList VTs, EVT MemVT, const SDLoc &dl,
1530+
ArrayRef<SDValue> Ops, MachineMemOperand *MMO,
1531+
ISD::MemIndexType IndexType);
15291532

15301533
SDValue getGetFPEnv(SDValue Chain, const SDLoc &dl, SDValue Ptr, EVT MemVT,
15311534
MachineMemOperand *MMO);

llvm/include/llvm/CodeGen/SelectionDAGNodes.h

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -542,6 +542,7 @@ BEGIN_TWO_BYTE_PACK()
542542
friend class MaskedLoadStoreSDNode;
543543
friend class MaskedGatherScatterSDNode;
544544
friend class VPGatherScatterSDNode;
545+
friend class MaskedHistogramSDNode;
545546

546547
uint16_t : NumMemSDNodeBits;
547548

@@ -552,6 +553,7 @@ BEGIN_TWO_BYTE_PACK()
552553
// MaskedLoadStoreBaseSDNode => enum ISD::MemIndexedMode
553554
// VPGatherScatterSDNode => enum ISD::MemIndexType
554555
// MaskedGatherScatterSDNode => enum ISD::MemIndexType
556+
// MaskedHistogramSDNode => enum ISD::MemIndexType
555557
uint16_t AddressingMode : 3;
556558
};
557559
enum { NumLSBaseSDNodeBits = NumMemSDNodeBits + 3 };
@@ -564,6 +566,7 @@ BEGIN_TWO_BYTE_PACK()
564566
friend class MaskedLoadSDNode;
565567
friend class MaskedGatherSDNode;
566568
friend class VPGatherSDNode;
569+
friend class MaskedHistogramSDNode;
567570

568571
uint16_t : NumLSBaseSDNodeBits;
569572

@@ -1420,6 +1423,7 @@ class MemSDNode : public SDNode {
14201423
return getOperand(2);
14211424
case ISD::MGATHER:
14221425
case ISD::MSCATTER:
1426+
case ISD::EXPERIMENTAL_VECTOR_HISTOGRAM:
14231427
return getOperand(3);
14241428
default:
14251429
return getOperand(1);
@@ -1468,6 +1472,7 @@ class MemSDNode : public SDNode {
14681472
case ISD::EXPERIMENTAL_VP_STRIDED_STORE:
14691473
case ISD::GET_FPENV_MEM:
14701474
case ISD::SET_FPENV_MEM:
1475+
case ISD::EXPERIMENTAL_VECTOR_HISTOGRAM:
14711476
return true;
14721477
default:
14731478
return N->isMemIntrinsic() || N->isTargetMemoryOpcode();
@@ -2953,6 +2958,34 @@ class MaskedScatterSDNode : public MaskedGatherScatterSDNode {
29532958
}
29542959
};
29552960

2961+
class MaskedHistogramSDNode : public MemSDNode {
2962+
public:
2963+
friend class SelectionDAG;
2964+
2965+
MaskedHistogramSDNode(unsigned Order, const DebugLoc &DL, SDVTList VTs,
2966+
EVT MemVT, MachineMemOperand *MMO,
2967+
ISD::MemIndexType IndexType)
2968+
: MemSDNode(ISD::EXPERIMENTAL_VECTOR_HISTOGRAM, Order, DL, VTs, MemVT,
2969+
MMO) {
2970+
LSBaseSDNodeBits.AddressingMode = IndexType;
2971+
}
2972+
2973+
ISD::MemIndexType getIndexType() const {
2974+
return static_cast<ISD::MemIndexType>(LSBaseSDNodeBits.AddressingMode);
2975+
}
2976+
2977+
const SDValue &getBasePtr() const { return getOperand(3); }
2978+
const SDValue &getIndex() const { return getOperand(4); }
2979+
const SDValue &getMask() const { return getOperand(2); }
2980+
const SDValue &getScale() const { return getOperand(5); }
2981+
const SDValue &getInc() const { return getOperand(1); }
2982+
const SDValue &getIntID() const { return getOperand(6); }
2983+
2984+
static bool classof(const SDNode *N) {
2985+
return N->getOpcode() == ISD::EXPERIMENTAL_VECTOR_HISTOGRAM;
2986+
}
2987+
};
2988+
29562989
class FPStateAccessSDNode : public MemSDNode {
29572990
public:
29582991
friend class SelectionDAG;

llvm/include/llvm/IR/Intrinsics.td

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1855,6 +1855,13 @@ def int_experimental_vp_strided_load : DefaultAttrsIntrinsic<[llvm_anyvector_ty
18551855
llvm_i32_ty],
18561856
[ NoCapture<ArgIndex<0>>, IntrNoSync, IntrReadMem, IntrWillReturn, IntrArgMemOnly ]>;
18571857

1858+
// Experimental histogram
1859+
def int_experimental_vector_histogram_add : DefaultAttrsIntrinsic<[],
1860+
[ llvm_anyvector_ty, // Vector of pointers
1861+
llvm_anyint_ty, // Increment
1862+
LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>], // Mask
1863+
[ IntrArgMemOnly ]>;
1864+
18581865
// Operators
18591866
let IntrProperties = [IntrNoMem, IntrNoSync, IntrWillReturn] in {
18601867
// Integer arithmetic

llvm/lib/Analysis/TargetTransformInfo.cpp

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -513,6 +513,11 @@ bool TargetTransformInfo::isLegalStridedLoadStore(Type *DataType,
513513
return TTIImpl->isLegalStridedLoadStore(DataType, Alignment);
514514
}
515515

516+
bool TargetTransformInfo::isLegalMaskedVectorHistogram(Type *AddrType,
517+
Type *DataType) const {
518+
return TTIImpl->isLegalMaskedVectorHistogram(AddrType, DataType);
519+
}
520+
516521
bool TargetTransformInfo::enableOrderedReductions() const {
517522
return TTIImpl->enableOrderedReductions();
518523
}

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9614,6 +9614,44 @@ SDValue SelectionDAG::getMaskedScatter(SDVTList VTs, EVT MemVT, const SDLoc &dl,
96149614
return V;
96159615
}
96169616

9617+
SDValue SelectionDAG::getMaskedHistogram(SDVTList VTs, EVT MemVT,
9618+
const SDLoc &dl, ArrayRef<SDValue> Ops,
9619+
MachineMemOperand *MMO,
9620+
ISD::MemIndexType IndexType) {
9621+
assert(Ops.size() == 7 && "Incompatible number of operands");
9622+
9623+
FoldingSetNodeID ID;
9624+
AddNodeIDNode(ID, ISD::EXPERIMENTAL_VECTOR_HISTOGRAM, VTs, Ops);
9625+
ID.AddInteger(MemVT.getRawBits());
9626+
ID.AddInteger(getSyntheticNodeSubclassData<MaskedHistogramSDNode>(
9627+
dl.getIROrder(), VTs, MemVT, MMO, IndexType));
9628+
ID.AddInteger(MMO->getPointerInfo().getAddrSpace());
9629+
ID.AddInteger(MMO->getFlags());
9630+
void *IP = nullptr;
9631+
if (SDNode *E = FindNodeOrInsertPos(ID, dl, IP)) {
9632+
cast<MaskedGatherSDNode>(E)->refineAlignment(MMO);
9633+
return SDValue(E, 0);
9634+
}
9635+
9636+
auto *N = newSDNode<MaskedHistogramSDNode>(dl.getIROrder(), dl.getDebugLoc(),
9637+
VTs, MemVT, MMO, IndexType);
9638+
createOperands(N, Ops);
9639+
9640+
assert(N->getMask().getValueType().getVectorElementCount() ==
9641+
N->getIndex().getValueType().getVectorElementCount() &&
9642+
"Vector width mismatch between mask and data");
9643+
assert(isa<ConstantSDNode>(N->getScale()) &&
9644+
N->getScale()->getAsAPIntVal().isPowerOf2() &&
9645+
"Scale should be a constant power of 2");
9646+
assert(N->getInc().getValueType().isInteger() && "Non integer update value");
9647+
9648+
CSEMap.InsertNode(N, IP);
9649+
InsertNode(N);
9650+
SDValue V(N, 0);
9651+
NewSDValueDbgMsg(V, "Creating new node: ", this);
9652+
return V;
9653+
}
9654+
96179655
SDValue SelectionDAG::getGetFPEnv(SDValue Chain, const SDLoc &dl, SDValue Ptr,
96189656
EVT MemVT, MachineMemOperand *MMO) {
96199657
assert(Chain.getValueType() == MVT::Other && "Invalid chain type");

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6281,6 +6281,64 @@ void SelectionDAGBuilder::visitConvergenceControl(const CallInst &I,
62816281
}
62826282
}
62836283

6284+
void SelectionDAGBuilder::visitVectorHistogram(const CallInst &I,
6285+
unsigned IntrinsicID) {
6286+
// For now, we're only lowering an 'add' histogram.
6287+
// We can add others later, e.g. saturating adds, min/max.
6288+
assert(IntrinsicID == Intrinsic::experimental_vector_histogram_add &&
6289+
"Tried to lower unsupported histogram type");
6290+
SDLoc sdl = getCurSDLoc();
6291+
Value *Ptr = I.getOperand(0);
6292+
SDValue Inc = getValue(I.getOperand(1));
6293+
SDValue Mask = getValue(I.getOperand(2));
6294+
6295+
const TargetLowering &TLI = DAG.getTargetLoweringInfo();
6296+
DataLayout TargetDL = DAG.getDataLayout();
6297+
EVT VT = Inc.getValueType();
6298+
Align Alignment = DAG.getEVTAlign(VT);
6299+
6300+
const MDNode *Ranges = getRangeMetadata(I);
6301+
6302+
SDValue Root = DAG.getRoot();
6303+
SDValue Base;
6304+
SDValue Index;
6305+
ISD::MemIndexType IndexType;
6306+
SDValue Scale;
6307+
bool UniformBase = getUniformBase(Ptr, Base, Index, IndexType, Scale, this,
6308+
I.getParent(), VT.getScalarStoreSize());
6309+
6310+
unsigned AS = Ptr->getType()->getScalarType()->getPointerAddressSpace();
6311+
6312+
MachineMemOperand *MMO = DAG.getMachineFunction().getMachineMemOperand(
6313+
MachinePointerInfo(AS),
6314+
MachineMemOperand::MOLoad | MachineMemOperand::MOStore,
6315+
MemoryLocation::UnknownSize, Alignment, I.getAAMetadata(), Ranges);
6316+
6317+
if (!UniformBase) {
6318+
Base = DAG.getConstant(0, sdl, TLI.getPointerTy(DAG.getDataLayout()));
6319+
Index = getValue(Ptr);
6320+
IndexType = ISD::SIGNED_SCALED;
6321+
Scale =
6322+
DAG.getTargetConstant(1, sdl, TLI.getPointerTy(DAG.getDataLayout()));
6323+
}
6324+
6325+
EVT IdxVT = Index.getValueType();
6326+
EVT EltTy = IdxVT.getVectorElementType();
6327+
if (TLI.shouldExtendGSIndex(IdxVT, EltTy)) {
6328+
EVT NewIdxVT = IdxVT.changeVectorElementType(EltTy);
6329+
Index = DAG.getNode(ISD::SIGN_EXTEND, sdl, NewIdxVT, Index);
6330+
}
6331+
6332+
SDValue ID = DAG.getTargetConstant(IntrinsicID, sdl, MVT::i32);
6333+
6334+
SDValue Ops[] = {Root, Inc, Mask, Base, Index, Scale, ID};
6335+
SDValue Histogram = DAG.getMaskedHistogram(DAG.getVTList(MVT::Other), VT, sdl,
6336+
Ops, MMO, IndexType);
6337+
6338+
setValue(&I, Histogram);
6339+
DAG.setRoot(Histogram);
6340+
}
6341+
62846342
/// Lower the call to the specified intrinsic function.
62856343
void SelectionDAGBuilder::visitIntrinsicCall(const CallInst &I,
62866344
unsigned Intrinsic) {
@@ -7949,6 +8007,11 @@ void SelectionDAGBuilder::visitIntrinsicCall(const CallInst &I,
79498007
case Intrinsic::experimental_convergence_entry:
79508008
case Intrinsic::experimental_convergence_loop:
79518009
visitConvergenceControl(I, Intrinsic);
8010+
return;
8011+
case Intrinsic::experimental_vector_histogram_add: {
8012+
visitVectorHistogram(I, Intrinsic);
8013+
return;
8014+
}
79528015
}
79538016
}
79548017

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -624,6 +624,7 @@ class SelectionDAGBuilder {
624624
void visitTargetIntrinsic(const CallInst &I, unsigned Intrinsic);
625625
void visitConstrainedFPIntrinsic(const ConstrainedFPIntrinsic &FPI);
626626
void visitConvergenceControl(const CallInst &I, unsigned Intrinsic);
627+
void visitVectorHistogram(const CallInst &I, unsigned IntrinsicID);
627628
void visitVPLoad(const VPIntrinsic &VPIntrin, EVT VT,
628629
const SmallVectorImpl<SDValue> &OpValues);
629630
void visitVPStore(const VPIntrinsic &VPIntrin,

llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -529,6 +529,9 @@ std::string SDNode::getOperationName(const SelectionDAG *G) const {
529529
case ISD::PATCHPOINT:
530530
return "patchpoint";
531531

532+
case ISD::EXPERIMENTAL_VECTOR_HISTOGRAM:
533+
return "histogram";
534+
532535
// Vector Predication
533536
#define BEGIN_REGISTER_VP_SDNODE(SDID, LEGALARG, NAME, ...) \
534537
case ISD::SDID: \

0 commit comments

Comments
 (0)