Skip to content

Add an all-in-one histogram intrinsic, along with lowering for AArch64 #88106

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 13, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
54 changes: 54 additions & 0 deletions llvm/docs/LangRef.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19068,6 +19068,60 @@ will be on any later loop iteration.
This intrinsic will only return 0 if the input count is also 0. A non-zero input
count will produce a non-zero result.

'``llvm.experimental.vector.histogram.*``' Intrinsics
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: looks like a singular form Intrinsics is used every where else.

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

These intrinsics are overloaded.

These intrinsics represent histogram-like operations; that is, updating values
in memory that may not be contiguous, and where multiple elements within a
single vector may be updating the same value in memory.

The update operation must be specified as part of the intrinsic name. For a
simple histogram like the following the ``add`` operation would be used.

.. code-block:: c

void simple_histogram(int *restrict buckets, unsigned *indices, int N, int inc) {
for (int i = 0; i < N; ++i)
buckets[indices[i]] += inc;
}

More update operation types may be added in the future.

::

declare <8 x i32> @llvm.experimental.vector.histogram.add.v8p0.i32(<8 x ptr> %ptrs, i32 %inc, <8 x i1> %mask)
declare <vscale x 2 x i64> @llvm.experimental.vector.histogram.add.nxv2p0.i64(<vscale x 2 x ptr> %ptrs, i64 %inc, <vscale x 2 x i1> %mask)

Arguments:
""""""""""

The first argument is a vector of pointers to the memory locations to be
updated. The second argument is a scalar used to update the value from
memory; it must match the type of value to be updated. The final argument
is a mask value to exclude locations from being modified.

Semantics:
""""""""""

The '``llvm.experimental.vector.histogram.*``' intrinsics are used to perform
updates on potentially overlapping values in memory. The intrinsics represent
the follow sequence of operations:

1. Gather load from the ``ptrs`` operand, with element type matching that of
the ``inc`` operand.
2. Update of the values loaded from memory. In the case of the ``add``
update operation, this means:

1. Perform a cross-vector histogram operation on the ``ptrs`` operand.
2. Multiply the result by the ``inc`` operand.
3. Add the result to the values loaded from memory
3. Scatter the result of the update operation to the memory locations from
the ``ptrs`` operand.

The ``mask`` operand will apply to at least the gather and scatter operations.

Matrix Intrinsics
-----------------

Expand Down
7 changes: 7 additions & 0 deletions llvm/include/llvm/Analysis/TargetTransformInfo.h
Original file line number Diff line number Diff line change
Expand Up @@ -797,6 +797,9 @@ class TargetTransformInfo {
/// Return true if the target supports strided load.
bool isLegalStridedLoadStore(Type *DataType, Align Alignment) const;

// Return true if the target supports masked vector histograms.
bool isLegalMaskedVectorHistogram(Type *AddrType, Type *DataType) const;

/// Return true if this is an alternating opcode pattern that can be lowered
/// to a single instruction on the target. In X86 this is for the addsub
/// instruction which corrsponds to a Shuffle + Fadd + FSub pattern in IR.
Expand Down Expand Up @@ -1883,6 +1886,7 @@ class TargetTransformInfo::Concept {
virtual bool isLegalMaskedCompressStore(Type *DataType, Align Alignment) = 0;
virtual bool isLegalMaskedExpandLoad(Type *DataType, Align Alignment) = 0;
virtual bool isLegalStridedLoadStore(Type *DataType, Align Alignment) = 0;
virtual bool isLegalMaskedVectorHistogram(Type *AddrType, Type *DataType) = 0;
virtual bool isLegalAltInstr(VectorType *VecTy, unsigned Opcode0,
unsigned Opcode1,
const SmallBitVector &OpcodeMask) const = 0;
Expand Down Expand Up @@ -2386,6 +2390,9 @@ class TargetTransformInfo::Model final : public TargetTransformInfo::Concept {
bool isLegalStridedLoadStore(Type *DataType, Align Alignment) override {
return Impl.isLegalStridedLoadStore(DataType, Alignment);
}
bool isLegalMaskedVectorHistogram(Type *AddrType, Type *DataType) override {
return Impl.isLegalMaskedVectorHistogram(AddrType, DataType);
}
bool isLegalAltInstr(VectorType *VecTy, unsigned Opcode0, unsigned Opcode1,
const SmallBitVector &OpcodeMask) const override {
return Impl.isLegalAltInstr(VecTy, Opcode0, Opcode1, OpcodeMask);
Expand Down
4 changes: 4 additions & 0 deletions llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
Original file line number Diff line number Diff line change
Expand Up @@ -315,6 +315,10 @@ class TargetTransformInfoImplBase {
return false;
}

bool isLegalMaskedVectorHistogram(Type *AddrType, Type *DataType) const {
return false;
}

bool enableOrderedReductions() const { return false; }

bool hasDivRemOp(Type *DataType, bool IsSigned) const { return false; }
Expand Down
5 changes: 5 additions & 0 deletions llvm/include/llvm/CodeGen/ISDOpcodes.h
Original file line number Diff line number Diff line change
Expand Up @@ -1402,6 +1402,11 @@ enum NodeType {
// which is later translated to an implicit use in the MIR.
CONVERGENCECTRL_GLUE,

// Experimental vector histogram intrinsic
// Operands: Input Chain, Inc, Mask, Base, Index, Scale, ID
// Output: Output Chain
EXPERIMENTAL_VECTOR_HISTOGRAM,

/// BUILTIN_OP_END - This must be the last enum value in this list.
/// The target-specific pre-isel opcode values start here.
BUILTIN_OP_END
Expand Down
3 changes: 3 additions & 0 deletions llvm/include/llvm/CodeGen/SelectionDAG.h
Original file line number Diff line number Diff line change
Expand Up @@ -1526,6 +1526,9 @@ class SelectionDAG {
ArrayRef<SDValue> Ops, MachineMemOperand *MMO,
ISD::MemIndexType IndexType,
bool IsTruncating = false);
SDValue getMaskedHistogram(SDVTList VTs, EVT MemVT, const SDLoc &dl,
ArrayRef<SDValue> Ops, MachineMemOperand *MMO,
ISD::MemIndexType IndexType);

SDValue getGetFPEnv(SDValue Chain, const SDLoc &dl, SDValue Ptr, EVT MemVT,
MachineMemOperand *MMO);
Expand Down
33 changes: 33 additions & 0 deletions llvm/include/llvm/CodeGen/SelectionDAGNodes.h
Original file line number Diff line number Diff line change
Expand Up @@ -542,6 +542,7 @@ BEGIN_TWO_BYTE_PACK()
friend class MaskedLoadStoreSDNode;
friend class MaskedGatherScatterSDNode;
friend class VPGatherScatterSDNode;
friend class MaskedHistogramSDNode;

uint16_t : NumMemSDNodeBits;

Expand All @@ -552,6 +553,7 @@ BEGIN_TWO_BYTE_PACK()
// MaskedLoadStoreBaseSDNode => enum ISD::MemIndexedMode
// VPGatherScatterSDNode => enum ISD::MemIndexType
// MaskedGatherScatterSDNode => enum ISD::MemIndexType
// MaskedHistogramSDNode => enum ISD::MemIndexType
uint16_t AddressingMode : 3;
};
enum { NumLSBaseSDNodeBits = NumMemSDNodeBits + 3 };
Expand All @@ -564,6 +566,7 @@ BEGIN_TWO_BYTE_PACK()
friend class MaskedLoadSDNode;
friend class MaskedGatherSDNode;
friend class VPGatherSDNode;
friend class MaskedHistogramSDNode;

uint16_t : NumLSBaseSDNodeBits;

Expand Down Expand Up @@ -1420,6 +1423,7 @@ class MemSDNode : public SDNode {
return getOperand(2);
case ISD::MGATHER:
case ISD::MSCATTER:
case ISD::EXPERIMENTAL_VECTOR_HISTOGRAM:
return getOperand(3);
default:
return getOperand(1);
Expand Down Expand Up @@ -1468,6 +1472,7 @@ class MemSDNode : public SDNode {
case ISD::EXPERIMENTAL_VP_STRIDED_STORE:
case ISD::GET_FPENV_MEM:
case ISD::SET_FPENV_MEM:
case ISD::EXPERIMENTAL_VECTOR_HISTOGRAM:
return true;
default:
return N->isMemIntrinsic() || N->isTargetMemoryOpcode();
Expand Down Expand Up @@ -2953,6 +2958,34 @@ class MaskedScatterSDNode : public MaskedGatherScatterSDNode {
}
};

class MaskedHistogramSDNode : public MemSDNode {
public:
friend class SelectionDAG;

MaskedHistogramSDNode(unsigned Order, const DebugLoc &DL, SDVTList VTs,
EVT MemVT, MachineMemOperand *MMO,
ISD::MemIndexType IndexType)
: MemSDNode(ISD::EXPERIMENTAL_VECTOR_HISTOGRAM, Order, DL, VTs, MemVT,
MMO) {
LSBaseSDNodeBits.AddressingMode = IndexType;
}

ISD::MemIndexType getIndexType() const {
return static_cast<ISD::MemIndexType>(LSBaseSDNodeBits.AddressingMode);
}

const SDValue &getBasePtr() const { return getOperand(3); }
const SDValue &getIndex() const { return getOperand(4); }
const SDValue &getMask() const { return getOperand(2); }
const SDValue &getScale() const { return getOperand(5); }
const SDValue &getInc() const { return getOperand(1); }
const SDValue &getIntID() const { return getOperand(6); }

static bool classof(const SDNode *N) {
return N->getOpcode() == ISD::EXPERIMENTAL_VECTOR_HISTOGRAM;
}
};

class FPStateAccessSDNode : public MemSDNode {
public:
friend class SelectionDAG;
Expand Down
7 changes: 7 additions & 0 deletions llvm/include/llvm/IR/Intrinsics.td
Original file line number Diff line number Diff line change
Expand Up @@ -1855,6 +1855,13 @@ def int_experimental_vp_strided_load : DefaultAttrsIntrinsic<[llvm_anyvector_ty
llvm_i32_ty],
[ NoCapture<ArgIndex<0>>, IntrNoSync, IntrReadMem, IntrWillReturn, IntrArgMemOnly ]>;

// Experimental histogram
def int_experimental_vector_histogram_add : DefaultAttrsIntrinsic<[],
[ llvm_anyvector_ty, // Vector of pointers
llvm_anyint_ty, // Increment
LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>], // Mask
[ IntrArgMemOnly ]>;

// Operators
let IntrProperties = [IntrNoMem, IntrNoSync, IntrWillReturn] in {
// Integer arithmetic
Expand Down
5 changes: 5 additions & 0 deletions llvm/lib/Analysis/TargetTransformInfo.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -513,6 +513,11 @@ bool TargetTransformInfo::isLegalStridedLoadStore(Type *DataType,
return TTIImpl->isLegalStridedLoadStore(DataType, Alignment);
}

bool TargetTransformInfo::isLegalMaskedVectorHistogram(Type *AddrType,
Type *DataType) const {
return TTIImpl->isLegalMaskedVectorHistogram(AddrType, DataType);
}

bool TargetTransformInfo::enableOrderedReductions() const {
return TTIImpl->enableOrderedReductions();
}
Expand Down
38 changes: 38 additions & 0 deletions llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -9614,6 +9614,44 @@ SDValue SelectionDAG::getMaskedScatter(SDVTList VTs, EVT MemVT, const SDLoc &dl,
return V;
}

SDValue SelectionDAG::getMaskedHistogram(SDVTList VTs, EVT MemVT,
const SDLoc &dl, ArrayRef<SDValue> Ops,
MachineMemOperand *MMO,
ISD::MemIndexType IndexType) {
assert(Ops.size() == 7 && "Incompatible number of operands");

FoldingSetNodeID ID;
AddNodeIDNode(ID, ISD::EXPERIMENTAL_VECTOR_HISTOGRAM, VTs, Ops);
ID.AddInteger(MemVT.getRawBits());
ID.AddInteger(getSyntheticNodeSubclassData<MaskedHistogramSDNode>(
dl.getIROrder(), VTs, MemVT, MMO, IndexType));
ID.AddInteger(MMO->getPointerInfo().getAddrSpace());
ID.AddInteger(MMO->getFlags());
void *IP = nullptr;
if (SDNode *E = FindNodeOrInsertPos(ID, dl, IP)) {
cast<MaskedGatherSDNode>(E)->refineAlignment(MMO);
return SDValue(E, 0);
}

auto *N = newSDNode<MaskedHistogramSDNode>(dl.getIROrder(), dl.getDebugLoc(),
VTs, MemVT, MMO, IndexType);
createOperands(N, Ops);

assert(N->getMask().getValueType().getVectorElementCount() ==
N->getIndex().getValueType().getVectorElementCount() &&
"Vector width mismatch between mask and data");
assert(isa<ConstantSDNode>(N->getScale()) &&
N->getScale()->getAsAPIntVal().isPowerOf2() &&
"Scale should be a constant power of 2");
assert(N->getInc().getValueType().isInteger() && "Non integer update value");

CSEMap.InsertNode(N, IP);
InsertNode(N);
SDValue V(N, 0);
NewSDValueDbgMsg(V, "Creating new node: ", this);
return V;
}

SDValue SelectionDAG::getGetFPEnv(SDValue Chain, const SDLoc &dl, SDValue Ptr,
EVT MemVT, MachineMemOperand *MMO) {
assert(Chain.getValueType() == MVT::Other && "Invalid chain type");
Expand Down
63 changes: 63 additions & 0 deletions llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -6281,6 +6281,64 @@ void SelectionDAGBuilder::visitConvergenceControl(const CallInst &I,
}
}

void SelectionDAGBuilder::visitVectorHistogram(const CallInst &I,
unsigned IntrinsicID) {
// For now, we're only lowering an 'add' histogram.
// We can add others later, e.g. saturating adds, min/max.
assert(IntrinsicID == Intrinsic::experimental_vector_histogram_add &&
"Tried to lower unsupported histogram type");
SDLoc sdl = getCurSDLoc();
Value *Ptr = I.getOperand(0);
SDValue Inc = getValue(I.getOperand(1));
SDValue Mask = getValue(I.getOperand(2));

const TargetLowering &TLI = DAG.getTargetLoweringInfo();
DataLayout TargetDL = DAG.getDataLayout();
EVT VT = Inc.getValueType();
Align Alignment = DAG.getEVTAlign(VT);

const MDNode *Ranges = getRangeMetadata(I);

SDValue Root = DAG.getRoot();
SDValue Base;
SDValue Index;
ISD::MemIndexType IndexType;
SDValue Scale;
bool UniformBase = getUniformBase(Ptr, Base, Index, IndexType, Scale, this,
I.getParent(), VT.getScalarStoreSize());

unsigned AS = Ptr->getType()->getScalarType()->getPointerAddressSpace();

MachineMemOperand *MMO = DAG.getMachineFunction().getMachineMemOperand(
MachinePointerInfo(AS),
MachineMemOperand::MOLoad | MachineMemOperand::MOStore,
MemoryLocation::UnknownSize, Alignment, I.getAAMetadata(), Ranges);

if (!UniformBase) {
Base = DAG.getConstant(0, sdl, TLI.getPointerTy(DAG.getDataLayout()));
Index = getValue(Ptr);
IndexType = ISD::SIGNED_SCALED;
Scale =
DAG.getTargetConstant(1, sdl, TLI.getPointerTy(DAG.getDataLayout()));
}

EVT IdxVT = Index.getValueType();
EVT EltTy = IdxVT.getVectorElementType();
if (TLI.shouldExtendGSIndex(IdxVT, EltTy)) {
EVT NewIdxVT = IdxVT.changeVectorElementType(EltTy);
Index = DAG.getNode(ISD::SIGN_EXTEND, sdl, NewIdxVT, Index);
}

SDValue ID = DAG.getTargetConstant(IntrinsicID, sdl, MVT::i32);

SDValue Ops[] = {Root, Inc, Mask, Base, Index, Scale, ID};
SDValue Histogram = DAG.getMaskedHistogram(DAG.getVTList(MVT::Other), VT, sdl,
Ops, MMO, IndexType);

setValue(&I, Histogram);
DAG.setRoot(Histogram);
}

/// Lower the call to the specified intrinsic function.
void SelectionDAGBuilder::visitIntrinsicCall(const CallInst &I,
unsigned Intrinsic) {
Expand Down Expand Up @@ -7949,6 +8007,11 @@ void SelectionDAGBuilder::visitIntrinsicCall(const CallInst &I,
case Intrinsic::experimental_convergence_entry:
case Intrinsic::experimental_convergence_loop:
visitConvergenceControl(I, Intrinsic);
return;
case Intrinsic::experimental_vector_histogram_add: {
visitVectorHistogram(I, Intrinsic);
return;
}
}
}

Expand Down
1 change: 1 addition & 0 deletions llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h
Original file line number Diff line number Diff line change
Expand Up @@ -624,6 +624,7 @@ class SelectionDAGBuilder {
void visitTargetIntrinsic(const CallInst &I, unsigned Intrinsic);
void visitConstrainedFPIntrinsic(const ConstrainedFPIntrinsic &FPI);
void visitConvergenceControl(const CallInst &I, unsigned Intrinsic);
void visitVectorHistogram(const CallInst &I, unsigned IntrinsicID);
void visitVPLoad(const VPIntrinsic &VPIntrin, EVT VT,
const SmallVectorImpl<SDValue> &OpValues);
void visitVPStore(const VPIntrinsic &VPIntrin,
Expand Down
3 changes: 3 additions & 0 deletions llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -529,6 +529,9 @@ std::string SDNode::getOperationName(const SelectionDAG *G) const {
case ISD::PATCHPOINT:
return "patchpoint";

case ISD::EXPERIMENTAL_VECTOR_HISTOGRAM:
return "histogram";

// Vector Predication
#define BEGIN_REGISTER_VP_SDNODE(SDID, LEGALARG, NAME, ...) \
case ISD::SDID: \
Expand Down
Loading