Skip to content

[Intrinsics][AArch64] Add intrinsic to mask off aliasing vector lanes #117007

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 25 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
129b7b5
[Intrinsics][AArch64] Add intrinsic to mask off aliasing vector lanes
SamTebbs33 Nov 15, 2024
7323981
Rework lowering location
SamTebbs33 Jan 10, 2025
d98f300
Fix ISD node name string and remove shouldExpand function
SamTebbs33 Jan 15, 2025
ad85f4b
Format
SamTebbs33 Jan 16, 2025
5dff164
Move promote case
SamTebbs33 Jan 27, 2025
60177a7
Fix tablegen comment
SamTebbs33 Jan 27, 2025
60e9a1f
Remove DAGTypeLegalizer::
SamTebbs33 Jan 27, 2025
eef2bf0
Use getConstantOperandVal
SamTebbs33 Jan 27, 2025
161c027
Remove isPredicateCCSettingOp case
SamTebbs33 Jan 29, 2025
5841a79
Remove overloads for pointer and element size parameters
SamTebbs33 Jan 30, 2025
7243ccc
Clarify elementSize and writeAfterRead = 0
SamTebbs33 Jan 30, 2025
1a264e8
Add i=0 to VF-1
SamTebbs33 Jan 30, 2025
dab5d3e
Rename to get.nonalias.lane.mask
SamTebbs33 Jan 30, 2025
561f2d3
Fix pointer types in example
SamTebbs33 Jan 30, 2025
e3d6ce7
Remove shouldExpandGetAliasLaneMask
SamTebbs33 Jan 30, 2025
22687ff
Lower to ISD node rather than intrinsic
SamTebbs33 Jan 30, 2025
836c34c
Rename to noalias
SamTebbs33 Jan 31, 2025
e6d9909
Rename to loop.dependence.raw/war.mask
SamTebbs33 Feb 26, 2025
baae4c6
Rename in langref
SamTebbs33 Mar 10, 2025
6ecb0d3
Reword argument description
SamTebbs33 Mar 21, 2025
8a295fd
Fixup langref
SamTebbs33 May 20, 2025
b9616cb
IsWriteAfterRead -> IsReadAfterWrite and avoid using ops vector
SamTebbs33 May 20, 2025
360d723
Extend vXi1 setcc to account for intrinsic VT promotion
SamTebbs33 May 20, 2025
d4d8d8c
Remove experimental from intrinsic name
SamTebbs33 May 21, 2025
64a9714
Clean up vector type creation
SamTebbs33 May 21, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
121 changes: 121 additions & 0 deletions llvm/docs/LangRef.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23970,6 +23970,127 @@ Examples:
%wide.masked.load = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32>* %3, i32 4, <4 x i1> %active.lane.mask, <4 x i32> poison)


.. _int_loop_dependence_war_mask:

'``llvm.loop.dependence.war.mask.*``' Intrinsics
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Syntax:
"""""""
This is an overloaded intrinsic.

::

declare <4 x i1> @llvm.loop.dependence.war.mask.v4i1(ptr %ptrA, ptr %ptrB, i64 immarg %elementSize)
declare <8 x i1> @llvm.loop.dependence.war.mask.v8i1(ptr %ptrA, ptr %ptrB, i64 immarg %elementSize)
declare <16 x i1> @llvm.loop.dependence.war.mask.v16i1(ptr %ptrA, ptr %ptrB, i64 immarg %elementSize)
declare <vscale x 16 x i1> @llvm.loop.dependence.war.mask.nxv16i1(ptr %ptrA, ptr %ptrB, i64 immarg %elementSize)


Overview:
"""""""""

Given a scalar load from %ptrA, followed by a scalar store to %ptrB, this
instruction generates a mask where an active lane indicates that there is no
write-after-read hazard for this lane.

A write-after-read hazard occurs when a write-after-read sequence for a given
lane in a vector ends up being executed as a read-after-write sequence due to
the aliasing of pointers.

Arguments:
""""""""""

The first two arguments are pointers and the last argument is an immediate.
The result is a vector with the i1 element type.

Semantics:
""""""""""

``%elementSize`` is the size of the accessed elements in bytes.
The intrinsic returns ``poison`` if the distance between ``%prtA`` and ``%ptrB``
is smaller than ``VF * %elementsize`` and either ``%ptrA + VF * %elementSize``
or ``%ptrB + VF * %elementSize`` wrap.
The element of the result mask is active when no write-after-read hazard occurs,
meaning that:

* (ptrB - ptrA) <= 0 (guarantees that all lanes are loaded before any stores are
committed), or
* (ptrB - ptrA) >= elementSize * lane (guarantees that this lane is loaded
before the store to the same address is committed)

Examples:
"""""""""

.. code-block:: llvm

%loop.dependence.mask = call <4 x i1> @llvm.loop.dependence.war.mask.v4i1(ptr %ptrA, ptr %ptrB, i64 4)
%vecA = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(ptr %ptrA, i32 4, <4 x i1> %loop.dependence.mask, <4 x i32> poison)
[...]
call @llvm.masked.store.v4i32.p0v4i32(<4 x i32> %vecA, ptr %ptrB, i32 4, <4 x i1> %loop.dependence.mask)

.. _int_loop_dependence_raw_mask:

'``llvm.loop.dependence.raw.mask.*``' Intrinsics
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Syntax:
"""""""
This is an overloaded intrinsic.

::

declare <4 x i1> @llvm.loop.dependence.raw.mask.v4i1(ptr %ptrA, ptr %ptrB, i64 immarg %elementSize)
declare <8 x i1> @llvm.loop.dependence.raw.mask.v8i1(ptr %ptrA, ptr %ptrB, i64 immarg %elementSize)
declare <16 x i1> @llvm.loop.dependence.raw.mask.v16i1(ptr %ptrA, ptr %ptrB, i64 immarg %elementSize)
declare <vscale x 16 x i1> @llvm.loop.dependence.raw.mask.nxv16i1(ptr %ptrA, ptr %ptrB, i64 immarg %elementSize)


Overview:
"""""""""

Given a scalar store to %ptrA, followed by a scalar load from %ptrB, this
instruction generates a mask where an active lane indicates that there is no
read-after-write hazard for this lane and that this lane does not introduce any
new store-to-load forwarding hazard.

A read-after-write hazard occurs when a read-after-write sequence for a given
lane in a vector ends up being executed as a write-after-read sequence due to
the aliasing of pointers.


Arguments:
""""""""""

The first two arguments are pointers and the last argument is an immediate.
The result is a vector with the i1 element type.

Semantics:
""""""""""

``%elementSize`` is the size of the accessed elements in bytes.
The intrinsic returns ``poison`` if the distance between ``%prtA`` and ``%ptrB``
is smaller than ``VF * %elementsize`` and either ``%ptrA + VF * %elementSize``
or ``%ptrB + VF * %elementSize`` wrap.
The element of the result mask is active when no read-after-write hazard occurs, meaning that:

abs(ptrB - ptrA) >= elementSize * lane (guarantees that the store of this lane
is committed before loading from this address)

Note that the case where (ptrB - ptrA) < 0 does not result in any
read-after-write hazards, but may introduce new store-to-load-forwarding stalls
where both the store and load partially access the same addresses.

Examples:
"""""""""

.. code-block:: llvm

%loop.dependence.mask = call <4 x i1> @llvm.loop.dependence.raw.mask.v4i1(ptr %ptrA, ptr %ptrB, i64 4)
call @llvm.masked.store.v4i32.p0v4i32(<4 x i32> %vecA, ptr %ptrA, i32 4, <4 x i1> %loop.dependence.mask)
[...]
%vecB = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(ptr %ptrB, i32 4, <4 x i1> %loop.dependence.mask, <4 x i32> poison)

.. _int_experimental_vp_splice:

'``llvm.experimental.vp.splice``' Intrinsic
Expand Down
6 changes: 6 additions & 0 deletions llvm/include/llvm/CodeGen/ISDOpcodes.h
Original file line number Diff line number Diff line change
Expand Up @@ -1556,6 +1556,12 @@ enum NodeType {
// bits conform to getBooleanContents similar to the SETCC operator.
GET_ACTIVE_LANE_MASK,

// The `llvm.experimental.loop.dependence.{war, raw}.mask` intrinsics
// Operands: Load pointer, Store pointer, Element size
// Output: Mask
LOOP_DEPENDENCE_WAR_MASK,
LOOP_DEPENDENCE_RAW_MASK,

// llvm.clear_cache intrinsic
// Operands: Input Chain, Start Addres, End Address
// Outputs: Output Chain
Expand Down
10 changes: 10 additions & 0 deletions llvm/include/llvm/IR/Intrinsics.td
Original file line number Diff line number Diff line change
Expand Up @@ -2399,6 +2399,16 @@ let IntrProperties = [IntrNoMem, ImmArg<ArgIndex<1>>] in {
llvm_i32_ty]>;
}

def int_loop_dependence_raw_mask:
DefaultAttrsIntrinsic<[llvm_anyvector_ty],
[llvm_ptr_ty, llvm_ptr_ty, llvm_i64_ty],
[IntrNoMem, IntrNoSync, IntrWillReturn, ImmArg<ArgIndex<2>>]>;

def int_loop_dependence_war_mask:
DefaultAttrsIntrinsic<[llvm_anyvector_ty],
[llvm_ptr_ty, llvm_ptr_ty, llvm_i64_ty],
[IntrNoMem, IntrNoSync, IntrWillReturn, ImmArg<ArgIndex<2>>]>;

def int_get_active_lane_mask:
DefaultAttrsIntrinsic<[llvm_anyvector_ty],
[llvm_anyint_ty, LLVMMatchType<1>],
Expand Down
22 changes: 22 additions & 0 deletions llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -322,6 +322,11 @@ void DAGTypeLegalizer::PromoteIntegerResult(SDNode *N, unsigned ResNo) {
Res = PromoteIntRes_VP_REDUCE(N);
break;

case ISD::LOOP_DEPENDENCE_WAR_MASK:
case ISD::LOOP_DEPENDENCE_RAW_MASK:
Res = PromoteIntRes_LOOP_DEPENDENCE_MASK(N);
break;

case ISD::FREEZE:
Res = PromoteIntRes_FREEZE(N);
break;
Expand Down Expand Up @@ -369,6 +374,12 @@ SDValue DAGTypeLegalizer::PromoteIntRes_MERGE_VALUES(SDNode *N,
return GetPromotedInteger(Op);
}

SDValue DAGTypeLegalizer::PromoteIntRes_LOOP_DEPENDENCE_MASK(SDNode *N) {
EVT VT = N->getValueType(0);
EVT NewVT = TLI.getTypeToTransformTo(*DAG.getContext(), VT);
return DAG.getNode(N->getOpcode(), SDLoc(N), NewVT, N->ops());
}

SDValue DAGTypeLegalizer::PromoteIntRes_AssertSext(SDNode *N) {
// Sign-extend the new bits, and continue the assertion.
SDValue Op = SExtPromotedInteger(N->getOperand(0));
Expand Down Expand Up @@ -2095,6 +2106,10 @@ bool DAGTypeLegalizer::PromoteIntegerOperand(SDNode *N, unsigned OpNo) {
case ISD::PARTIAL_REDUCE_SMLA:
Res = PromoteIntOp_PARTIAL_REDUCE_MLA(N);
break;
case ISD::LOOP_DEPENDENCE_RAW_MASK:
case ISD::LOOP_DEPENDENCE_WAR_MASK:
Res = PromoteIntOp_LOOP_DEPENDENCE_MASK(N, OpNo);
break;
}

// If the result is null, the sub-method took care of registering results etc.
Expand Down Expand Up @@ -2896,6 +2911,13 @@ SDValue DAGTypeLegalizer::PromoteIntOp_PARTIAL_REDUCE_MLA(SDNode *N) {
return SDValue(DAG.UpdateNodeOperands(N, NewOps), 0);
}

SDValue DAGTypeLegalizer::PromoteIntOp_LOOP_DEPENDENCE_MASK(SDNode *N,
unsigned OpNo) {
SmallVector<SDValue, 4> NewOps(N->ops());
NewOps[OpNo] = GetPromotedInteger(N->getOperand(OpNo));
return SDValue(DAG.UpdateNodeOperands(N, NewOps), 0);
}

//===----------------------------------------------------------------------===//
// Integer Result Expansion
//===----------------------------------------------------------------------===//
Expand Down
2 changes: 2 additions & 0 deletions llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
Original file line number Diff line number Diff line change
Expand Up @@ -381,6 +381,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
SDValue PromoteIntRes_VECTOR_FIND_LAST_ACTIVE(SDNode *N);
SDValue PromoteIntRes_GET_ACTIVE_LANE_MASK(SDNode *N);
SDValue PromoteIntRes_PARTIAL_REDUCE_MLA(SDNode *N);
SDValue PromoteIntRes_LOOP_DEPENDENCE_MASK(SDNode *N);

// Integer Operand Promotion.
bool PromoteIntegerOperand(SDNode *N, unsigned OpNo);
Expand Down Expand Up @@ -434,6 +435,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
SDValue PromoteIntOp_VECTOR_FIND_LAST_ACTIVE(SDNode *N, unsigned OpNo);
SDValue PromoteIntOp_GET_ACTIVE_LANE_MASK(SDNode *N);
SDValue PromoteIntOp_PARTIAL_REDUCE_MLA(SDNode *N);
SDValue PromoteIntOp_LOOP_DEPENDENCE_MASK(SDNode *N, unsigned OpNo);

void SExtOrZExtPromotedOperands(SDValue &LHS, SDValue &RHS);
void PromoteSetCCOperands(SDValue &LHS,SDValue &RHS, ISD::CondCode Code);
Expand Down
51 changes: 51 additions & 0 deletions llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -138,6 +138,7 @@ class VectorLegalizer {
SDValue ExpandVP_FNEG(SDNode *Node);
SDValue ExpandVP_FABS(SDNode *Node);
SDValue ExpandVP_FCOPYSIGN(SDNode *Node);
SDValue ExpandLOOP_DEPENDENCE_MASK(SDNode *N);
SDValue ExpandSELECT(SDNode *Node);
std::pair<SDValue, SDValue> ExpandLoad(SDNode *N);
SDValue ExpandStore(SDNode *N);
Expand Down Expand Up @@ -469,6 +470,8 @@ SDValue VectorLegalizer::LegalizeOp(SDValue Op) {
case ISD::VECTOR_COMPRESS:
case ISD::SCMP:
case ISD::UCMP:
case ISD::LOOP_DEPENDENCE_WAR_MASK:
case ISD::LOOP_DEPENDENCE_RAW_MASK:
Action = TLI.getOperationAction(Node->getOpcode(), Node->getValueType(0));
break;
case ISD::SMULFIX:
Expand Down Expand Up @@ -1262,6 +1265,10 @@ void VectorLegalizer::Expand(SDNode *Node, SmallVectorImpl<SDValue> &Results) {
case ISD::UCMP:
Results.push_back(TLI.expandCMP(Node, DAG));
return;
case ISD::LOOP_DEPENDENCE_WAR_MASK:
case ISD::LOOP_DEPENDENCE_RAW_MASK:
Results.push_back(ExpandLOOP_DEPENDENCE_MASK(Node));
return;

case ISD::FADD:
case ISD::FMUL:
Expand Down Expand Up @@ -1767,6 +1774,50 @@ SDValue VectorLegalizer::ExpandVP_FCOPYSIGN(SDNode *Node) {
return DAG.getNode(ISD::BITCAST, DL, VT, CopiedSign);
}

SDValue VectorLegalizer::ExpandLOOP_DEPENDENCE_MASK(SDNode *N) {
SDLoc DL(N);
SDValue SourceValue = N->getOperand(0);
SDValue SinkValue = N->getOperand(1);
SDValue EltSize = N->getOperand(2);

bool IsReadAfterWrite = N->getOpcode() == ISD::LOOP_DEPENDENCE_RAW_MASK;
auto VT = N->getValueType(0);
auto PtrVT = SourceValue->getValueType(0);

SDValue Diff = DAG.getNode(ISD::SUB, DL, PtrVT, SinkValue, SourceValue);
if (IsReadAfterWrite)
Diff = DAG.getNode(ISD::ABS, DL, PtrVT, Diff);

Diff = DAG.getNode(ISD::SDIV, DL, PtrVT, Diff, EltSize);

// If the difference is positive then some elements may alias
EVT CmpVT = TLI.getSetCCResultType(DAG.getDataLayout(), *DAG.getContext(),
Diff.getValueType());
SDValue Zero = DAG.getTargetConstant(0, DL, PtrVT);
SDValue Cmp = DAG.getSetCC(DL, CmpVT, Diff, Zero,
IsReadAfterWrite ? ISD::SETEQ : ISD::SETLE);

// Create the lane mask
EVT SplatTY = VT.changeElementType(PtrVT);
SDValue DiffSplat = DAG.getSplat(SplatTY, DL, Diff);
SDValue VectorStep = DAG.getStepVector(DL, SplatTY);
EVT MaskVT = VT.changeElementType(MVT::i1);
SDValue DiffMask =
DAG.getSetCC(DL, MaskVT, VectorStep, DiffSplat, ISD::CondCode::SETULT);

EVT VTElementTy = VT.getVectorElementType();
// Extend the diff setcc in case the intrinsic has been promoted to a vector
// type with elements larger than i1
if (VTElementTy.getScalarSizeInBits() > MaskVT.getScalarSizeInBits())
DiffMask = DAG.getNode(ISD::ANY_EXTEND, DL, VT, DiffMask);

// Splat the compare result then OR it with the lane mask
if (CmpVT.getScalarSizeInBits() < VTElementTy.getScalarSizeInBits())
Cmp = DAG.getNode(ISD::ZERO_EXTEND, DL, VTElementTy, Cmp);
SDValue Splat = DAG.getSplat(VT, DL, Cmp);
return DAG.getNode(ISD::OR, DL, VT, DiffMask, Splat);
}

void VectorLegalizer::ExpandFP_TO_UINT(SDNode *Node,
SmallVectorImpl<SDValue> &Results) {
// Attempt to expand using TargetLowering.
Expand Down
10 changes: 10 additions & 0 deletions llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -8244,6 +8244,16 @@ void SelectionDAGBuilder::visitIntrinsicCall(const CallInst &I,
visitVectorExtractLastActive(I, Intrinsic);
return;
}
case Intrinsic::loop_dependence_war_mask:
case Intrinsic::loop_dependence_raw_mask: {
auto IntrinsicVT = EVT::getEVT(I.getType());
unsigned ID = Intrinsic == Intrinsic::loop_dependence_war_mask
? ISD::LOOP_DEPENDENCE_WAR_MASK
: ISD::LOOP_DEPENDENCE_RAW_MASK;
setValue(&I,
DAG.getNode(ID, sdl, IntrinsicVT, getValue(I.getOperand(0)),
getValue(I.getOperand(1)), getValue(I.getOperand(2))));
}
}
}

Expand Down
4 changes: 4 additions & 0 deletions llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -585,6 +585,10 @@ std::string SDNode::getOperationName(const SelectionDAG *G) const {
return "partial_reduce_umla";
case ISD::PARTIAL_REDUCE_SMLA:
return "partial_reduce_smla";
case ISD::LOOP_DEPENDENCE_WAR_MASK:
return "loop_dep_war";
case ISD::LOOP_DEPENDENCE_RAW_MASK:
return "loop_dep_raw";

// Vector Predication
#define BEGIN_REGISTER_VP_SDNODE(SDID, LEGALARG, NAME, ...) \
Expand Down
4 changes: 4 additions & 0 deletions llvm/lib/CodeGen/TargetLoweringBase.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -839,6 +839,10 @@ void TargetLoweringBase::initActions() {
// Masked vector extracts default to expand.
setOperationAction(ISD::VECTOR_FIND_LAST_ACTIVE, VT, Expand);

// Lane mask with non-aliasing lanes enabled default to expand
setOperationAction(ISD::LOOP_DEPENDENCE_RAW_MASK, VT, Expand);
setOperationAction(ISD::LOOP_DEPENDENCE_WAR_MASK, VT, Expand);

// FP environment operations default to expand.
setOperationAction(ISD::GET_FPENV, VT, Expand);
setOperationAction(ISD::SET_FPENV, VT, Expand);
Expand Down
Loading