Skip to content

Commit cba9bd5

Browse files
authored
[DirectX] Implement the resource.load.rawbuffer intrinsic (llvm#121012)
This introduces `@llvm.dx.resource.load.rawbuffer` and generalizes the buffer load docs under DirectX/DXILResources. This resolves the "load" parts of llvm#106188
1 parent 8312876 commit cba9bd5

File tree

9 files changed

+519
-30
lines changed

9 files changed

+519
-30
lines changed

llvm/docs/DirectX/DXILResources.rst

Lines changed: 128 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -318,39 +318,43 @@ Examples:
318318
%ptr = call ptr @llvm.dx.resource.getpointer.p0.tdx.TypedBuffer_v4f32_0_0_0t(
319319
target("dx.TypedBuffer", <4 x float>, 0, 0, 0) %buffer, i32 %index)
320320
321-
16-byte Loads, Samples, and Gathers
322-
-----------------------------------
323-
324-
*relevant types: TypedBuffer, CBuffer, and Textures*
325-
326-
TypedBuffer, CBuffer, and Texture loads, as well as samples and gathers, can
327-
return 1 to 4 elements from the given resource, to a maximum of 16 bytes of
328-
data. DXIL's modeling of this is influenced by DirectX and DXBC's history and
329-
it generally treats these operations as returning 4 32-bit values. For 16-bit
330-
elements the values are 16-bit values, and for 64-bit values the operations
331-
return 4 32-bit integers and emit further code to construct the double.
332-
333-
In DXIL, these operations return `ResRet`_ and `CBufRet`_ values, are structs
334-
containing 4 elements of the same type, and in the case of `ResRet` a 5th
335-
element that is used by the `CheckAccessFullyMapped`_ operation.
336-
337-
In LLVM IR the intrinsics will return the contained type of the resource
338-
instead. That is, ``llvm.dx.resource.load.typedbuffer`` from a
339-
``Buffer<float>`` would return a single float, from ``Buffer<float4>`` a vector
340-
of 4 floats, and from ``Buffer<double2>`` a vector of two doubles, etc. The
341-
operations are then expanded out to match DXIL's format during lowering.
342-
343-
In order to support ``CheckAccessFullyMapped``, we need these intrinsics to
344-
return an anonymous struct with element-0 being the contained type, and
345-
element-1 being the ``i1`` result of a ``CheckAccessFullyMapped`` call. We
346-
don't have a separate call to ``CheckAccessFullyMapped`` at all, since that's
347-
the only operation that can possibly be done on this value. In practice this
348-
may mean we insert a DXIL operation for the check when this was missing in the
349-
HLSL source, but this actually matches DXC's behaviour in practice.
321+
Loads, Samples, and Gathers
322+
---------------------------
323+
324+
*relevant types: Buffers, CBuffers, and Textures*
325+
326+
All load, sample, and gather operations in DXIL return a `ResRet`_ type, and
327+
CBuffer loads return a similar `CBufRet`_ type. These types are structs
328+
containing 4 elements of some basic type, and in the case of `ResRet` a 5th
329+
element that is used by the `CheckAccessFullyMapped`_ operation. Some of these
330+
operations, like `RawBufferLoad`_ include a mask and/or alignment that tell us
331+
some information about how to interpret those four values.
332+
333+
In the LLVM IR representations of these operations we instead return scalars or
334+
vectors, but we keep the requirement that we only return up to 4 elements of a
335+
basic type. This avoids some unnecessary casting and structure manipulation in
336+
the intermediate format while also keeping lowering to DXIL straightforward.
337+
338+
LLVM intrinsics that map to operations returning `ResRet` return an anonymous
339+
struct with element-0 being the scalar or vector type, and element-1 being the
340+
``i1`` result of a ``CheckAccessFullyMapped`` call. We don't have a separate
341+
call to ``CheckAccessFullyMapped`` at all, since that's the only operation that
342+
can possibly be done on this value. In practice this may mean we insert a DXIL
343+
operation for the check when this was missing in the HLSL source, but this
344+
actually matches DXC's behaviour in practice.
345+
346+
For TypedBuffer and Texture, we map directly from the contained type of the
347+
resource to the return value of the intrinsic. Since these resources are
348+
constrained to contain only scalars and vectors of up to 4 elements, the
349+
lowering to DXIL ops is generally straightforward. The one exception we have
350+
here is that `double` types in the elements are special - these are allowed in
351+
the LLVM intrinsics, but are lowered to pairs of `i32` followed by
352+
``MakeDouble`` operations for DXIL.
350353

351354
.. _ResRet: https://github.com/microsoft/DirectXShaderCompiler/blob/main/docs/DXIL.rst#resource-operation-return-types
352355
.. _CBufRet: https://github.com/microsoft/DirectXShaderCompiler/blob/main/docs/DXIL.rst#cbufferloadlegacy
353356
.. _CheckAccessFullyMapped: https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/checkaccessfullymapped
357+
.. _RawBufferLoad: https://github.com/microsoft/DirectXShaderCompiler/blob/main/docs/DXIL.rst#rawbufferload
354358

355359
.. list-table:: ``@llvm.dx.resource.load.typedbuffer``
356360
:header-rows: 1
@@ -392,6 +396,101 @@ Examples:
392396
@llvm.dx.resource.load.typedbuffer.v2f64.tdx.TypedBuffer_v2f64_0_0t(
393397
target("dx.TypedBuffer", <2 x double>, 0, 0, 0) %buffer, i32 %index)
394398
399+
For RawBuffer, an HLSL load operation may return an arbitrarily sized result,
400+
but we still constrain the LLVM intrinsic to return only up to 4 elements of a
401+
basic type. This means that larger loads are represented as a series of loads,
402+
which matches DXIL. Unlike in the `RawBufferLoad`_ operation, we do not need
403+
arguments for the mask/type size and alignment, since we can calculate these
404+
from the return type of the load during lowering.
405+
406+
.. _RawBufferLoad: https://github.com/microsoft/DirectXShaderCompiler/blob/main/docs/DXIL.rst#rawbufferload
407+
408+
.. list-table:: ``@llvm.dx.resource.load.rawbuffer``
409+
:header-rows: 1
410+
411+
* - Argument
412+
-
413+
- Type
414+
- Description
415+
* - Return value
416+
-
417+
- A structure of a scalar or vector and the check bit
418+
- The data loaded from the buffer and the check bit
419+
* - ``%buffer``
420+
- 0
421+
- ``target(dx.RawBuffer, ...)``
422+
- The buffer to load from
423+
* - ``%index``
424+
- 1
425+
- ``i32``
426+
- Index into the buffer
427+
* - ``%offset``
428+
- 2
429+
- ``i32``
430+
- Offset into the structure at the given index
431+
432+
Examples:
433+
434+
.. code-block:: llvm
435+
436+
; float
437+
%ret = call {float, i1}
438+
@llvm.dx.resource.load.rawbuffer.f32.tdx.RawBuffer_f32_0_0_0t(
439+
target("dx.RawBuffer", float, 0, 0, 0) %buffer,
440+
i32 %index,
441+
i32 0)
442+
%ret = call {float, i1}
443+
@llvm.dx.resource.load.rawbuffer.f32.tdx.RawBuffer_i8_0_0_0t(
444+
target("dx.RawBuffer", i8, 0, 0, 0) %buffer,
445+
i32 %byte_offset,
446+
i32 0)
447+
448+
; float4
449+
%ret = call {<4 x float>, i1}
450+
@llvm.dx.resource.load.rawbuffer.v4f32.tdx.RawBuffer_v4f32_0_0_0t(
451+
target("dx.RawBuffer", float, 0, 0, 0) %buffer,
452+
i32 %index,
453+
i32 0)
454+
%ret = call {float, i1}
455+
@llvm.dx.resource.load.rawbuffer.v4f32.tdx.RawBuffer_i8_0_0_0t(
456+
target("dx.RawBuffer", i8, 0, 0, 0) %buffer,
457+
i32 %byte_offset,
458+
i32 0)
459+
460+
; struct S0 { float4 f; int4 i; };
461+
%ret = call {<4 x float>, i1}
462+
@llvm.dx.resource.load.rawbuffer.v4f32.tdx.RawBuffer_sl_v4f32v4i32s_0_0t(
463+
target("dx.RawBuffer", {<4 x float>, <4 x i32>}, 0, 0, 0) %buffer,
464+
i32 %index,
465+
i32 0)
466+
%ret = call {<4 x i32>, i1}
467+
@llvm.dx.resource.load.rawbuffer.v4i32.tdx.RawBuffer_sl_v4f32v4i32s_0_0t(
468+
target("dx.RawBuffer", {<4 x float>, <4 x i32>}, 0, 0, 0) %buffer,
469+
i32 %index,
470+
i32 1)
471+
472+
; struct Q { float4 f; int3 i; }
473+
; struct R { int z; S x; }
474+
%ret = call {i32, i1}
475+
@llvm.dx.resource.load.rawbuffer.i32(
476+
target("dx.RawBuffer", {i32, {<4 x float>, <3 x i32>}}, 0, 0, 0)
477+
%buffer, i32 %index, i32 0)
478+
%ret = call {<4 x float>, i1}
479+
@llvm.dx.resource.load.rawbuffer.i32(
480+
target("dx.RawBuffer", {i32, {<4 x float>, <3 x i32>}}, 0, 0, 0)
481+
%buffer, i32 %index, i32 4)
482+
%ret = call {<3 x i32>, i1}
483+
@llvm.dx.resource.load.rawbuffer.i32(
484+
target("dx.RawBuffer", {i32, {<4 x float>, <3 x i32>}}, 0, 0, 0)
485+
%buffer, i32 %index, i32 20)
486+
487+
; byteaddressbuf.Load<int64_t4>
488+
%ret = call {<4 x i64>, i1}
489+
@llvm.dx.resource.load.rawbuffer.v4i64.tdx.RawBuffer_i8_0_0t(
490+
target("dx.RawBuffer", i8, 0, 0, 0) %buffer,
491+
i32 %byte_offset,
492+
i32 0)
493+
395494
Texture and Typed Buffer Stores
396495
-------------------------------
397496

llvm/include/llvm/IR/IntrinsicsDirectX.td

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,10 @@ def int_dx_resource_load_typedbuffer
3636
def int_dx_resource_store_typedbuffer
3737
: DefaultAttrsIntrinsic<[], [llvm_any_ty, llvm_i32_ty, llvm_anyvector_ty],
3838
[IntrWriteMem]>;
39+
def int_dx_resource_load_rawbuffer
40+
: DefaultAttrsIntrinsic<[llvm_any_ty, llvm_i1_ty],
41+
[llvm_any_ty, llvm_i32_ty, llvm_i32_ty],
42+
[IntrReadMem]>;
3943

4044
def int_dx_resource_updatecounter
4145
: DefaultAttrsIntrinsic<[llvm_i32_ty], [llvm_any_ty, llvm_i8_ty],

llvm/lib/Target/DirectX/DXIL.td

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,8 +42,10 @@ def FloatTy : DXILOpParamType;
4242
def DoubleTy : DXILOpParamType;
4343
def ResRetHalfTy : DXILOpParamType;
4444
def ResRetFloatTy : DXILOpParamType;
45+
def ResRetDoubleTy : DXILOpParamType;
4546
def ResRetInt16Ty : DXILOpParamType;
4647
def ResRetInt32Ty : DXILOpParamType;
48+
def ResRetInt64Ty : DXILOpParamType;
4749
def HandleTy : DXILOpParamType;
4850
def ResBindTy : DXILOpParamType;
4951
def ResPropsTy : DXILOpParamType;
@@ -890,6 +892,23 @@ def SplitDouble : DXILOp<102, splitDouble> {
890892
let attributes = [Attributes<DXIL1_0, [ReadNone]>];
891893
}
892894

895+
def RawBufferLoad : DXILOp<139, rawBufferLoad> {
896+
let Doc = "reads from a raw buffer and structured buffer";
897+
// Handle, Coord0, Coord1, Mask, Alignment
898+
let arguments = [HandleTy, Int32Ty, Int32Ty, Int8Ty, Int32Ty];
899+
let result = OverloadTy;
900+
let overloads = [
901+
Overloads<DXIL1_2,
902+
[ResRetHalfTy, ResRetFloatTy, ResRetInt16Ty, ResRetInt32Ty]>,
903+
Overloads<DXIL1_3,
904+
[
905+
ResRetHalfTy, ResRetFloatTy, ResRetDoubleTy, ResRetInt16Ty,
906+
ResRetInt32Ty, ResRetInt64Ty
907+
]>
908+
];
909+
let stages = [Stages<DXIL1_2, [all_stages]>];
910+
}
911+
893912
def Dot4AddI8Packed : DXILOp<163, dot4AddPacked> {
894913
let Doc = "signed dot product of 4 x i8 vectors packed into i32, with "
895914
"accumulate to i32";

llvm/lib/Target/DirectX/DXILOpBuilder.cpp

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -263,10 +263,14 @@ static Type *getTypeFromOpParamType(OpParamType Kind, LLVMContext &Ctx,
263263
return getResRetType(Type::getHalfTy(Ctx));
264264
case OpParamType::ResRetFloatTy:
265265
return getResRetType(Type::getFloatTy(Ctx));
266+
case OpParamType::ResRetDoubleTy:
267+
return getResRetType(Type::getDoubleTy(Ctx));
266268
case OpParamType::ResRetInt16Ty:
267269
return getResRetType(Type::getInt16Ty(Ctx));
268270
case OpParamType::ResRetInt32Ty:
269271
return getResRetType(Type::getInt32Ty(Ctx));
272+
case OpParamType::ResRetInt64Ty:
273+
return getResRetType(Type::getInt64Ty(Ctx));
270274
case OpParamType::HandleTy:
271275
return getHandleType(Ctx);
272276
case OpParamType::ResBindTy:

llvm/lib/Target/DirectX/DXILOpLowering.cpp

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -542,6 +542,48 @@ class OpLowerer {
542542
});
543543
}
544544

545+
[[nodiscard]] bool lowerRawBufferLoad(Function &F) {
546+
Triple TT(Triple(M.getTargetTriple()));
547+
VersionTuple DXILVersion = TT.getDXILVersion();
548+
const DataLayout &DL = F.getDataLayout();
549+
IRBuilder<> &IRB = OpBuilder.getIRB();
550+
Type *Int8Ty = IRB.getInt8Ty();
551+
Type *Int32Ty = IRB.getInt32Ty();
552+
553+
return replaceFunction(F, [&](CallInst *CI) -> Error {
554+
IRB.SetInsertPoint(CI);
555+
556+
Type *OldTy = cast<StructType>(CI->getType())->getElementType(0);
557+
Type *ScalarTy = OldTy->getScalarType();
558+
Type *NewRetTy = OpBuilder.getResRetType(ScalarTy);
559+
560+
Value *Handle =
561+
createTmpHandleCast(CI->getArgOperand(0), OpBuilder.getHandleType());
562+
Value *Index0 = CI->getArgOperand(1);
563+
Value *Index1 = CI->getArgOperand(2);
564+
uint64_t NumElements =
565+
DL.getTypeSizeInBits(OldTy) / DL.getTypeSizeInBits(ScalarTy);
566+
Value *Mask = ConstantInt::get(Int8Ty, ~(~0U << NumElements));
567+
Value *Align =
568+
ConstantInt::get(Int32Ty, DL.getPrefTypeAlign(ScalarTy).value());
569+
570+
Expected<CallInst *> OpCall =
571+
DXILVersion >= VersionTuple(1, 2)
572+
? OpBuilder.tryCreateOp(OpCode::RawBufferLoad,
573+
{Handle, Index0, Index1, Mask, Align},
574+
CI->getName(), NewRetTy)
575+
: OpBuilder.tryCreateOp(OpCode::BufferLoad,
576+
{Handle, Index0, Index1}, CI->getName(),
577+
NewRetTy);
578+
if (Error E = OpCall.takeError())
579+
return E;
580+
if (Error E = replaceResRetUses(CI, *OpCall, /*HasCheckBit=*/true))
581+
return E;
582+
583+
return Error::success();
584+
});
585+
}
586+
545587
[[nodiscard]] bool lowerUpdateCounter(Function &F) {
546588
IRBuilder<> &IRB = OpBuilder.getIRB();
547589
Type *Int32Ty = IRB.getInt32Ty();
@@ -736,6 +778,9 @@ class OpLowerer {
736778
case Intrinsic::dx_resource_store_typedbuffer:
737779
HasErrors |= lowerTypedBufferStore(F);
738780
break;
781+
case Intrinsic::dx_resource_load_rawbuffer:
782+
HasErrors |= lowerRawBufferLoad(F);
783+
break;
739784
case Intrinsic::dx_resource_updatecounter:
740785
HasErrors |= lowerUpdateCounter(F);
741786
break;
Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
; RUN: opt -S -dxil-op-lower %s | FileCheck %s
2+
; Before SM6.2 ByteAddressBuffer and StructuredBuffer lower to bufferLoad.
3+
4+
target triple = "dxil-pc-shadermodel6.1-compute"
5+
6+
; CHECK-LABEL: define void @loadf32_struct
7+
define void @loadf32_struct(i32 %index) {
8+
%buffer = call target("dx.RawBuffer", float, 0, 0, 0)
9+
@llvm.dx.resource.handlefrombinding.tdx.RawBuffer_f32_0_0_0(
10+
i32 0, i32 0, i32 1, i32 0, i1 false)
11+
12+
; CHECK: [[DATA:%.*]] = call %dx.types.ResRet.f32 @dx.op.bufferLoad.f32(i32 68, %dx.types.Handle %{{.*}}, i32 %index, i32 0)
13+
%load = call {float, i1}
14+
@llvm.dx.resource.load.rawbuffer.f32.tdx.RawBuffer_f32_0_0_0t(
15+
target("dx.RawBuffer", float, 0, 0, 0) %buffer,
16+
i32 %index,
17+
i32 0)
18+
19+
ret void
20+
}
21+
22+
; CHECK-LABEL: define void @loadv4f32_byte
23+
define void @loadv4f32_byte(i32 %offset) {
24+
%buffer = call target("dx.RawBuffer", i8, 0, 0, 0)
25+
@llvm.dx.resource.handlefrombinding.tdx.RawBuffer_i8_0_0_0(
26+
i32 0, i32 0, i32 1, i32 0, i1 false)
27+
28+
; CHECK: [[DATA:%.*]] = call %dx.types.ResRet.f32 @dx.op.bufferLoad.f32(i32 68, %dx.types.Handle %{{.*}}, i32 %offset, i32 0)
29+
%load = call {<4 x float>, i1}
30+
@llvm.dx.resource.load.rawbuffer.f32.tdx.RawBuffer_i8_0_0_0t(
31+
target("dx.RawBuffer", i8, 0, 0, 0) %buffer,
32+
i32 %offset,
33+
i32 0)
34+
35+
ret void
36+
}
37+
38+
; CHECK-LABEL: define void @loadnested
39+
define void @loadnested(i32 %index) {
40+
%buffer = call
41+
target("dx.RawBuffer", {i32, {<4 x float>, <3 x half>}}, 0, 0, 0)
42+
@llvm.dx.resource.handlefrombinding(i32 0, i32 0, i32 1, i32 0, i1 false)
43+
44+
; CHECK: [[DATAI32:%.*]] = call %dx.types.ResRet.i32 @dx.op.bufferLoad.i32(i32 68, %dx.types.Handle %{{.*}}, i32 %index, i32 0)
45+
%loadi32 = call {i32, i1} @llvm.dx.resource.load.rawbuffer.i32(
46+
target("dx.RawBuffer", {i32, {<4 x float>, <3 x half>}}, 0, 0, 0) %buffer,
47+
i32 %index, i32 0)
48+
49+
; CHECK: [[DATAF32:%.*]] = call %dx.types.ResRet.f32 @dx.op.bufferLoad.f32(i32 68, %dx.types.Handle %{{.*}}, i32 %index, i32 4)
50+
%loadf32 = call {<4 x float>, i1} @llvm.dx.resource.load.rawbuffer.v4f32(
51+
target("dx.RawBuffer", {i32, {<4 x float>, <3 x half>}}, 0, 0, 0) %buffer,
52+
i32 %index, i32 4)
53+
54+
; CHECK: [[DATAF16:%.*]] = call %dx.types.ResRet.f16 @dx.op.bufferLoad.f16(i32 68, %dx.types.Handle %{{.*}}, i32 %index, i32 20)
55+
%loadf16 = call {<3 x half>, i1} @llvm.dx.resource.load.rawbuffer.v3f16(
56+
target("dx.RawBuffer", {i32, {<4 x float>, <3 x half>}}, 0, 0, 0) %buffer,
57+
i32 %index, i32 20)
58+
59+
ret void
60+
}
Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
; We use llc for this test so that we don't abort after the first error.
2+
; RUN: not llc %s -o /dev/null 2>&1 | FileCheck %s
3+
4+
target triple = "dxil-pc-shadermodel6.2-compute"
5+
6+
declare void @v4f64_user(<4 x double>)
7+
8+
; Can't load 64 bit types directly until SM6.3 (byteaddressbuf.Load<int64_t4>)
9+
; CHECK: error:
10+
; CHECK-SAME: in function loadv4f64_byte
11+
; CHECK-SAME: Cannot create RawBufferLoad operation: Invalid overload type
12+
define void @loadv4f64_byte(i32 %offset) "hlsl.export" {
13+
%buffer = call target("dx.RawBuffer", i8, 0, 0, 0)
14+
@llvm.dx.resource.handlefrombinding.tdx.RawBuffer_i8_0_0_0(
15+
i32 0, i32 0, i32 1, i32 0, i1 false)
16+
17+
%load = call {<4 x double>, i1} @llvm.dx.resource.load.rawbuffer.v4i64(
18+
target("dx.RawBuffer", i8, 0, 0, 0) %buffer, i32 %offset, i32 0)
19+
%data = extractvalue {<4 x double>, i1} %load, 0
20+
21+
call void @v4f64_user(<4 x double> %data)
22+
23+
ret void
24+
}

0 commit comments

Comments
 (0)