Skip to content

Commit 3f22756

Browse files
authored
[DirectX] Lower @llvm.dx.typedBufferLoad to DXIL ops
The `@llvm.dx.typedBufferLoad` intrinsic is lowered to `@dx.op.bufferLoad`. There's some complexity here in translating to scalarized IR, which I've abstracted out into a function that should be useful for samples, gathers, and CBuffer loads. I've also updated the DXILResources.rst docs to match what I'm doing here and the proposal in llvm/wg-hlsl#59. I've removed the content about stores and raw buffers for now with the expectation that it will be added along with the work. Note that this change includes a bit of a hack in how it deals with `getOverloadKind` for the `dx.ResRet` types - we need to adjust how we deal with operation overloads to generate a table directly rather than proxy through the OverloadKind enum, but that's left for a later change here. Part of #91367 Pull Request: #104252
1 parent 985600d commit 3f22756

File tree

8 files changed

+387
-129
lines changed

8 files changed

+387
-129
lines changed

llvm/docs/DirectX/DXILResources.rst

Lines changed: 55 additions & 120 deletions
Original file line numberDiff line numberDiff line change
@@ -267,45 +267,38 @@ Examples:
267267
@llvm.dx.handle.fromHeap.tdx.RawBuffer_v4f32_1_0(
268268
i32 2, i1 false)
269269
270-
Buffer Loads and Stores
271-
-----------------------
272-
273-
*relevant types: Buffers*
274-
275-
We need to treat buffer loads and stores from "dx.TypedBuffer" and
276-
"dx.RawBuffer" separately. For TypedBuffer, we have ``llvm.dx.typedBufferLoad``
277-
and ``llvm.dx.typedBufferStore``, which load and store 16-byte "rows" of data
278-
via a simple index. For RawBuffer, we have ``llvm.dx.rawBufferPtr``, which
279-
return a pointer that can be indexed, loaded, and stored to as needed.
280-
281-
The typed load and store operations always operate on exactly 16 bytes of data,
282-
so there are only a few valid overloads. For types that are 32-bits or smaller,
283-
we operate on 4-element vectors, such as ``<4 x i32>``, ``<4 x float>``, or
284-
``<4 x half>``. Note that in 16-bit cases each 16-bit value occupies 32-bits of
285-
storage. For 64-bit types we operate on 2-element vectors - ``<2 x double>`` or
286-
``<2 x i64>``. When a type like `Buffer<float>` is used at the HLSL level, it
287-
is expected that this will operate on a single float in each 16 byte row - that
288-
is, a load would use the ``<4 x float>`` variant and then extract the first
289-
element.
290-
291-
.. note:: In DXC, trying to operate on a ``Buffer<double4>`` crashes the
292-
compiler. We should probably just reject this in the frontend.
293-
294-
The TypedBuffer intrinsics are lowered to the `bufferLoad`_ and `bufferStore`_
295-
operations, and the operations on the memory accessed by RawBufferPtr are
296-
lowered to `rawBufferLoad`_ and `rawBufferStore`_. Note that if we want to
297-
support DXIL versions prior to 1.2 we'll need to lower the RawBuffer loads and
298-
stores to the non-raw operations as well.
299-
300-
.. note:: TODO: We need to account for `CheckAccessFullyMapped`_ here.
301-
302-
In DXIL the load operations always return an ``i32`` status value, but this
303-
isn't very ergonomic when it isn't used. We can (1) bite the bullet and have
304-
the loads return `{%ret_type, %i32}` all the time, (2) create a variant or
305-
update the signature iff the status is used, or (3) hide this in a sideband
306-
channel somewhere. I'm leaning towards (2), but could probably be convinced
307-
that the ugliness of (1) is worth the simplicity.
308-
270+
16-byte Loads, Samples, and Gathers
271+
-----------------------------------
272+
273+
*relevant types: TypedBuffer, CBuffer, and Textures*
274+
275+
TypedBuffer, CBuffer, and Texture loads, as well as samples and gathers, can
276+
return 1 to 4 elements from the given resource, to a maximum of 16 bytes of
277+
data. DXIL's modeling of this is influenced by DirectX and DXBC's history and
278+
it generally treats these operations as returning 4 32-bit values. For 16-bit
279+
elements the values are 16-bit values, and for 64-bit values the operations
280+
return 4 32-bit integers and emit further code to construct the double.
281+
282+
In DXIL, these operations return `ResRet`_ and `CBufRet`_ values, are structs
283+
containing 4 elements of the same type, and in the case of `ResRet` a 5th
284+
element that is used by the `CheckAccessFullyMapped`_ operation.
285+
286+
In LLVM IR the intrinsics will return the contained type of the resource
287+
instead. That is, ``llvm.dx.typedBufferLoad`` from a ``Buffer<float>`` would
288+
return a single float, from ``Buffer<float4>`` a vector of 4 floats, and from
289+
``Buffer<double2>`` a vector of two doubles, etc. The operations are then
290+
expanded out to match DXIL's format during lowering.
291+
292+
In cases where we need ``CheckAccessFullyMapped``, we have a second intrinsic
293+
that returns an anonymous struct with element-0 being the contained type, and
294+
element-1 being the ``i1`` result of a ``CheckAccessFullyMapped`` call. We
295+
don't have a separate call to ``CheckAccessFullyMapped`` at all, since that's
296+
the only operation that can possibly be done on this value. In practice this
297+
may mean we insert a DXIL operation for the check when this was missing in the
298+
HLSL source, but this actually matches DXC's behaviour in practice.
299+
300+
.. _ResRet: https://github.com/microsoft/DirectXShaderCompiler/blob/main/docs/DXIL.rst#resource-operation-return-types
301+
.. _CBufRet: https://github.com/microsoft/DirectXShaderCompiler/blob/main/docs/DXIL.rst#cbufferloadlegacy
309302
.. _CheckAccessFullyMapped: https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/checkaccessfullymapped
310303

311304
.. list-table:: ``@llvm.dx.typedBufferLoad``
@@ -317,7 +310,7 @@ stores to the non-raw operations as well.
317310
- Description
318311
* - Return value
319312
-
320-
- A 4- or 2-element vector of the type of the buffer
313+
- The contained type of the buffer
321314
- The data loaded from the buffer
322315
* - ``%buffer``
323316
- 0
@@ -332,16 +325,23 @@ Examples:
332325

333326
.. code-block:: llvm
334327
335-
%ret = call <4 x float> @llvm.dx.typedBufferLoad.tdx.TypedBuffer_f32_0_0t(
336-
target("dx.TypedBuffer", f32, 0, 0) %buffer, i32 %index)
337-
%ret = call <4 x i32> @llvm.dx.typedBufferLoad.tdx.TypedBuffer_i32_0_0t(
338-
target("dx.TypedBuffer", i32, 0, 0) %buffer, i32 %index)
339-
%ret = call <4 x half> @llvm.dx.typedBufferLoad.tdx.TypedBuffer_f16_0_0t(
340-
target("dx.TypedBuffer", f16, 0, 0) %buffer, i32 %index)
341-
%ret = call <2 x double> @llvm.dx.typedBufferLoad.tdx.TypedBuffer_f64_0_0t(
342-
target("dx.TypedBuffer", double, 0, 0) %buffer, i32 %index)
343-
344-
.. list-table:: ``@llvm.dx.typedBufferStore``
328+
%ret = call <4 x float>
329+
@llvm.dx.typedBufferLoad.v4f32.tdx.TypedBuffer_v4f32_0_0_0t(
330+
target("dx.TypedBuffer", <4 x float>, 0, 0, 0) %buffer, i32 %index)
331+
%ret = call float
332+
@llvm.dx.typedBufferLoad.f32.tdx.TypedBuffer_f32_0_0_0t(
333+
target("dx.TypedBuffer", float, 0, 0, 0) %buffer, i32 %index)
334+
%ret = call <4 x i32>
335+
@llvm.dx.typedBufferLoad.v4i32.tdx.TypedBuffer_v4i32_0_0_0t(
336+
target("dx.TypedBuffer", <4 x i32>, 0, 0, 0) %buffer, i32 %index)
337+
%ret = call <4 x half>
338+
@llvm.dx.typedBufferLoad.v4f16.tdx.TypedBuffer_v4f16_0_0_0t(
339+
target("dx.TypedBuffer", <4 x half>, 0, 0, 0) %buffer, i32 %index)
340+
%ret = call <2 x double>
341+
@llvm.dx.typedBufferLoad.v2f64.tdx.TypedBuffer_v2f64_0_0t(
342+
target("dx.TypedBuffer", <2 x double>, 0, 0, 0) %buffer, i32 %index)
343+
344+
.. list-table:: ``@llvm.dx.typedBufferLoad.checkbit``
345345
:header-rows: 1
346346

347347
* - Argument
@@ -350,46 +350,11 @@ Examples:
350350
- Description
351351
* - Return value
352352
-
353-
- ``void``
354-
-
353+
- A structure of the contained type and the check bit
354+
- The data loaded from the buffer and the check bit
355355
* - ``%buffer``
356356
- 0
357357
- ``target(dx.TypedBuffer, ...)``
358-
- The buffer to store into
359-
* - ``%index``
360-
- 1
361-
- ``i32``
362-
- Index into the buffer
363-
* - ``%data``
364-
- 2
365-
- A 4- or 2-element vector of the type of the buffer
366-
- The data to store
367-
368-
Examples:
369-
370-
.. code-block:: llvm
371-
372-
call void @llvm.dx.bufferStore.tdx.Buffer_f32_1_0t(
373-
target("dx.TypedBuffer", f32, 1, 0) %buf, i32 %index, <4 x f32> %data)
374-
call void @llvm.dx.bufferStore.tdx.Buffer_f16_1_0t(
375-
target("dx.TypedBuffer", f16, 1, 0) %buf, i32 %index, <4 x f16> %data)
376-
call void @llvm.dx.bufferStore.tdx.Buffer_f64_1_0t(
377-
target("dx.TypedBuffer", f64, 1, 0) %buf, i32 %index, <2 x f64> %data)
378-
379-
.. list-table:: ``@llvm.dx.rawBufferPtr``
380-
:header-rows: 1
381-
382-
* - Argument
383-
-
384-
- Type
385-
- Description
386-
* - Return value
387-
-
388-
- ``ptr``
389-
- Pointer to an element of the buffer
390-
* - ``%buffer``
391-
- 0
392-
- ``target(dx.RawBuffer, ...)``
393358
- The buffer to load from
394359
* - ``%index``
395360
- 1
@@ -400,37 +365,7 @@ Examples:
400365

401366
.. code-block:: llvm
402367
403-
; Load a float4 from a buffer
404-
%buf = call ptr @llvm.dx.rawBufferPtr.tdx.RawBuffer_v4f32_0_0t(
405-
target("dx.RawBuffer", <4 x f32>, 0, 0) %buffer, i32 %index)
406-
%val = load <4 x float>, ptr %buf, align 16
407-
408-
; Load the double from a struct containing an int, a float, and a double
409-
%buf = call ptr @llvm.dx.rawBufferPtr.tdx.RawBuffer_sl_i32f32f64s_0_0t(
410-
target("dx.RawBuffer", {i32, f32, f64}, 0, 0) %buffer, i32 %index)
411-
%val = getelementptr inbounds {i32, f32, f64}, ptr %buf, i32 0, i32 2
412-
%d = load double, ptr %val, align 8
413-
414-
; Load a float from a byte address buffer
415-
%buf = call ptr @llvm.dx.rawBufferPtr.tdx.RawBuffer_i8_0_0t(
416-
target("dx.RawBuffer", i8, 0, 0) %buffer, i32 %index)
417-
%val = getelementptr inbounds float, ptr %buf, i64 0
418-
%f = load float, ptr %val, align 4
419-
420-
; Store to a buffer containing float4
421-
%addr = call ptr @llvm.dx.rawBufferPtr.tdx.RawBuffer_v4f32_0_0t(
422-
target("dx.RawBuffer", <4 x f32>, 0, 0) %buffer, i32 %index)
423-
store <4 x float> %val, ptr %addr
424-
425-
; Store the double in a struct containing an int, a float, and a double
426-
%buf = call ptr @llvm.dx.rawBufferPtr.tdx.RawBuffer_sl_i32f32f64s_0_0t(
427-
target("dx.RawBuffer", {i32, f32, f64}, 0, 0) %buffer, i32 %index)
428-
%addr = getelementptr inbounds {i32, f32, f64}, ptr %buf, i32 0, i32 2
429-
store double %d, ptr %addr
430-
431-
; Store a float into a byte address buffer
432-
%buf = call ptr @llvm.dx.rawBufferPtr.tdx.RawBuffer_i8_0_0t(
433-
target("dx.RawBuffer", i8, 0, 0) %buffer, i32 %index)
434-
%addr = getelementptr inbounds float, ptr %buf, i64 0
435-
store float %f, ptr %val
368+
%ret = call {<4 x float>, i1}
369+
@llvm.dx.typedBufferLoad.checkbit.v4f32.tdx.TypedBuffer_v4f32_0_0_0t(
370+
target("dx.TypedBuffer", <4 x float>, 0, 0, 0) %buffer, i32 %index)
436371

llvm/include/llvm/IR/IntrinsicsDirectX.td

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,9 @@ def int_dx_handle_fromBinding
3030
[llvm_i32_ty, llvm_i32_ty, llvm_i32_ty, llvm_i32_ty, llvm_i1_ty],
3131
[IntrNoMem]>;
3232

33+
def int_dx_typedBufferLoad
34+
: DefaultAttrsIntrinsic<[llvm_any_ty], [llvm_any_ty, llvm_i32_ty]>;
35+
3336
// Cast between target extension handle types and dxil-style opaque handles
3437
def int_dx_cast_handle : Intrinsic<[llvm_any_ty], [llvm_any_ty]>;
3538

llvm/lib/Target/DirectX/DXIL.td

Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,10 @@ def Int64Ty : DXILOpParamType;
4040
def HalfTy : DXILOpParamType;
4141
def FloatTy : DXILOpParamType;
4242
def DoubleTy : DXILOpParamType;
43-
def ResRetTy : DXILOpParamType;
43+
def ResRetHalfTy : DXILOpParamType;
44+
def ResRetFloatTy : DXILOpParamType;
45+
def ResRetInt16Ty : DXILOpParamType;
46+
def ResRetInt32Ty : DXILOpParamType;
4447
def HandleTy : DXILOpParamType;
4548
def ResBindTy : DXILOpParamType;
4649
def ResPropsTy : DXILOpParamType;
@@ -693,6 +696,17 @@ def CreateHandle : DXILOp<57, createHandle> {
693696
let stages = [Stages<DXIL1_0, [all_stages]>, Stages<DXIL1_6, [removed]>];
694697
}
695698

699+
def BufferLoad : DXILOp<68, bufferLoad> {
700+
let Doc = "reads from a TypedBuffer";
701+
// Handle, Coord0, Coord1
702+
let arguments = [HandleTy, Int32Ty, Int32Ty];
703+
let result = OverloadTy;
704+
let overloads =
705+
[Overloads<DXIL1_0,
706+
[ResRetHalfTy, ResRetFloatTy, ResRetInt16Ty, ResRetInt32Ty]>];
707+
let stages = [Stages<DXIL1_0, [all_stages]>];
708+
}
709+
696710
def ThreadId : DXILOp<93, threadId> {
697711
let Doc = "Reads the thread ID";
698712
let LLVMIntrinsic = int_dx_thread_id;

llvm/lib/Target/DirectX/DXILOpBuilder.cpp

Lines changed: 23 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -120,8 +120,12 @@ static OverloadKind getOverloadKind(Type *Ty) {
120120
}
121121
case Type::PointerTyID:
122122
return OverloadKind::UserDefineType;
123-
case Type::StructTyID:
124-
return OverloadKind::ObjectType;
123+
case Type::StructTyID: {
124+
// TODO: This is a hack. As described in DXILEmitter.cpp, we need to rework
125+
// how we're handling overloads and remove the `OverloadKind` proxy enum.
126+
StructType *ST = cast<StructType>(Ty);
127+
return getOverloadKind(ST->getElementType(0));
128+
}
125129
default:
126130
return OverloadKind::UNDEFINED;
127131
}
@@ -194,10 +198,11 @@ static StructType *getOrCreateStructType(StringRef Name,
194198
return StructType::create(Ctx, EltTys, Name);
195199
}
196200

197-
static StructType *getResRetType(Type *OverloadTy, LLVMContext &Ctx) {
198-
OverloadKind Kind = getOverloadKind(OverloadTy);
201+
static StructType *getResRetType(Type *ElementTy) {
202+
LLVMContext &Ctx = ElementTy->getContext();
203+
OverloadKind Kind = getOverloadKind(ElementTy);
199204
std::string TypeName = constructOverloadTypeName(Kind, "dx.types.ResRet.");
200-
Type *FieldTypes[5] = {OverloadTy, OverloadTy, OverloadTy, OverloadTy,
205+
Type *FieldTypes[5] = {ElementTy, ElementTy, ElementTy, ElementTy,
201206
Type::getInt32Ty(Ctx)};
202207
return getOrCreateStructType(TypeName, FieldTypes, Ctx);
203208
}
@@ -247,8 +252,14 @@ static Type *getTypeFromOpParamType(OpParamType Kind, LLVMContext &Ctx,
247252
return Type::getInt64Ty(Ctx);
248253
case OpParamType::OverloadTy:
249254
return OverloadTy;
250-
case OpParamType::ResRetTy:
251-
return getResRetType(OverloadTy, Ctx);
255+
case OpParamType::ResRetHalfTy:
256+
return getResRetType(Type::getHalfTy(Ctx));
257+
case OpParamType::ResRetFloatTy:
258+
return getResRetType(Type::getFloatTy(Ctx));
259+
case OpParamType::ResRetInt16Ty:
260+
return getResRetType(Type::getInt16Ty(Ctx));
261+
case OpParamType::ResRetInt32Ty:
262+
return getResRetType(Type::getInt32Ty(Ctx));
252263
case OpParamType::HandleTy:
253264
return getHandleType(Ctx);
254265
case OpParamType::ResBindTy:
@@ -390,6 +401,7 @@ Expected<CallInst *> DXILOpBuilder::tryCreateOp(dxil::OpCode OpCode,
390401
return makeOpError(OpCode, "Wrong number of arguments");
391402
OverloadTy = Args[ArgIndex]->getType();
392403
}
404+
393405
FunctionType *DXILOpFT =
394406
getDXILOpFunctionType(OpCode, M.getContext(), OverloadTy);
395407

@@ -450,6 +462,10 @@ CallInst *DXILOpBuilder::createOp(dxil::OpCode OpCode, ArrayRef<Value *> Args,
450462
return *Result;
451463
}
452464

465+
StructType *DXILOpBuilder::getResRetType(Type *ElementTy) {
466+
return ::getResRetType(ElementTy);
467+
}
468+
453469
StructType *DXILOpBuilder::getHandleType() {
454470
return ::getHandleType(IRB.getContext());
455471
}

llvm/lib/Target/DirectX/DXILOpBuilder.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,8 @@ class DXILOpBuilder {
4646
Expected<CallInst *> tryCreateOp(dxil::OpCode Op, ArrayRef<Value *> Args,
4747
Type *RetTy = nullptr);
4848

49+
/// Get a `%dx.types.ResRet` type with the given element type.
50+
StructType *getResRetType(Type *ElementTy);
4951
/// Get the `%dx.types.Handle` type.
5052
StructType *getHandleType();
5153

0 commit comments

Comments
 (0)