@@ -318,39 +318,43 @@ Examples:
318
318
%ptr = call ptr @llvm.dx.resource.getpointer.p0.tdx.TypedBuffer_v4f32_0_0_0t(
319
319
target("dx.TypedBuffer", <4 x float>, 0, 0, 0) %buffer, i32 %index)
320
320
321
- 16-byte Loads, Samples, and Gathers
322
- -----------------------------------
323
-
324
- *relevant types: TypedBuffer, CBuffer, and Textures *
325
-
326
- TypedBuffer, CBuffer, and Texture loads, as well as samples and gathers, can
327
- return 1 to 4 elements from the given resource, to a maximum of 16 bytes of
328
- data. DXIL's modeling of this is influenced by DirectX and DXBC's history and
329
- it generally treats these operations as returning 4 32-bit values. For 16-bit
330
- elements the values are 16-bit values, and for 64-bit values the operations
331
- return 4 32-bit integers and emit further code to construct the double.
332
-
333
- In DXIL, these operations return `ResRet `_ and `CBufRet `_ values, are structs
334
- containing 4 elements of the same type, and in the case of `ResRet ` a 5th
335
- element that is used by the `CheckAccessFullyMapped `_ operation.
336
-
337
- In LLVM IR the intrinsics will return the contained type of the resource
338
- instead. That is, ``llvm.dx.resource.load.typedbuffer `` from a
339
- ``Buffer<float> `` would return a single float, from ``Buffer<float4> `` a vector
340
- of 4 floats, and from ``Buffer<double2> `` a vector of two doubles, etc. The
341
- operations are then expanded out to match DXIL's format during lowering.
342
-
343
- In order to support ``CheckAccessFullyMapped ``, we need these intrinsics to
344
- return an anonymous struct with element-0 being the contained type, and
345
- element-1 being the ``i1 `` result of a ``CheckAccessFullyMapped `` call. We
346
- don't have a separate call to ``CheckAccessFullyMapped `` at all, since that's
347
- the only operation that can possibly be done on this value. In practice this
348
- may mean we insert a DXIL operation for the check when this was missing in the
349
- HLSL source, but this actually matches DXC's behaviour in practice.
321
+ Loads, Samples, and Gathers
322
+ ---------------------------
323
+
324
+ *relevant types: Buffers, CBuffers, and Textures *
325
+
326
+ All load, sample, and gather operations in DXIL return a `ResRet `_ type, and
327
+ CBuffer loads return a similar `CBufRet `_ type. These types are structs
328
+ containing 4 elements of some basic type, and in the case of `ResRet ` a 5th
329
+ element that is used by the `CheckAccessFullyMapped `_ operation. Some of these
330
+ operations, like `RawBufferLoad `_ include a mask and/or alignment that tell us
331
+ some information about how to interpret those four values.
332
+
333
+ In the LLVM IR representations of these operations we instead return scalars or
334
+ vectors, but we keep the requirement that we only return up to 4 elements of a
335
+ basic type. This avoids some unnecessary casting and structure manipulation in
336
+ the intermediate format while also keeping lowering to DXIL straightforward.
337
+
338
+ LLVM intrinsics that map to operations returning `ResRet ` return an anonymous
339
+ struct with element-0 being the scalar or vector type, and element-1 being the
340
+ ``i1 `` result of a ``CheckAccessFullyMapped `` call. We don't have a separate
341
+ call to ``CheckAccessFullyMapped `` at all, since that's the only operation that
342
+ can possibly be done on this value. In practice this may mean we insert a DXIL
343
+ operation for the check when this was missing in the HLSL source, but this
344
+ actually matches DXC's behaviour in practice.
345
+
346
+ For TypedBuffer and Texture, we map directly from the contained type of the
347
+ resource to the return value of the intrinsic. Since these resources are
348
+ constrained to contain only scalars and vectors of up to 4 elements, the
349
+ lowering to DXIL ops is generally straightforward. The one exception we have
350
+ here is that `double ` types in the elements are special - these are allowed in
351
+ the LLVM intrinsics, but are lowered to pairs of `i32 ` followed by
352
+ ``MakeDouble `` operations for DXIL.
350
353
351
354
.. _ResRet : https://github.com/microsoft/DirectXShaderCompiler/blob/main/docs/DXIL.rst#resource-operation-return-types
352
355
.. _CBufRet : https://github.com/microsoft/DirectXShaderCompiler/blob/main/docs/DXIL.rst#cbufferloadlegacy
353
356
.. _CheckAccessFullyMapped : https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/checkaccessfullymapped
357
+ .. _RawBufferLoad : https://github.com/microsoft/DirectXShaderCompiler/blob/main/docs/DXIL.rst#rawbufferload
354
358
355
359
.. list-table :: ``@llvm.dx.resource.load.typedbuffer``
356
360
:header-rows: 1
@@ -392,6 +396,101 @@ Examples:
392
396
@llvm.dx.resource.load.typedbuffer.v2f64.tdx.TypedBuffer_v2f64_0_0t(
393
397
target("dx.TypedBuffer", <2 x double>, 0, 0, 0) %buffer, i32 %index)
394
398
399
+ For RawBuffer, an HLSL load operation may return an arbitrarily sized result,
400
+ but we still constrain the LLVM intrinsic to return only up to 4 elements of a
401
+ basic type. This means that larger loads are represented as a series of loads,
402
+ which matches DXIL. Unlike in the `RawBufferLoad `_ operation, we do not need
403
+ arguments for the mask/type size and alignment, since we can calculate these
404
+ from the return type of the load during lowering.
405
+
406
+ .. _RawBufferLoad : https://github.com/microsoft/DirectXShaderCompiler/blob/main/docs/DXIL.rst#rawbufferload
407
+
408
+ .. list-table :: ``@llvm.dx.resource.load.rawbuffer``
409
+ :header-rows: 1
410
+
411
+ * - Argument
412
+ -
413
+ - Type
414
+ - Description
415
+ * - Return value
416
+ -
417
+ - A structure of a scalar or vector and the check bit
418
+ - The data loaded from the buffer and the check bit
419
+ * - ``%buffer ``
420
+ - 0
421
+ - ``target(dx.RawBuffer, ...) ``
422
+ - The buffer to load from
423
+ * - ``%index ``
424
+ - 1
425
+ - ``i32 ``
426
+ - Index into the buffer
427
+ * - ``%offset ``
428
+ - 2
429
+ - ``i32 ``
430
+ - Offset into the structure at the given index
431
+
432
+ Examples:
433
+
434
+ .. code-block :: llvm
435
+
436
+ ; float
437
+ %ret = call {float, i1}
438
+ @llvm.dx.resource.load.rawbuffer.f32.tdx.RawBuffer_f32_0_0_0t(
439
+ target("dx.RawBuffer", float, 0, 0, 0) %buffer,
440
+ i32 %index,
441
+ i32 0)
442
+ %ret = call {float, i1}
443
+ @llvm.dx.resource.load.rawbuffer.f32.tdx.RawBuffer_i8_0_0_0t(
444
+ target("dx.RawBuffer", i8, 0, 0, 0) %buffer,
445
+ i32 %byte_offset,
446
+ i32 0)
447
+
448
+ ; float4
449
+ %ret = call {<4 x float>, i1}
450
+ @llvm.dx.resource.load.rawbuffer.v4f32.tdx.RawBuffer_v4f32_0_0_0t(
451
+ target("dx.RawBuffer", float, 0, 0, 0) %buffer,
452
+ i32 %index,
453
+ i32 0)
454
+ %ret = call {float, i1}
455
+ @llvm.dx.resource.load.rawbuffer.v4f32.tdx.RawBuffer_i8_0_0_0t(
456
+ target("dx.RawBuffer", i8, 0, 0, 0) %buffer,
457
+ i32 %byte_offset,
458
+ i32 0)
459
+
460
+ ; struct S0 { float4 f; int4 i; };
461
+ %ret = call {<4 x float>, i1}
462
+ @llvm.dx.resource.load.rawbuffer.v4f32.tdx.RawBuffer_sl_v4f32v4i32s_0_0t(
463
+ target("dx.RawBuffer", {<4 x float>, <4 x i32>}, 0, 0, 0) %buffer,
464
+ i32 %index,
465
+ i32 0)
466
+ %ret = call {<4 x i32>, i1}
467
+ @llvm.dx.resource.load.rawbuffer.v4i32.tdx.RawBuffer_sl_v4f32v4i32s_0_0t(
468
+ target("dx.RawBuffer", {<4 x float>, <4 x i32>}, 0, 0, 0) %buffer,
469
+ i32 %index,
470
+ i32 1)
471
+
472
+ ; struct Q { float4 f; int3 i; }
473
+ ; struct R { int z; S x; }
474
+ %ret = call {i32, i1}
475
+ @llvm.dx.resource.load.rawbuffer.i32(
476
+ target("dx.RawBuffer", {i32, {<4 x float>, <3 x i32>}}, 0, 0, 0)
477
+ %buffer, i32 %index, i32 0)
478
+ %ret = call {<4 x float>, i1}
479
+ @llvm.dx.resource.load.rawbuffer.i32(
480
+ target("dx.RawBuffer", {i32, {<4 x float>, <3 x i32>}}, 0, 0, 0)
481
+ %buffer, i32 %index, i32 4)
482
+ %ret = call {<3 x i32>, i1}
483
+ @llvm.dx.resource.load.rawbuffer.i32(
484
+ target("dx.RawBuffer", {i32, {<4 x float>, <3 x i32>}}, 0, 0, 0)
485
+ %buffer, i32 %index, i32 20)
486
+
487
+ ; byteaddressbuf.Load<int64_t4>
488
+ %ret = call {<4 x i64>, i1}
489
+ @llvm.dx.resource.load.rawbuffer.v4i64.tdx.RawBuffer_i8_0_0t(
490
+ target("dx.RawBuffer", i8, 0, 0, 0) %buffer,
491
+ i32 %byte_offset,
492
+ i32 0)
493
+
395
494
Texture and Typed Buffer Stores
396
495
-------------------------------
397
496
0 commit comments