@@ -267,45 +267,38 @@ Examples:
267
267
@llvm.dx.handle.fromHeap.tdx.RawBuffer_v4f32_1_0(
268
268
i32 2, i1 false)
269
269
270
- Buffer Loads and Stores
271
- -----------------------
272
-
273
- *relevant types: Buffers *
274
-
275
- We need to treat buffer loads and stores from "dx.TypedBuffer" and
276
- "dx.RawBuffer" separately. For TypedBuffer, we have ``llvm.dx.typedBufferLoad ``
277
- and ``llvm.dx.typedBufferStore ``, which load and store 16-byte "rows" of data
278
- via a simple index. For RawBuffer, we have ``llvm.dx.rawBufferPtr ``, which
279
- return a pointer that can be indexed, loaded, and stored to as needed.
280
-
281
- The typed load and store operations always operate on exactly 16 bytes of data,
282
- so there are only a few valid overloads. For types that are 32-bits or smaller,
283
- we operate on 4-element vectors, such as ``<4 x i32> ``, ``<4 x float> ``, or
284
- ``<4 x half> ``. Note that in 16-bit cases each 16-bit value occupies 32-bits of
285
- storage. For 64-bit types we operate on 2-element vectors - ``<2 x double> `` or
286
- ``<2 x i64> ``. When a type like `Buffer<float> ` is used at the HLSL level, it
287
- is expected that this will operate on a single float in each 16 byte row - that
288
- is, a load would use the ``<4 x float> `` variant and then extract the first
289
- element.
290
-
291
- .. note :: In DXC, trying to operate on a ``Buffer<double4>`` crashes the
292
- compiler. We should probably just reject this in the frontend.
293
-
294
- The TypedBuffer intrinsics are lowered to the `bufferLoad `_ and `bufferStore `_
295
- operations, and the operations on the memory accessed by RawBufferPtr are
296
- lowered to `rawBufferLoad `_ and `rawBufferStore `_. Note that if we want to
297
- support DXIL versions prior to 1.2 we'll need to lower the RawBuffer loads and
298
- stores to the non-raw operations as well.
299
-
300
- .. note :: TODO: We need to account for `CheckAccessFullyMapped`_ here.
301
-
302
- In DXIL the load operations always return an ``i32 `` status value, but this
303
- isn't very ergonomic when it isn't used. We can (1) bite the bullet and have
304
- the loads return `{%ret_type, %i32} ` all the time, (2) create a variant or
305
- update the signature iff the status is used, or (3) hide this in a sideband
306
- channel somewhere. I'm leaning towards (2), but could probably be convinced
307
- that the ugliness of (1) is worth the simplicity.
308
-
270
+ 16-byte Loads, Samples, and Gathers
271
+ -----------------------------------
272
+
273
+ *relevant types: TypedBuffer, CBuffer, and Textures *
274
+
275
+ TypedBuffer, CBuffer, and Texture loads, as well as samples and gathers, can
276
+ return 1 to 4 elements from the given resource, to a maximum of 16 bytes of
277
+ data. DXIL's modeling of this is influenced by DirectX and DXBC's history and
278
+ it generally treats these operations as returning 4 32-bit values. For 16-bit
279
+ elements the values are 16-bit values, and for 64-bit values the operations
280
+ return 4 32-bit integers and emit further code to construct the double.
281
+
282
+ In DXIL, these operations return `ResRet `_ and `CBufRet `_ values, are structs
283
+ containing 4 elements of the same type, and in the case of `ResRet ` a 5th
284
+ element that is used by the `CheckAccessFullyMapped `_ operation.
285
+
286
+ In LLVM IR the intrinsics will return the contained type of the resource
287
+ instead. That is, ``llvm.dx.typedBufferLoad `` from a ``Buffer<float> `` would
288
+ return a single float, from ``Buffer<float4> `` a vector of 4 floats, and from
289
+ ``Buffer<double2> `` a vector of two doubles, etc. The operations are then
290
+ expanded out to match DXIL's format during lowering.
291
+
292
+ In cases where we need ``CheckAccessFullyMapped ``, we have a second intrinsic
293
+ that returns an anonymous struct with element-0 being the contained type, and
294
+ element-1 being the ``i1 `` result of a ``CheckAccessFullyMapped `` call. We
295
+ don't have a separate call to ``CheckAccessFullyMapped `` at all, since that's
296
+ the only operation that can possibly be done on this value. In practice this
297
+ may mean we insert a DXIL operation for the check when this was missing in the
298
+ HLSL source, but this actually matches DXC's behaviour in practice.
299
+
300
+ .. _ResRet : https://github.com/microsoft/DirectXShaderCompiler/blob/main/docs/DXIL.rst#resource-operation-return-types
301
+ .. _CBufRet : https://github.com/microsoft/DirectXShaderCompiler/blob/main/docs/DXIL.rst#cbufferloadlegacy
309
302
.. _CheckAccessFullyMapped : https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/checkaccessfullymapped
310
303
311
304
.. list-table :: ``@llvm.dx.typedBufferLoad``
@@ -317,7 +310,7 @@ stores to the non-raw operations as well.
317
310
- Description
318
311
* - Return value
319
312
-
320
- - A 4- or 2-element vector of the type of the buffer
313
+ - The contained type of the buffer
321
314
- The data loaded from the buffer
322
315
* - ``%buffer ``
323
316
- 0
@@ -332,16 +325,23 @@ Examples:
332
325
333
326
.. code-block :: llvm
334
327
335
- %ret = call <4 x float> @llvm.dx.typedBufferLoad.tdx.TypedBuffer_f32_0_0t(
336
- target("dx.TypedBuffer", f32, 0, 0) %buffer, i32 %index)
337
- %ret = call <4 x i32> @llvm.dx.typedBufferLoad.tdx.TypedBuffer_i32_0_0t(
338
- target("dx.TypedBuffer", i32, 0, 0) %buffer, i32 %index)
339
- %ret = call <4 x half> @llvm.dx.typedBufferLoad.tdx.TypedBuffer_f16_0_0t(
340
- target("dx.TypedBuffer", f16, 0, 0) %buffer, i32 %index)
341
- %ret = call <2 x double> @llvm.dx.typedBufferLoad.tdx.TypedBuffer_f64_0_0t(
342
- target("dx.TypedBuffer", double, 0, 0) %buffer, i32 %index)
343
-
344
- .. list-table :: ``@llvm.dx.typedBufferStore``
328
+ %ret = call <4 x float>
329
+ @llvm.dx.typedBufferLoad.v4f32.tdx.TypedBuffer_v4f32_0_0_0t(
330
+ target("dx.TypedBuffer", <4 x float>, 0, 0, 0) %buffer, i32 %index)
331
+ %ret = call float
332
+ @llvm.dx.typedBufferLoad.f32.tdx.TypedBuffer_f32_0_0_0t(
333
+ target("dx.TypedBuffer", float, 0, 0, 0) %buffer, i32 %index)
334
+ %ret = call <4 x i32>
335
+ @llvm.dx.typedBufferLoad.v4i32.tdx.TypedBuffer_v4i32_0_0_0t(
336
+ target("dx.TypedBuffer", <4 x i32>, 0, 0, 0) %buffer, i32 %index)
337
+ %ret = call <4 x half>
338
+ @llvm.dx.typedBufferLoad.v4f16.tdx.TypedBuffer_v4f16_0_0_0t(
339
+ target("dx.TypedBuffer", <4 x half>, 0, 0, 0) %buffer, i32 %index)
340
+ %ret = call <2 x double>
341
+ @llvm.dx.typedBufferLoad.v2f64.tdx.TypedBuffer_v2f64_0_0t(
342
+ target("dx.TypedBuffer", <2 x double>, 0, 0, 0) %buffer, i32 %index)
343
+
344
+ .. list-table :: ``@llvm.dx.typedBufferLoad.checkbit``
345
345
:header-rows: 1
346
346
347
347
* - Argument
@@ -350,46 +350,11 @@ Examples:
350
350
- Description
351
351
* - Return value
352
352
-
353
- - `` void ``
354
- -
353
+ - A structure of the contained type and the check bit
354
+ - The data loaded from the buffer and the check bit
355
355
* - ``%buffer ``
356
356
- 0
357
357
- ``target(dx.TypedBuffer, ...) ``
358
- - The buffer to store into
359
- * - ``%index ``
360
- - 1
361
- - ``i32 ``
362
- - Index into the buffer
363
- * - ``%data ``
364
- - 2
365
- - A 4- or 2-element vector of the type of the buffer
366
- - The data to store
367
-
368
- Examples:
369
-
370
- .. code-block :: llvm
371
-
372
- call void @llvm.dx.bufferStore.tdx.Buffer_f32_1_0t(
373
- target("dx.TypedBuffer", f32, 1, 0) %buf, i32 %index, <4 x f32> %data)
374
- call void @llvm.dx.bufferStore.tdx.Buffer_f16_1_0t(
375
- target("dx.TypedBuffer", f16, 1, 0) %buf, i32 %index, <4 x f16> %data)
376
- call void @llvm.dx.bufferStore.tdx.Buffer_f64_1_0t(
377
- target("dx.TypedBuffer", f64, 1, 0) %buf, i32 %index, <2 x f64> %data)
378
-
379
- .. list-table :: ``@llvm.dx.rawBufferPtr``
380
- :header-rows: 1
381
-
382
- * - Argument
383
- -
384
- - Type
385
- - Description
386
- * - Return value
387
- -
388
- - ``ptr ``
389
- - Pointer to an element of the buffer
390
- * - ``%buffer ``
391
- - 0
392
- - ``target(dx.RawBuffer, ...) ``
393
358
- The buffer to load from
394
359
* - ``%index ``
395
360
- 1
@@ -400,37 +365,7 @@ Examples:
400
365
401
366
.. code-block :: llvm
402
367
403
- ; Load a float4 from a buffer
404
- %buf = call ptr @llvm.dx.rawBufferPtr.tdx.RawBuffer_v4f32_0_0t(
405
- target("dx.RawBuffer", <4 x f32>, 0, 0) %buffer, i32 %index)
406
- %val = load <4 x float>, ptr %buf, align 16
407
-
408
- ; Load the double from a struct containing an int, a float, and a double
409
- %buf = call ptr @llvm.dx.rawBufferPtr.tdx.RawBuffer_sl_i32f32f64s_0_0t(
410
- target("dx.RawBuffer", {i32, f32, f64}, 0, 0) %buffer, i32 %index)
411
- %val = getelementptr inbounds {i32, f32, f64}, ptr %buf, i32 0, i32 2
412
- %d = load double, ptr %val, align 8
413
-
414
- ; Load a float from a byte address buffer
415
- %buf = call ptr @llvm.dx.rawBufferPtr.tdx.RawBuffer_i8_0_0t(
416
- target("dx.RawBuffer", i8, 0, 0) %buffer, i32 %index)
417
- %val = getelementptr inbounds float, ptr %buf, i64 0
418
- %f = load float, ptr %val, align 4
419
-
420
- ; Store to a buffer containing float4
421
- %addr = call ptr @llvm.dx.rawBufferPtr.tdx.RawBuffer_v4f32_0_0t(
422
- target("dx.RawBuffer", <4 x f32>, 0, 0) %buffer, i32 %index)
423
- store <4 x float> %val, ptr %addr
424
-
425
- ; Store the double in a struct containing an int, a float, and a double
426
- %buf = call ptr @llvm.dx.rawBufferPtr.tdx.RawBuffer_sl_i32f32f64s_0_0t(
427
- target("dx.RawBuffer", {i32, f32, f64}, 0, 0) %buffer, i32 %index)
428
- %addr = getelementptr inbounds {i32, f32, f64}, ptr %buf, i32 0, i32 2
429
- store double %d, ptr %addr
430
-
431
- ; Store a float into a byte address buffer
432
- %buf = call ptr @llvm.dx.rawBufferPtr.tdx.RawBuffer_i8_0_0t(
433
- target("dx.RawBuffer", i8, 0, 0) %buffer, i32 %index)
434
- %addr = getelementptr inbounds float, ptr %buf, i64 0
435
- store float %f, ptr %val
368
+ %ret = call {<4 x float>, i1}
369
+ @llvm.dx.typedBufferLoad.checkbit.v4f32.tdx.TypedBuffer_v4f32_0_0_0t(
370
+ target("dx.TypedBuffer", <4 x float>, 0, 0, 0) %buffer, i32 %index)
436
371
0 commit comments