10
10
:bf16_capability_token: 6437
11
11
:capability_prefetch_name: CooperativeMatrixPrefetchINTEL
12
12
:capability_prefetch_token: 6411
13
+ :capability_checked_name: CooperativeMatrixCheckedInstructionsINTEL
14
+ :capability_checked_token: 6192
13
15
:OpCooperativeMatrixGetElementCoordINTEL_token: 6440
14
16
:OpCooperativeMatrixApplyFunctionINTEL_token: 6448
15
17
:OpCooperativeMatrixPrefetchINTEL_token: 6449
18
+ :OpCooperativeMatrixLoadCheckedINTEL_token: 6193
19
+ :OpCooperativeMatrixStoreCheckedINTEL_token: 6194
20
+ :OpCooperativeMatrixConstructCheckedINTEL_token: 6195
21
+
16
22
17
23
:DPCPP_URL: https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/experimental/sycl_ext_matrix/sycl_ext_intel_matrix.asciidoc
18
24
:bfloat16_conv_url: http://htmlpreview.github.io/?https://github.com/KhronosGroup/SPIRV-Registry/blob/main/extensions/INTEL/SPV_INTEL_bfloat16_conversion.html
@@ -67,7 +73,7 @@ please let us know!
67
73
[width="40%",cols="25,25"]
68
74
|========================================
69
75
| Last Modified Date | 2023-11-06
70
- | Revision | 15
76
+ | Revision | 16
71
77
|========================================
72
78
73
79
== Dependencies
@@ -116,6 +122,7 @@ This extension introduces new capabilities:
116
122
{invocation_capability_name}
117
123
{tf32_capability_name}
118
124
{bf16_capability_name}
125
+ {capability_checked_name}
119
126
{capability_prefetch_name}
120
127
----
121
128
@@ -137,6 +144,15 @@ OpCooperativeMatrixPrefetchINTEL
137
144
138
145
----
139
146
147
+ Instructions added under the *{capability_checked_name}* capability:
148
+
149
+ ----
150
+
151
+ OpCooperativeMatrixLoadCheckedINTEL
152
+ OpCooperativeMatrixStoreCheckedINTEL
153
+ OpCooperativeMatrixConstructCheckedINTEL
154
+
155
+ ----
140
156
141
157
== Token Number Assignments
142
158
@@ -149,9 +165,13 @@ OpCooperativeMatrixPrefetchINTEL
149
165
|*{tf32_capability_name}* | {tf32_capability_token}
150
166
|*{bf16_capability_name}* | {bf16_capability_token}
151
167
|*{capability_prefetch_name}* | {capability_prefetch_token}
168
+ |*{capability_checked_name}* | {capability_checked_token}
152
169
|*OpCooperativeMatrixGetElementCoordINTEL* | {OpCooperativeMatrixGetElementCoordINTEL_token}
153
170
|*OpCooperativeMatrixApplyFunctionINTEL* | {OpCooperativeMatrixApplyFunctionINTEL_token}
154
171
|*OpCooperativeMatrixPrefetchINTEL* | {OpCooperativeMatrixPrefetchINTEL_token}
172
+ |*OpCooperativeMatrixLoadCheckedINTEL* | {OpCooperativeMatrixLoadCheckedINTEL_token}
173
+ |*OpCooperativeMatrixStoreCheckedINTEL* | {OpCooperativeMatrixStoreCheckedINTEL_token}
174
+ |*OpCooperativeMatrixConstructCheckedINTEL* | {OpCooperativeMatrixConstructCheckedINTEL_token}
155
175
|====
156
176
157
177
== Modifications to the SPIR-V Specification, Version 1.6 and SPV_KHR_cooperative_matrix, Revision 3
@@ -231,6 +251,13 @@ Uses *BFloat16* in 3.X, Cooperative Matrix Operands +
231
251
Uses *OpCooperativeMatrixPrefetchINTEL* instructions. +
232
252
+
233
253
| *{main_capability_name}* +
254
+ | {capability_checked_token} | *{capability_checked_name}* +
255
+ +
256
+ Uses *OpCooperativeMatrixLoadCheckedINTEL* and *OpCooperativeMatrixStoreCheckedINTEL*
257
+ instructions. +
258
+ +
259
+ | *{main_capability_name}* +
260
+
234
261
|====
235
262
--
236
263
@@ -259,13 +286,11 @@ whose 'Type' operand is a scalar or vector type. If the *Shader* capability was
259
286
declared, 'Pointer' must point into an array and any *ArrayStride* decoration on
260
287
'Pointer' is ignored. +
261
288
+
262
- 'X offset' must be a constant instruction with scalar 32-bit integer type.
263
- It specifies offset in bytes along X axis from the 'Pointer' where prefetched
264
- memory region starts from. +
289
+ 'X offset' must be a scalar 32-bit integer type. It specifies offset in number of elements
290
+ along X axis from the 'Pointer' where the prefetched memory region starts from. +
265
291
+
266
- 'Y offset' must be a constant instruction with scalar 32-bit integer type.
267
- It specifies offset in bytes along Y axis from the 'Pointer' where prefetched
268
- memory region starts from. +
292
+ 'Y offset' must be a scalar 32-bit integer type. It specifies offset in number of elements
293
+ along Y axis from the 'Pointer' where the prefetched memory region starts from. +
269
294
+
270
295
'Rows' must be a constant instruction with scalar 32-bit integer type. +
271
296
+
@@ -297,6 +322,169 @@ scalar 'integer type' and its exact semantics depend on 'MemoryLayout'. +
297
322
'Stride' |
298
323
|=====
299
324
325
+ [cols="1,1,10*3",width="100%"]
326
+ |=====
327
+ 11+|[[OpCooperativeMatrixLoadCheckedINTEL]]*OpCooperativeMatrixLoadCheckedINTEL* +
328
+ +
329
+ Load a cooperative matrix through a pointer. Global matrix size might be not multiple the size of
330
+ the two-dimentional region that is being loaded, in this case the out-of-bounds elements are
331
+ set to 0. +
332
+ +
333
+ 'Result Type' is the type of the loaded object. It must be a cooperative matrix
334
+ type. +
335
+ +
336
+ 'X offset' must be a scalar 32-bit integer type. It specifies offset in number of elements
337
+ along X axis from the 'Pointer' where the loaded memory region starts from. +
338
+ +
339
+ 'Y offset' must be a scalar 32-bit integer type. It specifies offset in number of elements
340
+ along Y axis from the 'Pointer' where the loaded memory region starts from. +
341
+ +
342
+ 'Pointer' is a pointer. Its type must be an *OpTypePointer* whose 'Type' operand
343
+ is a scalar or vector type. If the *Shader* capability was declared, 'Pointer'
344
+ must point into an array and any *ArrayStride* decoration on 'Pointer' is ignored. +
345
+ +
346
+ 'MemoryLayout' specifies how matrix elements are laid out in memory. It must come
347
+ from a 32-bit integer 'constant instruction' whose value corresponds to a
348
+ 'Cooperative Matrix Layout'. See the _Cooperative Matrix Layout_ table for
349
+ a description of the layouts and detailed layout-specific rules. +
350
+ +
351
+ 'Height' is the height (number of rows of a big matrix) of the two-dimensional
352
+ region to load the matrix from. It must be a scalar 'integer type'. +
353
+ +
354
+ 'Width' is the width (number of columns of a big matrix) of the two-dimensional
355
+ region to load the matrix from. It must be a scalar 'integer type'. +
356
+ +
357
+ 'Stride' further qualifies how matrix elements are laid out in memory. It must be a
358
+ scalar 'integer type' and its exact semantics depend on 'MemoryLayout'. +
359
+ +
360
+ 'Memory Operand' must be a +Memory Operand+ literal. If not present, it is the
361
+ same as specifying *None*. +
362
+ +
363
+ For a given dynamic instance of this instruction, all operands of this
364
+ instruction must be the same for all invocations in a given scope instance
365
+ (where the scope is the scope the cooperative matrix type was created with).
366
+ All invocations in a given scope instance must be active or all must be
367
+ inactive. +
368
+ +
369
+ Note: To specify cache level for *OpCooperativeMatrixLoadCheckedINTEL* one
370
+ can use *CacheControlLoadINTEL* decoration from {cache_control_url}[SPV_INTEL_cache_controls extension]. +
371
+ +
372
+ 1+|Capability: +
373
+ *{capability_checked_name}*
374
+ 1+| 9+variable | {OpCooperativeMatrixLoadCheckedINTEL_token} | '<id>' +
375
+ 'Result Type' |'Result <id>' | '<id>' +
376
+ 'Pointer' | '<id>' +
377
+ 'X offset' | '<id>' +
378
+ 'Y offset' | '<id>' +
379
+ 'MemoryLayout' | '<id>' +
380
+ 'Height' | '<id>' +
381
+ 'Width' | Optional '<id>' +
382
+ 'Stride' | Optional +
383
+ 'Memory Operand' |
384
+ |=====
385
+
386
+ [cols="1,1,9*3",width="100%"]
387
+ |=====
388
+ 10+|[[OpCooperativeMatrixStoreCheckedINTEL]]*OpCooperativeMatrixStoreCheckedINTEL* +
389
+ +
390
+ Store a cooperative matrix through a pointer. Global matrix size might be not multiple the size of
391
+ the region to which it is stored, in this case the out-of-bounds elements are
392
+ dropped. +
393
+ +
394
+ 'Pointer' is a pointer. Its type must be an *OpTypePointer* whose 'Type' operand
395
+ is a scalar or vector type. If the *Shader* capability was declared, 'Pointer'
396
+ must point into an array and any *ArrayStride* decoration on 'Pointer' is ignored. +
397
+ +
398
+ 'X offset' must be a scalar 32-bit integer type. It specifies offset in number of elements
399
+ along X axis from the 'Pointer' where the stored memory region starts from. +
400
+ +
401
+ 'Y offset' must be a scalar 32-bit integer type. It specifies offset in number of elements
402
+ along Y axis from the 'Pointer' where the stored memory region starts from. +
403
+ +
404
+ 'Object' is the object to store. Its type must be a _cooperative matrix_. +
405
+ +
406
+ 'MemoryLayout' specifies how matrix elements are laid out in memory. It must come
407
+ from a 32-bit integer 'constant instruction' whose value corresponds to a
408
+ 'Cooperative Matrix Layout'. See the _Cooperative Matrix Layout_ table for
409
+ a description of the layouts and detailed layout-specific rules. +
410
+ +
411
+ 'Height' is the height (number of rows of a big matrix) of the two-dimensional
412
+ region to load the matrix from. It must be a scalar 'integer type'. +
413
+ +
414
+ 'Width' is the width (number of columns of a big matrix) of the two-dimensional
415
+ region to load the matrix from. It must be a scalar 'integer type'. +
416
+ +
417
+ 'Stride' further qualifies how matrix elements are laid out in memory. It must be a
418
+ scalar 'integer type' and its exact semantics depend on 'MemoryLayout'. +
419
+ +
420
+ 'Memory Operand' must be a +Memory Operand+ literal. If not present, it is the
421
+ same as specifying *None*. +
422
+ +
423
+ For a given dynamic instance of this instruction, all operands of this
424
+ instruction must be the same for all invocations in a given scope instance
425
+ (where the scope is the scope the cooperative matrix type was created with).
426
+ All invocations in a given scope instance must be active or all must be
427
+ inactive. +
428
+ +
429
+ Note: To specify cache level for *OpCooperativeMatrixStoreCheckedINTEL* one
430
+ can use *CacheControlStoreINTEL* decoration from {cache_control_url}[SPV_INTEL_cache_controls extension]. +
431
+ +
432
+ 1+|Capability: +
433
+ *{capability_checked_name}*
434
+ 1+| 8+variable | {OpCooperativeMatrixStoreCheckedINTEL_token} | '<id>' +
435
+ 'Pointer' | '<id>' +
436
+ 'X offset' | '<id>' +
437
+ 'Y offset' | '<id>' +
438
+ 'Object' | '<id>' +
439
+ 'MemoryLayout' | '<id>' +
440
+ 'Height' | '<id>' +
441
+ 'Width' | Optional '<id>' +
442
+ 'Stride' | Optional +
443
+ 'Memory Operand' |
444
+ |=====
445
+
446
+ [cols="1,1,7*3",width="100%"]
447
+ |=====
448
+ 8+|[[OpCooperativeMatrixConstructCheckedINTEL]]*OpCooperativeMatrixConstructCheckedINTEL* +
449
+ +
450
+ Construct a new _cooperative matrix_. It assignes 'Value' to elements in a range from
451
+ 'X offset' to 'Height' and 'Y offset' to 'Width' setting the rest elements to zero. +
452
+ +
453
+ 'Result Type' is the type of the constructed object. It must be a cooperative matrix
454
+ type. +
455
+ +
456
+ 'X offset' must be a scalar 32-bit integer type. It specifies offset in number of elements
457
+ along X axis for the initialized two-dimensional region. +
458
+ +
459
+ 'Y offset' must be a scalar 32-bit integer type. It specifies offset in number of elements
460
+ along Y axis for the initialized two-dimensional region. +
461
+ +
462
+ 'Height' is the height (number of rows of a big matrix) of the initialized two-dimensional region.
463
+ It must be a scalar 'integer type'. +
464
+ +
465
+ 'Width' is the width (number of columns of a big matrix) of the initialized two-dimensional region.
466
+ It must be a scalar 'integer type'. +
467
+ +
468
+ 'Value' is an initializer value for the constructed object. It must have the same type
469
+ as an element type of the 'Result Type'. +
470
+ +
471
+ For a given dynamic instance of this instruction, all operands of this
472
+ instruction must be the same for all invocations in a given scope instance
473
+ (where the scope is the scope the cooperative matrix type was created with).
474
+ All invocations in a given scope instance must be active or all must be
475
+ inactive. +
476
+ +
477
+ 1+|Capability: +
478
+ *{capability_checked_name}*
479
+ 1+| 7 | {OpCooperativeMatrixConstructCheckedINTEL_token} | '<id>' +
480
+ 'Result Type' |'Result <id>' | '<id>' +
481
+ 'X offset' | '<id>' +
482
+ 'Y offset' | '<id>' +
483
+ 'Height' | '<id>' +
484
+ 'Width' | '<id>' +
485
+ 'Value' |
486
+ |=====
487
+
300
488
==== 3.42.11. Conversion Instructions
301
489
302
490
If *{bf16_capability_name}* and *BFloat16ConversionINTEL* capabilities are
@@ -324,8 +512,8 @@ Returns (Row, Column) coordinate of dynamically selected element of a matrix. +
324
512
contains the row with the selected element, and the second element contains the
325
513
column with the selected element. +
326
514
+
327
- 'Matrix' is an ID of *OpTypeCooperativeMatrixKHR* . The instruction returns the
328
- element's coordinate of this cooperative matrix type . +
515
+ 'Matrix' is a _cooperative matrix_ . The instruction returns the
516
+ element's coordinate of the _cooperative matrix_ . +
329
517
+
330
518
'Index' must be a 32-bit 'scalar integer'. It is interpreted as an index into the list
331
519
of components owned by this work-item in the cooperative matrix. The behavior is
@@ -342,53 +530,43 @@ that *OpCooperativeMatrixLengthKHR* returns for this work-item. +
342
530
| '<id>' +
343
531
'Matrix'
344
532
| '<id>' +
345
- 'Index'
533
+ 'Index' |
346
534
|=====
347
535
348
- [cols="1,1,5 *3",width="100%"]
536
+ [cols="1,1,4 *3",width="100%"]
349
537
|=====
350
- 6+|[[OpCooperativeMatrixApplyFunctionINTEL]]*OpCooperativeMatrixApplyFunctionINTEL* +
538
+ 5+|[[OpCooperativeMatrixApplyFunctionINTEL]]*OpCooperativeMatrixApplyFunctionINTEL* +
539
+ +
540
+ *NOTE* the instruction is experimental. +
351
541
+
352
- Apply the function for each element of the matrix. Results in a new matrix within
542
+ Apply the function object for each element of the matrix. Results in a new matrix within
353
543
the same scope and with the same number of rows and columns. +
354
544
+
355
545
'Result Type' is the type of the return value of the function. It must be an
356
- *OpTypeCooperativeMatrix * with the same _Scope_, _Rows_ and _Columns_ as the type of
546
+ *OpTypeCooperativeMatrixKHR * with the same _Scope_, _Rows_ and _Columns_ as the type of
357
547
'Matrix' operand. _Component type_ as well as _Use_ of 'Result Type' and 'Matrix' can
358
548
differ. +
359
549
+
360
- 'Function' is an *OpFunction* instruction whose *OpTypeFunction* operand has _Result Type_
361
- of scalar _numerical type_. This could be a forward reference. The 'Function' will be
362
- invoked (_Rows_ - 'Y')_x_(_Cols_ - 'X') times within the cooperative matrix scope. The first parameter of the
363
- 'Function' must be scalar _numerical type_ that corresponds to an element of
364
- the matrix to which 'Function' is being applied.
550
+ 'Function object' must be a *OpTypePointer* with *OpTypeStruct* _Type_.
551
+ The 'Function object' will be invoked within the cooperative matrix scope.
365
552
+
366
553
'Matrix' is a cooperative matrix which elements are used as the first parameter of
367
554
the 'Function'. +
368
555
+
369
- 'Argument N' is the object to copy to parameter N. +
370
- +
371
- *Note* the first parameter is omitted in this list of parameters, as it is copied
372
- from the unique element of the 'Matrix'. Following two parameters must be (X, Y)
373
- coordinate of a first element of the matrix to apply the function, for example
374
- (0, 0) would mean, that *OpCooperativeMatrixApplyFunctionINTEL* affects the
375
- entire matrix. +
376
- +
377
556
378
557
1+|Capability: +
379
558
*{invocation_capability_name}*
380
- 1+| 4 + variable | {OpCooperativeMatrixApplyFunctionINTEL_token}
559
+ 1+| 4 | {OpCooperativeMatrixApplyFunctionINTEL_token}
381
560
| '<id>' +
382
561
'Result Type'
383
562
| 'Result <id>'
384
563
| '<id>' +
385
- 'Function'
564
+ 'Function object '
386
565
| '<id>' +
387
566
'Matrix'
388
- | '<id>, <id>, ..., <id>' +
389
- 'Argument 1', 'Argument 2', ..., 'Argument N'
390
567
|=====
391
568
569
+
392
570
=== Issues
393
571
394
572
1. Should we keep *OpCooperativeMatrixGetElementCoordINTEL* once we have *OpCooperativeMatrixApplyFunctionINTEL*? +
@@ -419,4 +597,5 @@ Revision History
419
597
|13|2023-09-25|Dmitry Sidorov|Add convertion instructions for tf32 and bf16
420
598
|14|2023-10-11|Dmitry Sidorov|Add matrix prefetch instruction
421
599
|15|2023-11-06|Dmitry Sidorov|Put deprecation note on OpCooperativeMatrixGetElementCoordINTEL
600
+ |16|2023-11-06|Dmitry Sidorov|Add checked load, store and construct instructions
422
601
|========================================
0 commit comments