@@ -5538,7 +5538,7 @@ following sections:
5538
5538
5539
5539
* :ref:`amdgpu-amdhsa-memory-model-gfx6-gfx9`
5540
5540
* :ref:`amdgpu-amdhsa-memory-model-gfx90a`
5541
- * :ref:`amdgpu-amdhsa-memory-model-gfx940 `
5541
+ * :ref:`amdgpu-amdhsa-memory-model-gfx942 `
5542
5542
* :ref:`amdgpu-amdhsa-memory-model-gfx10-gfx11`
5543
5543
5544
5544
.. _amdgpu-amdhsa-memory-model-gfx6-gfx9:
@@ -9190,12 +9190,12 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx90a-table`.
9190
9190
- system for OpenCL.*
9191
9191
============ ============ ============== ========== ================================
9192
9192
9193
- .. _amdgpu-amdhsa-memory-model-gfx940 :
9193
+ .. _amdgpu-amdhsa-memory-model-gfx942 :
9194
9194
9195
- Memory Model GFX940
9195
+ Memory Model GFX942
9196
9196
+++++++++++++++++++
9197
9197
9198
- For GFX940 :
9198
+ For GFX942 :
9199
9199
9200
9200
* Each agent has multiple shader arrays (SA).
9201
9201
* Each SA has multiple compute units (CU).
@@ -9249,7 +9249,7 @@ For GFX940:
9249
9249
model. See :ref:`amdgpu-amdhsa-memory-spaces`.
9250
9250
* The vector and scalar memory operations use an L2 cache.
9251
9251
9252
- * The gfx940 can be configured as a number of smaller agents with each having
9252
+ * The gfx942 can be configured as a number of smaller agents with each having
9253
9253
a single L2 shared by all CUs on the same agent, or as fewer (possibly one)
9254
9254
larger agents with groups of CUs on each agent each sharing separate L2
9255
9255
caches.
@@ -9325,15 +9325,15 @@ only accessed by a single thread, and is always write-before-read, there is
9325
9325
never a need to invalidate these entries from the L1 cache. Hence all cache
9326
9326
invalidates are done as ``*_vol`` to only invalidate the volatile cache lines.
9327
9327
9328
- The code sequences used to implement the memory model for GFX940 are defined
9329
- in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx940-table`.
9328
+ The code sequences used to implement the memory model for GFX940, GFX941, GFX942
9329
+ are defined in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx940-gfx941-gfx942 -table`.
9330
9330
9331
- .. table:: AMDHSA Memory Model Code Sequences GFX940
9332
- :name: amdgpu-amdhsa-memory-model-code-sequences-gfx940-table
9331
+ .. table:: AMDHSA Memory Model Code Sequences GFX940, GFX941, GFX942
9332
+ :name: amdgpu-amdhsa-memory-model-code-sequences-gfx940-gfx941-gfx942- table
9333
9333
9334
9334
============ ============ ============== ========== ================================
9335
9335
LLVM Instr LLVM Memory LLVM Memory AMDGPU AMDGPU Machine Code
9336
- Ordering Sync Scope Address GFX940
9336
+ Ordering Sync Scope Address GFX940, GFX941, GFX942
9337
9337
Space
9338
9338
============ ============ ============== ========== ================================
9339
9339
**Non-Atomic**
@@ -9368,12 +9368,20 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx940-table`.
9368
9368
load *none* *none* - local 1. ds_load
9369
9369
store *none* *none* - global - !volatile & !nontemporal
9370
9370
- generic
9371
- - private 1. buffer/global/flat_store
9372
- - constant
9371
+ - private 1. GFX940, GFX941
9372
+ - constant buffer/global/flat_store
9373
+ sc0=1 sc1=1
9374
+ GFX942
9375
+ buffer/global/flat_store
9376
+
9373
9377
- !volatile & nontemporal
9374
9378
9375
- 1. buffer/global/flat_store
9376
- nt=1
9379
+ 1. GFX940, GFX941
9380
+ buffer/global/flat_store
9381
+ nt=1 sc0=1 sc1=1
9382
+ GFX942
9383
+ buffer/global/flat_store
9384
+ nt=1
9377
9385
9378
9386
- volatile
9379
9387
@@ -10065,8 +10073,12 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx940-table`.
10065
10073
10066
10074
**Release Atomic**
10067
10075
------------------------------------------------------------------------------------
10068
- store atomic release - singlethread - global 1. buffer/global/flat_store
10069
- - wavefront - generic
10076
+ store atomic release - singlethread - global 1. GFX940, GFX941
10077
+ - wavefront - generic buffer/global/flat_store
10078
+ sc0=1 sc1=1
10079
+ GFX942
10080
+ buffer/global/flat_store
10081
+
10070
10082
store atomic release - singlethread - local *If TgSplit execution mode,
10071
10083
- wavefront local address space cannot
10072
10084
be used.*
@@ -10103,7 +10115,12 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx940-table`.
10103
10115
store that is being
10104
10116
released.
10105
10117
10106
- 2. buffer/global/flat_store sc0=1
10118
+ 2. GFX940, GFX941
10119
+ buffer/global/flat_store
10120
+ sc0=1 sc1=1
10121
+ GFX942
10122
+ buffer/global/flat_store
10123
+ sc0=1
10107
10124
store atomic release - workgroup - local *If TgSplit execution mode,
10108
10125
local address space cannot
10109
10126
be used.*
@@ -10162,7 +10179,12 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx940-table`.
10162
10179
store that is being
10163
10180
released.
10164
10181
10165
- 3. buffer/global/flat_store sc1=1
10182
+ 3. GFX940, GFX941
10183
+ buffer/global/flat_store
10184
+ sc0=1 sc1=1
10185
+ GFX942
10186
+ buffer/global/flat_store
10187
+ sc1=1
10166
10188
store atomic release - system - global 1. buffer_wbl2 sc0=1 sc1=1
10167
10189
- generic
10168
10190
- Must happen before
0 commit comments