Skip to content

Commit 8b36a19

Browse files
authored
AMDGPU/Docs: Memory model updates for GFX940, GFX941, GFX942 (#71091)
- Update memory model sequences for GFX940, GFX941, GFX942 to match implementation - Re-title "Memory Model GFX940" to "Memory Model GFX942" Co-authored with @t-tye Change-Id: I82f1707b7c3e010ce1fe8207fcca18c4570057a3 Co-authored-by: Konstantin Zhuravlyov <[email protected]>
1 parent 7fa9930 commit 8b36a19

File tree

1 file changed

+40
-18
lines changed

1 file changed

+40
-18
lines changed

llvm/docs/AMDGPUUsage.rst

Lines changed: 40 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -5538,7 +5538,7 @@ following sections:
55385538

55395539
* :ref:`amdgpu-amdhsa-memory-model-gfx6-gfx9`
55405540
* :ref:`amdgpu-amdhsa-memory-model-gfx90a`
5541-
* :ref:`amdgpu-amdhsa-memory-model-gfx940`
5541+
* :ref:`amdgpu-amdhsa-memory-model-gfx942`
55425542
* :ref:`amdgpu-amdhsa-memory-model-gfx10-gfx11`
55435543

55445544
.. _amdgpu-amdhsa-memory-model-gfx6-gfx9:
@@ -9190,12 +9190,12 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx90a-table`.
91909190
- system for OpenCL.*
91919191
============ ============ ============== ========== ================================
91929192

9193-
.. _amdgpu-amdhsa-memory-model-gfx940:
9193+
.. _amdgpu-amdhsa-memory-model-gfx942:
91949194

9195-
Memory Model GFX940
9195+
Memory Model GFX942
91969196
+++++++++++++++++++
91979197

9198-
For GFX940:
9198+
For GFX942:
91999199

92009200
* Each agent has multiple shader arrays (SA).
92019201
* Each SA has multiple compute units (CU).
@@ -9249,7 +9249,7 @@ For GFX940:
92499249
model. See :ref:`amdgpu-amdhsa-memory-spaces`.
92509250
* The vector and scalar memory operations use an L2 cache.
92519251

9252-
* The gfx940 can be configured as a number of smaller agents with each having
9252+
* The gfx942 can be configured as a number of smaller agents with each having
92539253
a single L2 shared by all CUs on the same agent, or as fewer (possibly one)
92549254
larger agents with groups of CUs on each agent each sharing separate L2
92559255
caches.
@@ -9325,15 +9325,15 @@ only accessed by a single thread, and is always write-before-read, there is
93259325
never a need to invalidate these entries from the L1 cache. Hence all cache
93269326
invalidates are done as ``*_vol`` to only invalidate the volatile cache lines.
93279327

9328-
The code sequences used to implement the memory model for GFX940 are defined
9329-
in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx940-table`.
9328+
The code sequences used to implement the memory model for GFX940, GFX941, GFX942
9329+
are defined in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx940-gfx941-gfx942-table`.
93309330

9331-
.. table:: AMDHSA Memory Model Code Sequences GFX940
9332-
:name: amdgpu-amdhsa-memory-model-code-sequences-gfx940-table
9331+
.. table:: AMDHSA Memory Model Code Sequences GFX940, GFX941, GFX942
9332+
:name: amdgpu-amdhsa-memory-model-code-sequences-gfx940-gfx941-gfx942-table
93339333

93349334
============ ============ ============== ========== ================================
93359335
LLVM Instr LLVM Memory LLVM Memory AMDGPU AMDGPU Machine Code
9336-
Ordering Sync Scope Address GFX940
9336+
Ordering Sync Scope Address GFX940, GFX941, GFX942
93379337
Space
93389338
============ ============ ============== ========== ================================
93399339
**Non-Atomic**
@@ -9368,12 +9368,20 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx940-table`.
93689368
load *none* *none* - local 1. ds_load
93699369
store *none* *none* - global - !volatile & !nontemporal
93709370
- generic
9371-
- private 1. buffer/global/flat_store
9372-
- constant
9371+
- private 1. GFX940, GFX941
9372+
- constant buffer/global/flat_store
9373+
sc0=1 sc1=1
9374+
GFX942
9375+
buffer/global/flat_store
9376+
93739377
- !volatile & nontemporal
93749378

9375-
1. buffer/global/flat_store
9376-
nt=1
9379+
1. GFX940, GFX941
9380+
buffer/global/flat_store
9381+
nt=1 sc0=1 sc1=1
9382+
GFX942
9383+
buffer/global/flat_store
9384+
nt=1
93779385

93789386
- volatile
93799387

@@ -10065,8 +10073,12 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx940-table`.
1006510073

1006610074
**Release Atomic**
1006710075
------------------------------------------------------------------------------------
10068-
store atomic release - singlethread - global 1. buffer/global/flat_store
10069-
- wavefront - generic
10076+
store atomic release - singlethread - global 1. GFX940, GFX941
10077+
- wavefront - generic buffer/global/flat_store
10078+
sc0=1 sc1=1
10079+
GFX942
10080+
buffer/global/flat_store
10081+
1007010082
store atomic release - singlethread - local *If TgSplit execution mode,
1007110083
- wavefront local address space cannot
1007210084
be used.*
@@ -10103,7 +10115,12 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx940-table`.
1010310115
store that is being
1010410116
released.
1010510117

10106-
2. buffer/global/flat_store sc0=1
10118+
2. GFX940, GFX941
10119+
buffer/global/flat_store
10120+
sc0=1 sc1=1
10121+
GFX942
10122+
buffer/global/flat_store
10123+
sc0=1
1010710124
store atomic release - workgroup - local *If TgSplit execution mode,
1010810125
local address space cannot
1010910126
be used.*
@@ -10162,7 +10179,12 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx940-table`.
1016210179
store that is being
1016310180
released.
1016410181

10165-
3. buffer/global/flat_store sc1=1
10182+
3. GFX940, GFX941
10183+
buffer/global/flat_store
10184+
sc0=1 sc1=1
10185+
GFX942
10186+
buffer/global/flat_store
10187+
sc1=1
1016610188
store atomic release - system - global 1. buffer_wbl2 sc0=1 sc1=1
1016710189
- generic
1016810190
- Must happen before

0 commit comments

Comments
 (0)