Skip to content

Commit 3734fe2

Browse files
committed
L1 is now a buffer + other small fix
1 parent 6509556 commit 3734fe2

File tree

1 file changed

+7
-10
lines changed

1 file changed

+7
-10
lines changed

llvm/docs/AMDGPUUsage.rst

Lines changed: 7 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -14159,19 +14159,16 @@ For GFX12:
1415914159
work-group:
1416014160

1416114161
* In CU wavefront execution mode, no special action is required.
14162-
* In WGP wavefront execution mode, a ``global_inv scope:SCOPE_CU`` is required
14162+
* In WGP wavefront execution mode, a ``global_inv scope:SCOPE_SE`` is required
1416314163
as wavefronts may be executing on SIMDs of different CUs that access different L0s.
1416414164

1416514165
* The scalar memory operations access a scalar L0 cache shared by all wavefronts
1416614166
on a WGP. The scalar and vector L0 caches are not coherent. However, scalar
1416714167
operations are used in a restricted way so do not impact the memory model. See
1416814168
:ref:`amdgpu-amdhsa-memory-spaces`.
14169-
* The vector and scalar memory L0 caches use an L1 cache shared by all WGPs on
14170-
the same SA. Therefore, no special action is required for coherence between
14171-
the wavefronts of a single work-group. However, a ``global_inv scope:SCOPE_DEV`` is
14172-
required for coherence between wavefronts executing in different work-groups
14173-
as they may be executing on different SAs that access different L1s.
14174-
* The L1 caches have independent quadrants to service disjoint ranges of virtual
14169+
* The vector and scalar memory L0 caches use an L1 buffer shared by all WGPs on
14170+
the same SA. The L1 buffer acts as a bridge to L2 for clients within a SA.
14171+
* The L1 buffers have independent quadrants to service disjoint ranges of virtual
1417514172
addresses.
1417614173
* Each L0 cache has a separate request queue per L1 quadrant. Therefore, the
1417714174
vector and scalar memory operations performed by different wavefronts, whether
@@ -14188,7 +14185,7 @@ For GFX12:
1418814185
* ``s_wait_bvhcnt 0x0``
1418914186
* ``s_wait_storecnt 0x0``
1419014187

14191-
* The L1 caches use an L2 cache shared by all SAs on the same agent.
14188+
* The L1 buffers use an L2 cache shared by all SAs on the same agent.
1419214189
* The L2 cache has independent channels to service disjoint ranges of virtual
1419314190
addresses.
1419414191
* Each L1 quadrant of a single SA accesses a different L2 channel. Each L1
@@ -14223,7 +14220,7 @@ may change between kernel dispatch executions. See
1422314220

1422414221
For kernarg backing memory:
1422514222

14226-
* CP invalidates the L0 and L1 caches at the start of each kernel dispatch.
14223+
* CP invalidates caches start of each kernel dispatch.
1422714224
* On dGPU the kernarg backing memory is accessed as MTYPE UC (uncached) to avoid
1422814225
needing to invalidate the L2 cache.
1422914226
* On APU the kernarg backing memory is accessed as MTYPE CC (cache coherent) and
@@ -14232,7 +14229,7 @@ For kernarg backing memory:
1423214229
Scratch backing memory (which is used for the private address space) is accessed
1423314230
with MTYPE NC (non-coherent). Since the private address space is only accessed
1423414231
by a single thread, and is always write-before-read, there is never a need to
14235-
invalidate these entries from the L0 or L1 caches.
14232+
invalidate these entries from L0.
1423614233

1423714234
Wavefronts can be executed in WGP or CU wavefront execution mode:
1423814235

0 commit comments

Comments
 (0)