@@ -14159,19 +14159,16 @@ For GFX12:
14159
14159
work-group:
14160
14160
14161
14161
* In CU wavefront execution mode, no special action is required.
14162
- * In WGP wavefront execution mode, a ``global_inv scope:SCOPE_CU `` is required
14162
+ * In WGP wavefront execution mode, a ``global_inv scope:SCOPE_SE `` is required
14163
14163
as wavefronts may be executing on SIMDs of different CUs that access different L0s.
14164
14164
14165
14165
* The scalar memory operations access a scalar L0 cache shared by all wavefronts
14166
14166
on a WGP. The scalar and vector L0 caches are not coherent. However, scalar
14167
14167
operations are used in a restricted way so do not impact the memory model. See
14168
14168
:ref:`amdgpu-amdhsa-memory-spaces`.
14169
- * The vector and scalar memory L0 caches use an L1 cache shared by all WGPs on
14170
- the same SA. Therefore, no special action is required for coherence between
14171
- the wavefronts of a single work-group. However, a ``global_inv scope:SCOPE_DEV`` is
14172
- required for coherence between wavefronts executing in different work-groups
14173
- as they may be executing on different SAs that access different L1s.
14174
- * The L1 caches have independent quadrants to service disjoint ranges of virtual
14169
+ * The vector and scalar memory L0 caches use an L1 buffer shared by all WGPs on
14170
+ the same SA. The L1 buffer acts as a bridge to L2 for clients within a SA.
14171
+ * The L1 buffers have independent quadrants to service disjoint ranges of virtual
14175
14172
addresses.
14176
14173
* Each L0 cache has a separate request queue per L1 quadrant. Therefore, the
14177
14174
vector and scalar memory operations performed by different wavefronts, whether
@@ -14188,7 +14185,7 @@ For GFX12:
14188
14185
* ``s_wait_bvhcnt 0x0``
14189
14186
* ``s_wait_storecnt 0x0``
14190
14187
14191
- * The L1 caches use an L2 cache shared by all SAs on the same agent.
14188
+ * The L1 buffers use an L2 cache shared by all SAs on the same agent.
14192
14189
* The L2 cache has independent channels to service disjoint ranges of virtual
14193
14190
addresses.
14194
14191
* Each L1 quadrant of a single SA accesses a different L2 channel. Each L1
@@ -14223,7 +14220,7 @@ may change between kernel dispatch executions. See
14223
14220
14224
14221
For kernarg backing memory:
14225
14222
14226
- * CP invalidates the L0 and L1 caches at the start of each kernel dispatch.
14223
+ * CP invalidates caches start of each kernel dispatch.
14227
14224
* On dGPU the kernarg backing memory is accessed as MTYPE UC (uncached) to avoid
14228
14225
needing to invalidate the L2 cache.
14229
14226
* On APU the kernarg backing memory is accessed as MTYPE CC (cache coherent) and
@@ -14232,7 +14229,7 @@ For kernarg backing memory:
14232
14229
Scratch backing memory (which is used for the private address space) is accessed
14233
14230
with MTYPE NC (non-coherent). Since the private address space is only accessed
14234
14231
by a single thread, and is always write-before-read, there is never a need to
14235
- invalidate these entries from the L0 or L1 caches .
14232
+ invalidate these entries from L0 .
14236
14233
14237
14234
Wavefronts can be executed in WGP or CU wavefront execution mode:
14238
14235
0 commit comments