[AMDGPU] Document & Finalize GFX12 Memory Model #98599

Pierre-vh · 2024-07-12T07:39:20Z

Documents the memory model implemented as of #98591, with some fixes/optimizations to the implementation.

llvmbot · 2024-07-12T07:39:51Z

@llvm/pr-subscribers-backend-amdgpu

Author: Pierre van Houtryve (Pierre-vh)

Changes

Document the memory model implemented as of #98591

Patch is 160.80 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/98599.diff

1 Files Affected:

(modified) llvm/docs/AMDGPUUsage.rst (+2261)

diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
index 117fc2cf6bbbc..be8a2fed07b57 100644
--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -6094,6 +6094,7 @@ following sections:
 * :ref:`amdgpu-amdhsa-memory-model-gfx90a`
 * :ref:`amdgpu-amdhsa-memory-model-gfx942`
 * :ref:`amdgpu-amdhsa-memory-model-gfx10-gfx11`
+* :ref:`amdgpu-amdhsa-memory-model-gfx12`
 
 .. _amdgpu-fence-as:
 
@@ -14074,6 +14075,2266 @@ table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx10-gfx11-table`.
                                - system                  for OpenCL.*
      ============ ============ ============== ========== ================================
 
+
+.. _amdgpu-amdhsa-memory-model-gfx12:
+
+Memory Model GFX12
+++++++++++++++++++++++++
+
+For GFX12:
+
+* Each agent has multiple shader arrays (SA).
+* Each SA has multiple work-group processors (WGP).
+* Each WGP has multiple compute units (CU).
+* Each CU has multiple SIMDs that execute wavefronts.
+* The wavefronts for a single work-group are executed in the same
+  WGP.
+
+  * In CU wavefront execution mode the wavefronts may be executed by different SIMDs
+    in the same CU.
+  * In WGP wavefront execution mode the wavefronts may be executed by different SIMDs
+    in different CUs in the same WGP.
+
+* Each WGP has a single LDS memory shared by the wavefronts of the work-groups
+  executing on it.
+* All LDS operations of a WGP are performed as wavefront wide operations in a
+  global order and involve no caching. Completion is reported to a wavefront in
+  execution order.
+* The LDS memory has multiple request queues shared by the SIMDs of a
+  WGP. Therefore, the LDS operations performed by different wavefronts of a
+  work-group can be reordered relative to each other, which can result in
+  reordering the visibility of vector memory operations with respect to LDS
+  operations of other wavefronts in the same work-group. A ``s_wait_dscnt 0x0``
+  is required to ensure synchronization between LDS operations and
+  vector memory operations between wavefronts of a work-group, but not between
+  operations performed by the same wavefront.
+* The vector memory operations are performed as wavefront wide operations.
+  Vector memory operations are divided in different types. Completion of a
+  vector memory operation is reported to a wavefront in-order within a type,
+  but may be out of order between types. The types of vector memory operations
+  (and their associated ``s_wait`` instructions) are:
+
+  * LDS: ``s_wait_dscnt``
+  * Load (global, scratch, flat, buffer and image): ``s_wait_loadcnt``
+  * Store (global, scratch, flat, buffer and image): ``s_wait_storecnt``
+  * Sample and Gather4: ``s_wait_samplecnt``
+  * BVH: ``s_wait_bvhcnt``
+
+* Vector and scalar memory instructions contain a ``SCOPE`` field with values
+  corresponding to each cache level. The ``SCOPE`` determines whether a cache
+  can complete an operation locally or whether it needs to forward the operation
+  to the next cache level. The ``SCOPE`` values are:
+
+  * ``SCOPE_CU``: Compute Unit (NOTE: not affected by CU/WGP mode)
+  * ``SCOPE_SE``: Shader Engine
+  * ``SCOPE_DEV``: Device/Agent
+  * ``SCOPE_SYS``: System
+
+* When a memory operation with a given ``SCOPE`` reaches a cache with a smaller
+  ``SCOPE`` value, it is forwarded to the next level of cache.
+* When a memory operation with a given ``SCOPE`` reaches a cache with a ``SCOPE``
+  value greater than or equal to its own, the operation can proceed:
+
+  * Reads can hit into the cache
+  * Writes can happen in this cache and the transaction is acknowledged
+    from this level of cache.
+  * RMW operations can be done locally.
+
+* ``global_inv``, ``global_wb`` and ``global_wbinv`` instructions are used to
+  invalidate, write-back and write-back+invalidate caches. The affected
+  cache(s) are controlled by the ``SCOPE:`` of the instruction.
+* ``global_inv`` invalidates caches whose scope is strictly smaller than the
+  instruction's. The invalidation requests cannot be reordered with pending or
+  upcoming memory operations.
+* ``global_wb`` additionally ensures that previous memory operation done at
+  a lower scope level have reached the ``SCOPE:`` of the ``global_wb``.
+* The vector memory operations access a vector L0 cache. There is a single L0
+  cache per CU. Each SIMD of a CU accesses the same L0 cache. Therefore, no
+  special action is required for coherence between the lanes of a single
+  wavefront. To achieve coherence between wavefronts executing in the same
+  work-group:
+
+  * In CU wavefront execution mode, no special action is required.
+  * In WGP wavefront execution mode, a ``global_inv scope:SCOPE_CU`` is required
+    as wavefronts may be executing on SIMDs of different CUs that access different L0s.
+
+* The scalar memory operations access a scalar L0 cache shared by all wavefronts
+  on a WGP. The scalar and vector L0 caches are not coherent. However, scalar
+  operations are used in a restricted way so do not impact the memory model. See
+  :ref:`amdgpu-amdhsa-memory-spaces`.
+* The vector and scalar memory L0 caches use an L1 cache shared by all WGPs on
+  the same SA. Therefore, no special action is required for coherence between
+  the wavefronts of a single work-group. However, a ``global_inv scope:SCOPE_DEV`` is
+  required for coherence between wavefronts executing in different work-groups
+  as they may be executing on different SAs that access different L1s.
+* The L1 caches have independent quadrants to service disjoint ranges of virtual
+  addresses.
+* Each L0 cache has a separate request queue per L1 quadrant. Therefore, the
+  vector and scalar memory operations performed by different wavefronts, whether
+  executing in the same or different work-groups (which may be executing on
+  different CUs accessing different L0s), can be reordered relative to each
+  other. Some or all of the wait instructions below are required to ensure
+  synchronization between vector memory operations of different wavefronts. It
+  ensures a previous vector memory operation has completed before executing a
+  subsequent vector memory or LDS operation and so can be used to meet the
+  requirements of acquire, release and sequential consistency.
+
+  * ``s_wait_loadcnt 0x0``
+  * ``s_wait_samplecnt 0x0``
+  * ``s_wait_bvhcnt 0x0``
+  * ``s_wait_storecnt 0x0``
+
+* The L1 caches use an L2 cache shared by all SAs on the same agent.
+* The L2 cache has independent channels to service disjoint ranges of virtual
+  addresses.
+* Each L1 quadrant of a single SA accesses a different L2 channel. Each L1
+  quadrant has a separate request queue per L2 channel. Therefore, the vector
+  and scalar memory operations performed by wavefronts executing in different
+  work-groups (which may be executing on different SAs) of an agent can be
+  reordered relative to each other. Some or all of the wait instructions below are
+  required to ensure synchronization between vector memory operations of
+  different SAs. It ensures a previous vector memory operation has completed
+  before executing a subsequent vector memory and so can be used to meet the
+  requirements of acquire, release and sequential consistency.
+
+  * ``s_wait_loadcnt 0x0``
+  * ``s_wait_samplecnt 0x0``
+  * ``s_wait_bvhcnt 0x0``
+  * ``s_wait_storecnt 0x0``
+
+* The L2 cache can be kept coherent with other agents, or ranges
+  of virtual addresses can be set up to bypass it to ensure system coherence.
+* A memory attached last level (MALL) cache exists for GPU memory.
+  The MALL cache is fully coherent with GPU memory and has no impact on system
+  coherence. All agents (GPU and CPU) access GPU memory through the MALL cache.
+
+Scalar memory operations are only used to access memory that is proven to not
+change during the execution of the kernel dispatch. This includes constant
+address space and global address space for program scope ``const`` variables.
+Therefore, the kernel machine code does not have to maintain the scalar cache to
+ensure it is coherent with the vector caches. The scalar and vector caches are
+invalidated between kernel dispatches by CP since constant address space data
+may change between kernel dispatch executions. See
+:ref:`amdgpu-amdhsa-memory-spaces`.
+
+For kernarg backing memory:
+
+* CP invalidates the L0 and L1 caches at the start of each kernel dispatch.
+* On dGPU the kernarg backing memory is accessed as MTYPE UC (uncached) to avoid
+  needing to invalidate the L2 cache.
+* On APU the kernarg backing memory is accessed as MTYPE CC (cache coherent) and
+  so the L2 cache will be coherent with the CPU and other agents.
+
+Scratch backing memory (which is used for the private address space) is accessed
+with MTYPE NC (non-coherent). Since the private address space is only accessed
+by a single thread, and is always write-before-read, there is never a need to
+invalidate these entries from the L0 or L1 caches.
+
+Wavefronts can be executed in WGP or CU wavefront execution mode:
+
+* In WGP wavefront execution mode the wavefronts of a work-group are executed
+  on the SIMDs of both CUs of the WGP. Therefore, explicit management of the per
+  CU L0 caches is required for work-group synchronization. Also accesses to L1
+  at work-group scope need to be explicitly ordered as the accesses from
+  different CUs are not ordered.
+* In CU wavefront execution mode the wavefronts of a work-group are executed on
+  the SIMDs of a single CU of the WGP. Therefore, all global memory access by
+  the work-group access the same L0 which in turn ensures L1 accesses are
+  ordered and so do not require explicit management of the caches for
+  work-group synchronization.
+
+See ``WGP_MODE`` field in
+:ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx12-table` and
+:ref:`amdgpu-target-features`.
+
+The code sequences used to implement the memory model for GFX12 are defined in
+table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx12-table`.
+
+  .. table:: AMDHSA Memory Model Code Sequences GFX12 - Instruction Scopes
+     :name: amdgpu-amdhsa-memory-model-code-sequences-gfx12-scopes-table
+
+     =================== =================== ===================
+     LLVM syncscope      CU wavefront        WGP wavefront
+                         execution           execution
+                         mode                mode
+     =================== =================== ===================
+     *none*              ``scope:SCOPE_SYS`` ``scope:SCOPE_SYS``
+     system              ``scope:SCOPE_SYS`` ``scope:SCOPE_SYS``
+     agent               ``scope:SCOPE_DEV`` ``scope:SCOPE_DEV``
+     workgroup           *none*              ``scope:SCOPE_SE``
+     wavefront           *none*              *none*
+     singlethread        *none*              *none*
+     one-as              ``scope:SCOPE_SYS`` ``scope:SCOPE_SYS``
+     system-one-as       ``scope:SCOPE_SYS`` ``scope:SCOPE_SYS``
+     agent-one-as        ``scope:SCOPE_DEV`` ``scope:SCOPE_DEV``
+     workgroup-one-as    *none*              ``scope:SCOPE_SE``
+     wavefront-one-as    *none*              *none*
+     singlethread-one-as *none*              *none*
+     =================== =================== ===================
+
+NOTE: The table above applies if and only if it is explicitly referenced by
+a code sequence in :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx12-table`.
+
+  .. table:: AMDHSA Memory Model Code Sequences GFX12
+     :name: amdgpu-amdhsa-memory-model-code-sequences-gfx12-table
+
+     ============ ============ ============== ========== ================================
+     LLVM Instr   LLVM Memory  LLVM Memory    AMDGPU     AMDGPU Machine Code
+                  Ordering     Sync Scope     Address    GFX12
+                                              Space
+     ============ ============ ============== ========== ================================
+     **Non-Atomic**
+     ------------------------------------------------------------------------------------
+     load         *none*       *none*         - global   - !volatile & !nontemporal
+                                              - generic
+                                              - private    1. buffer/global/flat_load
+                                              - constant
+                                                         - !volatile & nontemporal
+
+                                                           1. buffer/global/flat_load
+                                                              ``th:TH_LOAD_NT``
+
+                                                         - volatile
+
+                                                           1. buffer/global/flat_load
+                                                              ``scope:SCOPE_SYS``
+
+                                                           2. ``s_wait_bvhcnt 0x0``
+                                                              ``s_wait_samplecnt 0x0``
+                                                              ``s_wait_loadcnt 0x0``
+
+                                                            - Must happen before
+                                                              any following volatile
+                                                              global/generic
+                                                              load/store.
+                                                            - Ensures that
+                                                              volatile
+                                                              operations to
+                                                              different
+                                                              addresses will not
+                                                              be reordered by
+                                                              hardware.
+
+     load         *none*       *none*         - local    1. ds_load
+     store        *none*       *none*         - global   - !volatile & !nontemporal
+                                              - generic
+                                              - private    1. buffer/global/flat_store
+                                              - constant
+                                                         - !volatile & nontemporal
+
+                                                           1. buffer/global/flat_store
+                                                              ``th:TH_STORE_NT``
+
+                                                         - volatile
+
+                                                           1. buffer/global/flat_store
+                                                              ``scope:SCOPE_SYS``
+
+                                                           2. ``s_wait_storecnt 0x0``
+
+                                                            - Must happen before
+                                                              any following volatile
+                                                              global/generic
+                                                              load/store.
+                                                            - Ensures that
+                                                              volatile
+                                                              operations to
+                                                              different
+                                                              addresses will not
+                                                              be reordered by
+                                                              hardware.
+
+     store        *none*       *none*         - local    1. ds_store
+     **Unordered Atomic**
+     ------------------------------------------------------------------------------------
+     load atomic  unordered    *any*          *any*      *Same as non-atomic*.
+     store atomic unordered    *any*          *any*      *Same as non-atomic*.
+     atomicrmw    unordered    *any*          *any*      *Same as monotonic atomic*.
+     **Monotonic Atomic**
+     ------------------------------------------------------------------------------------
+     load atomic  monotonic    - singlethread - global   1. buffer/global/flat_load
+                               - wavefront    - generic
+                               - workgroup                - Apply :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx12-scopes-table`.
+                               - agent
+                               - system
+     load atomic  monotonic    - singlethread - local    1. ds_load
+                               - wavefront
+                               - workgroup
+     store atomic monotonic    - singlethread - global   1. buffer/global/flat_store
+                               - wavefront    - generic
+                               - workgroup                 - Apply :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx12-scopes-table`.
+                               - agent
+                               - system
+     store atomic monotonic    - singlethread - local    1. ds_store
+                               - wavefront
+                               - workgroup
+     atomicrmw    monotonic    - singlethread - global   1. buffer/global/flat_atomic
+                               - wavefront    - generic
+                               - workgroup                 - Apply :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx12-scopes-table`.
+                               - agent
+                               - system
+     atomicrmw    monotonic    - singlethread - local    1. ds_atomic
+                               - wavefront
+                               - workgroup
+     **Acquire Atomic**
+     ------------------------------------------------------------------------------------
+     load atomic  acquire      - singlethread - global   1. buffer/global/ds/flat_load
+                               - wavefront    - local
+                                              - generic
+     load atomic  acquire      - workgroup    - global   1. buffer/global_load ``scope:SCOPE_SE``
+
+                                                           - Apply :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx12-scopes-table`.
+
+                                                         2. | ``s_wait_bvhcnt 0x0``
+                                                            | ``s_wait_samplecnt 0x0``
+                                                            | ``s_wait_loadcnt 0x0``
+
+                                                           - If CU wavefront execution
+                                                             mode, omit.
+                                                           - Must happen before
+                                                             the following ``global_inv``
+                                                             and before any following
+                                                             global/generic
+                                                             load/load
+                                                             atomic/store/store
+                                                             atomic/atomicrmw.
+
+                                                         3. ``global_inv scope:SCOPE_SE``
+
+                                                           - If CU wavefront execution
+                                                             mode, omit.
+                                                           - Ensures that
+                                                             following
+                                                             loads will not see
+                                                             stale data.
+
+     load atomic  acquire      - workgroup    - local    1. ds_load
+                                                         2. ``s_wait_dscnt 0x0``
+
+                                                           - If OpenCL, omit.
+                                                           - Must happen before
+                                                             the following ``global_inv``
+                                                             and before any following
+ ...
[truncated]

jayfoad

LGTM (having previously reviewed this downstream)

Pierre-vh · 2024-07-16T08:53:44Z

I will land this at the end of the week if no comments come up.

jayfoad · 2024-07-17T09:17:06Z

llvm/docs/AMDGPUUsage.rst

+                         execution           execution
+                         mode                mode
+     =================== =================== ===================
+     *none*              ``scope:SCOPE_SYS`` ``scope:SCOPE_SYS``


So if you don't specify a syncscope in IR, it acts like system? Has that always been the case?

@mariusz-sikora-at-amd

Yes, the default is always system scope: https://llvm.org/docs/AMDGPUUsage.html#memory-scopes

It has to be that way otherwise the code generated would not be conservatively correct in the absence of the hint.

Pierre-vh · 2024-08-01T06:30:56Z

@jayfoad Can you please review the changes I made for L1 as a buffer?

t-tye · 2024-08-01T23:12:48Z

llvm/docs/AMDGPUUsage.rst

+The code sequences used to implement the memory model for GFX12 are defined in
+table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx12-table`.
+
+  .. table:: AMDHSA Memory Model Code Sequences GFX12 - Instruction Scopes


Do we really want the semantic meaning of syncscope to change according to the hardware mode? My understanding is that synscope is a language level concept. It is the responsibility of the backend to generate the correct code according to what mode the hardware will be running in as defined in the kernel descriptor.

It is not my intent to change the meaning of syncscope, this table just provides a mapping of LLVM IR synscope to GFX12 ISA scope operands.

The idea is that you start with some instruction, say a flat_load_b32, then when you see this table being referenced in the code sequences table below, you look at it and add the relevant scope. So if you want workgroup scope, you either add nothing for CU mode, or scope:SCOPE_SE for WGP mode.

I'm now realizing the doc isn't maybe as evident as I thought it would be so I will add a short paragraph to explain this

nhaehnle

Mostly LGTM, but I do have some substantive comments :)

llvm/docs/AMDGPUUsage.rst

jayfoad · 2024-08-15T08:34:03Z

Can you please review the changes I made for L1 as a buffer?

I'm confused that c2625c2 changes the text about what global_inv are required, but does not update anything in the code sequences table.

Document the memory model implemented as of llvm#98591

Pierre-vh · 2024-08-19T09:09:23Z

Can you please review the changes I made for L1 as a buffer?

I'm confused that c2625c2 changes the text about what global_inv are required, but does not update anything in the code sequences table.

Good catch, and you're right. I think we could replace those SCOPE_DEV with SCOPE_SE, but I'm not really convinced it's the right decision because:

global_inv is meant as a release operation, and it makes sense that if we do a agent scope release, we should have a global_inv scope:SCOPE_DEV.
I prefer to not rely too much on the device configuration when it's not strictly needed. e.g. it's technically possible for L2 to have SCOPE_SE depending on mtype, device layout, etc. which means that for a agent release we would have to invalidate it too, and a SCOPE_SE inv won't do it. (Though this is a bit of a "whataboutism")
AFAIK L1 forwards everything to L2, so SCOPE_SE vs DEV isn't any less efficient, it'll reach L2 in any case.

I will bring this up with @t-tye on our next meeting. My intuition is that we should leave SCOPE_DEV, and then add a new paragraph to explain how we approach global_inv/wb emission.

Pierre-vh · 2024-08-19T09:32:38Z

note: I plan to include the code changes in this diff as well and just make it a "finalize GFX12 memory model" patch. It makes more sense and will be easier to review like that IMO.

I need a bit of time to finalize the code changes, they'll come later today or tomorrow.

nhaehnle · 2024-08-19T14:08:19Z

I think we could replace those SCOPE_DEV with SCOPE_SE

[...]

AFAIK L1 forwards everything to L2, so SCOPE_SE vs DEV isn't any less efficient, it'll reach L2 in any case.

I'm not entirely sure that's true, and in any case I think we shouldn't rely on such details. Let's just have the scopes follow the semantics we want. So like you said, and agent scope release should do a SCOPE_DEV writeback, an agent scope acquire should do a SCOPE_DEV invalidate.

t-tye · 2024-08-19T16:34:07Z

Can you please review the changes I made for L1 as a buffer?

I'm confused that c2625c2 changes the text about what global_inv are required, but does not update anything in the code sequences table.

Good catch, and you're right. I think we could replace those SCOPE_DEV with SCOPE_SE, but I'm not really convinced it's the right decision because:
* global_inv is meant as a release operation, and it makes sense that if we do a agent scope release, we should have a `global_inv scope:SCOPE_DEV`.

* I prefer to not rely too much on the device configuration when it's not strictly needed. e.g. it's technically possible for L2 to have SCOPE_SE depending on mtype, device layout, etc. which means that for a agent release we would have to invalidate it too, and a SCOPE_SE inv won't do it. (Though this is a bit of a "whataboutism")

* AFAIK L1 forwards everything to L2, so SCOPE_SE vs DEV isn't any less efficient, it'll reach L2 in any case.
I will bring this up with @t-tye on our next meeting. My intuition is that we should leave SCOPE_DEV, and then add a new paragraph to explain how we approach global_inv/wb emission.

Yes let's discuss this as there are several things here that seem questionable:-)

t-tye · 2024-08-19T16:49:26Z

I think we could replace those SCOPE_DEV with SCOPE_SE
[...]
AFAIK L1 forwards everything to L2, so SCOPE_SE vs DEV isn't any less efficient, it'll reach L2 in any case.

I'm not entirely sure that's true, and in any case I think we shouldn't rely on such details. Let's just have the scopes follow the semantics we want. So like you said, and agent scope release should do a SCOPE_DEV writeback, an agent scope acquire should do a SCOPE_DEV invalidate.

This makes more sense to me. Release is not about invalidating, it is about "writing back" and ensuring it has completed. The WB instructions do this. Even if the cache is write through, we still have to confirm that the write through has completed to the scope we want to release to. That is were the WB instruction comes in. It does more than just trigger a write back, it also confirms the write is complete at a specified scope.

We want the hardware instruction scopes to reflect the source language semantics. Unfortunately this is not always the case and so we have to modify the scope according to the modality of the configuration in some cases. But where the scopes do reflect the language semantics, we can use them and not worry about which caches they manipulate as the hardware will make sure it controls the appropriate caches for the source language semantic action in conjunction with the hardware modal configuration.

Pierre-vh · 2024-09-04T07:05:02Z

@nhaehnle @jayfoad I added a comment in the insertRelease about the approach we follow for WB emission.

Pierre-vh · 2024-09-06T05:53:33Z

@nhaehnle @jayfoad Hi, can you please re-review/approve?
The review is already marked as approved but it needs another round of review after the recent changes.
Thanks

nhaehnle

Thanks, LGTM

Pierre-vh requested review from jayfoad, arsenm, nhaehnle and t-tye July 12, 2024 07:39

llvmbot added the backend:AMDGPU label Jul 12, 2024

jayfoad approved these changes Jul 15, 2024

View reviewed changes

jayfoad reviewed Jul 17, 2024

View reviewed changes

Pierre-vh requested a review from jayfoad August 1, 2024 09:52

t-tye reviewed Aug 1, 2024

View reviewed changes

nhaehnle reviewed Aug 13, 2024

View reviewed changes

Pierre-vh added 4 commits August 19, 2024 10:11

[AMDGPU] Document GFX12 Memory Model

6509556

Document the memory model implemented as of llvm#98591

L1 is now a buffer + other small fix

3734fe2

Try to make the synscope table clearer

ae1aa5b

Fix waitcnts

6ccf48a

Pierre-vh force-pushed the gfx12-memory-model-docs branch from 3a17dc5 to 6ccf48a Compare August 19, 2024 08:29

additional fixes

0ade58a

Add code changes

d880449

Pierre-vh changed the title ~~[AMDGPU] Document GFX12 Memory Model~~ [AMDGPU] Document & Finalize GFX12 Memory Model Aug 19, 2024

Pierre-vh requested a review from nhaehnle August 19, 2024 10:11

Added note about WB emission

e89e94b

Pierre-vh requested a review from t-tye September 9, 2024 10:15

nhaehnle approved these changes Sep 9, 2024

View reviewed changes

Pierre-vh merged commit eaac4a2 into llvm:main Sep 9, 2024
9 checks passed

Pierre-vh deleted the gfx12-memory-model-docs branch September 9, 2024 13:35

[AMDGPU] Document & Finalize GFX12 Memory Model #98599

[AMDGPU] Document & Finalize GFX12 Memory Model #98599

Uh oh!

Conversation

Pierre-vh commented Jul 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Jul 12, 2024

Uh oh!

jayfoad left a comment

Choose a reason for hiding this comment

Uh oh!

Pierre-vh commented Jul 16, 2024

Uh oh!

jayfoad Jul 17, 2024

Choose a reason for hiding this comment

Uh oh!

Pierre-vh Jul 17, 2024

Choose a reason for hiding this comment

Uh oh!

t-tye Aug 1, 2024

Choose a reason for hiding this comment

Uh oh!

Pierre-vh commented Aug 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

t-tye Aug 1, 2024

Choose a reason for hiding this comment

Uh oh!

Pierre-vh Aug 2, 2024

Choose a reason for hiding this comment

Uh oh!

nhaehnle left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jayfoad commented Aug 15, 2024

Uh oh!

Pierre-vh commented Aug 19, 2024

Uh oh!

Pierre-vh commented Aug 19, 2024

Uh oh!

nhaehnle commented Aug 19, 2024

Uh oh!

t-tye commented Aug 19, 2024

Uh oh!

t-tye commented Aug 19, 2024

Uh oh!

Pierre-vh commented Sep 4, 2024

Uh oh!

Pierre-vh commented Sep 6, 2024

Uh oh!

nhaehnle left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Pierre-vh commented Jul 12, 2024 •

edited

Loading

Pierre-vh commented Aug 1, 2024 •

edited

Loading