-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[RFC] Memory Model Relaxation Annotations #78569
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-llvm-transforms @llvm/pr-subscribers-llvm-selectiondag Author: Pierre van Houtryve (Pierre-vh) ChangesWork-in-progress patch to implement the core/target-agnostic components of Memory Model Relaxation Annotations. This diff is mostly complete but likely has a few holes, especially in codegen (MMRAs aren't always carried over to MI layer I believe). Most of the work needs to be done in adding tests and analyzing the outputs to find what's missing. Patch is 98.39 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/78569.diff 35 Files Affected:
diff --git a/llvm/docs/MemoryModelRelaxationAnnotations.rst b/llvm/docs/MemoryModelRelaxationAnnotations.rst
new file mode 100644
index 00000000000000..3ec0ddaf289a88
--- /dev/null
+++ b/llvm/docs/MemoryModelRelaxationAnnotations.rst
@@ -0,0 +1,351 @@
+
+===================================
+Memory Model Relaxation Annotations
+===================================
+.. contents::
+ :local:
+Introduction
+============
+Memory Model Relaxation Annotations (MMRAs) are target-defined properties
+on instructions that can be used to selectively relax constraints placed
+by the memory model. For example:
+* The use of ``VulkanMemoryModel`` in a SPIRV program allows certain
+ memory operations to be reordered across ``acquire`` or ``release``
+ operations.
+* OpenCL APIs expose primitives to only fence a specific set of address
+ spaces, carrying that information to the backend can enable the
+ use of faster synchronization instructions, rather than fencing all
+ address spaces.
+MMRAs offer an opt-in system for targets to relax the default LLVM
+memory model.
+As such, they are attached to an operation using LLVM metadata which
+can always be dropped without affecting correctness.
+Definitions
+===========
+memory operation
+ A load, a store, an atomic, or a function call that is marked as
+ accessing memory.
+synchronizing operation
+ An instruction that synchronizes memory with other threads (e.g.
+ an atomic or a fence).
+tag
+ Metadata attached to a memory or synchronizing operation
+ that represents some target-defined property regarding memory
+ synchronization.
+ An operation may have multiple tags that each represent a different
+ property.
+ A tag is composed of a pair of metadata nodes:
+ * a *prefix* string.
+ * a *suffix* integer or string.
+ In LLVM IR, the pair is represented using a metadata tuple.
+ In other cases (comments, documentation, etc.), we may use the
+ ``prefix:suffix`` notation.
+ For example:
+ .. code-block::
+ :caption: Example: Tags in Metadata
+ !0 = !{!"scope", !"workgroup"} # scope:workgroup
+ !1 = !{!"scope", !"device"} # scope:device
+ !2 = !{!"scope", !"system"} # scope:system
+ !3 = !{!"sync-as", i32 2} # sync-as:2
+ !4 = !{!"sync-as", i32 1} # sync-as:1
+ !5 = !{!"sync-as", i32 0} # sync-as:0
+ .. note::
+ The only semantics relevant to the optimizer is the
+ "compatibility" relation defined below. All other
+ semantics are target defined.
+ Tags can also be organised in lists to allow operations
+ to specify all of the tags they belong to. Such a list
+ is referred to as a "set of tags".
+ .. code-block::
+ :caption: Example: Set of Tags in Metadata
+ !0 = !{!"scope", !"workgroup"}
+ !1 = !{!"sync-as", !"private"}
+ !2 = !{!0, !2}
+ .. note::
+ If an operation does not have MMRA metadata, it's treated as if
+ it has an empty list (``!{}``) of tags.
+ Note that it is not an error if a tag is not recognized by the
+ instruction it is applied to, or by the current target.
+ Such tags are simply ignored.
+ Both synchronizing operations and memory operations can have
+ zero or more tags attached to them using the ``!mmra`` syntax.
+ For the sake of readability in examples below,
+ we use a (non-functional) short syntax to represent MMMRA metadata:
+ .. code-block::
+ :caption: Short Syntax Example
+ store %ptr1 # foo:bar
+ store %ptr1 !mmra !{!"foo", !"bar"}
+ These two notations can be used in this document and are strictly
+ equivalent. However, only the second version is functional.
+compatibility
+ Two sets of tags are said to be *compatible* iff, for every unique
+ tag prefix P present in at least one set:
+ - the other set contains no tag with prefix P, or
+ - at least one tag with prefix P is common to both sets.
+ The above definition implies that an empty set is always compatible
+ with any other set. This is an important property as it ensures that
+ if a transform drops the metadata on an operation, it can never affect
+ correctness. In other words, the memory model cannot be relaxed further
+ by deleting metadata from instructions.
+.. _HappensBefore:
+The *happens-before* Relation
+==============================
+Compatibility checks can be used to opt out of the *happens-before* relation
+established between two instructions.
+Ordering
+ When two instructions' metadata are not compatible, any program order
+ between them are not in *happens-before*.
+ For example, consider two tags ``foo:bar`` and
+ ``foo:baz`` exposed by a target:
+ .. code-block::
+ A: store %ptr1 # foo:bar
+ B: store %ptr2 # foo:baz
+ X: store atomic release %ptr3 # foo:bar
+ In the above figure, ``A`` is compatible with ``X``, and hence ``A``
+ happens-before ``X``. But ``B`` is not compatible with
+ ``X``, and hence it is not happens-before ``X``.
+Synchronization
+ If an synchronizing operation has one or more tags, then whether it
+ participate in the ``seq_cst`` order with other operations is target
+ dependent.
+ .. code-block::
+ ; Depending on the semantics of foo:bar & foo:bux, this may not
+ ; synchronize with another sequence.
+ fence release # foo:bar
+ store atomic %ptr1 # foo:bux
+Examples
+--------
+.. code-block:: text
+ :caption: Example 1
+ A: store ptr addrspace(1) %ptr2 # sync-as:1 vulkan:nonprivate
+ B: store atomic release ptr addrspace(1) %ptr3 # sync-as:0 vulkan:nonprivate
+A and B are not ordered relative to each other
+(no *happens-before*) because their sets of tags are not compatible.
+Note that the ``sync-as`` value does not have to match the ``addrspace`` value.
+e.g. In Example 1, a store-release to a location in ``addrspace(1)`` wants to
+only synchronize with operations happening in ``addrspace(0)``.
+.. code-block:: text
+ :caption: Example 2
+ A: store ptr addrspace(1) %ptr2 # sync-as:1 vulkan:nonprivate
+ B: store atomic release ptr addrspace(1) %ptr3 # sync-as:1 vulkan:nonprivate
+The ordering of A and B is unaffected because their set of tags are
+compatible.
+Note that A and B may or may not be in *happens-before* due to other reasons.
+.. code-block:: text
+ :caption: Example 3
+ A: store ptr addrspace(1) %ptr2 # sync-as:1 vulkan:nonprivate
+ B: store atomic release ptr addrspace(1) %ptr3 # vulkan:nonprivate
+The ordering of A and B is unaffected because their set of tags are
+compatible.
+.. code-block:: text
+ :caption: Example 3
+ A: store ptr addrspace(1) %ptr2 # sync-as:1
+ B: store atomic release ptr addrspace(1) %ptr3 # sync-as:2
+A and B do not have to be ordered relative to each other
+(no *happens-before*) because their sets of tags are not compatible.
+Use-cases
+=========
+SPIRV ``NonPrivatePointer``
+---------------------------
+MMRAs can support the SPIRV capability
+``VulkanMemoryModel``, where synchronizing operations only affect
+memory operations that specify ``NonPrivatePointer`` semantics.
+The example below is generated from a SPIRV program using the
+following recipe:
+- Add ``vulkan:nonprivate`` to every synchronizing operation.
+- Add ``vulkan:nonprivate`` to every non-atomic memory operation
+ that is marked ``NonPrivatePointer``.
+- Add ``vulkan:private`` to tags of every non-atomic memory operation
+ that is not marked ``NonPrivatePointer``.
+.. code-block::
+ Thread T1:
+ A: store %ptr1 # vulkan:nonprivate
+ B: store %ptr2 # vulkan:private
+ X: store atomic release %ptr3 # vulkan:nonprivate
+ Thread T2:
+ Y: load atomic acquire %ptr3 # vulkan:nonprivate
+ C: load %ptr2 # vulkan:private
+ D: load %ptr1 # vulkan:nonprivate
+Compatibility ensures that operation ``A`` is ordered
+relative to ``X`` while operation ``D`` is ordered relative to ``Y``.
+If ``X`` synchronizes with ``Y``, then ``A`` happens-before ``D``.
+No such relation can be inferred about operations ``B`` and ``C``.
+.. note::
+ The `Vulkan Memory Model <https://registry.khronos.org/vulkan/specs/1.3-extensions/html/vkspec.html#memory-model-non-private>`_
+ considers all atomic operation non-private.
+ Whether ``vulkan:nonprivate`` would be specified on atomic operations is
+ an implementation detail, as an atomic operation is always ``nonprivate``.
+ The implementation may choose to be explicit and emit IR with
+ ``vulkan:nonprivate`` on every atomic operation, or it could choose to
+ only emit ``vulkan::private`` and assume ``vulkan:nonprivate``
+ by default.
+Operations marked with ``vulkan:private`` effectively opt out of the
+happens-before order in a SPIRV program since they are incompatible
+with every synchronizing operation. Note that SPIRV operations that
+are not marked ``NonPrivatePointer`` are not entirely private to the
+thread --- they are implicitly synchronized at the start or end of a
+thread by the Vulkan *system-synchronizes-with* relationship. This
+example assumes that the target-defined semantics of
+``vulkan:private`` correctly implements this property.
+This scheme is general enough to express the interoperability of SPIRV
+programs with other environments.
+.. code-block::
+ Thread T1:
+ A: store %ptr1 # vulkan:nonprivate
+ X: store atomic release %ptr2 # vulkan:nonprivate
+ Thread T2:
+ Y: load atomic acquire %ptr2 # foo:bar
+ B: load %ptr1
+In the above example, thread ``T1`` originates from a SPIRV program
+while thread ``T2`` originates from a non-SPIRV program. Whether ``X``
+can synchronize with ``Y`` is target defined. If ``X`` synchronizes
+with ``Y``, then ``A`` happens before ``B`` (because A/X and
+Y/B are compatible).
+Implementation Example
+~~~~~~~~~~~~~~~~~~~~~~
+Consider the implementation of SPIRV ``NonPrivatePointer`` on a target
+where all memory operations are cached, and the entire cache is
+flushed or invalidated at a ``release`` or ``acquire`` respectively. A
+possible scheme is that when translating a SPIRV program, memory
+operations marked ``NonPrivatePointer`` should not be cached, and the
+cache contents should not be touched during an ``acquire`` and
+``release`` operation.
+This could be implemented using the tags that share the ``vulkan:`` prefix,
+as follows:
+- For memory operations:
+ - Operations with ``vulkan:nonprivate`` should bypass the cache.
+ - Operations with ``vulkan:private`` should be cached.
+ - Operations that specify neither or both should conservatively
+ bypass the cache to ensure correctness.
+- For synchronizing operations:
+ - Operations with ``vulkan:nonprivate`` should not flush or
+ invalidate the cache.
+ - Operations with ``vulkan:private`` should flush or invalidate the cache.
+ - Operations that specify neither or both should conservatively
+ flush or invalidate the cache to ensure correctness.
+.. note::
+ In such an implementation, dropping the metadata on an operation, while
+ not affecting correctness, may have big performance implications.
+ e.g. an operation bypasses the cache when it shouldn't.
+Memory Types
+------------
+MMRAs may express the selective synchronization of
+different memory types.
+As an example, a target may expose an ``sync-as:<N>`` tag to
+pass information about which address spaces are synchronized by the
+execution of a synchronizing operation.
+.. note::
+ Address spaces are used here as a common example, but this concept isn't
+ can apply for other "memory types". What "memory types" means here is
+ up to the target.
+.. code-block::
+ # let 1 = global address space
+ # let 3 = local address space
+ Thread T1:
+ A: store %ptr1 # sync-as:1
+ B: store %ptr2 # sync-as:3
+ X: store atomic release ptr addrspace(0) %ptr3 # sync-as:3
+ Thread T2:
+ Y: load atomic acquire ptr addrspace(0) %ptr3 # sync-as:3
+ C: load %ptr2 # sync-as:3
+ D: load %ptr1 # sync-as:1
+In the above figure, ``X`` and ``Y`` are atomic operations on a
+location in the ``global`` address space. If ``X`` synchronizes with
+``Y``, then ``B`` happens-before ``C`` in the ``local`` address
+space. But no such statement can be made about operations ``A`` and
+``D``, although they are peformed on a location in the ``global``
+address space.
+Implementation Example: Adding Address Space Information to Fences
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Languages such as OpenCL C provide fence operations such as
+``atomic_work_item_fence`` that can take an explicit address
+space to fence.
+By default, LLVM has no means to carry that information in the IR, so
+the information is lost during lowering to LLVM IR. This means that
+targets such as AMDGPU have to conservatively emit instructions to
+fence all address spaces in all cases, which can have a noticeable
+performance impact in high-performance applications.
+MMRAs may be used to preserve that information at the IR level, all the
+way through code generation. For example, a fence that only affects the
+global address space ``addrspace(1)`` may be lowered as
+.. code-block::
+ fence release # sync-as:1
+and the target may use the presence of ``sync-as:1`` to infer that it
+must only emit instruction to fence the global address space.
+Note that as MMRAs are opt in, a fence that does not have MMRA metadata
+could still be lowered conservatively, so this optimization would only
+apply if the front-end emits the MMRA metadata on the fence instructions.
+Additional Topics
+=================
+.. note::
+ The following sections are informational.
+Performance Impact
+------------------
+MMRAs are a way to capture optimization opportunities in the program.
+But when an operation mentions no tags or conflicting tags,
+the target may need to produce conservative code to ensure correctness
+at the cost of performance. This can happen in the following situations:
+1. When a target first introduces MMRAs, the
+ frontend might not have been updated to emit them.
+2. An optimization may drop MMRA metadata.
+3. An optimization may add arbitrary tags to an operation.
+Note that targets can always choose to ignore (or even drop) MMRAs
+and revert to the default behavior/codegen heuristics without
+affecting correctness.
+Consequences of the Absence of *happens-before*
+-----------------------------------------------
+In the :ref:`happens-before<HappensBefore>` section, we defined how an
+*happens-before* relation between two instruction can be broken
+by leveraging compatibility between MMRAs. When the instructions
+are incompatible and there is no *happens-before* relation, we say
+that the instructions "do not have to be ordered relative to each
+other".
+"Ordering" in this context is a very broad term which covers both
+static and runtime aspects.
+When there is no ordering constraint, we *could* statically reorder
+the instructions in an optimizer transform if the reordering does
+not break other constraints as single location coherence.
+Static reordering is one consequence of breaking *happens-before*,
+but is not the most interesting one.
+Run-time consequences are more interesting. When there is an
+*happens-before* relation between instructions, the target has to emit
+synchronization code to ensure other threads will observe the effects of
+the instructions in the right order.
+For instance, the target may have to wait for previous loads & stores to
+finish before starting a fence-release, or there may be a need to flush a
+memory cache before executing the next instruction.
+In the absence of *happens-before*, there is no such requirement and
+no waiting or flushing is required. This may noticeably speed up
+execution in some cases.
+Combining Operations
+--------------------
+If a pass can combine multiple memory or synchronizing operations
+into one, then the metadata of the new instruction(s) shall be a
+prefix-wise union of the metadata of the source instructions.
+Let A and B be two tags set, and U be the prefix-wise union of A and B.
+For every unique tag prefix P present in A or B:
+* If either A or B has no tags with prefix P, no tags with prefix
+ P are added to U.
+* If both A and B have at least one tag with prefix P, only the tags
+ common to A and B are added to U.
+Examples:
+.. code-block::
+ A: store release %ptr1 # foo:x, foo:y, bar:x
+ B: store release %ptr2 # foo:x, bar:y
+ # Unique prefixes P = [foo, bar]
+ # "foo:x" is common to A and B so it's added to U.
+ # "bar:x" != "bar:y" so it's not added to U.
+ U: store release %ptr3 # foo:x
+.. code-block::
+ A: store release %ptr1 # foo:x, foo:y
+ B: store release %ptr2 # foo:x, bux:y
+ # Unique prefixes P = [foo, bux]
+ # "foo:x" is common to A and B so it's added to U.
+ # No tags have the prefix "bux" in A.
+ U: store release %ptr3 # foo:x
+.. code-block::
+ A: store release %ptr1
+ B: store release %ptr2 # foo:x, bar:y
+ # Unique prefixes P = [foo, bar]
+ # No tags with "foo" or "bar" in A, so no tags added.
+ U: store release %ptr3
diff --git a/llvm/docs/Reference.rst b/llvm/docs/Reference.rst
index 3a1d1665be439e..1661c8c533db1d 100644
--- a/llvm/docs/Reference.rst
+++ b/llvm/docs/Reference.rst
@@ -39,6 +39,7 @@ LLVM and API reference documentation.
PDB/index
PointerAuth
ScudoHardenedAllocator
+ MemoryModelRelaxationAnnotations
MemTagSanitizer
Security
SecurityTransparencyReports
@@ -194,6 +195,9 @@ Additional Topics
:doc:`ScudoHardenedAllocator`
A library that implements a security-hardened `malloc()`.
+:doc:`MemoryModelRelaxationAnnotations`
+ Target-defined relaxation to LLVM's concurrency model.
+
:doc:`MemTagSanitizer`
Security hardening for production code aiming to mitigate memory
related vulnerabilities. Based on the Armv8.5-A Memory Tagging Extension.
diff --git a/llvm/include/llvm/Analysis/VectorUtils.h b/llvm/include/llvm/Analysis/VectorUtils.h
index 7a92e62b53c53d..d1a16fcd7c5ebb 100644
--- a/llvm/include/llvm/Analysis/VectorUtils.h
+++ b/llvm/include/llvm/Analysis/VectorUtils.h
@@ -301,7 +301,7 @@ MDNode *intersectAccessGroups(const Instruction *Inst1,
const Instruction *Inst2);
/// Specifically, let Kinds = [MD_tbaa, MD_alias_scope, MD_noalias, MD_fpmath,
-/// MD_nontemporal, MD_access_group].
+/// MD_nontemporal, MD_access_group, MD_MMRA].
/// For K in Kinds, we get the MDNode for K from each of the
/// elements of VL, compute their "intersection" (i.e., the most generic
/// metadata value that covers all of the individual values), and set I's
diff --git a/llvm/include/llvm/CodeGen/GlobalISel/MachineIRBuilder.h b/llvm/include/llvm/CodeGen/GlobalISel/MachineIRBuilder.h
index 1387a0a37561c4..2dc23e86437656 100644
--- a/llvm/include/llvm/CodeGen/GlobalISel/MachineIRBuilder.h
+++ b/llvm/include/llvm/CodeGen/GlobalISel/MachineIRBuilder.h
@@ -52,6 +52,8 @@ struct MachineIRBuilderState {
DebugLoc DL;
/// PC sections metadata to be set to any instruction we create.
MDNode *PCSections = nullptr;
+ /// MMRA Metadata to be set on any instruction we create.
+ MDNode *MMRA = nullptr;
/// \name Fields describing the insertion point.
/// @{
@@ -353,6 +355,7 @@ class MachineIRBuilder {
setMBB(*MI.getParent());
State.II = MI.getIterator();
setPCSections(MI.getPCSections());
+ setMMRAMetadata(MI.getMMRAMetadata());
}
/// @}
@@ -383,9 +386,15 @@ class MachineIRBuilder {
/// Set the PC sections metadata to \p MD for all the next build instructions.
void setPCSections(MDNode *MD) { State.PCSections = MD; }
+ /// Set the PC sections metadata to \p MD for all the next build instructions.
+ void setMMRAMetadata(MDNode *MMRA) { State.MMRA = MMRA; }
+
/// Get the current instruction's PC sections metadata.
MDNode *getPCSections() { return State.PCSections; }
+ /// Get the current instruction's MMRA metadata.
+ MDNode *getMMRAMetadata() { return State.MMRA; }
+
/// Build and insert <empty> = \p Opcode <empty>.
/// The insertion point is the one s...
[truncated]
|
@llvm/pr-subscribers-llvm-globalisel Author: Pierre van Houtryve (Pierre-vh) ChangesWork-in-progress patch to implement the core/target-agnostic components of Memory Model Relaxation Annotations. This diff is mostly complete but likely has a few holes, especially in codegen (MMRAs aren't always carried over to MI layer I believe). Most of the work needs to be done in adding tests and analyzing the outputs to find what's missing. Patch is 98.45 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/78569.diff 35 Files Affected:
diff --git a/llvm/docs/MemoryModelRelaxationAnnotations.rst b/llvm/docs/MemoryModelRelaxationAnnotations.rst
new file mode 100644
index 000000000000000..3ec0ddaf289a881
--- /dev/null
+++ b/llvm/docs/MemoryModelRelaxationAnnotations.rst
@@ -0,0 +1,351 @@
+
+===================================
+Memory Model Relaxation Annotations
+===================================
+.. contents::
+ :local:
+Introduction
+============
+Memory Model Relaxation Annotations (MMRAs) are target-defined properties
+on instructions that can be used to selectively relax constraints placed
+by the memory model. For example:
+* The use of ``VulkanMemoryModel`` in a SPIRV program allows certain
+ memory operations to be reordered across ``acquire`` or ``release``
+ operations.
+* OpenCL APIs expose primitives to only fence a specific set of address
+ spaces, carrying that information to the backend can enable the
+ use of faster synchronization instructions, rather than fencing all
+ address spaces.
+MMRAs offer an opt-in system for targets to relax the default LLVM
+memory model.
+As such, they are attached to an operation using LLVM metadata which
+can always be dropped without affecting correctness.
+Definitions
+===========
+memory operation
+ A load, a store, an atomic, or a function call that is marked as
+ accessing memory.
+synchronizing operation
+ An instruction that synchronizes memory with other threads (e.g.
+ an atomic or a fence).
+tag
+ Metadata attached to a memory or synchronizing operation
+ that represents some target-defined property regarding memory
+ synchronization.
+ An operation may have multiple tags that each represent a different
+ property.
+ A tag is composed of a pair of metadata nodes:
+ * a *prefix* string.
+ * a *suffix* integer or string.
+ In LLVM IR, the pair is represented using a metadata tuple.
+ In other cases (comments, documentation, etc.), we may use the
+ ``prefix:suffix`` notation.
+ For example:
+ .. code-block::
+ :caption: Example: Tags in Metadata
+ !0 = !{!"scope", !"workgroup"} # scope:workgroup
+ !1 = !{!"scope", !"device"} # scope:device
+ !2 = !{!"scope", !"system"} # scope:system
+ !3 = !{!"sync-as", i32 2} # sync-as:2
+ !4 = !{!"sync-as", i32 1} # sync-as:1
+ !5 = !{!"sync-as", i32 0} # sync-as:0
+ .. note::
+ The only semantics relevant to the optimizer is the
+ "compatibility" relation defined below. All other
+ semantics are target defined.
+ Tags can also be organised in lists to allow operations
+ to specify all of the tags they belong to. Such a list
+ is referred to as a "set of tags".
+ .. code-block::
+ :caption: Example: Set of Tags in Metadata
+ !0 = !{!"scope", !"workgroup"}
+ !1 = !{!"sync-as", !"private"}
+ !2 = !{!0, !2}
+ .. note::
+ If an operation does not have MMRA metadata, it's treated as if
+ it has an empty list (``!{}``) of tags.
+ Note that it is not an error if a tag is not recognized by the
+ instruction it is applied to, or by the current target.
+ Such tags are simply ignored.
+ Both synchronizing operations and memory operations can have
+ zero or more tags attached to them using the ``!mmra`` syntax.
+ For the sake of readability in examples below,
+ we use a (non-functional) short syntax to represent MMMRA metadata:
+ .. code-block::
+ :caption: Short Syntax Example
+ store %ptr1 # foo:bar
+ store %ptr1 !mmra !{!"foo", !"bar"}
+ These two notations can be used in this document and are strictly
+ equivalent. However, only the second version is functional.
+compatibility
+ Two sets of tags are said to be *compatible* iff, for every unique
+ tag prefix P present in at least one set:
+ - the other set contains no tag with prefix P, or
+ - at least one tag with prefix P is common to both sets.
+ The above definition implies that an empty set is always compatible
+ with any other set. This is an important property as it ensures that
+ if a transform drops the metadata on an operation, it can never affect
+ correctness. In other words, the memory model cannot be relaxed further
+ by deleting metadata from instructions.
+.. _HappensBefore:
+The *happens-before* Relation
+==============================
+Compatibility checks can be used to opt out of the *happens-before* relation
+established between two instructions.
+Ordering
+ When two instructions' metadata are not compatible, any program order
+ between them are not in *happens-before*.
+ For example, consider two tags ``foo:bar`` and
+ ``foo:baz`` exposed by a target:
+ .. code-block::
+ A: store %ptr1 # foo:bar
+ B: store %ptr2 # foo:baz
+ X: store atomic release %ptr3 # foo:bar
+ In the above figure, ``A`` is compatible with ``X``, and hence ``A``
+ happens-before ``X``. But ``B`` is not compatible with
+ ``X``, and hence it is not happens-before ``X``.
+Synchronization
+ If an synchronizing operation has one or more tags, then whether it
+ participate in the ``seq_cst`` order with other operations is target
+ dependent.
+ .. code-block::
+ ; Depending on the semantics of foo:bar & foo:bux, this may not
+ ; synchronize with another sequence.
+ fence release # foo:bar
+ store atomic %ptr1 # foo:bux
+Examples
+--------
+.. code-block:: text
+ :caption: Example 1
+ A: store ptr addrspace(1) %ptr2 # sync-as:1 vulkan:nonprivate
+ B: store atomic release ptr addrspace(1) %ptr3 # sync-as:0 vulkan:nonprivate
+A and B are not ordered relative to each other
+(no *happens-before*) because their sets of tags are not compatible.
+Note that the ``sync-as`` value does not have to match the ``addrspace`` value.
+e.g. In Example 1, a store-release to a location in ``addrspace(1)`` wants to
+only synchronize with operations happening in ``addrspace(0)``.
+.. code-block:: text
+ :caption: Example 2
+ A: store ptr addrspace(1) %ptr2 # sync-as:1 vulkan:nonprivate
+ B: store atomic release ptr addrspace(1) %ptr3 # sync-as:1 vulkan:nonprivate
+The ordering of A and B is unaffected because their set of tags are
+compatible.
+Note that A and B may or may not be in *happens-before* due to other reasons.
+.. code-block:: text
+ :caption: Example 3
+ A: store ptr addrspace(1) %ptr2 # sync-as:1 vulkan:nonprivate
+ B: store atomic release ptr addrspace(1) %ptr3 # vulkan:nonprivate
+The ordering of A and B is unaffected because their set of tags are
+compatible.
+.. code-block:: text
+ :caption: Example 3
+ A: store ptr addrspace(1) %ptr2 # sync-as:1
+ B: store atomic release ptr addrspace(1) %ptr3 # sync-as:2
+A and B do not have to be ordered relative to each other
+(no *happens-before*) because their sets of tags are not compatible.
+Use-cases
+=========
+SPIRV ``NonPrivatePointer``
+---------------------------
+MMRAs can support the SPIRV capability
+``VulkanMemoryModel``, where synchronizing operations only affect
+memory operations that specify ``NonPrivatePointer`` semantics.
+The example below is generated from a SPIRV program using the
+following recipe:
+- Add ``vulkan:nonprivate`` to every synchronizing operation.
+- Add ``vulkan:nonprivate`` to every non-atomic memory operation
+ that is marked ``NonPrivatePointer``.
+- Add ``vulkan:private`` to tags of every non-atomic memory operation
+ that is not marked ``NonPrivatePointer``.
+.. code-block::
+ Thread T1:
+ A: store %ptr1 # vulkan:nonprivate
+ B: store %ptr2 # vulkan:private
+ X: store atomic release %ptr3 # vulkan:nonprivate
+ Thread T2:
+ Y: load atomic acquire %ptr3 # vulkan:nonprivate
+ C: load %ptr2 # vulkan:private
+ D: load %ptr1 # vulkan:nonprivate
+Compatibility ensures that operation ``A`` is ordered
+relative to ``X`` while operation ``D`` is ordered relative to ``Y``.
+If ``X`` synchronizes with ``Y``, then ``A`` happens-before ``D``.
+No such relation can be inferred about operations ``B`` and ``C``.
+.. note::
+ The `Vulkan Memory Model <https://registry.khronos.org/vulkan/specs/1.3-extensions/html/vkspec.html#memory-model-non-private>`_
+ considers all atomic operation non-private.
+ Whether ``vulkan:nonprivate`` would be specified on atomic operations is
+ an implementation detail, as an atomic operation is always ``nonprivate``.
+ The implementation may choose to be explicit and emit IR with
+ ``vulkan:nonprivate`` on every atomic operation, or it could choose to
+ only emit ``vulkan::private`` and assume ``vulkan:nonprivate``
+ by default.
+Operations marked with ``vulkan:private`` effectively opt out of the
+happens-before order in a SPIRV program since they are incompatible
+with every synchronizing operation. Note that SPIRV operations that
+are not marked ``NonPrivatePointer`` are not entirely private to the
+thread --- they are implicitly synchronized at the start or end of a
+thread by the Vulkan *system-synchronizes-with* relationship. This
+example assumes that the target-defined semantics of
+``vulkan:private`` correctly implements this property.
+This scheme is general enough to express the interoperability of SPIRV
+programs with other environments.
+.. code-block::
+ Thread T1:
+ A: store %ptr1 # vulkan:nonprivate
+ X: store atomic release %ptr2 # vulkan:nonprivate
+ Thread T2:
+ Y: load atomic acquire %ptr2 # foo:bar
+ B: load %ptr1
+In the above example, thread ``T1`` originates from a SPIRV program
+while thread ``T2`` originates from a non-SPIRV program. Whether ``X``
+can synchronize with ``Y`` is target defined. If ``X`` synchronizes
+with ``Y``, then ``A`` happens before ``B`` (because A/X and
+Y/B are compatible).
+Implementation Example
+~~~~~~~~~~~~~~~~~~~~~~
+Consider the implementation of SPIRV ``NonPrivatePointer`` on a target
+where all memory operations are cached, and the entire cache is
+flushed or invalidated at a ``release`` or ``acquire`` respectively. A
+possible scheme is that when translating a SPIRV program, memory
+operations marked ``NonPrivatePointer`` should not be cached, and the
+cache contents should not be touched during an ``acquire`` and
+``release`` operation.
+This could be implemented using the tags that share the ``vulkan:`` prefix,
+as follows:
+- For memory operations:
+ - Operations with ``vulkan:nonprivate`` should bypass the cache.
+ - Operations with ``vulkan:private`` should be cached.
+ - Operations that specify neither or both should conservatively
+ bypass the cache to ensure correctness.
+- For synchronizing operations:
+ - Operations with ``vulkan:nonprivate`` should not flush or
+ invalidate the cache.
+ - Operations with ``vulkan:private`` should flush or invalidate the cache.
+ - Operations that specify neither or both should conservatively
+ flush or invalidate the cache to ensure correctness.
+.. note::
+ In such an implementation, dropping the metadata on an operation, while
+ not affecting correctness, may have big performance implications.
+ e.g. an operation bypasses the cache when it shouldn't.
+Memory Types
+------------
+MMRAs may express the selective synchronization of
+different memory types.
+As an example, a target may expose an ``sync-as:<N>`` tag to
+pass information about which address spaces are synchronized by the
+execution of a synchronizing operation.
+.. note::
+ Address spaces are used here as a common example, but this concept isn't
+ can apply for other "memory types". What "memory types" means here is
+ up to the target.
+.. code-block::
+ # let 1 = global address space
+ # let 3 = local address space
+ Thread T1:
+ A: store %ptr1 # sync-as:1
+ B: store %ptr2 # sync-as:3
+ X: store atomic release ptr addrspace(0) %ptr3 # sync-as:3
+ Thread T2:
+ Y: load atomic acquire ptr addrspace(0) %ptr3 # sync-as:3
+ C: load %ptr2 # sync-as:3
+ D: load %ptr1 # sync-as:1
+In the above figure, ``X`` and ``Y`` are atomic operations on a
+location in the ``global`` address space. If ``X`` synchronizes with
+``Y``, then ``B`` happens-before ``C`` in the ``local`` address
+space. But no such statement can be made about operations ``A`` and
+``D``, although they are peformed on a location in the ``global``
+address space.
+Implementation Example: Adding Address Space Information to Fences
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Languages such as OpenCL C provide fence operations such as
+``atomic_work_item_fence`` that can take an explicit address
+space to fence.
+By default, LLVM has no means to carry that information in the IR, so
+the information is lost during lowering to LLVM IR. This means that
+targets such as AMDGPU have to conservatively emit instructions to
+fence all address spaces in all cases, which can have a noticeable
+performance impact in high-performance applications.
+MMRAs may be used to preserve that information at the IR level, all the
+way through code generation. For example, a fence that only affects the
+global address space ``addrspace(1)`` may be lowered as
+.. code-block::
+ fence release # sync-as:1
+and the target may use the presence of ``sync-as:1`` to infer that it
+must only emit instruction to fence the global address space.
+Note that as MMRAs are opt in, a fence that does not have MMRA metadata
+could still be lowered conservatively, so this optimization would only
+apply if the front-end emits the MMRA metadata on the fence instructions.
+Additional Topics
+=================
+.. note::
+ The following sections are informational.
+Performance Impact
+------------------
+MMRAs are a way to capture optimization opportunities in the program.
+But when an operation mentions no tags or conflicting tags,
+the target may need to produce conservative code to ensure correctness
+at the cost of performance. This can happen in the following situations:
+1. When a target first introduces MMRAs, the
+ frontend might not have been updated to emit them.
+2. An optimization may drop MMRA metadata.
+3. An optimization may add arbitrary tags to an operation.
+Note that targets can always choose to ignore (or even drop) MMRAs
+and revert to the default behavior/codegen heuristics without
+affecting correctness.
+Consequences of the Absence of *happens-before*
+-----------------------------------------------
+In the :ref:`happens-before<HappensBefore>` section, we defined how an
+*happens-before* relation between two instruction can be broken
+by leveraging compatibility between MMRAs. When the instructions
+are incompatible and there is no *happens-before* relation, we say
+that the instructions "do not have to be ordered relative to each
+other".
+"Ordering" in this context is a very broad term which covers both
+static and runtime aspects.
+When there is no ordering constraint, we *could* statically reorder
+the instructions in an optimizer transform if the reordering does
+not break other constraints as single location coherence.
+Static reordering is one consequence of breaking *happens-before*,
+but is not the most interesting one.
+Run-time consequences are more interesting. When there is an
+*happens-before* relation between instructions, the target has to emit
+synchronization code to ensure other threads will observe the effects of
+the instructions in the right order.
+For instance, the target may have to wait for previous loads & stores to
+finish before starting a fence-release, or there may be a need to flush a
+memory cache before executing the next instruction.
+In the absence of *happens-before*, there is no such requirement and
+no waiting or flushing is required. This may noticeably speed up
+execution in some cases.
+Combining Operations
+--------------------
+If a pass can combine multiple memory or synchronizing operations
+into one, then the metadata of the new instruction(s) shall be a
+prefix-wise union of the metadata of the source instructions.
+Let A and B be two tags set, and U be the prefix-wise union of A and B.
+For every unique tag prefix P present in A or B:
+* If either A or B has no tags with prefix P, no tags with prefix
+ P are added to U.
+* If both A and B have at least one tag with prefix P, only the tags
+ common to A and B are added to U.
+Examples:
+.. code-block::
+ A: store release %ptr1 # foo:x, foo:y, bar:x
+ B: store release %ptr2 # foo:x, bar:y
+ # Unique prefixes P = [foo, bar]
+ # "foo:x" is common to A and B so it's added to U.
+ # "bar:x" != "bar:y" so it's not added to U.
+ U: store release %ptr3 # foo:x
+.. code-block::
+ A: store release %ptr1 # foo:x, foo:y
+ B: store release %ptr2 # foo:x, bux:y
+ # Unique prefixes P = [foo, bux]
+ # "foo:x" is common to A and B so it's added to U.
+ # No tags have the prefix "bux" in A.
+ U: store release %ptr3 # foo:x
+.. code-block::
+ A: store release %ptr1
+ B: store release %ptr2 # foo:x, bar:y
+ # Unique prefixes P = [foo, bar]
+ # No tags with "foo" or "bar" in A, so no tags added.
+ U: store release %ptr3
diff --git a/llvm/docs/Reference.rst b/llvm/docs/Reference.rst
index 3a1d1665be439e2..1661c8c533db1d2 100644
--- a/llvm/docs/Reference.rst
+++ b/llvm/docs/Reference.rst
@@ -39,6 +39,7 @@ LLVM and API reference documentation.
PDB/index
PointerAuth
ScudoHardenedAllocator
+ MemoryModelRelaxationAnnotations
MemTagSanitizer
Security
SecurityTransparencyReports
@@ -194,6 +195,9 @@ Additional Topics
:doc:`ScudoHardenedAllocator`
A library that implements a security-hardened `malloc()`.
+:doc:`MemoryModelRelaxationAnnotations`
+ Target-defined relaxation to LLVM's concurrency model.
+
:doc:`MemTagSanitizer`
Security hardening for production code aiming to mitigate memory
related vulnerabilities. Based on the Armv8.5-A Memory Tagging Extension.
diff --git a/llvm/include/llvm/Analysis/VectorUtils.h b/llvm/include/llvm/Analysis/VectorUtils.h
index 7a92e62b53c53db..d1a16fcd7c5ebb6 100644
--- a/llvm/include/llvm/Analysis/VectorUtils.h
+++ b/llvm/include/llvm/Analysis/VectorUtils.h
@@ -301,7 +301,7 @@ MDNode *intersectAccessGroups(const Instruction *Inst1,
const Instruction *Inst2);
/// Specifically, let Kinds = [MD_tbaa, MD_alias_scope, MD_noalias, MD_fpmath,
-/// MD_nontemporal, MD_access_group].
+/// MD_nontemporal, MD_access_group, MD_MMRA].
/// For K in Kinds, we get the MDNode for K from each of the
/// elements of VL, compute their "intersection" (i.e., the most generic
/// metadata value that covers all of the individual values), and set I's
diff --git a/llvm/include/llvm/CodeGen/GlobalISel/MachineIRBuilder.h b/llvm/include/llvm/CodeGen/GlobalISel/MachineIRBuilder.h
index 1387a0a37561c4b..2dc23e864376561 100644
--- a/llvm/include/llvm/CodeGen/GlobalISel/MachineIRBuilder.h
+++ b/llvm/include/llvm/CodeGen/GlobalISel/MachineIRBuilder.h
@@ -52,6 +52,8 @@ struct MachineIRBuilderState {
DebugLoc DL;
/// PC sections metadata to be set to any instruction we create.
MDNode *PCSections = nullptr;
+ /// MMRA Metadata to be set on any instruction we create.
+ MDNode *MMRA = nullptr;
/// \name Fields describing the insertion point.
/// @{
@@ -353,6 +355,7 @@ class MachineIRBuilder {
setMBB(*MI.getParent());
State.II = MI.getIterator();
setPCSections(MI.getPCSections());
+ setMMRAMetadata(MI.getMMRAMetadata());
}
/// @}
@@ -383,9 +386,15 @@ class MachineIRBuilder {
/// Set the PC sections metadata to \p MD for all the next build instructions.
void setPCSections(MDNode *MD) { State.PCSections = MD; }
+ /// Set the PC sections metadata to \p MD for all the next build instructions.
+ void setMMRAMetadata(MDNode *MMRA) { State.MMRA = MMRA; }
+
/// Get the current instruction's PC sections metadata.
MDNode *getPCSections() { return State.PCSections; }
+ /// Get the current instruction's MMRA metadata.
+ MDNode *getMMRAMetadata() { return State.MMRA; }
+
/// Build and insert <empty> = \p Opcode <empty>.
/// The insertion point is t...
[truncated]
|
I've added some reviewers that are code owners of various parts that this change affects. Please feel free to add anyone I missed. As for the state of this PR, it's not exactly land-able as -is, but it's close. It needs some bugfixes and additional tests, but I want to do a first round of reviews for two purposes:
Finally, note that the changes to optimizations here are minimal, they're just to make sure the metadata isn't dropped too much at this time. The changes aren't focused on enabling additional optimizations or performance gains, but I posted an example in the RFC of what such a change could be. I plan to make a completely separate PR for those once this lands, so it can get proper attention. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there patches available for preserving MMRAs on LLVM IR? Some metadata is preserved by specific optimizations, but I recall there are some common utility functions where MMRAs need to be included for preservation.
@@ -248,15 +260,19 @@ class MachineInstr | |||
size_t numTrailingObjects(OverloadToken<uint32_t>) const { | |||
return HasCFIType; | |||
} | |||
size_t numTrailingObjects(OverloadToken<MMRAMetadata>) const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really grok how trailing objects are implemented, but this smells wrong to me. Shouldn't this function be folded into the overload for MDNode* just above? What makes this metadata different from the other metadata?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was just a leftover from the previous iteration of the design where I stored the MMRAMetadata object directly
@@ -25,6 +25,7 @@ | |||
#include "llvm/CodeGen/TargetOpcodes.h" | |||
#include "llvm/IR/DebugLoc.h" | |||
#include "llvm/IR/InlineAsm.h" | |||
#include "llvm/IR/MemoryModelRelaxationAnnotations.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this and other headers, do we really need to include this header?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nope, another leftover of previous design iterations :(
A tag is composed of a pair of metadata nodes: | ||
|
||
* a *prefix* string. | ||
* a *suffix* integer or string. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just remembered ints aren't supported yet in the implementation.
I'm also doubting they'll be useful at all, and we can always store them as string which avoids questions like "which integer type to use to represent them".
Is it ok to just remove it from the spec? @ssahasra
✅ With the latest revision this PR passed the C/C++ code formatter. |
llvm/lib/IR/Instruction.cpp
Outdated
// all callers and make them check MMRAs. | ||
// OTOH, MMRAs can really alter semantics so this is technically correct | ||
// (the best kind of correct). | ||
if (MMRAMetadata(*this) != MMRAMetadata(*I2)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is something I'm unsure about. It doesn't feel like the right place to put this, but it kind of makes sense to put here given how MMRAs work?
Maybe it should check for compatibility instead? (@sameerds)
This is what prevents SimplifyCFG from merging common code. GVN does it though because it doesn't seem to use this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function must indeed not check metadata. Instead it will get resolved by metadata merging (implemented in Local.cpp).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nikic understood, so I need to teach each pass individually to check MMRAs and compare them? (= look at callers of this function and update them on a case-by-case basis)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, you should not need to do anything special. If instructions with different MMRAs get merged, then they will be updated correctly based on whatever merging you implemented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merging MMRAs can be very bad for performance. For example, on AMDGPU, vulkan:private is supposed to be cached and vulkan:nonprivate is supposed to bypass the cache. If a merge results in both tags, then the backend chooses correctness over performance, which means the resulting operation is not cached. So it seems best if an optimization only merges MMRAs when it understand the impact. To continue the Vulkan example, If two loads are being coalesced into a wider load, and one of them is private while the other is nonprivate, then it's best not to coalesce them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- (+) Even if we want to avoid combining MMRAs, it's still a well-defined process. We just want to avoid doing it aggressively (in this case, we want to avoid doing it as part of the IR canonicalization passes such as SimplifyCFG, IC, ...)
In particular, whether an optimization should examine MMRAs is independent of whether it is metadata. Even if MMRAs were an additional operand on the load/store instructions, we still don't want optimizations to merge them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way you are approaching this, you can't really drop MMRAs. Like, yes, it's semantically correct to do, but it sounds like every time it would happen in a generic transform, you would actually treat that like a bug and slap an extra MMRA check in that transform. You are actively fighting the metadata model at every step here.
I'm not sure to what degree all these checks you have introduced are actually necessary, as opposed to you being overly conservative. For example, is if (x) { load p, !mmra1 } else { load p, !mmra2 }
really something that you expect to happen a lot in practice? But if these are necessary, then I don't think this is the right mechanism.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of these checks are necessary, for instance one of the use case we tested basically has (in pseudocode):
if (A)
fence release, !mmra !0
else if (B)
fence release, !mmra !1
else
fence release, !mmra !2
And SimplifyCFG would just remove all CF and leave fence release
, undoing all the work. Such cases would arise in OpenCL code whenever one or two address spaces are being fenced instead of all of them.
I'm really hesitant about making this an instruction operand. I really don't mind doing the work if it makes a better end result, but I'm not yet convinced it's strictly better.
This is unfortunately right in the middle between MD and operand so I guess which one should be used depends on the point of view. IMO it's better to make it metadata because this has much more in common with other MD nodes than with something like syncscope (which can never be dropped/merged/etc.)
Also we wouldn't treat every instance of dropped metadata like a bug, it's more nuanced. Dropping MMRAs doesn't affect correctness, it just has a performance cost of X, but if an optimization being disabled has a performance cost of Y and Y > X, then the optimization must take priority and be allowed to run and drop MMRAs anyway.
If the amount of pass changes is going to be a blocker I'm open to taking a less conservative approach and just leave the Transform changes that I can prove actively hurt MMRAs (so if I have real-world evidence that that pass caused problems). IIRC that'd only leave SimplifyCFG but I'd need to run more tests. I can then make more changes over time if needed. Would that be better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the amount of pass changes is going to be a blocker I'm open to taking a less conservative approach and just leave the Transform changes that I can prove actively hurt MMRAs (so if I have real-world evidence that that pass caused problems). IIRC that'd only leave SimplifyCFG but I'd need to run more tests. I can then make more changes over time if needed. Would that be better?
Yes, that would be better. I think your fence example for SimplifyCFG is compelling, but I'm still doubtful about other changes you've made (esp. when it comes to CSE-like transforms).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the amount of pass changes is going to be a blocker I'm open to taking a less conservative approach and just leave the Transform changes that I can prove actively hurt MMRAs (so if I have real-world evidence that that pass caused problems). IIRC that'd only leave SimplifyCFG but I'd need to run more tests. I can then make more changes over time if needed. Would that be better?
I agree.
; RUN: opt -passes=simplifycfg -simplifycfg-require-and-preserve-domtree=1 -S %s | FileCheck %s | ||
|
||
; TODO: Should this work? | ||
define i32 @test1(i1 zeroext %flag, i32 %x, ptr %y) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This (IMO rightfully) doesn't work. I remember making this change because otherwise, almost all MMRAs ended up dropped because SimplifyCFG (AFAIK) canonicalizes the CFG so it kicks in a ton.
Thoughts @sameerds ?
Note I also fixed AtomicExpand, so now there is (seemingly) no more MMRAs dropped during codegen! I believe the best thing to do to preserve MMRA is to require a strict equality, instead of combining. At least for common operations like hoisting/simplifyCFG. However this may be a heavy price to pay because those are core optimizations and inhibiting them too much could eventually be problematic. |
Ping :) |
Ping |
For some reason I can't apply this patch with |
class MMRAMetadata { | ||
public: | ||
using TagT = std::pair<std::string, std::string>; | ||
using SetT = std::set<TagT>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am trying to understand this from the programmer's manual. It seems to me that since the components of the tag are strings owned by Metadata, they can be StringRef instead of std::string? Also, the manual kinda says that the std::set is the not the best choice for anything. So maybe DenseSet is a better choice? Especially since a pair of StringRef might satisfy the property "DenseSet is a great way to unique small values that are not simple pointers."
In particular, it seems to me that most uses of this helper class are on the stack, so any set that is efficient on the stack is more suited than std::set.
llvm/lib/Analysis/VectorUtils.cpp
Outdated
switch (Kind) { | ||
case LLVMContext::MD_mmra: { | ||
auto Tags = MMRAMetadata(dyn_cast_or_null<MDTuple>(MD)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be combined into a single static function that takes to MDNode arguments and returns a new MDNode? I get the impression that the MMRAMetadata helper class can be moved to the cpp file if all exposes APIs use only MDNode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get the impression that the MMRAMetadata helper class can be moved to the cpp file if all exposes APIs use only MDNode.
No, it's the API used by emitters to create MMRAs (e.g. Clang) and by consumers to parse it (e.g. SIMemoryLegalizer)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should mention in release notes?
I refactored the API a bit to remove the "builder" aspect. It's not really needed and just adds an unnecessary constraint on the class (which needed a |
if (Tags.empty()) | ||
return nullptr; | ||
|
||
if (Tags.size() == 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The special case of having a single tag seems like an optimization. Is it okay if MMRAs are always implemented as MDTuple? That should eliminate having to handle this special case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean rejecting something like !mmra !{!"foo", !"bar"}
and always requiring !{!{!"foo", !"bar"}}
instead?
I'm not sure its good to remove an already implemented small optimization like this just to get rid of a couple of ifs, are there other benefits to always requiring a tuple around tags?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I didn't realize that a tuple with a single pair will look that bad in the text form. Then the special case seems worth it. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that LLVM would never output it like that, instead we'd get something like:
%0 = load ..., !mmra !1
!0 = !{!"foo", !"bar"}
!1 = !{!0}
When we currently allow using !0
directly
|
||
StringMap<bool> PrefixStatuses; | ||
for (const auto &[P, S] : Tags) | ||
PrefixStatuses[P] |= (Other.hasTag(P, S) || !Other.hasTagWithPrefix(P)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PrefixStatuses[P] |= (Other.hasTag(P, S) || !Other.hasTagWithPrefix(P)); | |
if (!Other.hasTag(P, S) && Other.hasTagWithPrefix(P)) return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for the piecemeal reviews, but I am converging, I promise!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't work, imagine the following:
- A:
foo:bar foo:bux
- B:
foo:bar
Starting with A's tags:
foo:bar
passesfoo:bux
triggers the earlyreturn false
despite the sets being compatible because:Other.hasTag(P, S)
returns false,B
does not havefoo:bux
Other.hasTagWithPrefix(P)
returns true,B
hasfoo:bar
I have a unit test that covers it and it fails with the suggested change
Ping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Marked some typos in the spec. Please do go over the text once. Also a couple of smallish comments in the spec. The implementation looks good to me.
Work-in-progress patch to implement the core/target-agnostic components of Memory Model Relaxation Annotations. This diff is mostly complete but likely has a few holes, especially in codegen (MMRAs aren't always carried over to MI layer I believe). Most of the work needs to be done in adding tests and analyzing the outputs to find what's missing.
It's useful for clang codegen.
I did another pass over the docs + looked at HTML render, it looks good after some small changes |
No concerns from an IR perspective, but can't comment on the feature itself. |
@Pierre-vh Maybe you could ping the RFC and wait till end of Monday to submit? It's a rather safe design that is not triggered if there is no MMRA metadata. I don't think any fundamental issues will crop up. And we can always improve the feature if they do. |
This change appears to have broken several buildbots: With the error:
The bot that builds and uses lld also catches this failure:
which does not appear to be caused by the other change in that changeset. Please investigate and fix this or revert your change if it will keep the bots red for 24 hours. Let me know if there is anyway I can assist, thank you. |
The |
Ah I see you fixed the unused variable shortly after with 806db47, thank you, unfortunately that allowed the second error: in |
No worries, thank you for looking into this, best of luck. |
#78569 did not implement this correctly and an edge case breaks it by triggering `Assertion `!Leafs.empty()' failed.` Fixes SWDEV-507698
Implements the core/target-agnostic components of Memory Model Relaxation Annotations.
RFC: https://discourse.llvm.org/t/rfc-mmras-memory-model-relaxation-annotations/76361/5