[AMDGPU][Attributor] Skip update and manifest if an AA is at its initial state #114726

shiltian · 2024-11-04T00:36:10Z

There are four states to consider: valid, invalid, best, and worst. An AA typically begins in the best state and transitions toward a "worse" state until reaching the worst state. For most AAs, the invalid state is the worst state, but this is not always true, especially for the IntegerRangeState, which we use extensively here.

In the case of IntegerRangeState, the worst state is considered invalid only if Known is set to the worst value (~0U by default). Once Known is modified, there is no longer invalid state, meaning that manifest will always run, and intermediate values are likely to be added. This is why I avoided updating Known in #113018 and all related PRs in this stack unless the changes were definitive (i.e., all call sites were seen).

The situation becomes more complex because the best state does not actually represent a valid pair of values in terms of the attribute. However, it must still be considered valid so that updates can occur, allowing Assumed to be updated accordingly.

Complications arise further because this AA relies on another AA that uses range state. Similarly, the best state of the dependent AA is not a valid pair of values either. If the dependent AA remains in its best (or initial, as introduced in this PR) state, we must skip the update, as its values are effectively nonsensical.

shiltian · 2024-11-04T00:36:17Z

This stack of pull requests is managed by Graphite. Learn more about stacking.

llvmbot · 2024-11-04T00:36:44Z

@llvm/pr-subscribers-backend-amdgpu

Author: Shilei Tian (shiltian)

Changes

Patch is 31.50 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/114726.diff

7 Files Affected:

(modified) llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp (+87-18)
(modified) llvm/test/CodeGen/AMDGPU/annotate-kernel-features-hsa-call.ll (+22-24)
(modified) llvm/test/CodeGen/AMDGPU/attributor-loop-issue-58639.ll (+1-2)
(modified) llvm/test/CodeGen/AMDGPU/direct-indirect-call.ll (+1-2)
(modified) llvm/test/CodeGen/AMDGPU/remove-no-kernel-id-attribute.ll (+5-6)
(modified) llvm/test/CodeGen/AMDGPU/uniform-work-group-multistep.ll (+1-2)
(modified) llvm/test/CodeGen/AMDGPU/uniform-work-group-recursion-test.ll (+1-1)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp b/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
index 7d51412730d4d5..85500b95eec1a4 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
@@ -740,6 +740,16 @@ struct AAAMDSizeRangeAttribute
       if (!CallerInfo || !CallerInfo->isValidState())
         return false;
 
+      /// When the caller AA is in its initial state, the state remains valid
+      /// but awaits propagation. We skip processing in this case. Note that we
+      /// must return true since the state is still considered valid.
+      if (CallerInfo->isAtInitialState()) {
+        LLVM_DEBUG(dbgs() << '[' << getName() << "] Caller "
+                          << Caller->getName()
+                          << " is still at initial state. Skip the update.\n");
+        return true;
+      }
+
       Change |=
           clampStateAndIndicateChange(this->getState(), CallerInfo->getState());
 
@@ -784,6 +794,15 @@ struct AAAMDSizeRangeAttribute
                            /*ForceReplace=*/true);
   }
 
+  /// The initial state of `IntegerRangeState` represents an empty set, which
+  /// does not constitute a valid range. This empty state complicates
+  /// propagation, particularly for arithmetic operations like
+  /// `getAssumed().getUpper() - 1`. Therefore, it is recommended to skip the
+  /// initial state during processing.
+  bool isAtInitialState() const {
+    return isValidState() && getAssumed().isEmptySet();
+  }
+
   const std::string getAsStr(Attributor *) const override {
     std::string Str;
     raw_string_ostream OS(Str);
@@ -840,6 +859,11 @@ struct AAAMDFlatWorkGroupSize : public AAAMDSizeRangeAttribute {
                                                    Attributor &A);
 
   ChangeStatus manifest(Attributor &A) override {
+    if (isAtInitialState()) {
+      LLVM_DEBUG(dbgs() << '[' << getName()
+                        << "] Still at initial state. No manifest.\n";);
+      return ChangeStatus::UNCHANGED;
+    }
     Function *F = getAssociatedFunction();
     auto &InfoCache = static_cast<AMDGPUInformationCache &>(A.getInfoCache());
     return emitAttributeIfNotDefaultAfterClamp(
@@ -927,31 +951,71 @@ struct AAAMDWavesPerEU : public AAAMDSizeRangeAttribute {
     auto &InfoCache = static_cast<AMDGPUInformationCache &>(A.getInfoCache());
     ChangeStatus Change = ChangeStatus::UNCHANGED;
 
+    Function *F = getAssociatedFunction();
+
+    const auto *AAFlatWorkGroupSize = A.getAAFor<AAAMDFlatWorkGroupSize>(
+        *this, IRPosition::function(*F), DepClassTy::REQUIRED);
+    if (!AAFlatWorkGroupSize || !AAFlatWorkGroupSize->isValidState()) {
+      LLVM_DEBUG(
+          dbgs() << '[' << getName()
+                 << "] AAAMDFlatWorkGroupSize is unavailable or invalid.\n");
+      return ChangeStatus::UNCHANGED;
+    }
+
+    if (AAFlatWorkGroupSize->isAtInitialState()) {
+      LLVM_DEBUG(dbgs() << '[' << getName()
+                        << "] AAAMDFlatWorkGroupSize is still at initial "
+                           "state. Skip the update.\n");
+      return ChangeStatus::UNCHANGED;
+    }
+
+    auto CurrentWorkGroupSize = std::make_pair(
+        AAFlatWorkGroupSize->getAssumed().getLower().getZExtValue(),
+        AAFlatWorkGroupSize->getAssumed().getUpper().getZExtValue() - 1);
+
+    auto DoUpdate = [&](std::pair<unsigned, unsigned> WavesPerEU,
+                        std::pair<unsigned, unsigned> FlatWorkGroupSize) {
+      auto [Min, Max] =
+          InfoCache.getEffectiveWavesPerEU(*F, WavesPerEU, FlatWorkGroupSize);
+      ConstantRange CR(APInt(32, Min), APInt(32, Max + 1));
+      IntegerRangeState IRS(CR);
+      Change |= clampStateAndIndicateChange(this->getState(), IRS);
+    };
+
+    // // We need to clamp once if we are not at initial state, because
+    // // AAAMDFlatWorkGroupSize could be updated in last iteration.
+    if (!isAtInitialState()) {
+      auto CurrentWavesPerEU =
+          std::make_pair(getAssumed().getLower().getZExtValue(),
+                         getAssumed().getUpper().getZExtValue() - 1);
+      DoUpdate(CurrentWavesPerEU, CurrentWorkGroupSize);
+    }
+
     auto CheckCallSite = [&](AbstractCallSite CS) {
       Function *Caller = CS.getInstruction()->getFunction();
-      Function *Func = getAssociatedFunction();
+
       LLVM_DEBUG(dbgs() << '[' << getName() << "] Call " << Caller->getName()
-                        << "->" << Func->getName() << '\n');
+                        << "->" << F->getName() << '\n');
 
-      const auto *CallerInfo = A.getAAFor<AAAMDWavesPerEU>(
+      const auto *AAWavesPerEU = A.getAAFor<AAAMDWavesPerEU>(
           *this, IRPosition::function(*Caller), DepClassTy::REQUIRED);
-      const auto *AssumedGroupSize = A.getAAFor<AAAMDFlatWorkGroupSize>(
-          *this, IRPosition::function(*Func), DepClassTy::REQUIRED);
-      if (!CallerInfo || !AssumedGroupSize || !CallerInfo->isValidState() ||
-          !AssumedGroupSize->isValidState())
+      if (!AAWavesPerEU || !AAWavesPerEU->isValidState()) {
+        LLVM_DEBUG(dbgs() << '[' << getName() << "] Caller "
+                          << Caller->getName()
+                          << " is unavailable or invalid.\n");
         return false;
+      }
+      if (AAWavesPerEU->isAtInitialState()) {
+        LLVM_DEBUG(dbgs() << '[' << getName() << "] Caller "
+                          << Caller->getName()
+                          << " is still at initial state. Skip the update.\n");
+        return true;
+      }
 
-      unsigned Min, Max;
-      std::tie(Min, Max) = InfoCache.getEffectiveWavesPerEU(
-          *Caller,
-          {CallerInfo->getAssumed().getLower().getZExtValue(),
-           CallerInfo->getAssumed().getUpper().getZExtValue() - 1},
-          {AssumedGroupSize->getAssumed().getLower().getZExtValue(),
-           AssumedGroupSize->getAssumed().getUpper().getZExtValue() - 1});
-      ConstantRange CallerRange(APInt(32, Min), APInt(32, Max + 1));
-      IntegerRangeState CallerRangeState(CallerRange);
-      Change |= clampStateAndIndicateChange(this->getState(), CallerRangeState);
-
+      auto CallerWavesPerEU = std::make_pair(
+          AAWavesPerEU->getAssumed().getLower().getZExtValue(),
+          AAWavesPerEU->getAssumed().getUpper().getZExtValue() - 1);
+      DoUpdate(CallerWavesPerEU, CurrentWorkGroupSize);
       return true;
     };
 
@@ -967,6 +1031,11 @@ struct AAAMDWavesPerEU : public AAAMDSizeRangeAttribute {
                                             Attributor &A);
 
   ChangeStatus manifest(Attributor &A) override {
+    if (isAtInitialState()) {
+      LLVM_DEBUG(dbgs() << '[' << getName()
+                        << "] Still at initial state. No manifest.\n";);
+      return ChangeStatus::UNCHANGED;
+    }
     Function *F = getAssociatedFunction();
     auto &InfoCache = static_cast<AMDGPUInformationCache &>(A.getInfoCache());
     return emitAttributeIfNotDefaultAfterClamp(
diff --git a/llvm/test/CodeGen/AMDGPU/annotate-kernel-features-hsa-call.ll b/llvm/test/CodeGen/AMDGPU/annotate-kernel-features-hsa-call.ll
index 773ba4c8df673d..6d1842167824b1 100644
--- a/llvm/test/CodeGen/AMDGPU/annotate-kernel-features-hsa-call.ll
+++ b/llvm/test/CodeGen/AMDGPU/annotate-kernel-features-hsa-call.ll
@@ -688,7 +688,7 @@ define void @func_call_asm() #3 {
 ;
 ; ATTRIBUTOR_HSA-LABEL: define {{[^@]+}}@func_call_asm
 ; ATTRIBUTOR_HSA-SAME: () #[[ATTR16]] {
-; ATTRIBUTOR_HSA-NEXT:    call void asm sideeffect "", ""() #[[ATTR26:[0-9]+]]
+; ATTRIBUTOR_HSA-NEXT:    call void asm sideeffect "", ""() #[[ATTR24:[0-9]+]]
 ; ATTRIBUTOR_HSA-NEXT:    ret void
 ;
   call void asm sideeffect "", ""() #3
@@ -717,7 +717,7 @@ define amdgpu_kernel void @func_kern_defined() #3 {
 ; AKF_HSA-NEXT:    ret void
 ;
 ; ATTRIBUTOR_HSA-LABEL: define {{[^@]+}}@func_kern_defined
-; ATTRIBUTOR_HSA-SAME: () #[[ATTR17:[0-9]+]] {
+; ATTRIBUTOR_HSA-SAME: () #[[ATTR16]] {
 ; ATTRIBUTOR_HSA-NEXT:    call void @defined.func()
 ; ATTRIBUTOR_HSA-NEXT:    ret void
 ;
@@ -845,7 +845,7 @@ define amdgpu_kernel void @kern_sanitize_address() #4 {
 ; AKF_HSA-NEXT:    ret void
 ;
 ; ATTRIBUTOR_HSA-LABEL: define {{[^@]+}}@kern_sanitize_address
-; ATTRIBUTOR_HSA-SAME: () #[[ATTR18:[0-9]+]] {
+; ATTRIBUTOR_HSA-SAME: () #[[ATTR17:[0-9]+]] {
 ; ATTRIBUTOR_HSA-NEXT:    store volatile i32 0, ptr addrspace(1) null, align 4
 ; ATTRIBUTOR_HSA-NEXT:    ret void
 ;
@@ -861,7 +861,7 @@ define void @func_sanitize_address() #4 {
 ; AKF_HSA-NEXT:    ret void
 ;
 ; ATTRIBUTOR_HSA-LABEL: define {{[^@]+}}@func_sanitize_address
-; ATTRIBUTOR_HSA-SAME: () #[[ATTR18]] {
+; ATTRIBUTOR_HSA-SAME: () #[[ATTR17]] {
 ; ATTRIBUTOR_HSA-NEXT:    store volatile i32 0, ptr addrspace(1) null, align 4
 ; ATTRIBUTOR_HSA-NEXT:    ret void
 ;
@@ -877,7 +877,7 @@ define void @func_indirect_sanitize_address() #3 {
 ; AKF_HSA-NEXT:    ret void
 ;
 ; ATTRIBUTOR_HSA-LABEL: define {{[^@]+}}@func_indirect_sanitize_address
-; ATTRIBUTOR_HSA-SAME: () #[[ATTR19:[0-9]+]] {
+; ATTRIBUTOR_HSA-SAME: () #[[ATTR18:[0-9]+]] {
 ; ATTRIBUTOR_HSA-NEXT:    call void @func_sanitize_address()
 ; ATTRIBUTOR_HSA-NEXT:    ret void
 ;
@@ -893,7 +893,7 @@ define amdgpu_kernel void @kern_indirect_sanitize_address() #3 {
 ; AKF_HSA-NEXT:    ret void
 ;
 ; ATTRIBUTOR_HSA-LABEL: define {{[^@]+}}@kern_indirect_sanitize_address
-; ATTRIBUTOR_HSA-SAME: () #[[ATTR19]] {
+; ATTRIBUTOR_HSA-SAME: () #[[ATTR18]] {
 ; ATTRIBUTOR_HSA-NEXT:    call void @func_sanitize_address()
 ; ATTRIBUTOR_HSA-NEXT:    ret void
 ;
@@ -928,7 +928,7 @@ define internal void @enqueue_block_def() #6 {
 ; AKF_HSA-NEXT:    ret void
 ;
 ; ATTRIBUTOR_HSA-LABEL: define {{[^@]+}}@enqueue_block_def
-; ATTRIBUTOR_HSA-SAME: () #[[ATTR22:[0-9]+]] {
+; ATTRIBUTOR_HSA-SAME: () #[[ATTR21:[0-9]+]] {
 ; ATTRIBUTOR_HSA-NEXT:    ret void
 ;
   ret void
@@ -941,7 +941,7 @@ define amdgpu_kernel void @kern_call_enqueued_block_decl() {
 ; AKF_HSA-NEXT:    ret void
 ;
 ; ATTRIBUTOR_HSA-LABEL: define {{[^@]+}}@kern_call_enqueued_block_decl
-; ATTRIBUTOR_HSA-SAME: () #[[ATTR23:[0-9]+]] {
+; ATTRIBUTOR_HSA-SAME: () #[[ATTR22:[0-9]+]] {
 ; ATTRIBUTOR_HSA-NEXT:    call void @enqueue_block_decl()
 ; ATTRIBUTOR_HSA-NEXT:    ret void
 ;
@@ -956,7 +956,7 @@ define amdgpu_kernel void @kern_call_enqueued_block_def() {
 ; AKF_HSA-NEXT:    ret void
 ;
 ; ATTRIBUTOR_HSA-LABEL: define {{[^@]+}}@kern_call_enqueued_block_def
-; ATTRIBUTOR_HSA-SAME: () #[[ATTR24:[0-9]+]] {
+; ATTRIBUTOR_HSA-SAME: () #[[ATTR23:[0-9]+]] {
 ; ATTRIBUTOR_HSA-NEXT:    call void @enqueue_block_def()
 ; ATTRIBUTOR_HSA-NEXT:    ret void
 ;
@@ -969,7 +969,7 @@ define void @unused_enqueue_block() {
 ; AKF_HSA-NEXT:    ret void
 ;
 ; ATTRIBUTOR_HSA-LABEL: define {{[^@]+}}@unused_enqueue_block
-; ATTRIBUTOR_HSA-SAME: () #[[ATTR25:[0-9]+]] {
+; ATTRIBUTOR_HSA-SAME: () #[[ATTR23]] {
 ; ATTRIBUTOR_HSA-NEXT:    ret void
 ;
   ret void
@@ -980,7 +980,7 @@ define internal void @known_func() {
 ; AKF_HSA-NEXT:    ret void
 ;
 ; ATTRIBUTOR_HSA-LABEL: define {{[^@]+}}@known_func
-; ATTRIBUTOR_HSA-SAME: () #[[ATTR25]] {
+; ATTRIBUTOR_HSA-SAME: () #[[ATTR23]] {
 ; ATTRIBUTOR_HSA-NEXT:    ret void
 ;
   ret void
@@ -994,8 +994,8 @@ define amdgpu_kernel void @kern_callsite_enqueue_block() {
 ; AKF_HSA-NEXT:    ret void
 ;
 ; ATTRIBUTOR_HSA-LABEL: define {{[^@]+}}@kern_callsite_enqueue_block
-; ATTRIBUTOR_HSA-SAME: () #[[ATTR24]] {
-; ATTRIBUTOR_HSA-NEXT:    call void @known_func() #[[ATTR27:[0-9]+]]
+; ATTRIBUTOR_HSA-SAME: () #[[ATTR23]] {
+; ATTRIBUTOR_HSA-NEXT:    call void @known_func() #[[ATTR25:[0-9]+]]
 ; ATTRIBUTOR_HSA-NEXT:    ret void
 ;
   call void @known_func() #6
@@ -1041,17 +1041,15 @@ attributes #6 = { "enqueued-block" }
 ; ATTRIBUTOR_HSA: attributes #[[ATTR14]] = { nounwind "amdgpu-no-agpr" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "target-cpu"="gfx900" "uniform-work-group-size"="false" }
 ; ATTRIBUTOR_HSA: attributes #[[ATTR15]] = { nounwind "uniform-work-group-size"="false" }
 ; ATTRIBUTOR_HSA: attributes #[[ATTR16]] = { nounwind "amdgpu-no-agpr" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "uniform-work-group-size"="false" }
-; ATTRIBUTOR_HSA: attributes #[[ATTR17]] = { nounwind "amdgpu-no-agpr" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "amdgpu-waves-per-eu"="4,10" "uniform-work-group-size"="false" }
-; ATTRIBUTOR_HSA: attributes #[[ATTR18]] = { nounwind sanitize_address "amdgpu-no-agpr" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-heap-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "uniform-work-group-size"="false" }
-; ATTRIBUTOR_HSA: attributes #[[ATTR19]] = { nounwind "amdgpu-no-agpr" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-heap-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "uniform-work-group-size"="false" }
-; ATTRIBUTOR_HSA: attributes #[[ATTR20:[0-9]+]] = { nounwind sanitize_address "amdgpu-no-implicitarg-ptr" "uniform-work-group-size"="false" }
-; ATTRIBUTOR_HSA: attributes #[[ATTR21:[0-9]+]] = { "enqueued-block" "uniform-work-group-size"="false" }
-; ATTRIBUTOR_HSA: attributes #[[ATTR22]] = { "amdgpu-no-agpr" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "enqueued-block" "uniform-work-group-size"="false" }
-; ATTRIBUTOR_HSA: attributes #[[ATTR23]] = { "uniform-work-group-size"="false" }
-; ATTRIBUTOR_HSA: attributes #[[ATTR24]] = { "amdgpu-no-agpr" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "amdgpu-waves-per-eu"="4,10" "uniform-work-group-size"="false" }
-; ATTRIBUTOR_HSA: attributes #[[ATTR25]] = { "amdgpu-no-agpr" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "uniform-work-group-size"="false" }
-; ATTRIBUTOR_HSA: attributes #[[ATTR26]] = { nounwind }
-; ATTRIBUTOR_HSA: attributes #[[ATTR27]] = { "enqueued-block" }
+; ATTRIBUTOR_HSA: attributes #[[ATTR17]] = { nounwind sanitize_address "amdgpu-no-agpr" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-heap-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "uniform-work-group-size"="false" }
+; ATTRIBUTOR_HSA: attributes #[[ATTR18]] = { nounwind "amdgpu-no-agpr" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-heap-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "uniform-work-group-size"="false" }
+; ATTRIBUTOR_HSA: attributes #[[ATTR19:[0-9]+]] = { nounwind sanitize_address "amdgpu-no-implicitarg-ptr" "uniform-work-group-size"="false" }
+; ATTRIBUTOR_HSA: attributes #[[ATTR20:[0-9]+]] = { "enqueued-block" "uniform-work-group-size"="false" }
+; ATTRIBUTOR_HSA: attributes #[[ATTR21]] = { "amdgpu-no-agpr" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "enqueued-block" "uniform-work-group-size"="false" }
+; ATTRIBUTOR_HSA: attributes #[[ATTR22]] = { "uniform-work-group-size"="false" }
+; ATTRIBUTOR_HSA: attributes #[[ATTR23]] = { "amdgpu-no-agpr" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "uniform-work-group-size"="false" }
+; ATTRIBUTOR_HSA: attributes #[[ATTR24]] = { nounwind }
+; ATTRIBUTOR_HSA: attributes #[[ATTR25]] = { "enqueued-block" }
 ;.
 ; AKF_HSA: [[META0:![0-9]+]] = !{i32 1, !"amdhsa_code_object_version", i32 500}
 ;.
diff --git a/llvm/test/CodeGen/AMDGPU/attributor-loop-issue-58639.ll b/llvm/test/CodeGen/AMDGPU/attributor-loop-issue-58639.ll
index 8481cea4d7c353..a9efcdcb0af6d0 100644
--- a/llvm/test/CodeGen/AMDGPU/attributor-loop-issue-58639.ll
+++ b/llvm/test/CodeGen/AMDGPU/attributor-loop-issue-58639.ll
@@ -51,7 +51,7 @@ bb5:                                              ; preds = %bb5, %bb3
 
 define amdgpu_kernel void @entry() {
 ; CHECK-LABEL: define {{[^@]+}}@entry
-; CHECK-SAME: () #[[ATTR1:[0-9]+]] {
+; CHECK-SAME: () #[[ATTR0]] {
 ; CHECK-NEXT:    [[ALLOCA:%.*]] = alloca [[TMP0:%.*]], align 8, addrspace(5)
 ; CHECK-NEXT:    [[CAST:%.*]] = addrspacecast ptr addrspace(5) [[ALLOCA]] to ptr
 ; CHECK-NEXT:    [[ARST:%.*]] = call double @baz(ptr [[CAST]])
@@ -64,5 +64,4 @@ define amdgpu_kernel void @entry() {
 }
 ;.
 ; CHECK: attributes #[[ATTR0]] = { "uniform-work-group-size"="false" }
-; CHECK: attributes #[[ATTR1]] = { "amdgpu-waves-per-eu"="4,10" "uniform-work-group-size"="false" }
 ;.
diff --git a/llvm/test/CodeGen/AMDGPU/direct-indirect-call.ll b/llvm/test/CodeGen/AMDGPU/direct-indirect-call.ll
index d9be8cbc73a5e9..145d730d356537 100644
--- a/llvm/test/CodeGen/AMDGPU/direct-indirect-call.ll
+++ b/llvm/test/CodeGen/AMDGPU/direct-indirect-call.ll
@@ -27,7 +27,7 @@ define internal void @direct() {
 
 define amdgpu_kernel void @test_direct_indirect_call() {
 ; CHECK-LABEL: define {{[^@]+}}@test_direct_indirect_call
-; CHECK-SAME: () #[[ATTR2:[0-9]+]] {
+; CHECK-SAME: () #[[ATTR1]] {
 ; CHECK-NEXT:    call void @direct()
 ; CHECK-NEXT:    ret vo...
[truncated]

arsenm · 2024-11-04T17:19:10Z

llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp

+      /// must return true since the state is still considered valid.
+      if (CallerInfo->isAtInitialState()) {
+        LLVM_DEBUG(dbgs() << '[' << getName() << "] Caller "
+                          << Caller->getName()


use printAsOperand to handle anonymous functions correctly

This is copied from existing code. I'll do a follow up patch to change them all.

shiltian · 2024-11-05T17:49:33Z

bump

shiltian · 2024-12-10T17:48:27Z

ping, I prefer not to update tests again and again...

arsenm · 2024-12-11T11:03:51Z

llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp

+    if (AAFlatWorkGroupSize->isAtInitialState()) {
+      LLVM_DEBUG(dbgs() << '[' << getName()
+                        << "] AAAMDFlatWorkGroupSize is still at initial "
+                           "state. Skip the update.\n");
+      return ChangeStatus::UNCHANGED;
+    }


The isAtInitialState check is a superset of the above isValidState, just replace the isValidState call?

An initial state is always a valid state, but the reverse is not necessarily true. Once the value is updated, it is no longer in the initial state; however, it remains valid. As such, updates should continue, and the manifest function must still be invoked.

There are four states to consider: valid, invalid, best, and worst. An AA typically begins in the best state and transitions toward a "worse" state until reaching the worst state. For most AAs, the invalid state is the worst state, but this is not always true, especially for the IntegerRangeState, which we use extensively here.

In the case of IntegerRangeState, the worst state is considered invalid only if Known is set to the worst value (~0U by default). Once Known is modified, there is no longer invalid state, meaning that manifest will always run, and half-baked values are likely to be added. This is why I avoided updating Known in #113018 and all related PRs in this stack unless the changes were definitive (i.e., all call sites were seen).

The situation becomes more complex because the best state does not actually represent a valid pair of values in terms of the attribute. However, it must still be considered valid so that updates can occur, allowing Assumed to be updated accordingly.

Complications arise further because this AA relies on another AA that uses range state. Similarly, the best state of the dependent AA is not a valid pair of values either. If the dependent AA remains in its best (or initial, as introduced in this PR) state, we must skip the update, as its values are effectively nonsensical.

arsenm · 2024-12-11T11:04:59Z

llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp

        return false;
+      }
+      if (AAWavesPerEU->isAtInitialState()) {


Same as above

shiltian · 2024-12-30T03:49:40Z

gentle ping +1

shiltian · 2025-01-07T05:22:02Z

ping @arsenm

shiltian · 2025-01-30T16:24:31Z

Close this in favor of #123995.

This was referenced Nov 4, 2024

[AMDGPU][Attributor] Make AAAMDWavesPerEU honor existing attribute #114438

Merged

[AMDGPU][Attributor] Make AAAMDFlatWorkGroupSize honor existing attribute #114357

Merged

llvmbot added the backend:AMDGPU label Nov 4, 2024

shiltian force-pushed the users/shiltian/skip-initial-state branch from b7612ed to 4e38059 Compare November 4, 2024 00:37

arsenm reviewed Nov 4, 2024

View reviewed changes

shiltian mentioned this pull request Nov 5, 2024

AMDGPU: Propagate amdgpu-max-num-workgroups attribute #113018

Merged

shiltian force-pushed the users/shiltian/honor-attribute-wave-per-eu branch from 79f8883 to de91e17 Compare December 9, 2024 16:13

shiltian force-pushed the users/shiltian/skip-initial-state branch from 4e38059 to 47f7697 Compare December 9, 2024 16:13

shiltian force-pushed the users/shiltian/honor-attribute-wave-per-eu branch from de91e17 to eb0b434 Compare December 10, 2024 17:47

shiltian force-pushed the users/shiltian/skip-initial-state branch from 47f7697 to f3a3afa Compare December 10, 2024 17:47

arsenm reviewed Dec 11, 2024

View reviewed changes

shiltian force-pushed the users/shiltian/honor-attribute-wave-per-eu branch from eb0b434 to afda3da Compare December 11, 2024 16:59

shiltian force-pushed the users/shiltian/skip-initial-state branch 2 times, most recently from f93730e to b2c5c63 Compare December 11, 2024 17:49

shiltian changed the title ~~[AMDGPU][Attributor] Skip update if an AA is at its initial state~~ [AMDGPU][Attributor] Skip update and manifest if an AA is at its initial state Dec 11, 2024

shiltian force-pushed the users/shiltian/honor-attribute-wave-per-eu branch from afda3da to c9dd751 Compare December 11, 2024 21:48

Base automatically changed from users/shiltian/honor-attribute-wave-per-eu to main December 11, 2024 21:50

shiltian force-pushed the users/shiltian/skip-initial-state branch from b2c5c63 to f1b9fd1 Compare December 11, 2024 21:52

[AMDGPU][Attributor] Skip update if an AA is at its initial state

deeb119

shiltian force-pushed the users/shiltian/skip-initial-state branch from f1b9fd1 to deeb119 Compare January 6, 2025 17:52

shiltian closed this Jan 30, 2025

shiltian deleted the users/shiltian/skip-initial-state branch January 30, 2025 16:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AMDGPU][Attributor] Skip update and manifest if an AA is at its initial state #114726

[AMDGPU][Attributor] Skip update and manifest if an AA is at its initial state #114726

Uh oh!

shiltian commented Nov 4, 2024 •

edited

Loading

Uh oh!

shiltian commented Nov 4, 2024 •

edited

Loading

Uh oh!

llvmbot commented Nov 4, 2024

Uh oh!

arsenm Nov 4, 2024

Uh oh!

shiltian Nov 4, 2024

Uh oh!

shiltian commented Nov 5, 2024

Uh oh!

shiltian commented Dec 10, 2024

Uh oh!

arsenm Dec 11, 2024

Uh oh!

shiltian Dec 11, 2024

Uh oh!

shiltian Dec 11, 2024

Uh oh!

arsenm Dec 11, 2024

Uh oh!

shiltian commented Dec 30, 2024

Uh oh!

shiltian commented Jan 7, 2025

Uh oh!

shiltian commented Jan 30, 2025

Uh oh!

Uh oh!

[AMDGPU][Attributor] Skip update and manifest if an AA is at its initial state #114726

[AMDGPU][Attributor] Skip update and manifest if an AA is at its initial state #114726

Uh oh!

Conversation

shiltian commented Nov 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shiltian commented Nov 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Nov 4, 2024

Uh oh!

arsenm Nov 4, 2024

Choose a reason for hiding this comment

Uh oh!

shiltian Nov 4, 2024

Choose a reason for hiding this comment

Uh oh!

shiltian commented Nov 5, 2024

Uh oh!

shiltian commented Dec 10, 2024

Uh oh!

arsenm Dec 11, 2024

Choose a reason for hiding this comment

Uh oh!

shiltian Dec 11, 2024

Choose a reason for hiding this comment

Uh oh!

shiltian Dec 11, 2024

Choose a reason for hiding this comment

Uh oh!

arsenm Dec 11, 2024

Choose a reason for hiding this comment

Uh oh!

shiltian commented Dec 30, 2024

Uh oh!

shiltian commented Jan 7, 2025

Uh oh!

shiltian commented Jan 30, 2025

Uh oh!

Uh oh!

shiltian commented Nov 4, 2024 •

edited

Loading

shiltian commented Nov 4, 2024 •

edited

Loading