Skip to content

[AMDGPU] Infer amdgpu-no-flat-scratch-init attribute in AMDGPUAttributor #94647

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

jwanggit86
Copy link
Contributor

The AMDGPUAnnotateKernelFeatures pass infers the "amdgpu-calls" and "amdgpu-stack-objects" attributes, which are used to infer whether we need to initialize flat scratch. This is, however, not precise. Instead, we should use AMDGPUAttributor and infer amdgpu-no-flat-scratch-init on kernels. Refer to #63586 .

@llvmbot
Copy link
Member

llvmbot commented Jun 6, 2024

@llvm/pr-subscribers-clang
@llvm/pr-subscribers-backend-amdgpu

@llvm/pr-subscribers-llvm-globalisel

Author: Jun Wang (jwanggit86)

Changes

The AMDGPUAnnotateKernelFeatures pass infers the "amdgpu-calls" and "amdgpu-stack-objects" attributes, which are used to infer whether we need to initialize flat scratch. This is, however, not precise. Instead, we should use AMDGPUAttributor and infer amdgpu-no-flat-scratch-init on kernels. Refer to #63586 .


Patch is 1.65 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/94647.diff

56 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/AMDGPUAttributes.def (+1)
  • (modified) llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp (+43)
  • (modified) llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp (+3-7)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/implicit-kernarg-backend-usage-global-isel.ll (+14-4)
  • (modified) llvm/test/CodeGen/AMDGPU/addrspacecast-constantexpr.ll (+3-3)
  • (modified) llvm/test/CodeGen/AMDGPU/amdgpu-attributor-no-agpr.ll (+11-10)
  • (modified) llvm/test/CodeGen/AMDGPU/annotate-kernel-features-hsa-call.ll (+58-54)
  • (modified) llvm/test/CodeGen/AMDGPU/annotate-kernel-features-hsa.ll (+25-23)
  • (modified) llvm/test/CodeGen/AMDGPU/annotate-kernel-features.ll (+9-9)
  • (added) llvm/test/CodeGen/AMDGPU/attributor-flatscratchinit-globalisel.ll (+1028)
  • (added) llvm/test/CodeGen/AMDGPU/attributor-flatscratchinit.ll (+914)
  • (modified) llvm/test/CodeGen/AMDGPU/attributor-noopt.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/call-graph-register-usage.ll (+4-4)
  • (modified) llvm/test/CodeGen/AMDGPU/callee-special-input-sgprs-fixed-abi.ll (+18-18)
  • (modified) llvm/test/CodeGen/AMDGPU/direct-indirect-call.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/duplicate-attribute-indirect.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/flat-address-space.ll (+4-4)
  • (modified) llvm/test/CodeGen/AMDGPU/implicit-kernarg-backend-usage.ll (+15-4)
  • (modified) llvm/test/CodeGen/AMDGPU/implicitarg-offset-attributes.ll (+15-15)
  • (modified) llvm/test/CodeGen/AMDGPU/ipra.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/lds-frame-extern.ll (+24-72)
  • (modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.lds.kernel.id.ll (+7-10)
  • (modified) llvm/test/CodeGen/AMDGPU/lower-module-lds-via-hybrid.ll (+3-12)
  • (modified) llvm/test/CodeGen/AMDGPU/lower-module-lds-via-table.ll (+3-12)
  • (modified) llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-agent.ll (+1380)
  • (modified) llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-nontemporal.ll (+75)
  • (modified) llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-singlethread.ll (+1380)
  • (modified) llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-system.ll (+1380)
  • (modified) llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-volatile.ll (+66)
  • (modified) llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-wavefront.ll (+1365)
  • (modified) llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-workgroup.ll (+1320)
  • (modified) llvm/test/CodeGen/AMDGPU/memory-legalizer-global-agent.ll (+273)
  • (modified) llvm/test/CodeGen/AMDGPU/memory-legalizer-global-nontemporal.ll (+15)
  • (modified) llvm/test/CodeGen/AMDGPU/memory-legalizer-global-singlethread.ll (+276)
  • (modified) llvm/test/CodeGen/AMDGPU/memory-legalizer-global-system.ll (+261)
  • (modified) llvm/test/CodeGen/AMDGPU/memory-legalizer-global-volatile.ll (+18)
  • (modified) llvm/test/CodeGen/AMDGPU/memory-legalizer-global-wavefront.ll (+276)
  • (modified) llvm/test/CodeGen/AMDGPU/memory-legalizer-global-workgroup.ll (+276)
  • (modified) llvm/test/CodeGen/AMDGPU/memory-legalizer-local-nontemporal.ll (+9)
  • (modified) llvm/test/CodeGen/AMDGPU/memory-legalizer-local-volatile.ll (+6)
  • (modified) llvm/test/CodeGen/AMDGPU/memory-legalizer-private-nontemporal.ll (+34-25)
  • (modified) llvm/test/CodeGen/AMDGPU/memory-legalizer-private-volatile.ll (+18-12)
  • (modified) llvm/test/CodeGen/AMDGPU/propagate-flat-work-group-size.ll (+9-9)
  • (modified) llvm/test/CodeGen/AMDGPU/propagate-waves-per-eu.ll (+22-22)
  • (modified) llvm/test/CodeGen/AMDGPU/recursive_global_initializer.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/remove-no-kernel-id-attribute.ll (+4-4)
  • (modified) llvm/test/CodeGen/AMDGPU/sgpr-spill-no-vgprs.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/simple-indirect-call.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/uniform-work-group-attribute-missing.ll (+2-2)
  • (modified) llvm/test/CodeGen/AMDGPU/uniform-work-group-multistep.ll (+2-2)
  • (modified) llvm/test/CodeGen/AMDGPU/uniform-work-group-nested-function-calls.ll (+2-2)
  • (modified) llvm/test/CodeGen/AMDGPU/uniform-work-group-prevent-attribute-propagation.ll (+3-3)
  • (modified) llvm/test/CodeGen/AMDGPU/uniform-work-group-propagate-attribute.ll (+2-2)
  • (modified) llvm/test/CodeGen/AMDGPU/uniform-work-group-recursion-test.ll (+3-3)
  • (modified) llvm/test/CodeGen/AMDGPU/uniform-work-group-test.ll (+2-2)
  • (modified) llvm/test/CodeGen/AMDGPU/vgpr-spill-placement-issue61083.ll (+1-1)
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAttributes.def b/llvm/lib/Target/AMDGPU/AMDGPUAttributes.def
index bacc8e4e821e5..8c1c8219690ba 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUAttributes.def
+++ b/llvm/lib/Target/AMDGPU/AMDGPUAttributes.def
@@ -30,5 +30,6 @@ AMDGPU_ATTRIBUTE(WORKITEM_ID_Z, "amdgpu-no-workitem-id-z")
 AMDGPU_ATTRIBUTE(LDS_KERNEL_ID, "amdgpu-no-lds-kernel-id")
 AMDGPU_ATTRIBUTE(DEFAULT_QUEUE, "amdgpu-no-default-queue")
 AMDGPU_ATTRIBUTE(COMPLETION_ACTION, "amdgpu-no-completion-action")
+AMDGPU_ATTRIBUTE(FLAT_SCRATCH_INIT, "amdgpu-no-flat-scratch-init")
 
 #undef AMDGPU_ATTRIBUTE
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp b/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
index 43bfd0f13f875..8bdc9eab577a9 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
@@ -433,6 +433,19 @@ struct AAAMDAttributesFunction : public AAAMDAttributes {
       indicatePessimisticFixpoint();
       return;
     }
+
+    bool HasAllocaOrASCast = false;
+    for (BasicBlock &BB : *F) {
+      for (Instruction &I : BB) {
+        if (isa<AllocaInst>(I) || isa<AddrSpaceCastInst>(I)) {
+          HasAllocaOrASCast = true;
+          removeAssumedBits(FLAT_SCRATCH_INIT);
+          break;
+        }
+      }
+      if (HasAllocaOrASCast)
+        break;
+    }
   }
 
   ChangeStatus updateImpl(Attributor &A) override {
@@ -519,6 +532,9 @@ struct AAAMDAttributesFunction : public AAAMDAttributes {
     if (isAssumed(COMPLETION_ACTION) && funcRetrievesCompletionAction(A, COV))
       removeAssumedBits(COMPLETION_ACTION);
 
+    if (isAssumed(FLAT_SCRATCH_INIT) && needFlatScratchInit(A))
+      removeAssumedBits(FLAT_SCRATCH_INIT);
+
     return getAssumed() != OrigAssumed ? ChangeStatus::CHANGED
                                        : ChangeStatus::UNCHANGED;
   }
@@ -677,6 +693,33 @@ struct AAAMDAttributesFunction : public AAAMDAttributes {
     return !A.checkForAllCallLikeInstructions(DoesNotRetrieve, *this,
                                               UsedAssumedInformation);
   }
+
+  // Returns true if FlatScratchInit is needed, i.e., no-flat-scratch-init is
+  // not to be set.
+  bool needFlatScratchInit(Attributor &A) {
+    // This is called on each callee; false means callee shouldn't have
+    // no-flat-scratch-init.
+    auto CheckForNoFlatScratchInit = [&](Instruction &I) {
+      const auto &CB = cast<CallBase>(I);
+      const Value *CalleeOp = CB.getCalledOperand();
+      const Function *Callee = dyn_cast<Function>(CalleeOp);
+      if (!Callee) // indirect call
+        return CB.isInlineAsm();
+
+      if (Callee->isIntrinsic())
+        return true;
+
+      const auto *CalleeInfo = A.getAAFor<AAAMDAttributes>(
+          *this, IRPosition::function(*Callee), DepClassTy::REQUIRED);
+      return CalleeInfo && CalleeInfo->isAssumed(FLAT_SCRATCH_INIT);
+    };
+
+    bool UsedAssumedInformation = false;
+    // If any callee is false (i.e. need FlatScratchInit),
+    // checkForAllCallLikeInstructions returns false
+    return !A.checkForAllCallLikeInstructions(CheckForNoFlatScratchInit, *this,
+                                              UsedAssumedInformation);
+  }
 };
 
 AAAMDAttributes &AAAMDAttributes::createForPosition(const IRPosition &IRP,
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp b/llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp
index 94ee4ac78142d..511e711bf724d 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp
@@ -1040,12 +1040,8 @@ GCNUserSGPRUsageInfo::GCNUserSGPRUsageInfo(const Function &F,
   const CallingConv::ID CC = F.getCallingConv();
   const bool IsKernel =
       CC == CallingConv::AMDGPU_KERNEL || CC == CallingConv::SPIR_KERNEL;
-  // FIXME: Should have analysis or something rather than attribute to detect
-  // calls.
-  const bool HasCalls = F.hasFnAttribute("amdgpu-calls");
-  // FIXME: This attribute is a hack, we just need an analysis on the function
-  // to look for allocas.
-  const bool HasStackObjects = F.hasFnAttribute("amdgpu-stack-objects");
+  const bool NoFlatScratchInit =
+      F.hasFnAttribute("amdgpu-no-flat-scratch-init");
 
   if (IsKernel && (!F.arg_empty() || ST.getImplicitArgNumBytes(F) != 0))
     KernargSegmentPtr = true;
@@ -1073,7 +1069,7 @@ GCNUserSGPRUsageInfo::GCNUserSGPRUsageInfo(const Function &F,
   // lowering.
   if (ST.hasFlatAddressSpace() && AMDGPU::isEntryFunctionCC(CC) &&
       (IsAmdHsaOrMesa || ST.enableFlatScratch()) &&
-      (HasCalls || HasStackObjects || ST.enableFlatScratch()) &&
+      (!NoFlatScratchInit || ST.enableFlatScratch()) &&
       !ST.flatScratchIsArchitected()) {
     FlatScratchInit = true;
   }
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/implicit-kernarg-backend-usage-global-isel.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/implicit-kernarg-backend-usage-global-isel.ll
index 8859ac69923a9..74ba17d4bf59d 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/implicit-kernarg-backend-usage-global-isel.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/implicit-kernarg-backend-usage-global-isel.ll
@@ -12,7 +12,9 @@ define amdgpu_kernel void @addrspacecast(ptr addrspace(5) %ptr.private, ptr addr
 ; GFX8V4:       ; %bb.0:
 ; GFX8V4-NEXT:    s_load_dwordx2 s[0:1], s[6:7], 0x0
 ; GFX8V4-NEXT:    s_load_dwordx2 s[2:3], s[4:5], 0x40
-; GFX8V4-NEXT:    v_mov_b32_e32 v2, 1
+; GFX8V4-NEXT:    s_add_i32 s8, s8, s11
+; GFX8V4-NEXT:    s_lshr_b32 flat_scratch_hi, s8, 8
+; GFX8V4-NEXT:    s_mov_b32 flat_scratch_lo, s9
 ; GFX8V4-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX8V4-NEXT:    s_mov_b32 s4, s0
 ; GFX8V4-NEXT:    s_mov_b32 s5, s3
@@ -23,6 +25,7 @@ define amdgpu_kernel void @addrspacecast(ptr addrspace(5) %ptr.private, ptr addr
 ; GFX8V4-NEXT:    s_cmp_lg_u32 s1, -1
 ; GFX8V4-NEXT:    v_mov_b32_e32 v0, s4
 ; GFX8V4-NEXT:    s_cselect_b64 s[0:1], s[6:7], 0
+; GFX8V4-NEXT:    v_mov_b32_e32 v2, 1
 ; GFX8V4-NEXT:    v_mov_b32_e32 v1, s5
 ; GFX8V4-NEXT:    flat_store_dword v[0:1], v2
 ; GFX8V4-NEXT:    s_waitcnt vmcnt(0)
@@ -37,7 +40,9 @@ define amdgpu_kernel void @addrspacecast(ptr addrspace(5) %ptr.private, ptr addr
 ; GFX8V5:       ; %bb.0:
 ; GFX8V5-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x0
 ; GFX8V5-NEXT:    s_load_dwordx2 s[2:3], s[4:5], 0xc8
-; GFX8V5-NEXT:    v_mov_b32_e32 v2, 1
+; GFX8V5-NEXT:    s_add_i32 s6, s6, s9
+; GFX8V5-NEXT:    s_lshr_b32 flat_scratch_hi, s6, 8
+; GFX8V5-NEXT:    s_mov_b32 flat_scratch_lo, s7
 ; GFX8V5-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX8V5-NEXT:    s_mov_b32 s4, s0
 ; GFX8V5-NEXT:    s_mov_b32 s5, s2
@@ -47,6 +52,7 @@ define amdgpu_kernel void @addrspacecast(ptr addrspace(5) %ptr.private, ptr addr
 ; GFX8V5-NEXT:    s_cmp_lg_u32 s1, -1
 ; GFX8V5-NEXT:    v_mov_b32_e32 v0, s4
 ; GFX8V5-NEXT:    s_cselect_b64 s[0:1], s[2:3], 0
+; GFX8V5-NEXT:    v_mov_b32_e32 v2, 1
 ; GFX8V5-NEXT:    v_mov_b32_e32 v1, s5
 ; GFX8V5-NEXT:    flat_store_dword v[0:1], v2
 ; GFX8V5-NEXT:    s_waitcnt vmcnt(0)
@@ -60,9 +66,10 @@ define amdgpu_kernel void @addrspacecast(ptr addrspace(5) %ptr.private, ptr addr
 ; GFX9V4-LABEL: addrspacecast:
 ; GFX9V4:       ; %bb.0:
 ; GFX9V4-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x0
+; GFX9V4-NEXT:    s_add_u32 flat_scratch_lo, s6, s9
+; GFX9V4-NEXT:    s_addc_u32 flat_scratch_hi, s7, 0
 ; GFX9V4-NEXT:    s_mov_b64 s[2:3], src_private_base
 ; GFX9V4-NEXT:    s_mov_b64 s[4:5], src_shared_base
-; GFX9V4-NEXT:    v_mov_b32_e32 v2, 1
 ; GFX9V4-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX9V4-NEXT:    s_mov_b32 s2, s0
 ; GFX9V4-NEXT:    s_cmp_lg_u32 s0, -1
@@ -71,6 +78,7 @@ define amdgpu_kernel void @addrspacecast(ptr addrspace(5) %ptr.private, ptr addr
 ; GFX9V4-NEXT:    s_cmp_lg_u32 s1, -1
 ; GFX9V4-NEXT:    v_mov_b32_e32 v0, s2
 ; GFX9V4-NEXT:    s_cselect_b64 s[0:1], s[4:5], 0
+; GFX9V4-NEXT:    v_mov_b32_e32 v2, 1
 ; GFX9V4-NEXT:    v_mov_b32_e32 v1, s3
 ; GFX9V4-NEXT:    flat_store_dword v[0:1], v2
 ; GFX9V4-NEXT:    s_waitcnt vmcnt(0)
@@ -84,9 +92,10 @@ define amdgpu_kernel void @addrspacecast(ptr addrspace(5) %ptr.private, ptr addr
 ; GFX9V5-LABEL: addrspacecast:
 ; GFX9V5:       ; %bb.0:
 ; GFX9V5-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x0
+; GFX9V5-NEXT:    s_add_u32 flat_scratch_lo, s6, s9
+; GFX9V5-NEXT:    s_addc_u32 flat_scratch_hi, s7, 0
 ; GFX9V5-NEXT:    s_mov_b64 s[2:3], src_private_base
 ; GFX9V5-NEXT:    s_mov_b64 s[4:5], src_shared_base
-; GFX9V5-NEXT:    v_mov_b32_e32 v2, 1
 ; GFX9V5-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX9V5-NEXT:    s_mov_b32 s2, s0
 ; GFX9V5-NEXT:    s_cmp_lg_u32 s0, -1
@@ -95,6 +104,7 @@ define amdgpu_kernel void @addrspacecast(ptr addrspace(5) %ptr.private, ptr addr
 ; GFX9V5-NEXT:    s_cmp_lg_u32 s1, -1
 ; GFX9V5-NEXT:    v_mov_b32_e32 v0, s2
 ; GFX9V5-NEXT:    s_cselect_b64 s[0:1], s[4:5], 0
+; GFX9V5-NEXT:    v_mov_b32_e32 v2, 1
 ; GFX9V5-NEXT:    v_mov_b32_e32 v1, s3
 ; GFX9V5-NEXT:    flat_store_dword v[0:1], v2
 ; GFX9V5-NEXT:    s_waitcnt vmcnt(0)
diff --git a/llvm/test/CodeGen/AMDGPU/addrspacecast-constantexpr.ll b/llvm/test/CodeGen/AMDGPU/addrspacecast-constantexpr.ll
index cff9ce0506679..96bbcb7ed2149 100644
--- a/llvm/test/CodeGen/AMDGPU/addrspacecast-constantexpr.ll
+++ b/llvm/test/CodeGen/AMDGPU/addrspacecast-constantexpr.ll
@@ -233,9 +233,9 @@ attributes #1 = { nounwind }
 ; AKF_HSA: attributes #[[ATTR1]] = { nounwind }
 ;.
 ; ATTRIBUTOR_HSA: attributes #[[ATTR0:[0-9]+]] = { nocallback nofree nounwind willreturn memory(argmem: readwrite) }
-; ATTRIBUTOR_HSA: attributes #[[ATTR1]] = { nounwind "amdgpu-no-agpr" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "uniform-work-group-size"="false" }
-; ATTRIBUTOR_HSA: attributes #[[ATTR2]] = { nounwind "amdgpu-no-agpr" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "uniform-work-group-size"="false" }
-; ATTRIBUTOR_HSA: attributes #[[ATTR3]] = { nounwind "amdgpu-no-agpr" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "amdgpu-waves-per-eu"="4,10" "uniform-work-group-size"="false" }
+; ATTRIBUTOR_HSA: attributes #[[ATTR1]] = { nounwind "amdgpu-no-agpr" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-flat-scratch-init" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "uniform-work-group-size"="false" }
+; ATTRIBUTOR_HSA: attributes #[[ATTR2]] = { nounwind "amdgpu-no-agpr" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-flat-scratch-init" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "uniform-work-group-size"="false" }
+; ATTRIBUTOR_HSA: attributes #[[ATTR3]] = { nounwind "amdgpu-no-agpr" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-flat-scratch-init" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "amdgpu-waves-per-eu"="4,10" "uniform-work-group-size"="false" }
 ;.
 ; AKF_HSA: [[META0:![0-9]+]] = !{i32 1, !"amdhsa_code_object_version", i32 500}
 ;.
diff --git a/llvm/test/CodeGen/AMDGPU/amdgpu-attributor-no-agpr.ll b/llvm/test/CodeGen/AMDGPU/amdgpu-attributor-no-agpr.ll
index 33b1cc65dc569..5ace66fd2dd76 100644
--- a/llvm/test/CodeGen/AMDGPU/amdgpu-attributor-no-agpr.ll
+++ b/llvm/test/CodeGen/AMDGPU/amdgpu-attributor-no-agpr.ll
@@ -116,7 +116,7 @@ define amdgpu_kernel void @kernel_calls_extern() {
 define amdgpu_kernel void @kernel_calls_extern_marked_callsite() {
 ; CHECK-LABEL: define amdgpu_kernel void @kernel_calls_extern_marked_callsite(
 ; CHECK-SAME: ) #[[ATTR4]] {
-; CHECK-NEXT:    call void @unknown() #[[ATTR9:[0-9]+]]
+; CHECK-NEXT:    call void @unknown() #[[ATTR10:[0-9]+]]
 ; CHECK-NEXT:    ret void
 ;
   call void @unknown() #0
@@ -136,7 +136,7 @@ define amdgpu_kernel void @kernel_calls_indirect(ptr %indirect) {
 define amdgpu_kernel void @kernel_calls_indirect_marked_callsite(ptr %indirect) {
 ; CHECK-LABEL: define amdgpu_kernel void @kernel_calls_indirect_marked_callsite(
 ; CHECK-SAME: ptr [[INDIRECT:%.*]]) #[[ATTR4]] {
-; CHECK-NEXT:    call void [[INDIRECT]]() #[[ATTR9]]
+; CHECK-NEXT:    call void [[INDIRECT]]() #[[ATTR10]]
 ; CHECK-NEXT:    ret void
 ;
   call void %indirect() #0
@@ -229,7 +229,7 @@ define amdgpu_kernel void @kernel_calls_workitem_id_x(ptr addrspace(1) %out) {
 
 define amdgpu_kernel void @indirect_calls_none_agpr(i1 %cond) {
 ; CHECK-LABEL: define amdgpu_kernel void @indirect_calls_none_agpr(
-; CHECK-SAME: i1 [[COND:%.*]]) #[[ATTR0]] {
+; CHECK-SAME: i1 [[COND:%.*]]) #[[ATTR7:[0-9]+]] {
 ; CHECK-NEXT:    [[FPTR:%.*]] = select i1 [[COND]], ptr @empty, ptr @also_empty
 ; CHECK-NEXT:    call void [[FPTR]]()
 ; CHECK-NEXT:    ret void
@@ -242,14 +242,15 @@ define amdgpu_kernel void @indirect_calls_none_agpr(i1 %cond) {
 
 attributes #0 = { "amdgpu-no-agpr" }
 ;.
-; CHECK: attributes #[[ATTR0]] = { "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "target-cpu"="gfx90a" "uniform-work-group-size"="false" }
-; CHECK: attributes #[[ATTR1]] = { "amdgpu-no-agpr" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "target-cpu"="gfx90a" "uniform-work-group-size"="false" }
-; CHECK: attributes #[[ATTR2]] = { "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "amdgpu-waves-per-eu"="4,8" "target-cpu"="gfx90a" "uniform-work-group-size"="false" }
+; CHECK: attributes #[[ATTR0]] = { "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-flat-scratch-init" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "target-cpu"="gfx90a" "uniform-work-group-size"="false" }
+; CHECK: attributes #[[ATTR1]] = { "amdgpu-no-agpr" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-flat-scratch-init" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "target-cpu"="gfx90a" "uniform-work-group-size"="false" }
+; CHECK: attributes #[[ATTR2]] = { "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-flat-scratch-init" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "amdgpu-waves-per-eu"="4,8" "target-cpu"="gfx90a" "uniform-work-group-size"="false" }
 ; CHECK: attributes #[[ATTR3:[0-9]+]] = { "amdgpu-waves-per-eu"="4,8" "target-cpu"="gfx90a" "uniform-work-group-size"="false" }
 ; CHECK: attributes #[[ATTR4]] = { "target-cpu"="gfx90a" "uniform-work-group-size"="false" }
-; CHECK: attributes #[[ATTR5]] = { "amdgpu-no-agpr" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "amdgpu-waves-per-eu"="4,8" "target-cpu"="gfx90a" "uniform-work-group-size"="false" }
+; CHECK: attributes #[[ATTR5]] = { "amdgpu-no-agpr" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-flat-scratch-init" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "amdgpu-waves-per-eu"="4,8" "target-cpu"="gfx90a" "uniform-work-group-size"="false" }
 ; CHECK: attributes #[[ATTR6:[0-9]+]] = { convergent nocallback nofree nosync nounwind willreturn memory(none) "target-cpu"="gfx90a" }
-; CHECK: attributes #[[ATTR7:[0-9]+]] = { nocallback nofree nosync nounwind speculatable willreturn memory(none) "target-cpu"="gfx90a" }
-; CHECK: attributes #[[ATTR8:[0-9]+]] = { nocallback nofree nounwind willreturn memory(argmem: readwrite) "target-cpu"="gfx90a" }
-; CHECK: attributes #[[ATTR9]] = { "amdgpu-no-agpr" }
+; CHECK: attributes #[[ATTR7]] = { "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "target-cpu"="gfx90a" "uniform-work-group-size"="false" }
+; CHECK: attributes #[[ATTR8:[0-9]+]] = { nocallback nofree nosync nounwind speculatable willreturn memory(none) "target-cpu"="gfx90a" }
+; CHECK: attributes #[[ATTR9:[0-9]+]] = { nocallback nofree nounwind willreturn memory(argmem: readwrite) "target-cpu"="gfx90a" }
+; CHECK: attributes #[[ATTR10]] = { "amdgpu-no-agpr" }
 ;.
diff --git a/llvm/test/CodeGen/AMDGPU/annotate-kernel-features-hsa-call.ll b/llvm/test/CodeGen/AMDGPU/annotate-kern...
[truncated]

Comment on lines 437 to 461
bool HasAllocaOrASCast = false;
for (BasicBlock &BB : *F) {
for (Instruction &I : BB) {
if (isa<AllocaInst>(I) || isa<AddrSpaceCastInst>(I)) {
HasAllocaOrASCast = true;
removeAssumedBits(FLAT_SCRATCH_INIT);
break;
}
}
if (HasAllocaOrASCast)
break;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You shouldn't need to do anything in initialize

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At some point, you need to look at the non-call instructions.
It makes sense to do that in update if we ever do value simplification or dead code elimination as part of the pass. Otherwise, init is fine, I think.

Comment on lines 704 to 705
const Value *CalleeOp = CB.getCalledOperand();
const Function *Callee = dyn_cast<Function>(CalleeOp);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can hide the cast with getCalledFunction

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a subtle difference between using getCalledFuunction() and using getCalledOperand(). In the following example,

define i32 @use_dispatch_ptr_ret_type() #1 {
  %dispatch.ptr = call ptr addrspace(4) @llvm.amdgcn.dispatch.ptr()
  store volatile ptr addrspace(4) %dispatch.ptr, ptr addrspace(1) undef
  ret i32 0
}

define float @func_indirect_use_dispatch_ptr_constexpr_cast_func() #1 {
  %f = call float @use_dispatch_ptr_ret_type()
  %fadd = fadd float %f, 1.0
  ret float %fadd
}

Note that callee's return type is i32 but caller casts it to float. Because of this, getCalledFunction() would return nullptr and eventually lead to incorrect analysis result.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it returns null when the call type mismatches the callee type. This is UB and doesn't matter; you don't need optimal handling. In the real world the call would have been replaced with unreachable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arsenm So you are saying that we should treat such cases as if the call didn't exist?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They should be treated like indirect calls

const Value *CalleeOp = CB.getCalledOperand();
const Function *Callee = dyn_cast<Function>(CalleeOp);
if (!Callee) // indirect call
return CB.isInlineAsm();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In an ideal world users would have to mark the flat_scr use, but they probably won't

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should replace all this handling, starting from here till the end of this lambda, with a call to

 /// Check \p Pred on all potential Callees of \p CB.
  ///
  /// This method will evaluate \p Pred with all potential callees of \p CB as
  /// input and return true if \p Pred does. If some callees might be unknown
  /// this function will return false.
  bool checkForAllCallees(
      function_ref<bool(ArrayRef<const Function *> Callees)> Pred,
      const AbstractAttribute &QueryingAA, const CallBase &CB);

This will work for indirect calls as well.

Copy link
Contributor Author

@jwanggit86 jwanggit86 Jun 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jdoerfert I'm not sure I follow. This checkForAllCallees() function requires a pred. So we wrap the existing code of the lambda in a new lambda, and pass the new lambda to checkFroAllCallees()? Also, could you pls explain a little what the potential problem is without using this function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jdoerfert ping here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of checking for the call instructions with checkForAllCallLikeInstructions, you instead use checkForAllCallees for all direct and possible indirect call candidates (not 100% sure how this interacts with asm). That way you don't have to concern yourself with indirect calls, since any possible call targets should also appear in the callees

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we were to use callForAllCallees(), based on my understanding, the code would look something like this:

   bool needFlatScratchInit(Attributor &A) {
     // This is called on each callee; false means callee shouldn't have
     // no-flat-scratch-init.
     auto CheckForNoFlatScratchInit = [&](Instruction &I) {
       const auto &CB = cast<CallBase>(I);
-      const Value *CalleeOp = CB.getCalledOperand();
-      const Function *Callee = dyn_cast<Function>(CalleeOp);
-      if (!Callee) // indirect call
-        return CB.isInlineAsm();
-
-      if (Callee->isIntrinsic())
-        return Callee->getIntrinsicID() != Intrinsic::amdgcn_addrspacecast_nonnull;
-
-      const auto *CalleeInfo = A.getAAFor<AAAMDAttributes>(
-          *this, IRPosition::function(*Callee), DepClassTy::REQUIRED);
-      return CalleeInfo && CalleeInfo->isAssumed(FLAT_SCRATCH_INIT);
+      auto PredOnCallees = [&](ArrayRef<const Function *> Callees) {
+        bool Ret = true;
+        for (const Function *Callee : Callees) {
+          if (!Callee) { // indirect call
+            // non-asm indirect call is already handled in updateImpl()
+            assert(CB.isInlineAsm());
+            continue;
+          }
+
+          if (Callee->isIntrinsic()) {
+            Ret &= (Callee->getIntrinsicID() != Intrinsic::amdgcn_addrspacecast_nonnull);
+            if (!Ret)
+              return false;
+            continue;
+          }
+
+          const auto *CalleeInfo = A.getAAFor<AAAMDAttributes>(
+              *this, IRPosition::function(*Callee), DepClassTy::REQUIRED);
+          Ret &= (CalleeInfo && CalleeInfo->isAssumed(FLAT_SCRATCH_INIT));
+          if (!Ret)
+            return false;
+        }
+        return Ret;
+      };
+
+      return A.checkForAllCallees(PredOnCallees, *this, CB);
     };

     bool UsedAssumedInformation = false;
    // If any callee is false (i.e. need FlatScratchInit),
    // checkForAllCallLikeInstructions returns false, in which case this
    // function returns true.
    return !A.checkForAllCallLikeInstructions(CheckForNoFlatScratchInit, *this,
                                              UsedAssumedInformation);
  }

Here, in the lambda CheckForNoFlatScratchInit, a new lambda is created (PredOnCallees), and it simply calls checkForAllCallees() passing the new lambda.
@arsenm @jdoerfert Pls let me know if this is what you had in mind.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume you shouldn't need to handle a null callee in the callback. Also, I don't think you have the CallBase here.

It's easier to review an actual patch, so submit a new one using checkForAllCallees?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code has been updated after fixing a bug regarding indirect calls of known callees. I'm not sure using checkForAllCallees() is necessary for 2 reasons: (1) code is now very simple, boiling down to just checking for intrinsics (2) the first part of AAAMDAttributesFunction::updateImpl() already does similar work as checkForAllCallees():

  ChangeStatus updateImpl(Attributor &A) override {
    ...
    // Check for Intrinsics and propagate attributes.
    const AACallEdges *AAEdges = A.getAAFor<AACallEdges>(
        *this, this->getIRPosition(), DepClassTy::REQUIRED);
    if (!AAEdges || AAEdges->hasNonAsmUnknownCallee())
      return indicatePessimisticFixpoint();
...
    for (Function *Callee : AAEdges->getOptimisticEdges()) { // Jun - what checkForAllCallees() does is essentially calling getOptimisticEdges() and then running the pred on the edges
      Intrinsic::ID IID = Callee->getIntrinsicID();
      if (IID == Intrinsic::not_intrinsic) {
        const AAAMDAttributes *AAAMD = A.getAAFor<AAAMDAttributes>(
            *this, IRPosition::function(*Callee), DepClassTy::REQUIRED);
        if (!AAAMD)
          return indicatePessimisticFixpoint();
        *this &= *AAAMD;
        continue;
      }
...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the other hand, a patch where checkForAllCallees() is used is as follows:

   bool needFlatScratchInit(Attributor &A) {
     assert(isAssumed(FLAT_SCRATCH_INIT)); // only called if the bit is still set
     auto CheckForNoFlatScratchInit = [&](Instruction &I) {
       const auto &CB = cast<CallBase>(I);
-      const Function *Callee = CB.getCalledFunction();
-
-      if (Callee && Callee->isIntrinsic())
-        return Callee->getIntrinsicID() !=
-               Intrinsic::amdgcn_addrspacecast_nonnull;
-
+      auto PredOnCallees = [&](ArrayRef<const Function *> Callees) {
+        for (const Function *Callee : Callees) {
+          if (Callee->isIntrinsic())
+            if (Callee->getIntrinsicID() == Intrinsic::amdgcn_addrspacecast_nonnull)
+              return false;
+        }
+        return true;
+      };
+      if(!A.checkForAllCallees(PredOnCallees, *this, CB))
+        return CB.isInlineAsm();
       return true;
     };

     bool UsedAssumedInformation = false;
     return !A.checkForAllCallLikeInstructions(CheckForNoFlatScratchInit, *this,

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point of using chekcForAllCallees is it also works for indirect calls, not that it makes the code simpler or better. You can do that in a follow up patch if you really want, but you should still use it

// to look for allocas.
const bool HasStackObjects = F.hasFnAttribute("amdgpu-stack-objects");
const bool NoFlatScratchInit =
F.hasFnAttribute("amdgpu-no-flat-scratch-init");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can sink this down to the use

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

;
; GFX10-LABEL: define amdgpu_kernel void @without_alloca_cc_kernel(i1 %arg0)
; GFX10-SAME: #[[ATTR_GFX10_NOFSI2:[0-9]+]]
store volatile i1 %arg0, ptr addrspace(1) undef
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't use undef, use null instead

;
; GFX10-LABEL: define amdgpu_kernel void @with_region_to_flat_addrspacecast_cc_kernel(ptr addrspace(2) %ptr)
; GFX10-SAME: #[[ATTR_GFX10_NO_NOFSI2:[0-9]+]]
%stof = addrspacecast ptr addrspace(2) %ptr to ptr
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also add a test with the amdgcn addrspacecast nonnull intrinsic

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

bool UsedAssumedInformation = false;
// If any callee is false (i.e. need FlatScratchInit),
// checkForAllCallLikeInstructions returns false
return !A.checkForAllCallLikeInstructions(CheckForNoFlatScratchInit, *this,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should look more like how the queue pointer is handled. This is just a slightly more complicated version of checkForQueuePtr. The instruction walk you put in initialize should be handled by checkForAllInstructions looking for addrspacecast

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing it like QueuePtr is certainly more "canonical'. Seems more preferable than doing it in initialize(), although the difference won't be noticeable unless the attributor is also simplifying the program at the same time.

const Value *CalleeOp = CB.getCalledOperand();
const Function *Callee = dyn_cast<Function>(CalleeOp);
if (!Callee) // indirect call
return CB.isInlineAsm();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should replace all this handling, starting from here till the end of this lambda, with a call to

 /// Check \p Pred on all potential Callees of \p CB.
  ///
  /// This method will evaluate \p Pred with all potential callees of \p CB as
  /// input and return true if \p Pred does. If some callees might be unknown
  /// this function will return false.
  bool checkForAllCallees(
      function_ref<bool(ArrayRef<const Function *> Callees)> Pred,
      const AbstractAttribute &QueryingAA, const CallBase &CB);

This will work for indirect calls as well.

@jwanggit86 jwanggit86 force-pushed the set-no-flat-scratch-init-in-amdgpu-attributor branch from b28a77a to 46097b1 Compare August 18, 2024 00:26
@jwanggit86 jwanggit86 requested review from arsenm and jdoerfert August 18, 2024 00:45
@@ -433,6 +433,13 @@ struct AAAMDAttributesFunction : public AAAMDAttributes {
indicatePessimisticFixpoint();
return;
}

for (Instruction &I : instructions(F)) {
if (isa<AllocaInst>(I) || isa<AddrSpaceCastInst>(I)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only addrspacecasts from address space 5 require flat_scr. Just an alloca does not imply flat_scr is required, only if the alloca is ever cast to flat.

I suppose for robustness you could double check the alloca address space is correct

Copy link
Contributor Author

@jwanggit86 jwanggit86 Aug 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arsenm Did you mean this:

@@ -435,7 +435,7 @@ struct AAAMDAttributesFunction : public AAAMDAttributes {
     }

     for (Instruction &I : instructions(F)) {
-      if (isa<AllocaInst>(I) || isa<AddrSpaceCastInst>(I)) {
+      if (isa<AddrSpaceCastInst>(I) && static_cast<AddrSpaceCastInst&>(I).getSrcAddressSpace() == AMDGPUAS::PRIVATE_ADDRESS) {
         removeAssumedBits(FLAT_SCRATCH_INIT);
         return;
       }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approximately yes, but you should use dyn_cast and never static_cast

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approximately yes, but you should use dyn_cast and never static_cast

Doesn't the check isa<AddrSpaceCastInst>(I) already make sure that the instruction is a AddrSpaceCastInst?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arsenm ping.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static_cast is a C++ feature, but rarely used in LLVM code. Instead we use dyn_cast or cast. You should be doing something like this:

if (auto *AddrCastInst = dyn_cast<AddrSpaceCastInst>(&I)) { ... }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. This has been updated.

const auto &CB = cast<CallBase>(I);
const Function *Callee = CB.getCalledFunction();

if (Callee && Callee->isIntrinsic())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't need to check isIntrinsic, getIntrinsicID will just be not_intrinsic for an arbitrary call

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point.

// call, and (3)indirect call with known callees. For (2) and (3)
// updateImpl() already checked the callees and we know their
// FLAT_SCRATCH_INIT bit is set.
return true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can just return the boolean expression above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Improved a little.

Comment on lines 1057 to 1062
// FIXME: Should have analysis or something rather than attribute to detect
// calls.
const bool HasCalls = F.hasFnAttribute("amdgpu-calls");
// FIXME: This attribute is a hack, we just need an analysis on the function
// to look for allocas.
const bool HasStackObjects = F.hasFnAttribute("amdgpu-stack-objects");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the interest of making the test changes smaller, I think this part should be done in a separate patch (which also removes AMDGPUAnnotateKernelFeatures at the same time). This patch should just do the pure inference.

Although now that the attributor is moved, I'm worried about passes between amdgpuattributor and codegen introducing new addrspacecasts

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is done now. Changes to AMDGPUAnnotateKernelFeatures will be in a separate PR.

@jwanggit86 jwanggit86 force-pushed the set-no-flat-scratch-init-in-amdgpu-attributor branch from 46097b1 to 19aedb9 Compare September 4, 2024 21:24
@llvmbot llvmbot added the clang Clang issues not falling into any other category label Sep 4, 2024
@jwanggit86 jwanggit86 requested a review from arsenm September 4, 2024 23:59
@@ -434,6 +434,15 @@ struct AAAMDAttributesFunction : public AAAMDAttributes {
indicatePessimisticFixpoint();
return;
}

for (Instruction &I : instructions(F)) {
if (isa<AddrSpaceCastInst>(I) &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a nightmare of an edge case, addrspacecasts from private to flat can exist somewhere in constant expressions. For now, as long as addrspace(5) globals are forbidden, this would only be valid with literal addresses.

I'm not sure how defined we should consider that case.

But if you follow along with the queue pointer handling, it will work. It already has to handle the 3->0 case in constant expressions

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arsenm Could you please give me an example of a constant expression having an addrSpaceCast?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arsenm Following constants to see if they contain addrSpaceCast is now done. An example is: store i32 7, ptr addrspace(3) addrspacecast (ptr addrspace(5) null to ptr addrspace(3)).
However, I'm not sure it's required or even correct. For the above example, opt with -O2 would optimize away the addrspacecast, and the result would be the opposite.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5->3 is an illegal address space cast, but the round trip cast can fold away. You don't want the cast back to the original address space.

Copy link
Contributor

@arsenm arsenm Sep 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simple example, where the cast is still directly the operand. It could be further nested inside another constant expression https://godbolt.org/z/cM6q78dnb

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test has been added.

// already removed in updateImpl() and execution won't reach here.
if (!Callee)
return true;
else
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No else after return

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines +701 to +718
auto CheckForNoFlatScratchInit = [&](Instruction &I) {
const auto &CB = cast<CallBase>(I);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would hope FroAllCallLikeInstructions would have a CallBase typed argument to begin with

@jwanggit86 jwanggit86 requested a review from arsenm September 19, 2024 02:07
The AMDGPUAnnotateKernelFeatures pass infers the "amdgpu-calls" and
"amdgpu-stack-objects" attributes, which are used to infer whether we need to
initialize flat scratch. This is, however, not precise. Instead, we should use
AMDGPUAttributor and infer amdgpu-no-flat-scratch-init on kernels.
Refer to llvm#63586 .
minor code change based on reviews (3) fix test files.
Those code changes will be in a follow-up PR.
This undo is simply achieved by merging code from upstream because
a recent commit has changed that file.
The changes therein will be included in a separate PR.
@jwanggit86 jwanggit86 force-pushed the set-no-flat-scratch-init-in-amdgpu-attributor branch from 74d9ef1 to 09012f4 Compare October 7, 2024 17:42
Comment on lines 741 to 761
bool constHasASCast(const Constant *C,
SmallPtrSetImpl<const Constant *> &Visited) {
if (!Visited.insert(C).second)
return false;

if (const auto *CE = dyn_cast<ConstantExpr>(C))
if (CE->getOpcode() == Instruction::AddrSpaceCast &&
CE->getOperand(0)->getType()->getPointerAddressSpace() ==
AMDGPUAS::PRIVATE_ADDRESS)
return true;

for (const Use &U : C->operands()) {
const auto *OpC = dyn_cast<Constant>(U);
if (!OpC || !Visited.insert(OpC).second)
continue;

if (constHasASCast(OpC, Visited))
return true;
}
return false;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not want to duplicate the same function that already exists for the LDS case. Unify these.

We also should try to avoid doing this walk over all instructions through all constant expressions twice for the two attributes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arsenm What is the LDS case? LDS_KERNEL_ID? Are you saying there's another attribute that checks ConstantExpr for AddrSpaceCast?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found getConstantAccess(), which might be what you were referring to. I'm looking into it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not want to duplicate the same function that already exists for the LDS case. Unify these.

We also should try to avoid doing this walk over all instructions through all constant expressions twice for the two attributes

The latest commit creates a new function in AMDGPUInformationCache that makes use of the existing getConstantAccess(), so the code in getConstantAccess() is not duplicated. However, the two attributes still walk over all instructions to check on constants separately. To unify these two walks would require the walk to happen earlier and the results (all the constants) be collected. Pls let me know your thoughts on this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arsenm ping here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arsenm Any more comments?


SmallPtrSet<const Constant *, 8> VisitedConsts;

for (Instruction &I : instructions(F)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should use checkForAllInstructions instead of manually looking at all instructions

@jwanggit86 jwanggit86 requested review from arsenm and shiltian October 30, 2024 17:44
@@ -262,6 +262,18 @@ class AMDGPUInformationCache : public InformationCache {
return !HasAperture && (Access & ADDR_SPACE_CAST);
}

bool constHasASCastFromPrivate(const Constant *C, Function &Fn) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: function starts with a verb

// Check all AddrSpaceCast instructions. FlatScratchInit is needed if
// there is a cast from PRIVATE_ADDRESS.
auto AddrSpaceCastNotFromPrivate = [&](Instruction &I) {
return static_cast<AddrSpaceCastInst &>(I).getSrcAddressSpace() !=
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cast<...> might be more proper here

@jwanggit86 jwanggit86 requested a review from shiltian November 1, 2024 18:00
// Check for addrSpaceCast from PRIVATE_ADDRESS in constant expressions
auto &InfoCache = static_cast<AMDGPUInformationCache &>(A.getInfoCache());

Function *F = getAssociatedFunction();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to merge this into AddrSpaceCastNotFromPrivate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AddrSpaceCastNotFromPrivate is a predicate used in the checks on the AddrSpaceCast instructions. The for-loop on the other hand checks all the constants. It's not clear to me how these two can be merged.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that is not ideal, because A.checkForAllInstructions uses liveness analysis as well, so it is able to skip dead instructions, while the explicit iteration over instructions doesn't, but indeed here we are lack of an interface that just go through all instructions w/o checking op codes.

if (!Callee)
return true;

return Callee->getIntrinsicID() !=
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, this attribute should propagate from callee to caller, so you will need to check all function calls, and ask Attributor whether the callee needs it or not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, this attribute should propagate from callee to caller, so you will need to check all function calls, and ask Attributor whether the callee needs it or not.

Callees are already checked at the beginning of updateImpl() (See the for-loop at lines 475-494). When needFlatScratchInit() is reached, only inline asm and intrinsics are left to be further checked.

@jwanggit86 jwanggit86 requested a review from shiltian November 4, 2024 23:45
Copy link
Contributor

@shiltian shiltian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. As a follow-up, it might be a good idea to add an extra interface to the attributor framework to walk through all instructions w/o op code filter, and let the existing one to call it.

@jwanggit86
Copy link
Contributor Author

I plan to submit this PR next week by Nov 26. Pls let me know if you have additional comments.

@jwanggit86 jwanggit86 merged commit e6aec2c into llvm:main Dec 4, 2024
8 checks passed

bool UsedAssumedInformation = false;
if (!A.checkForAllInstructions(AddrSpaceCastNotFromPrivate, *this,
{Instruction::AddrSpaceCast},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't this handle the call case instead of a separate checkForAllCallLikeInstructions?

Alternatively, we should finally add the nonnull flag to addrspacecast

@llvm-ci
Copy link
Collaborator

llvm-ci commented Dec 4, 2024

LLVM Buildbot has detected a new failure on builder ml-opt-rel-x86-64 running on ml-opt-rel-x86-64-b1 while building clang,llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/185/builds/9635

Here is the relevant piece of the build log for the reference
Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM :: CodeGen/AMDGPU/attributor-flatscratchinit-globalisel.ll' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 2: /b/ml-opt-rel-x86-64-b1/build/bin/opt -S -O2 -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 /b/ml-opt-rel-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/attributor-flatscratchinit-globalisel.ll | /b/ml-opt-rel-x86-64-b1/build/bin/llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 -global-isel -stop-after=irtranslator | /b/ml-opt-rel-x86-64-b1/build/bin/FileCheck -check-prefixes=GFX10 /b/ml-opt-rel-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/attributor-flatscratchinit-globalisel.ll
+ /b/ml-opt-rel-x86-64-b1/build/bin/FileCheck -check-prefixes=GFX10 /b/ml-opt-rel-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/attributor-flatscratchinit-globalisel.ll
+ /b/ml-opt-rel-x86-64-b1/build/bin/opt -S -O2 -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 /b/ml-opt-rel-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/attributor-flatscratchinit-globalisel.ll
+ /b/ml-opt-rel-x86-64-b1/build/bin/llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 -global-isel -stop-after=irtranslator
/b/ml-opt-rel-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/attributor-flatscratchinit-globalisel.ll:521:15: error: GFX10-NEXT: expected string not found in input
; GFX10-NEXT: kernargSegmentPtr: { reg: '$sgpr6_sgpr7' }
              ^
<stdin>:4864:38: note: scanning from here
 dispatchPtr: { reg: '$sgpr4_sgpr5' }
                                     ^
<stdin>:4866:2: note: possible intended match here
 kernargSegmentPtr: { reg: '$sgpr8_sgpr9' }
 ^

Input file: <stdin>
Check file: /b/ml-opt-rel-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/attributor-flatscratchinit-globalisel.ll

-dump-input=help explains the following input dump.

Input was:
<<<<<<
            .
            .
            .
         4859:  stackPtrOffsetReg: '$sp_reg' 
         4860:  bytesInStackArgArea: 0 
         4861:  returnsVoid: true 
         4862:  argumentInfo: 
         4863:  privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' } 
         4864:  dispatchPtr: { reg: '$sgpr4_sgpr5' } 
next:521'0                                          X error: no match found
         4865:  queuePtr: { reg: '$sgpr6_sgpr7' } 
next:521'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
         4866:  kernargSegmentPtr: { reg: '$sgpr8_sgpr9' } 
next:521'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
next:521'1      ?                                           possible intended match
         4867:  dispatchID: { reg: '$sgpr10_sgpr11' } 
next:521'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
         4868:  flatScratchInit: { reg: '$sgpr12_sgpr13' } 
next:521'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
         4869:  workGroupIDX: { reg: '$sgpr14' } 
next:521'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
         4870:  workGroupIDY: { reg: '$sgpr15' } 
next:521'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
         4871:  workGroupIDZ: { reg: '$sgpr16' } 
...

@llvm-ci
Copy link
Collaborator

llvm-ci commented Dec 4, 2024

LLVM Buildbot has detected a new failure on builder ml-opt-dev-x86-64 running on ml-opt-dev-x86-64-b1 while building clang,llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/137/builds/9759

Here is the relevant piece of the build log for the reference
Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM :: CodeGen/AMDGPU/attributor-flatscratchinit-globalisel.ll' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 2: /b/ml-opt-dev-x86-64-b1/build/bin/opt -S -O2 -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 /b/ml-opt-dev-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/attributor-flatscratchinit-globalisel.ll | /b/ml-opt-dev-x86-64-b1/build/bin/llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 -global-isel -stop-after=irtranslator | /b/ml-opt-dev-x86-64-b1/build/bin/FileCheck -check-prefixes=GFX10 /b/ml-opt-dev-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/attributor-flatscratchinit-globalisel.ll
+ /b/ml-opt-dev-x86-64-b1/build/bin/llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 -global-isel -stop-after=irtranslator
+ /b/ml-opt-dev-x86-64-b1/build/bin/opt -S -O2 -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 /b/ml-opt-dev-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/attributor-flatscratchinit-globalisel.ll
+ /b/ml-opt-dev-x86-64-b1/build/bin/FileCheck -check-prefixes=GFX10 /b/ml-opt-dev-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/attributor-flatscratchinit-globalisel.ll
/b/ml-opt-dev-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/attributor-flatscratchinit-globalisel.ll:521:15: error: GFX10-NEXT: expected string not found in input
; GFX10-NEXT: kernargSegmentPtr: { reg: '$sgpr6_sgpr7' }
              ^
<stdin>:4864:38: note: scanning from here
 dispatchPtr: { reg: '$sgpr4_sgpr5' }
                                     ^
<stdin>:4866:2: note: possible intended match here
 kernargSegmentPtr: { reg: '$sgpr8_sgpr9' }
 ^

Input file: <stdin>
Check file: /b/ml-opt-dev-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/attributor-flatscratchinit-globalisel.ll

-dump-input=help explains the following input dump.

Input was:
<<<<<<
            .
            .
            .
         4859:  stackPtrOffsetReg: '$sp_reg' 
         4860:  bytesInStackArgArea: 0 
         4861:  returnsVoid: true 
         4862:  argumentInfo: 
         4863:  privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' } 
         4864:  dispatchPtr: { reg: '$sgpr4_sgpr5' } 
next:521'0                                          X error: no match found
         4865:  queuePtr: { reg: '$sgpr6_sgpr7' } 
next:521'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
         4866:  kernargSegmentPtr: { reg: '$sgpr8_sgpr9' } 
next:521'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
next:521'1      ?                                           possible intended match
         4867:  dispatchID: { reg: '$sgpr10_sgpr11' } 
next:521'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
         4868:  flatScratchInit: { reg: '$sgpr12_sgpr13' } 
next:521'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
         4869:  workGroupIDX: { reg: '$sgpr14' } 
next:521'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
         4870:  workGroupIDY: { reg: '$sgpr15' } 
next:521'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
         4871:  workGroupIDZ: { reg: '$sgpr16' } 
...

@llvm-ci
Copy link
Collaborator

llvm-ci commented Dec 4, 2024

LLVM Buildbot has detected a new failure on builder openmp-offload-sles-build-only running on rocm-worker-hw-04-sles while building clang,llvm at step 8 "Add check check-llvm".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/140/builds/12341

Here is the relevant piece of the build log for the reference
Step 8 (Add check check-llvm) failure: test (failure)
******************** TEST 'LLVM :: CodeGen/AMDGPU/attributor-flatscratchinit-globalisel.ll' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 2: /home/botworker/bbot/builds/openmp-offload-sles-build/llvm.build/bin/opt -S -O2 -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 /home/botworker/bbot/builds/openmp-offload-sles-build/llvm.src/llvm/test/CodeGen/AMDGPU/attributor-flatscratchinit-globalisel.ll | /home/botworker/bbot/builds/openmp-offload-sles-build/llvm.build/bin/llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 -global-isel -stop-after=irtranslator | /home/botworker/bbot/builds/openmp-offload-sles-build/llvm.build/bin/FileCheck -check-prefixes=GFX10 /home/botworker/bbot/builds/openmp-offload-sles-build/llvm.src/llvm/test/CodeGen/AMDGPU/attributor-flatscratchinit-globalisel.ll
+ /home/botworker/bbot/builds/openmp-offload-sles-build/llvm.build/bin/FileCheck -check-prefixes=GFX10 /home/botworker/bbot/builds/openmp-offload-sles-build/llvm.src/llvm/test/CodeGen/AMDGPU/attributor-flatscratchinit-globalisel.ll
+ /home/botworker/bbot/builds/openmp-offload-sles-build/llvm.build/bin/opt -S -O2 -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 /home/botworker/bbot/builds/openmp-offload-sles-build/llvm.src/llvm/test/CodeGen/AMDGPU/attributor-flatscratchinit-globalisel.ll
+ /home/botworker/bbot/builds/openmp-offload-sles-build/llvm.build/bin/llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 -global-isel -stop-after=irtranslator
/home/botworker/bbot/builds/openmp-offload-sles-build/llvm.src/llvm/test/CodeGen/AMDGPU/attributor-flatscratchinit-globalisel.ll:521:15: error: GFX10-NEXT: expected string not found in input
; GFX10-NEXT: kernargSegmentPtr: { reg: '$sgpr6_sgpr7' }
              ^
<stdin>:4864:38: note: scanning from here
 dispatchPtr: { reg: '$sgpr4_sgpr5' }
                                     ^
<stdin>:4866:2: note: possible intended match here
 kernargSegmentPtr: { reg: '$sgpr8_sgpr9' }
 ^

Input file: <stdin>
Check file: /home/botworker/bbot/builds/openmp-offload-sles-build/llvm.src/llvm/test/CodeGen/AMDGPU/attributor-flatscratchinit-globalisel.ll

-dump-input=help explains the following input dump.

Input was:
<<<<<<
            .
            .
            .
         4859:  stackPtrOffsetReg: '$sp_reg' 
         4860:  bytesInStackArgArea: 0 
         4861:  returnsVoid: true 
         4862:  argumentInfo: 
         4863:  privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' } 
         4864:  dispatchPtr: { reg: '$sgpr4_sgpr5' } 
next:521'0                                          X error: no match found
         4865:  queuePtr: { reg: '$sgpr6_sgpr7' } 
next:521'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
         4866:  kernargSegmentPtr: { reg: '$sgpr8_sgpr9' } 
next:521'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
next:521'1      ?                                           possible intended match
         4867:  dispatchID: { reg: '$sgpr10_sgpr11' } 
next:521'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
         4868:  flatScratchInit: { reg: '$sgpr12_sgpr13' } 
next:521'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
         4869:  workGroupIDX: { reg: '$sgpr14' } 
next:521'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
         4870:  workGroupIDY: { reg: '$sgpr15' } 
next:521'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
         4871:  workGroupIDZ: { reg: '$sgpr16' } 
...

@llvm-ci
Copy link
Collaborator

llvm-ci commented Dec 4, 2024

LLVM Buildbot has detected a new failure on builder ml-opt-devrel-x86-64 running on ml-opt-devrel-x86-64-b1 while building clang,llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/175/builds/9633

Here is the relevant piece of the build log for the reference
Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM :: CodeGen/AMDGPU/attributor-flatscratchinit-globalisel.ll' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 2: /b/ml-opt-devrel-x86-64-b1/build/bin/opt -S -O2 -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 /b/ml-opt-devrel-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/attributor-flatscratchinit-globalisel.ll | /b/ml-opt-devrel-x86-64-b1/build/bin/llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 -global-isel -stop-after=irtranslator | /b/ml-opt-devrel-x86-64-b1/build/bin/FileCheck -check-prefixes=GFX10 /b/ml-opt-devrel-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/attributor-flatscratchinit-globalisel.ll
+ /b/ml-opt-devrel-x86-64-b1/build/bin/opt -S -O2 -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 /b/ml-opt-devrel-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/attributor-flatscratchinit-globalisel.ll
+ /b/ml-opt-devrel-x86-64-b1/build/bin/FileCheck -check-prefixes=GFX10 /b/ml-opt-devrel-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/attributor-flatscratchinit-globalisel.ll
+ /b/ml-opt-devrel-x86-64-b1/build/bin/llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 -global-isel -stop-after=irtranslator
/b/ml-opt-devrel-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/attributor-flatscratchinit-globalisel.ll:521:15: error: GFX10-NEXT: expected string not found in input
; GFX10-NEXT: kernargSegmentPtr: { reg: '$sgpr6_sgpr7' }
              ^
<stdin>:4864:38: note: scanning from here
 dispatchPtr: { reg: '$sgpr4_sgpr5' }
                                     ^
<stdin>:4866:2: note: possible intended match here
 kernargSegmentPtr: { reg: '$sgpr8_sgpr9' }
 ^

Input file: <stdin>
Check file: /b/ml-opt-devrel-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/attributor-flatscratchinit-globalisel.ll

-dump-input=help explains the following input dump.

Input was:
<<<<<<
            .
            .
            .
         4859:  stackPtrOffsetReg: '$sp_reg' 
         4860:  bytesInStackArgArea: 0 
         4861:  returnsVoid: true 
         4862:  argumentInfo: 
         4863:  privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' } 
         4864:  dispatchPtr: { reg: '$sgpr4_sgpr5' } 
next:521'0                                          X error: no match found
         4865:  queuePtr: { reg: '$sgpr6_sgpr7' } 
next:521'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
         4866:  kernargSegmentPtr: { reg: '$sgpr8_sgpr9' } 
next:521'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
next:521'1      ?                                           possible intended match
         4867:  dispatchID: { reg: '$sgpr10_sgpr11' } 
next:521'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
         4868:  flatScratchInit: { reg: '$sgpr12_sgpr13' } 
next:521'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
         4869:  workGroupIDX: { reg: '$sgpr14' } 
next:521'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
         4870:  workGroupIDY: { reg: '$sgpr15' } 
next:521'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
         4871:  workGroupIDZ: { reg: '$sgpr16' } 
...

@llvm-ci
Copy link
Collaborator

llvm-ci commented Dec 4, 2024

LLVM Buildbot has detected a new failure on builder clang-x86_64-debian-fast running on gribozavr4 while building clang,llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/56/builds/13726

Here is the relevant piece of the build log for the reference
Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM :: CodeGen/AMDGPU/attributor-flatscratchinit-globalisel.ll' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 2: /b/1/clang-x86_64-debian-fast/llvm.obj/bin/opt -S -O2 -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 /b/1/clang-x86_64-debian-fast/llvm.src/llvm/test/CodeGen/AMDGPU/attributor-flatscratchinit-globalisel.ll | /b/1/clang-x86_64-debian-fast/llvm.obj/bin/llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 -global-isel -stop-after=irtranslator | /b/1/clang-x86_64-debian-fast/llvm.obj/bin/FileCheck -check-prefixes=GFX10 /b/1/clang-x86_64-debian-fast/llvm.src/llvm/test/CodeGen/AMDGPU/attributor-flatscratchinit-globalisel.ll
+ /b/1/clang-x86_64-debian-fast/llvm.obj/bin/llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 -global-isel -stop-after=irtranslator
+ /b/1/clang-x86_64-debian-fast/llvm.obj/bin/FileCheck -check-prefixes=GFX10 /b/1/clang-x86_64-debian-fast/llvm.src/llvm/test/CodeGen/AMDGPU/attributor-flatscratchinit-globalisel.ll
+ /b/1/clang-x86_64-debian-fast/llvm.obj/bin/opt -S -O2 -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 /b/1/clang-x86_64-debian-fast/llvm.src/llvm/test/CodeGen/AMDGPU/attributor-flatscratchinit-globalisel.ll
/b/1/clang-x86_64-debian-fast/llvm.src/llvm/test/CodeGen/AMDGPU/attributor-flatscratchinit-globalisel.ll:521:15: error: GFX10-NEXT: expected string not found in input
; GFX10-NEXT: kernargSegmentPtr: { reg: '$sgpr6_sgpr7' }
              ^
<stdin>:4864:38: note: scanning from here
 dispatchPtr: { reg: '$sgpr4_sgpr5' }
                                     ^
<stdin>:4866:2: note: possible intended match here
 kernargSegmentPtr: { reg: '$sgpr8_sgpr9' }
 ^

Input file: <stdin>
Check file: /b/1/clang-x86_64-debian-fast/llvm.src/llvm/test/CodeGen/AMDGPU/attributor-flatscratchinit-globalisel.ll

-dump-input=help explains the following input dump.

Input was:
<<<<<<
            .
            .
            .
         4859:  stackPtrOffsetReg: '$sp_reg' 
         4860:  bytesInStackArgArea: 0 
         4861:  returnsVoid: true 
         4862:  argumentInfo: 
         4863:  privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' } 
         4864:  dispatchPtr: { reg: '$sgpr4_sgpr5' } 
next:521'0                                          X error: no match found
         4865:  queuePtr: { reg: '$sgpr6_sgpr7' } 
next:521'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
         4866:  kernargSegmentPtr: { reg: '$sgpr8_sgpr9' } 
next:521'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
next:521'1      ?                                           possible intended match
         4867:  dispatchID: { reg: '$sgpr10_sgpr11' } 
next:521'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
         4868:  flatScratchInit: { reg: '$sgpr12_sgpr13' } 
next:521'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
         4869:  workGroupIDX: { reg: '$sgpr14' } 
next:521'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
         4870:  workGroupIDY: { reg: '$sgpr15' } 
next:521'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
         4871:  workGroupIDZ: { reg: '$sgpr16' } 
...

@llvm-ci
Copy link
Collaborator

llvm-ci commented Dec 4, 2024

LLVM Buildbot has detected a new failure on builder llvm-x86_64-debian-dylib running on gribozavr4 while building clang,llvm at step 7 "test-build-unified-tree-check-llvm".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/60/builds/14375

Here is the relevant piece of the build log for the reference
Step 7 (test-build-unified-tree-check-llvm) failure: test (failure)
******************** TEST 'LLVM :: CodeGen/AMDGPU/attributor-flatscratchinit-globalisel.ll' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 2: /b/1/llvm-x86_64-debian-dylib/build/bin/opt -S -O2 -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 /b/1/llvm-x86_64-debian-dylib/llvm-project/llvm/test/CodeGen/AMDGPU/attributor-flatscratchinit-globalisel.ll | /b/1/llvm-x86_64-debian-dylib/build/bin/llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 -global-isel -stop-after=irtranslator | /b/1/llvm-x86_64-debian-dylib/build/bin/FileCheck -check-prefixes=GFX10 /b/1/llvm-x86_64-debian-dylib/llvm-project/llvm/test/CodeGen/AMDGPU/attributor-flatscratchinit-globalisel.ll
+ /b/1/llvm-x86_64-debian-dylib/build/bin/opt -S -O2 -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 /b/1/llvm-x86_64-debian-dylib/llvm-project/llvm/test/CodeGen/AMDGPU/attributor-flatscratchinit-globalisel.ll
+ /b/1/llvm-x86_64-debian-dylib/build/bin/FileCheck -check-prefixes=GFX10 /b/1/llvm-x86_64-debian-dylib/llvm-project/llvm/test/CodeGen/AMDGPU/attributor-flatscratchinit-globalisel.ll
+ /b/1/llvm-x86_64-debian-dylib/build/bin/llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 -global-isel -stop-after=irtranslator
/b/1/llvm-x86_64-debian-dylib/llvm-project/llvm/test/CodeGen/AMDGPU/attributor-flatscratchinit-globalisel.ll:521:15: error: GFX10-NEXT: expected string not found in input
; GFX10-NEXT: kernargSegmentPtr: { reg: '$sgpr6_sgpr7' }
              ^
<stdin>:4864:38: note: scanning from here
 dispatchPtr: { reg: '$sgpr4_sgpr5' }
                                     ^
<stdin>:4866:2: note: possible intended match here
 kernargSegmentPtr: { reg: '$sgpr8_sgpr9' }
 ^

Input file: <stdin>
Check file: /b/1/llvm-x86_64-debian-dylib/llvm-project/llvm/test/CodeGen/AMDGPU/attributor-flatscratchinit-globalisel.ll

-dump-input=help explains the following input dump.

Input was:
<<<<<<
            .
            .
            .
         4859:  stackPtrOffsetReg: '$sp_reg' 
         4860:  bytesInStackArgArea: 0 
         4861:  returnsVoid: true 
         4862:  argumentInfo: 
         4863:  privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' } 
         4864:  dispatchPtr: { reg: '$sgpr4_sgpr5' } 
next:521'0                                          X error: no match found
         4865:  queuePtr: { reg: '$sgpr6_sgpr7' } 
next:521'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
         4866:  kernargSegmentPtr: { reg: '$sgpr8_sgpr9' } 
next:521'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
next:521'1      ?                                           possible intended match
         4867:  dispatchID: { reg: '$sgpr10_sgpr11' } 
next:521'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
         4868:  flatScratchInit: { reg: '$sgpr12_sgpr13' } 
next:521'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
         4869:  workGroupIDX: { reg: '$sgpr14' } 
next:521'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
         4870:  workGroupIDY: { reg: '$sgpr15' } 
next:521'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
         4871:  workGroupIDZ: { reg: '$sgpr16' } 
...

@llvm-ci
Copy link
Collaborator

llvm-ci commented Dec 4, 2024

LLVM Buildbot has detected a new failure on builder llvm-clang-x86_64-expensive-checks-debian running on gribozavr4 while building clang,llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/16/builds/10095

Here is the relevant piece of the build log for the reference
Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM :: CodeGen/AMDGPU/attributor-flatscratchinit-globalisel.ll' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 2: /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/opt -S -O2 -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/CodeGen/AMDGPU/attributor-flatscratchinit-globalisel.ll | /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 -global-isel -stop-after=irtranslator | /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/FileCheck -check-prefixes=GFX10 /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/CodeGen/AMDGPU/attributor-flatscratchinit-globalisel.ll
+ /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 -global-isel -stop-after=irtranslator
+ /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/opt -S -O2 -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/CodeGen/AMDGPU/attributor-flatscratchinit-globalisel.ll
+ /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/FileCheck -check-prefixes=GFX10 /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/CodeGen/AMDGPU/attributor-flatscratchinit-globalisel.ll
/b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/CodeGen/AMDGPU/attributor-flatscratchinit-globalisel.ll:521:15: error: GFX10-NEXT: expected string not found in input
; GFX10-NEXT: kernargSegmentPtr: { reg: '$sgpr6_sgpr7' }
              ^
<stdin>:4864:38: note: scanning from here
 dispatchPtr: { reg: '$sgpr4_sgpr5' }
                                     ^
<stdin>:4866:2: note: possible intended match here
 kernargSegmentPtr: { reg: '$sgpr8_sgpr9' }
 ^

Input file: <stdin>
Check file: /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/CodeGen/AMDGPU/attributor-flatscratchinit-globalisel.ll

-dump-input=help explains the following input dump.

Input was:
<<<<<<
            .
            .
            .
         4859:  stackPtrOffsetReg: '$sp_reg' 
         4860:  bytesInStackArgArea: 0 
         4861:  returnsVoid: true 
         4862:  argumentInfo: 
         4863:  privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' } 
         4864:  dispatchPtr: { reg: '$sgpr4_sgpr5' } 
next:521'0                                          X error: no match found
         4865:  queuePtr: { reg: '$sgpr6_sgpr7' } 
next:521'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
         4866:  kernargSegmentPtr: { reg: '$sgpr8_sgpr9' } 
next:521'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
next:521'1      ?                                           possible intended match
         4867:  dispatchID: { reg: '$sgpr10_sgpr11' } 
next:521'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
         4868:  flatScratchInit: { reg: '$sgpr12_sgpr13' } 
next:521'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
         4869:  workGroupIDX: { reg: '$sgpr14' } 
next:521'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
         4870:  workGroupIDY: { reg: '$sgpr15' } 
next:521'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
         4871:  workGroupIDZ: { reg: '$sgpr16' } 
...

preames added a commit that referenced this pull request Dec 4, 2024
…UAttributor (#94647)"

This reverts commit e6aec2c.  Commit breaks "ninja check-llvm" on x86 host.
@llvm-ci
Copy link
Collaborator

llvm-ci commented Dec 5, 2024

LLVM Buildbot has detected a new failure on builder lld-x86_64-ubuntu-fast running on as-builder-4 while building clang,llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/33/builds/7687

Here is the relevant piece of the build log for the reference
Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM :: CodeGen/AMDGPU/attributor-flatscratchinit-globalisel.ll' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 2: /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/build/bin/opt -S -O2 -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/llvm-project/llvm/test/CodeGen/AMDGPU/attributor-flatscratchinit-globalisel.ll | /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/build/bin/llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 -global-isel -stop-after=irtranslator | /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/build/bin/FileCheck -check-prefixes=GFX10 /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/llvm-project/llvm/test/CodeGen/AMDGPU/attributor-flatscratchinit-globalisel.ll
+ /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/build/bin/opt -S -O2 -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/llvm-project/llvm/test/CodeGen/AMDGPU/attributor-flatscratchinit-globalisel.ll
+ /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/build/bin/llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 -global-isel -stop-after=irtranslator
+ /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/build/bin/FileCheck -check-prefixes=GFX10 /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/llvm-project/llvm/test/CodeGen/AMDGPU/attributor-flatscratchinit-globalisel.ll
/home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/llvm-project/llvm/test/CodeGen/AMDGPU/attributor-flatscratchinit-globalisel.ll:521:15: error: GFX10-NEXT: expected string not found in input
; GFX10-NEXT: kernargSegmentPtr: { reg: '$sgpr6_sgpr7' }
              ^
<stdin>:4864:38: note: scanning from here
 dispatchPtr: { reg: '$sgpr4_sgpr5' }
                                     ^
<stdin>:4866:2: note: possible intended match here
 kernargSegmentPtr: { reg: '$sgpr8_sgpr9' }
 ^

Input file: <stdin>
Check file: /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/llvm-project/llvm/test/CodeGen/AMDGPU/attributor-flatscratchinit-globalisel.ll

-dump-input=help explains the following input dump.

Input was:
<<<<<<
            .
            .
            .
         4859:  stackPtrOffsetReg: '$sp_reg' 
         4860:  bytesInStackArgArea: 0 
         4861:  returnsVoid: true 
         4862:  argumentInfo: 
         4863:  privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' } 
         4864:  dispatchPtr: { reg: '$sgpr4_sgpr5' } 
next:521'0                                          X error: no match found
         4865:  queuePtr: { reg: '$sgpr6_sgpr7' } 
next:521'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
         4866:  kernargSegmentPtr: { reg: '$sgpr8_sgpr9' } 
next:521'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
next:521'1      ?                                           possible intended match
         4867:  dispatchID: { reg: '$sgpr10_sgpr11' } 
next:521'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
         4868:  flatScratchInit: { reg: '$sgpr12_sgpr13' } 
next:521'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
         4869:  workGroupIDX: { reg: '$sgpr14' } 
next:521'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
         4870:  workGroupIDY: { reg: '$sgpr15' } 
next:521'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
         4871:  workGroupIDZ: { reg: '$sgpr16' } 
...

@llvm-ci
Copy link
Collaborator

llvm-ci commented Dec 5, 2024

LLVM Buildbot has detected a new failure on builder premerge-monolithic-linux running on premerge-linux-1 while building clang,llvm at step 7 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/153/builds/16586

Here is the relevant piece of the build log for the reference
Step 7 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM :: CodeGen/AMDGPU/attributor-flatscratchinit-globalisel.ll' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 2: /build/buildbot/premerge-monolithic-linux/build/bin/opt -S -O2 -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 /build/buildbot/premerge-monolithic-linux/llvm-project/llvm/test/CodeGen/AMDGPU/attributor-flatscratchinit-globalisel.ll | /build/buildbot/premerge-monolithic-linux/build/bin/llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 -global-isel -stop-after=irtranslator | /build/buildbot/premerge-monolithic-linux/build/bin/FileCheck -check-prefixes=GFX10 /build/buildbot/premerge-monolithic-linux/llvm-project/llvm/test/CodeGen/AMDGPU/attributor-flatscratchinit-globalisel.ll
+ /build/buildbot/premerge-monolithic-linux/build/bin/opt -S -O2 -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 /build/buildbot/premerge-monolithic-linux/llvm-project/llvm/test/CodeGen/AMDGPU/attributor-flatscratchinit-globalisel.ll
+ /build/buildbot/premerge-monolithic-linux/build/bin/llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 -global-isel -stop-after=irtranslator
+ /build/buildbot/premerge-monolithic-linux/build/bin/FileCheck -check-prefixes=GFX10 /build/buildbot/premerge-monolithic-linux/llvm-project/llvm/test/CodeGen/AMDGPU/attributor-flatscratchinit-globalisel.ll
/build/buildbot/premerge-monolithic-linux/llvm-project/llvm/test/CodeGen/AMDGPU/attributor-flatscratchinit-globalisel.ll:521:15: error: GFX10-NEXT: expected string not found in input
; GFX10-NEXT: kernargSegmentPtr: { reg: '$sgpr6_sgpr7' }
              ^
<stdin>:4864:38: note: scanning from here
 dispatchPtr: { reg: '$sgpr4_sgpr5' }
                                     ^
<stdin>:4866:2: note: possible intended match here
 kernargSegmentPtr: { reg: '$sgpr8_sgpr9' }
 ^

Input file: <stdin>
Check file: /build/buildbot/premerge-monolithic-linux/llvm-project/llvm/test/CodeGen/AMDGPU/attributor-flatscratchinit-globalisel.ll

-dump-input=help explains the following input dump.

Input was:
<<<<<<
            .
            .
            .
         4859:  stackPtrOffsetReg: '$sp_reg' 
         4860:  bytesInStackArgArea: 0 
         4861:  returnsVoid: true 
         4862:  argumentInfo: 
         4863:  privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' } 
         4864:  dispatchPtr: { reg: '$sgpr4_sgpr5' } 
next:521'0                                          X error: no match found
         4865:  queuePtr: { reg: '$sgpr6_sgpr7' } 
next:521'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
         4866:  kernargSegmentPtr: { reg: '$sgpr8_sgpr9' } 
next:521'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
next:521'1      ?                                           possible intended match
         4867:  dispatchID: { reg: '$sgpr10_sgpr11' } 
next:521'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
         4868:  flatScratchInit: { reg: '$sgpr12_sgpr13' } 
next:521'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
         4869:  workGroupIDX: { reg: '$sgpr14' } 
next:521'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
         4870:  workGroupIDY: { reg: '$sgpr15' } 
next:521'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
         4871:  workGroupIDZ: { reg: '$sgpr16' } 
...

jwanggit86 added a commit to jwanggit86/llvm-project that referenced this pull request Dec 6, 2024
jwanggit86 added a commit to jwanggit86/llvm-project that referenced this pull request Dec 6, 2024
…PUAttributor (llvm#94647)"

This reverts commit 1ef9410.

This fixes the test file attributor-flatscratchinit-globalisel.ll.
jwanggit86 added a commit that referenced this pull request Dec 10, 2024
…PUAttributor (#94647)" (#118907)

This reverts commit 1ef9410.

This fixes the test file attributor-flatscratchinit-globalisel.ll.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:AMDGPU clang Clang issues not falling into any other category llvm:globalisel
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants