Skip to content

[AMDGPU][GlobalISel] Properly handle lane op lowering for larger vector types #132358

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 15, 2025

Conversation

vikramRH
Copy link
Contributor

Fixes #128650

Also adds few previously existing permlane64 tests which somehow got removed in between..

@llvmbot
Copy link
Member

llvmbot commented Mar 21, 2025

@llvm/pr-subscribers-llvm-globalisel

@llvm/pr-subscribers-backend-amdgpu

Author: Vikram Hegde (vikramRH)

Changes

Fixes #128650

Also adds few previously existing permlane64 tests which somehow got removed in between..


Patch is 138.00 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/132358.diff

6 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp (+10-2)
  • (modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.permlane.ll (+1028)
  • (modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.permlane64.ll (+614-11)
  • (modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.readfirstlane.ll (+148-21)
  • (modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.readlane.ll (+168)
  • (modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.writelane.ll (+795)
diff --git a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
index b3a8183beeacf..158cd1bc60f46 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
@@ -5565,6 +5565,7 @@ bool AMDGPULegalizerInfo::legalizeLaneOp(LegalizerHelper &Helper,
     return false;
 
   LLT PartialResTy = LLT::scalar(SplitSize);
+  bool NeedsBitcast = false;
   if (Ty.isVector()) {
     LLT EltTy = Ty.getElementType();
     unsigned EltSize = EltTy.getSizeInBits();
@@ -5573,8 +5574,10 @@ bool AMDGPULegalizerInfo::legalizeLaneOp(LegalizerHelper &Helper,
     } else if (EltSize == 16 || EltSize == 32) {
       unsigned NElem = SplitSize / EltSize;
       PartialResTy = Ty.changeElementCount(ElementCount::getFixed(NElem));
+    } else {
+      // Handle all other cases via S32/S64 pieces
+      NeedsBitcast = true;
     }
-    // Handle all other cases via S32/S64 pieces;
   }
 
   SmallVector<Register, 4> PartialRes;
@@ -5600,7 +5603,12 @@ bool AMDGPULegalizerInfo::legalizeLaneOp(LegalizerHelper &Helper,
     PartialRes.push_back(createLaneOp(Src0, Src1, Src2, PartialResTy));
   }
 
-  B.buildMergeLikeInstr(DstReg, PartialRes);
+  if (NeedsBitcast)
+    B.buildBitcast(DstReg, B.buildMergeLikeInstr(
+                               LLT::scalar(Ty.getSizeInBits()), PartialRes));
+  else
+    B.buildMergeLikeInstr(DstReg, PartialRes);
+
   MI.eraseFromParent();
   return true;
 }
diff --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.permlane.ll b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.permlane.ll
index 076cf09678b57..65d27f97733e0 100644
--- a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.permlane.ll
+++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.permlane.ll
@@ -9430,3 +9430,1031 @@ define void @v_permlanex16_v8i16(ptr addrspace(1) %out, <8 x i16> %src0, i32 %sr
   store <8 x i16> %v, ptr addrspace(1) %out
   ret void
 }
+
+define void @v_permlane16_v2i64(ptr addrspace(1) %out, <2 x i64> %src0, i32 %src1, i32 %src2) {
+; GFX10-SDAG-LABEL: v_permlane16_v2i64:
+; GFX10-SDAG:       ; %bb.0:
+; GFX10-SDAG-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX10-SDAG-NEXT:    v_readfirstlane_b32 s4, v6
+; GFX10-SDAG-NEXT:    v_readfirstlane_b32 s5, v7
+; GFX10-SDAG-NEXT:    v_permlane16_b32 v5, v5, s4, s5
+; GFX10-SDAG-NEXT:    v_permlane16_b32 v4, v4, s4, s5
+; GFX10-SDAG-NEXT:    v_permlane16_b32 v3, v3, s4, s5
+; GFX10-SDAG-NEXT:    v_permlane16_b32 v2, v2, s4, s5
+; GFX10-SDAG-NEXT:    global_store_dwordx4 v[0:1], v[2:5], off
+; GFX10-SDAG-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX10-GISEL-LABEL: v_permlane16_v2i64:
+; GFX10-GISEL:       ; %bb.0:
+; GFX10-GISEL-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX10-GISEL-NEXT:    v_readfirstlane_b32 s4, v6
+; GFX10-GISEL-NEXT:    v_readfirstlane_b32 s5, v7
+; GFX10-GISEL-NEXT:    v_permlane16_b32 v2, v2, s4, s5
+; GFX10-GISEL-NEXT:    v_permlane16_b32 v3, v3, s4, s5
+; GFX10-GISEL-NEXT:    v_permlane16_b32 v4, v4, s4, s5
+; GFX10-GISEL-NEXT:    v_permlane16_b32 v5, v5, s4, s5
+; GFX10-GISEL-NEXT:    global_store_dwordx4 v[0:1], v[2:5], off
+; GFX10-GISEL-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX11-SDAG-LABEL: v_permlane16_v2i64:
+; GFX11-SDAG:       ; %bb.0:
+; GFX11-SDAG-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-SDAG-NEXT:    v_readfirstlane_b32 s0, v6
+; GFX11-SDAG-NEXT:    v_readfirstlane_b32 s1, v7
+; GFX11-SDAG-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; GFX11-SDAG-NEXT:    v_permlane16_b32 v5, v5, s0, s1
+; GFX11-SDAG-NEXT:    v_permlane16_b32 v4, v4, s0, s1
+; GFX11-SDAG-NEXT:    v_permlane16_b32 v3, v3, s0, s1
+; GFX11-SDAG-NEXT:    v_permlane16_b32 v2, v2, s0, s1
+; GFX11-SDAG-NEXT:    global_store_b128 v[0:1], v[2:5], off
+; GFX11-SDAG-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX11-GISEL-LABEL: v_permlane16_v2i64:
+; GFX11-GISEL:       ; %bb.0:
+; GFX11-GISEL-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-GISEL-NEXT:    v_readfirstlane_b32 s0, v6
+; GFX11-GISEL-NEXT:    v_readfirstlane_b32 s1, v7
+; GFX11-GISEL-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; GFX11-GISEL-NEXT:    v_permlane16_b32 v2, v2, s0, s1
+; GFX11-GISEL-NEXT:    v_permlane16_b32 v3, v3, s0, s1
+; GFX11-GISEL-NEXT:    v_permlane16_b32 v4, v4, s0, s1
+; GFX11-GISEL-NEXT:    v_permlane16_b32 v5, v5, s0, s1
+; GFX11-GISEL-NEXT:    global_store_b128 v[0:1], v[2:5], off
+; GFX11-GISEL-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX12-SDAG-LABEL: v_permlane16_v2i64:
+; GFX12-SDAG:       ; %bb.0:
+; GFX12-SDAG-NEXT:    s_wait_loadcnt_dscnt 0x0
+; GFX12-SDAG-NEXT:    s_wait_expcnt 0x0
+; GFX12-SDAG-NEXT:    s_wait_samplecnt 0x0
+; GFX12-SDAG-NEXT:    s_wait_bvhcnt 0x0
+; GFX12-SDAG-NEXT:    s_wait_kmcnt 0x0
+; GFX12-SDAG-NEXT:    v_readfirstlane_b32 s0, v6
+; GFX12-SDAG-NEXT:    v_readfirstlane_b32 s1, v7
+; GFX12-SDAG-NEXT:    s_wait_alu 0xf1ff
+; GFX12-SDAG-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; GFX12-SDAG-NEXT:    v_permlane16_b32 v5, v5, s0, s1
+; GFX12-SDAG-NEXT:    v_permlane16_b32 v4, v4, s0, s1
+; GFX12-SDAG-NEXT:    v_permlane16_b32 v3, v3, s0, s1
+; GFX12-SDAG-NEXT:    v_permlane16_b32 v2, v2, s0, s1
+; GFX12-SDAG-NEXT:    global_store_b128 v[0:1], v[2:5], off
+; GFX12-SDAG-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX12-GISEL-LABEL: v_permlane16_v2i64:
+; GFX12-GISEL:       ; %bb.0:
+; GFX12-GISEL-NEXT:    s_wait_loadcnt_dscnt 0x0
+; GFX12-GISEL-NEXT:    s_wait_expcnt 0x0
+; GFX12-GISEL-NEXT:    s_wait_samplecnt 0x0
+; GFX12-GISEL-NEXT:    s_wait_bvhcnt 0x0
+; GFX12-GISEL-NEXT:    s_wait_kmcnt 0x0
+; GFX12-GISEL-NEXT:    v_readfirstlane_b32 s0, v6
+; GFX12-GISEL-NEXT:    v_readfirstlane_b32 s1, v7
+; GFX12-GISEL-NEXT:    s_wait_alu 0xf1ff
+; GFX12-GISEL-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; GFX12-GISEL-NEXT:    v_permlane16_b32 v2, v2, s0, s1
+; GFX12-GISEL-NEXT:    v_permlane16_b32 v3, v3, s0, s1
+; GFX12-GISEL-NEXT:    v_permlane16_b32 v4, v4, s0, s1
+; GFX12-GISEL-NEXT:    v_permlane16_b32 v5, v5, s0, s1
+; GFX12-GISEL-NEXT:    global_store_b128 v[0:1], v[2:5], off
+; GFX12-GISEL-NEXT:    s_setpc_b64 s[30:31]
+  %v = call <2 x i64> @llvm.amdgcn.permlane16.v2i64(<2 x i64> %src0, <2 x i64> %src0, i32 %src1, i32 %src2, i1 false, i1 false)
+  store <2 x i64> %v, ptr addrspace(1) %out
+  ret void
+}
+
+define void @v_permlane16_v3i64(ptr addrspace(1) %out, <3 x i64> %src0, i32 %src1, i32 %src2) {
+; GFX10-SDAG-LABEL: v_permlane16_v3i64:
+; GFX10-SDAG:       ; %bb.0:
+; GFX10-SDAG-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX10-SDAG-NEXT:    v_readfirstlane_b32 s4, v8
+; GFX10-SDAG-NEXT:    v_readfirstlane_b32 s5, v9
+; GFX10-SDAG-NEXT:    v_permlane16_b32 v7, v7, s4, s5
+; GFX10-SDAG-NEXT:    v_permlane16_b32 v6, v6, s4, s5
+; GFX10-SDAG-NEXT:    v_permlane16_b32 v5, v5, s4, s5
+; GFX10-SDAG-NEXT:    v_permlane16_b32 v4, v4, s4, s5
+; GFX10-SDAG-NEXT:    v_permlane16_b32 v3, v3, s4, s5
+; GFX10-SDAG-NEXT:    v_permlane16_b32 v2, v2, s4, s5
+; GFX10-SDAG-NEXT:    global_store_dwordx2 v[0:1], v[6:7], off offset:16
+; GFX10-SDAG-NEXT:    global_store_dwordx4 v[0:1], v[2:5], off
+; GFX10-SDAG-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX10-GISEL-LABEL: v_permlane16_v3i64:
+; GFX10-GISEL:       ; %bb.0:
+; GFX10-GISEL-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX10-GISEL-NEXT:    v_readfirstlane_b32 s4, v8
+; GFX10-GISEL-NEXT:    v_readfirstlane_b32 s5, v9
+; GFX10-GISEL-NEXT:    v_permlane16_b32 v2, v2, s4, s5
+; GFX10-GISEL-NEXT:    v_permlane16_b32 v3, v3, s4, s5
+; GFX10-GISEL-NEXT:    v_permlane16_b32 v4, v4, s4, s5
+; GFX10-GISEL-NEXT:    v_permlane16_b32 v5, v5, s4, s5
+; GFX10-GISEL-NEXT:    v_permlane16_b32 v6, v6, s4, s5
+; GFX10-GISEL-NEXT:    v_permlane16_b32 v7, v7, s4, s5
+; GFX10-GISEL-NEXT:    global_store_dwordx4 v[0:1], v[2:5], off
+; GFX10-GISEL-NEXT:    global_store_dwordx2 v[0:1], v[6:7], off offset:16
+; GFX10-GISEL-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX11-SDAG-LABEL: v_permlane16_v3i64:
+; GFX11-SDAG:       ; %bb.0:
+; GFX11-SDAG-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-SDAG-NEXT:    v_readfirstlane_b32 s0, v8
+; GFX11-SDAG-NEXT:    v_readfirstlane_b32 s1, v9
+; GFX11-SDAG-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; GFX11-SDAG-NEXT:    v_permlane16_b32 v7, v7, s0, s1
+; GFX11-SDAG-NEXT:    v_permlane16_b32 v6, v6, s0, s1
+; GFX11-SDAG-NEXT:    v_permlane16_b32 v5, v5, s0, s1
+; GFX11-SDAG-NEXT:    v_permlane16_b32 v4, v4, s0, s1
+; GFX11-SDAG-NEXT:    v_permlane16_b32 v3, v3, s0, s1
+; GFX11-SDAG-NEXT:    v_permlane16_b32 v2, v2, s0, s1
+; GFX11-SDAG-NEXT:    s_clause 0x1
+; GFX11-SDAG-NEXT:    global_store_b64 v[0:1], v[6:7], off offset:16
+; GFX11-SDAG-NEXT:    global_store_b128 v[0:1], v[2:5], off
+; GFX11-SDAG-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX11-GISEL-LABEL: v_permlane16_v3i64:
+; GFX11-GISEL:       ; %bb.0:
+; GFX11-GISEL-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-GISEL-NEXT:    v_readfirstlane_b32 s0, v8
+; GFX11-GISEL-NEXT:    v_readfirstlane_b32 s1, v9
+; GFX11-GISEL-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; GFX11-GISEL-NEXT:    v_permlane16_b32 v2, v2, s0, s1
+; GFX11-GISEL-NEXT:    v_permlane16_b32 v3, v3, s0, s1
+; GFX11-GISEL-NEXT:    v_permlane16_b32 v4, v4, s0, s1
+; GFX11-GISEL-NEXT:    v_permlane16_b32 v5, v5, s0, s1
+; GFX11-GISEL-NEXT:    v_permlane16_b32 v6, v6, s0, s1
+; GFX11-GISEL-NEXT:    v_permlane16_b32 v7, v7, s0, s1
+; GFX11-GISEL-NEXT:    s_clause 0x1
+; GFX11-GISEL-NEXT:    global_store_b128 v[0:1], v[2:5], off
+; GFX11-GISEL-NEXT:    global_store_b64 v[0:1], v[6:7], off offset:16
+; GFX11-GISEL-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX12-SDAG-LABEL: v_permlane16_v3i64:
+; GFX12-SDAG:       ; %bb.0:
+; GFX12-SDAG-NEXT:    s_wait_loadcnt_dscnt 0x0
+; GFX12-SDAG-NEXT:    s_wait_expcnt 0x0
+; GFX12-SDAG-NEXT:    s_wait_samplecnt 0x0
+; GFX12-SDAG-NEXT:    s_wait_bvhcnt 0x0
+; GFX12-SDAG-NEXT:    s_wait_kmcnt 0x0
+; GFX12-SDAG-NEXT:    v_readfirstlane_b32 s0, v8
+; GFX12-SDAG-NEXT:    v_readfirstlane_b32 s1, v9
+; GFX12-SDAG-NEXT:    s_wait_alu 0xf1ff
+; GFX12-SDAG-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; GFX12-SDAG-NEXT:    v_permlane16_b32 v7, v7, s0, s1
+; GFX12-SDAG-NEXT:    v_permlane16_b32 v6, v6, s0, s1
+; GFX12-SDAG-NEXT:    v_permlane16_b32 v5, v5, s0, s1
+; GFX12-SDAG-NEXT:    v_permlane16_b32 v4, v4, s0, s1
+; GFX12-SDAG-NEXT:    v_permlane16_b32 v3, v3, s0, s1
+; GFX12-SDAG-NEXT:    v_permlane16_b32 v2, v2, s0, s1
+; GFX12-SDAG-NEXT:    s_clause 0x1
+; GFX12-SDAG-NEXT:    global_store_b64 v[0:1], v[6:7], off offset:16
+; GFX12-SDAG-NEXT:    global_store_b128 v[0:1], v[2:5], off
+; GFX12-SDAG-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX12-GISEL-LABEL: v_permlane16_v3i64:
+; GFX12-GISEL:       ; %bb.0:
+; GFX12-GISEL-NEXT:    s_wait_loadcnt_dscnt 0x0
+; GFX12-GISEL-NEXT:    s_wait_expcnt 0x0
+; GFX12-GISEL-NEXT:    s_wait_samplecnt 0x0
+; GFX12-GISEL-NEXT:    s_wait_bvhcnt 0x0
+; GFX12-GISEL-NEXT:    s_wait_kmcnt 0x0
+; GFX12-GISEL-NEXT:    v_readfirstlane_b32 s0, v8
+; GFX12-GISEL-NEXT:    v_readfirstlane_b32 s1, v9
+; GFX12-GISEL-NEXT:    s_wait_alu 0xf1ff
+; GFX12-GISEL-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; GFX12-GISEL-NEXT:    v_permlane16_b32 v2, v2, s0, s1
+; GFX12-GISEL-NEXT:    v_permlane16_b32 v3, v3, s0, s1
+; GFX12-GISEL-NEXT:    v_permlane16_b32 v4, v4, s0, s1
+; GFX12-GISEL-NEXT:    v_permlane16_b32 v5, v5, s0, s1
+; GFX12-GISEL-NEXT:    v_permlane16_b32 v6, v6, s0, s1
+; GFX12-GISEL-NEXT:    v_permlane16_b32 v7, v7, s0, s1
+; GFX12-GISEL-NEXT:    s_clause 0x1
+; GFX12-GISEL-NEXT:    global_store_b128 v[0:1], v[2:5], off
+; GFX12-GISEL-NEXT:    global_store_b64 v[0:1], v[6:7], off offset:16
+; GFX12-GISEL-NEXT:    s_setpc_b64 s[30:31]
+  %v = call <3 x i64> @llvm.amdgcn.permlane16.v3i64(<3 x i64> %src0, <3 x i64> %src0, i32 %src1, i32 %src2, i1 false, i1 false)
+  store <3 x i64> %v, ptr addrspace(1) %out
+  ret void
+}
+
+define void @v_permlane16_v4f64(ptr addrspace(1) %out, <4 x double> %src0, i32 %src1, i32 %src2) {
+; GFX10-SDAG-LABEL: v_permlane16_v4f64:
+; GFX10-SDAG:       ; %bb.0:
+; GFX10-SDAG-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX10-SDAG-NEXT:    v_readfirstlane_b32 s4, v10
+; GFX10-SDAG-NEXT:    v_readfirstlane_b32 s5, v11
+; GFX10-SDAG-NEXT:    v_permlane16_b32 v9, v9, s4, s5
+; GFX10-SDAG-NEXT:    v_permlane16_b32 v8, v8, s4, s5
+; GFX10-SDAG-NEXT:    v_permlane16_b32 v7, v7, s4, s5
+; GFX10-SDAG-NEXT:    v_permlane16_b32 v6, v6, s4, s5
+; GFX10-SDAG-NEXT:    v_permlane16_b32 v5, v5, s4, s5
+; GFX10-SDAG-NEXT:    v_permlane16_b32 v4, v4, s4, s5
+; GFX10-SDAG-NEXT:    v_permlane16_b32 v3, v3, s4, s5
+; GFX10-SDAG-NEXT:    v_permlane16_b32 v2, v2, s4, s5
+; GFX10-SDAG-NEXT:    global_store_dwordx4 v[0:1], v[6:9], off offset:16
+; GFX10-SDAG-NEXT:    global_store_dwordx4 v[0:1], v[2:5], off
+; GFX10-SDAG-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX10-GISEL-LABEL: v_permlane16_v4f64:
+; GFX10-GISEL:       ; %bb.0:
+; GFX10-GISEL-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX10-GISEL-NEXT:    v_readfirstlane_b32 s4, v10
+; GFX10-GISEL-NEXT:    v_readfirstlane_b32 s5, v11
+; GFX10-GISEL-NEXT:    v_permlane16_b32 v2, v2, s4, s5
+; GFX10-GISEL-NEXT:    v_permlane16_b32 v3, v3, s4, s5
+; GFX10-GISEL-NEXT:    v_permlane16_b32 v4, v4, s4, s5
+; GFX10-GISEL-NEXT:    v_permlane16_b32 v5, v5, s4, s5
+; GFX10-GISEL-NEXT:    v_permlane16_b32 v6, v6, s4, s5
+; GFX10-GISEL-NEXT:    v_permlane16_b32 v7, v7, s4, s5
+; GFX10-GISEL-NEXT:    v_permlane16_b32 v8, v8, s4, s5
+; GFX10-GISEL-NEXT:    v_permlane16_b32 v9, v9, s4, s5
+; GFX10-GISEL-NEXT:    global_store_dwordx4 v[0:1], v[2:5], off
+; GFX10-GISEL-NEXT:    global_store_dwordx4 v[0:1], v[6:9], off offset:16
+; GFX10-GISEL-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX11-SDAG-LABEL: v_permlane16_v4f64:
+; GFX11-SDAG:       ; %bb.0:
+; GFX11-SDAG-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-SDAG-NEXT:    v_readfirstlane_b32 s0, v10
+; GFX11-SDAG-NEXT:    v_readfirstlane_b32 s1, v11
+; GFX11-SDAG-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; GFX11-SDAG-NEXT:    v_permlane16_b32 v9, v9, s0, s1
+; GFX11-SDAG-NEXT:    v_permlane16_b32 v8, v8, s0, s1
+; GFX11-SDAG-NEXT:    v_permlane16_b32 v7, v7, s0, s1
+; GFX11-SDAG-NEXT:    v_permlane16_b32 v6, v6, s0, s1
+; GFX11-SDAG-NEXT:    v_permlane16_b32 v5, v5, s0, s1
+; GFX11-SDAG-NEXT:    v_permlane16_b32 v4, v4, s0, s1
+; GFX11-SDAG-NEXT:    v_permlane16_b32 v3, v3, s0, s1
+; GFX11-SDAG-NEXT:    v_permlane16_b32 v2, v2, s0, s1
+; GFX11-SDAG-NEXT:    s_clause 0x1
+; GFX11-SDAG-NEXT:    global_store_b128 v[0:1], v[6:9], off offset:16
+; GFX11-SDAG-NEXT:    global_store_b128 v[0:1], v[2:5], off
+; GFX11-SDAG-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX11-GISEL-LABEL: v_permlane16_v4f64:
+; GFX11-GISEL:       ; %bb.0:
+; GFX11-GISEL-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-GISEL-NEXT:    v_readfirstlane_b32 s0, v10
+; GFX11-GISEL-NEXT:    v_readfirstlane_b32 s1, v11
+; GFX11-GISEL-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; GFX11-GISEL-NEXT:    v_permlane16_b32 v2, v2, s0, s1
+; GFX11-GISEL-NEXT:    v_permlane16_b32 v3, v3, s0, s1
+; GFX11-GISEL-NEXT:    v_permlane16_b32 v4, v4, s0, s1
+; GFX11-GISEL-NEXT:    v_permlane16_b32 v5, v5, s0, s1
+; GFX11-GISEL-NEXT:    v_permlane16_b32 v6, v6, s0, s1
+; GFX11-GISEL-NEXT:    v_permlane16_b32 v7, v7, s0, s1
+; GFX11-GISEL-NEXT:    v_permlane16_b32 v8, v8, s0, s1
+; GFX11-GISEL-NEXT:    v_permlane16_b32 v9, v9, s0, s1
+; GFX11-GISEL-NEXT:    s_clause 0x1
+; GFX11-GISEL-NEXT:    global_store_b128 v[0:1], v[2:5], off
+; GFX11-GISEL-NEXT:    global_store_b128 v[0:1], v[6:9], off offset:16
+; GFX11-GISEL-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX12-SDAG-LABEL: v_permlane16_v4f64:
+; GFX12-SDAG:       ; %bb.0:
+; GFX12-SDAG-NEXT:    s_wait_loadcnt_dscnt 0x0
+; GFX12-SDAG-NEXT:    s_wait_expcnt 0x0
+; GFX12-SDAG-NEXT:    s_wait_samplecnt 0x0
+; GFX12-SDAG-NEXT:    s_wait_bvhcnt 0x0
+; GFX12-SDAG-NEXT:    s_wait_kmcnt 0x0
+; GFX12-SDAG-NEXT:    v_readfirstlane_b32 s0, v10
+; GFX12-SDAG-NEXT:    v_readfirstlane_b32 s1, v11
+; GFX12-SDAG-NEXT:    s_wait_alu 0xf1ff
+; GFX12-SDAG-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; GFX12-SDAG-NEXT:    v_permlane16_b32 v9, v9, s0, s1
+; GFX12-SDAG-NEXT:    v_permlane16_b32 v8, v8, s0, s1
+; GFX12-SDAG-NEXT:    v_permlane16_b32 v7, v7, s0, s1
+; GFX12-SDAG-NEXT:    v_permlane16_b32 v6, v6, s0, s1
+; GFX12-SDAG-NEXT:    v_permlane16_b32 v5, v5, s0, s1
+; GFX12-SDAG-NEXT:    v_permlane16_b32 v4, v4, s0, s1
+; GFX12-SDAG-NEXT:    v_permlane16_b32 v3, v3, s0, s1
+; GFX12-SDAG-NEXT:    v_permlane16_b32 v2, v2, s0, s1
+; GFX12-SDAG-NEXT:    s_clause 0x1
+; GFX12-SDAG-NEXT:    global_store_b128 v[0:1], v[6:9], off offset:16
+; GFX12-SDAG-NEXT:    global_store_b128 v[0:1], v[2:5], off
+; GFX12-SDAG-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX12-GISEL-LABEL: v_permlane16_v4f64:
+; GFX12-GISEL:       ; %bb.0:
+; GFX12-GISEL-NEXT:    s_wait_loadcnt_dscnt 0x0
+; GFX12-GISEL-NEXT:    s_wait_expcnt 0x0
+; GFX12-GISEL-NEXT:    s_wait_samplecnt 0x0
+; GFX12-GISEL-NEXT:    s_wait_bvhcnt 0x0
+; GFX12-GISEL-NEXT:    s_wait_kmcnt 0x0
+; GFX12-GISEL-NEXT:    v_readfirstlane_b32 s0, v10
+; GFX12-GISEL-NEXT:    v_readfirstlane_b32 s1, v11
+; GFX12-GISEL-NEXT:    s_wait_alu 0xf1ff
+; GFX12-GISEL-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; GFX12-GISEL-NEXT:    v_permlane16_b32 v2, v2, s0, s1
+; GFX12-GISEL-NEXT:    v_permlane16_b32 v3, v3, s0, s1
+; GFX12-GISEL-NEXT:    v_permlane16_b32 v4, v4, s0, s1
+; GFX12-GISEL-NEXT:    v_permlane16_b32 v5, v5, s0, s1
+; GFX12-GISEL-NEXT:    v_permlane16_b32 v6, v6, s0, s1
+; GFX12-GISEL-NEXT:    v_permlane16_b32 v7, v7, s0, s1
+; GFX12-GISEL-NEXT:    v_permlane16_b32 v8, v8, s0, s1
+; GFX12-GISEL-NEXT:    v_permlane16_b32 v9, v9, s0, s1
+; GFX12-GISEL-NEXT:    s_clause 0x1
+; GFX12-GISEL-NEXT:    global_store_b128 v[0:1], v[2:5], off
+; GFX12-GISEL-NEXT:    global_store_b128 v[0:1], v[6:9], off offset:16
+; GFX12-GISEL-NEXT:    s_setpc_b64 s[30:31]
+  %v = call <4 x double> @llvm.amdgcn.permlane16.v4f64(<4 x double> %src0, <4 x double> %src0, i32 %src1, i32 %src2, i1 false, i1 false)
+  store <4 x double> %v, ptr addrspace(1) %out
+  ret void
+}
+
+define void @v_permlane16_v8f64(ptr addrspace(1) %out, <8 x double> %src0, i32 %src1, i32 %src2) {
+; GFX10-SDAG-LABEL: v_permlane16_v8f64:
+; GFX10-SDAG:       ; %bb.0:
+; GFX10-SDAG-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX10-SDAG-NEXT:    v_readfirstlane_b32 s4, v18
+; GFX10-SDAG-NEXT:    v_readfirstlane_b32 s5, v19
+; GFX10-SDAG-NEXT:    v_permlane16_b32 v17, v17, s4, s5
+; GFX10-SDAG-NEXT:    v_permlane16_b32 v16, v16, s4, s5
+; GFX10-SDAG-NEXT:    v_permlane16_b32 v15, v15, s4, s5
+; GFX10-SDAG-NEXT:    v_permlane16_b32 v14, v14, s4, s5
+; GFX10-SDAG-NEXT:    v_permlane16_b32 v13, v13, s4, s5
+; GFX10-SDAG-NEXT:    v_permlane16_b32 v12, v12, s4, s5
+; GFX10-SDAG-NEXT:    v_permlane16_b32 v11, v11, s4, s5
+; GFX10-SDAG-NEXT:    v_permlane16_b32 v10, v10, s4, s5
+; GFX10-SDAG-NEXT:    v_permlane16_b32 v9, v9, s4, s5
+; GFX10-SDAG-NEXT:    v_permlane16_b32 v8, v8, s4, s5
+; GFX10-SDAG-NEXT:    v_permlane16_b32 v7, v7, s4, s5
+; GFX10-SDAG-NEXT:    v_permlane16_b32 v6, v6, s4, s5
+; GFX10-SDAG-NEXT:    v_permlane16_b32 v5, v5, s4, s5
+; GFX10-SDAG-NEXT:    v_permlane16_b32 v4, v4, s4, s5
+; GFX10-SDAG-NEXT:    v_permlane16_b32 v3, v3, s4, s5
+; GFX10-SDAG-NEXT:    v_permlane16_b32 v2, v2, s4, s5
+; GFX10-SDAG-NEXT:    global_store_dwordx4 v[0:1], v[14:17], off offset:48
+; GFX10-SDAG-NEXT:    global_store_dwordx4 v[0:1], v[10:13], off offset:32
+; GFX10-SDAG-NEXT:    global_store_dwordx4 v[0:1], v[6:9], off offset:16
+; GFX10-SDAG-NEXT:    global_store_dwordx4 v[0:1], v[2:5], off
+; GFX10-SDAG-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX10-GISEL-LABEL: v_permlane16_v8f64:
+; GFX10-GISEL:       ; %bb.0:
+; GFX10-GISEL-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX10-GISEL-NEXT:    v_readfirstlane_b32 s4, v18
+; GFX10-GISEL-NEXT:    v_readfirstlane_b32 s5, v19
+; GFX10-GISEL-NEXT:    v_permlane16_b32 v2, v2, s4, s5
+; GFX10-GISEL-NEXT:    v_permlane16_b32 v3, v3, s4, s5
+; GFX10-GISEL-NEXT:    v_permlane16_b32 v4, v4, s4, s5
+; GFX10-GISEL-NEXT:    v_permlane16_b32 v5, v5, s4, s5
+; GFX10-GISEL-NEXT:    v_permlane16_b32 v6, v6, s4, s5
+; GFX10-GISEL-NEXT:    v_permlane16_b32 v7, v7, s4, s5
+; GFX10-GISEL-NEXT:    v_permlane16_b32 v8, v8, s4, s5
+; GFX10-GISEL-NEXT:    v_permlane16_b32 v9, v9, s4, s5
+; GFX10-GISEL-NEXT:    v_permlane16_b32 v10, v10, s4, s5
+; GFX10-GISEL-NEXT:    v_permlane16_b32 v11, v11, s4, s5
+; GFX10-GISEL-NEXT:    v_permlane16_b32 v12, v12, s4, s5
+; GFX10-GISEL-NEXT:    v_permlan...
[truncated]

@vikramRH
Copy link
Contributor Author

Gentle Ping..

@vikramRH
Copy link
Contributor Author

vikramRH commented Apr 3, 2025

ping..

@vikramRH
Copy link
Contributor Author

vikramRH commented Apr 9, 2025

Ping @arsenm @jayfoad

@vikramRH vikramRH merged commit 62ef10a into llvm:main Apr 15, 2025
12 checks passed
@llvm-ci
Copy link
Collaborator

llvm-ci commented Apr 15, 2025

LLVM Buildbot has detected a new failure on builder ml-opt-rel-x86-64 running on ml-opt-rel-x86-64-b2 while building llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/185/builds/16692

Here is the relevant piece of the build log for the reference
Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM :: CodeGen/AMDGPU/llvm.amdgcn.writelane.ll' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
/b/ml-opt-rel-x86-64-b1/build/bin/llc -mtriple=amdgcn--amdhsa -mcpu=gfx802 -verify-machineinstrs < /b/ml-opt-rel-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.writelane.ll | /b/ml-opt-rel-x86-64-b1/build/bin/FileCheck -check-prefixes=GFX802-SDAG /b/ml-opt-rel-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.writelane.ll # RUN: at line 2
+ /b/ml-opt-rel-x86-64-b1/build/bin/llc -mtriple=amdgcn--amdhsa -mcpu=gfx802 -verify-machineinstrs
+ /b/ml-opt-rel-x86-64-b1/build/bin/FileCheck -check-prefixes=GFX802-SDAG /b/ml-opt-rel-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.writelane.ll
/b/ml-opt-rel-x86-64-b1/build/bin/llc -mtriple=amdgcn--amdhsa -mcpu=gfx1010 -verify-machineinstrs < /b/ml-opt-rel-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.writelane.ll | /b/ml-opt-rel-x86-64-b1/build/bin/FileCheck -check-prefixes=GFX1010-SDAG /b/ml-opt-rel-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.writelane.ll # RUN: at line 3
+ /b/ml-opt-rel-x86-64-b1/build/bin/llc -mtriple=amdgcn--amdhsa -mcpu=gfx1010 -verify-machineinstrs
+ /b/ml-opt-rel-x86-64-b1/build/bin/FileCheck -check-prefixes=GFX1010-SDAG /b/ml-opt-rel-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.writelane.ll
/b/ml-opt-rel-x86-64-b1/build/bin/llc -mtriple=amdgcn--amdhsa -mcpu=gfx1100 -verify-machineinstrs -amdgpu-enable-vopd=0 < /b/ml-opt-rel-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.writelane.ll | /b/ml-opt-rel-x86-64-b1/build/bin/FileCheck -check-prefixes=GFX1100-SDAG /b/ml-opt-rel-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.writelane.ll # RUN: at line 4
+ /b/ml-opt-rel-x86-64-b1/build/bin/FileCheck -check-prefixes=GFX1100-SDAG /b/ml-opt-rel-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.writelane.ll
+ /b/ml-opt-rel-x86-64-b1/build/bin/llc -mtriple=amdgcn--amdhsa -mcpu=gfx1100 -verify-machineinstrs -amdgpu-enable-vopd=0
/b/ml-opt-rel-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.writelane.ll:2870:22: error: GFX1100-SDAG-NEXT: expected string not found in input
; GFX1100-SDAG-NEXT: s_delay_alu instid0(VALU_DEP_2)
                     ^
<stdin>:2199:20: note: scanning from here
 s_waitcnt vmcnt(1)
                   ^
<stdin>:2207:2: note: possible intended match here
 s_clause 0x1
 ^

Input file: <stdin>
Check file: /b/ml-opt-rel-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.writelane.ll

-dump-input=help explains the following input dump.

Input was:
<<<<<<
             .
             .
             .
          2194:  v_readfirstlane_b32 s4, v4 
          2195:  v_readfirstlane_b32 s5, v3 
          2196:  v_readfirstlane_b32 s6, v2 
          2197:  v_readfirstlane_b32 s0, v7 
          2198:  v_readfirstlane_b32 s2, v6 
          2199:  s_waitcnt vmcnt(1) 
next:2870'0                        X error: no match found
          2200:  v_writelane_b32 v14, s0, s1 
next:2870'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          2201:  s_waitcnt vmcnt(0) 
next:2870'0     ~~~~~~~~~~~~~~~~~~~~
          2202:  v_writelane_b32 v12, s3, s1 
next:2870'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          2203:  v_writelane_b32 v11, s4, s1 
next:2870'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          2204:  v_writelane_b32 v10, s5, s1 
...

@llvm-ci
Copy link
Collaborator

llvm-ci commented Apr 15, 2025

LLVM Buildbot has detected a new failure on builder ml-opt-dev-x86-64 running on ml-opt-dev-x86-64-b2 while building llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/137/builds/16942

Here is the relevant piece of the build log for the reference
Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM :: CodeGen/AMDGPU/llvm.amdgcn.writelane.ll' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
/b/ml-opt-dev-x86-64-b1/build/bin/llc -mtriple=amdgcn--amdhsa -mcpu=gfx802 -verify-machineinstrs < /b/ml-opt-dev-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.writelane.ll | /b/ml-opt-dev-x86-64-b1/build/bin/FileCheck -check-prefixes=GFX802-SDAG /b/ml-opt-dev-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.writelane.ll # RUN: at line 2
+ /b/ml-opt-dev-x86-64-b1/build/bin/llc -mtriple=amdgcn--amdhsa -mcpu=gfx802 -verify-machineinstrs
+ /b/ml-opt-dev-x86-64-b1/build/bin/FileCheck -check-prefixes=GFX802-SDAG /b/ml-opt-dev-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.writelane.ll
/b/ml-opt-dev-x86-64-b1/build/bin/llc -mtriple=amdgcn--amdhsa -mcpu=gfx1010 -verify-machineinstrs < /b/ml-opt-dev-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.writelane.ll | /b/ml-opt-dev-x86-64-b1/build/bin/FileCheck -check-prefixes=GFX1010-SDAG /b/ml-opt-dev-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.writelane.ll # RUN: at line 3
+ /b/ml-opt-dev-x86-64-b1/build/bin/FileCheck -check-prefixes=GFX1010-SDAG /b/ml-opt-dev-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.writelane.ll
+ /b/ml-opt-dev-x86-64-b1/build/bin/llc -mtriple=amdgcn--amdhsa -mcpu=gfx1010 -verify-machineinstrs
/b/ml-opt-dev-x86-64-b1/build/bin/llc -mtriple=amdgcn--amdhsa -mcpu=gfx1100 -verify-machineinstrs -amdgpu-enable-vopd=0 < /b/ml-opt-dev-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.writelane.ll | /b/ml-opt-dev-x86-64-b1/build/bin/FileCheck -check-prefixes=GFX1100-SDAG /b/ml-opt-dev-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.writelane.ll # RUN: at line 4
+ /b/ml-opt-dev-x86-64-b1/build/bin/FileCheck -check-prefixes=GFX1100-SDAG /b/ml-opt-dev-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.writelane.ll
+ /b/ml-opt-dev-x86-64-b1/build/bin/llc -mtriple=amdgcn--amdhsa -mcpu=gfx1100 -verify-machineinstrs -amdgpu-enable-vopd=0
/b/ml-opt-dev-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.writelane.ll:2870:22: error: GFX1100-SDAG-NEXT: expected string not found in input
; GFX1100-SDAG-NEXT: s_delay_alu instid0(VALU_DEP_2)
                     ^
<stdin>:2199:20: note: scanning from here
 s_waitcnt vmcnt(1)
                   ^
<stdin>:2207:2: note: possible intended match here
 s_clause 0x1
 ^

Input file: <stdin>
Check file: /b/ml-opt-dev-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.writelane.ll

-dump-input=help explains the following input dump.

Input was:
<<<<<<
             .
             .
             .
          2194:  v_readfirstlane_b32 s4, v4 
          2195:  v_readfirstlane_b32 s5, v3 
          2196:  v_readfirstlane_b32 s6, v2 
          2197:  v_readfirstlane_b32 s0, v7 
          2198:  v_readfirstlane_b32 s2, v6 
          2199:  s_waitcnt vmcnt(1) 
next:2870'0                        X error: no match found
          2200:  v_writelane_b32 v14, s0, s1 
next:2870'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          2201:  s_waitcnt vmcnt(0) 
next:2870'0     ~~~~~~~~~~~~~~~~~~~~
          2202:  v_writelane_b32 v12, s3, s1 
next:2870'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          2203:  v_writelane_b32 v11, s4, s1 
next:2870'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          2204:  v_writelane_b32 v10, s5, s1 
...

@llvm-ci
Copy link
Collaborator

llvm-ci commented Apr 15, 2025

LLVM Buildbot has detected a new failure on builder premerge-monolithic-linux running on premerge-linux-1 while building llvm at step 7 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/153/builds/28836

Here is the relevant piece of the build log for the reference
Step 7 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM :: CodeGen/AMDGPU/llvm.amdgcn.permlane.ll' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
/build/buildbot/premerge-monolithic-linux/build/bin/llc -global-isel=0 -amdgpu-load-store-vectorizer=0 -mtriple=amdgcn -mcpu=gfx1010 -verify-machineinstrs < /build/buildbot/premerge-monolithic-linux/llvm-project/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.permlane.ll | /build/buildbot/premerge-monolithic-linux/build/bin/FileCheck -check-prefixes=GFX10,GFX10-SDAG /build/buildbot/premerge-monolithic-linux/llvm-project/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.permlane.ll # RUN: at line 2
+ /build/buildbot/premerge-monolithic-linux/build/bin/llc -global-isel=0 -amdgpu-load-store-vectorizer=0 -mtriple=amdgcn -mcpu=gfx1010 -verify-machineinstrs
+ /build/buildbot/premerge-monolithic-linux/build/bin/FileCheck -check-prefixes=GFX10,GFX10-SDAG /build/buildbot/premerge-monolithic-linux/llvm-project/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.permlane.ll
/build/buildbot/premerge-monolithic-linux/build/bin/llc -global-isel=1 -global-isel-abort=2 -amdgpu-load-store-vectorizer=0 -mtriple=amdgcn -mcpu=gfx1010 -verify-machineinstrs < /build/buildbot/premerge-monolithic-linux/llvm-project/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.permlane.ll | /build/buildbot/premerge-monolithic-linux/build/bin/FileCheck -check-prefixes=GFX10,GFX10-GISEL /build/buildbot/premerge-monolithic-linux/llvm-project/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.permlane.ll # RUN: at line 3
+ /build/buildbot/premerge-monolithic-linux/build/bin/llc -global-isel=1 -global-isel-abort=2 -amdgpu-load-store-vectorizer=0 -mtriple=amdgcn -mcpu=gfx1010 -verify-machineinstrs
+ /build/buildbot/premerge-monolithic-linux/build/bin/FileCheck -check-prefixes=GFX10,GFX10-GISEL /build/buildbot/premerge-monolithic-linux/llvm-project/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.permlane.ll
warning: Instruction selection used fallback path for v_permlane16_bfloat
warning: Instruction selection used fallback path for v_permlanex16_bfloat
/build/buildbot/premerge-monolithic-linux/build/bin/llc -global-isel=0 -amdgpu-load-store-vectorizer=0 -mtriple=amdgcn -mcpu=gfx1100 -verify-machineinstrs < /build/buildbot/premerge-monolithic-linux/llvm-project/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.permlane.ll | /build/buildbot/premerge-monolithic-linux/build/bin/FileCheck -check-prefixes=GFX11,GFX11-SDAG /build/buildbot/premerge-monolithic-linux/llvm-project/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.permlane.ll # RUN: at line 4
+ /build/buildbot/premerge-monolithic-linux/build/bin/llc -global-isel=0 -amdgpu-load-store-vectorizer=0 -mtriple=amdgcn -mcpu=gfx1100 -verify-machineinstrs
+ /build/buildbot/premerge-monolithic-linux/build/bin/FileCheck -check-prefixes=GFX11,GFX11-SDAG /build/buildbot/premerge-monolithic-linux/llvm-project/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.permlane.ll
/build/buildbot/premerge-monolithic-linux/build/bin/llc -global-isel=1 -global-isel-abort=2 -amdgpu-load-store-vectorizer=0 -mtriple=amdgcn -mcpu=gfx1100 -verify-machineinstrs < /build/buildbot/premerge-monolithic-linux/llvm-project/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.permlane.ll | /build/buildbot/premerge-monolithic-linux/build/bin/FileCheck -check-prefixes=GFX11,GFX11-GISEL /build/buildbot/premerge-monolithic-linux/llvm-project/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.permlane.ll # RUN: at line 5
+ /build/buildbot/premerge-monolithic-linux/build/bin/llc -global-isel=1 -global-isel-abort=2 -amdgpu-load-store-vectorizer=0 -mtriple=amdgcn -mcpu=gfx1100 -verify-machineinstrs
+ /build/buildbot/premerge-monolithic-linux/build/bin/FileCheck -check-prefixes=GFX11,GFX11-GISEL /build/buildbot/premerge-monolithic-linux/llvm-project/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.permlane.ll
warning: Instruction selection used fallback path for v_permlane16_bfloat
warning: Instruction selection used fallback path for v_permlanex16_bfloat
/build/buildbot/premerge-monolithic-linux/build/bin/llc -global-isel=0 -amdgpu-load-store-vectorizer=0 -mtriple=amdgcn -mcpu=gfx1200 -verify-machineinstrs < /build/buildbot/premerge-monolithic-linux/llvm-project/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.permlane.ll | /build/buildbot/premerge-monolithic-linux/build/bin/FileCheck -check-prefixes=GFX12,GFX12-SDAG /build/buildbot/premerge-monolithic-linux/llvm-project/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.permlane.ll # RUN: at line 6
+ /build/buildbot/premerge-monolithic-linux/build/bin/llc -global-isel=0 -amdgpu-load-store-vectorizer=0 -mtriple=amdgcn -mcpu=gfx1200 -verify-machineinstrs
+ /build/buildbot/premerge-monolithic-linux/build/bin/FileCheck -check-prefixes=GFX12,GFX12-SDAG /build/buildbot/premerge-monolithic-linux/llvm-project/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.permlane.ll
/build/buildbot/premerge-monolithic-linux/llvm-project/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.permlane.ll:9463:20: error: GFX12-SDAG-NEXT: expected string not found in input
; GFX12-SDAG-NEXT: s_delay_alu instid0(VALU_DEP_1)
                   ^
<stdin>:8217:19: note: scanning from here
 s_wait_alu 0xf1ff
                  ^
<stdin>:8223:2: note: possible intended match here
 s_setpc_b64 s[30:31]
 ^
/build/buildbot/premerge-monolithic-linux/llvm-project/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.permlane.ll:9568:20: error: GFX12-SDAG-NEXT: expected string not found in input
; GFX12-SDAG-NEXT: s_delay_alu instid0(VALU_DEP_1)
                   ^
<stdin>:8267:19: note: scanning from here
 s_wait_alu 0xf1ff
                  ^
<stdin>:8274:2: note: possible intended match here
 s_clause 0x1
 ^
/build/buildbot/premerge-monolithic-linux/llvm-project/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.permlane.ll:9689:20: error: GFX12-SDAG-NEXT: expected string not found in input
; GFX12-SDAG-NEXT: s_delay_alu instid0(VALU_DEP_1)
                   ^
<stdin>:8321:19: note: scanning from here
 s_wait_alu 0xf1ff
                  ^
<stdin>:8330:2: note: possible intended match here
 s_clause 0x1
...

kazutakahirata added a commit that referenced this pull request Apr 15, 2025
…ger vector types (#132358)"

This reverts commit 62ef10a.

Multiple buildbot failures have been reported:
#132358
@kazutakahirata
Copy link
Contributor

@vikramRH I've reverted this PR because of builtbot failures. I am wondering if some tests need updating because this PR was created a while ago. Anyway, I'm happy to try your revised patch. Thanks!

@llvm-ci
Copy link
Collaborator

llvm-ci commented Apr 15, 2025

LLVM Buildbot has detected a new failure on builder ml-opt-devrel-x86-64 running on ml-opt-devrel-x86-64-b1 while building llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/175/builds/16869

Here is the relevant piece of the build log for the reference
Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM :: CodeGen/AMDGPU/llvm.amdgcn.writelane.ll' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
/b/ml-opt-devrel-x86-64-b1/build/bin/llc -mtriple=amdgcn--amdhsa -mcpu=gfx802 -verify-machineinstrs < /b/ml-opt-devrel-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.writelane.ll | /b/ml-opt-devrel-x86-64-b1/build/bin/FileCheck -check-prefixes=GFX802-SDAG /b/ml-opt-devrel-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.writelane.ll # RUN: at line 2
+ /b/ml-opt-devrel-x86-64-b1/build/bin/FileCheck -check-prefixes=GFX802-SDAG /b/ml-opt-devrel-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.writelane.ll
+ /b/ml-opt-devrel-x86-64-b1/build/bin/llc -mtriple=amdgcn--amdhsa -mcpu=gfx802 -verify-machineinstrs
/b/ml-opt-devrel-x86-64-b1/build/bin/llc -mtriple=amdgcn--amdhsa -mcpu=gfx1010 -verify-machineinstrs < /b/ml-opt-devrel-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.writelane.ll | /b/ml-opt-devrel-x86-64-b1/build/bin/FileCheck -check-prefixes=GFX1010-SDAG /b/ml-opt-devrel-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.writelane.ll # RUN: at line 3
+ /b/ml-opt-devrel-x86-64-b1/build/bin/llc -mtriple=amdgcn--amdhsa -mcpu=gfx1010 -verify-machineinstrs
+ /b/ml-opt-devrel-x86-64-b1/build/bin/FileCheck -check-prefixes=GFX1010-SDAG /b/ml-opt-devrel-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.writelane.ll
/b/ml-opt-devrel-x86-64-b1/build/bin/llc -mtriple=amdgcn--amdhsa -mcpu=gfx1100 -verify-machineinstrs -amdgpu-enable-vopd=0 < /b/ml-opt-devrel-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.writelane.ll | /b/ml-opt-devrel-x86-64-b1/build/bin/FileCheck -check-prefixes=GFX1100-SDAG /b/ml-opt-devrel-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.writelane.ll # RUN: at line 4
+ /b/ml-opt-devrel-x86-64-b1/build/bin/FileCheck -check-prefixes=GFX1100-SDAG /b/ml-opt-devrel-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.writelane.ll
+ /b/ml-opt-devrel-x86-64-b1/build/bin/llc -mtriple=amdgcn--amdhsa -mcpu=gfx1100 -verify-machineinstrs -amdgpu-enable-vopd=0
/b/ml-opt-devrel-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.writelane.ll:2870:22: error: GFX1100-SDAG-NEXT: expected string not found in input
; GFX1100-SDAG-NEXT: s_delay_alu instid0(VALU_DEP_2)
                     ^
<stdin>:2199:20: note: scanning from here
 s_waitcnt vmcnt(1)
                   ^
<stdin>:2207:2: note: possible intended match here
 s_clause 0x1
 ^

Input file: <stdin>
Check file: /b/ml-opt-devrel-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.writelane.ll

-dump-input=help explains the following input dump.

Input was:
<<<<<<
             .
             .
             .
          2194:  v_readfirstlane_b32 s4, v4 
          2195:  v_readfirstlane_b32 s5, v3 
          2196:  v_readfirstlane_b32 s6, v2 
          2197:  v_readfirstlane_b32 s0, v7 
          2198:  v_readfirstlane_b32 s2, v6 
          2199:  s_waitcnt vmcnt(1) 
next:2870'0                        X error: no match found
          2200:  v_writelane_b32 v14, s0, s1 
next:2870'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          2201:  s_waitcnt vmcnt(0) 
next:2870'0     ~~~~~~~~~~~~~~~~~~~~
          2202:  v_writelane_b32 v12, s3, s1 
next:2870'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          2203:  v_writelane_b32 v11, s4, s1 
next:2870'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          2204:  v_writelane_b32 v10, s5, s1 
...

@vikramRH
Copy link
Contributor Author

@vikramRH I've reverted this PR because of builtbot failures. I am wondering if some tests need updating because this PR was created a while ago. Anyway, I'm happy to try your revised patch. Thanks!

@kazutakahirata , thanks, I will raise a revised PR

llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Apr 15, 2025
…ing for larger vector types (#132358)"

This reverts commit 62ef10a.

Multiple buildbot failures have been reported:
llvm/llvm-project#132358
@llvm-ci
Copy link
Collaborator

llvm-ci commented Apr 15, 2025

LLVM Buildbot has detected a new failure on builder llvm-clang-x86_64-expensive-checks-debian running on gribozavr4 while building llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/16/builds/17372

Here is the relevant piece of the build log for the reference
Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM :: CodeGen/AMDGPU/llvm.amdgcn.writelane.ll' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
/b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/llc -mtriple=amdgcn--amdhsa -mcpu=gfx802 -verify-machineinstrs < /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.writelane.ll | /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/FileCheck -check-prefixes=GFX802-SDAG /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.writelane.ll # RUN: at line 2
+ /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/llc -mtriple=amdgcn--amdhsa -mcpu=gfx802 -verify-machineinstrs
+ /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/FileCheck -check-prefixes=GFX802-SDAG /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.writelane.ll
/b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/llc -mtriple=amdgcn--amdhsa -mcpu=gfx1010 -verify-machineinstrs < /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.writelane.ll | /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/FileCheck -check-prefixes=GFX1010-SDAG /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.writelane.ll # RUN: at line 3
+ /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/FileCheck -check-prefixes=GFX1010-SDAG /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.writelane.ll
+ /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/llc -mtriple=amdgcn--amdhsa -mcpu=gfx1010 -verify-machineinstrs
/b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/llc -mtriple=amdgcn--amdhsa -mcpu=gfx1100 -verify-machineinstrs -amdgpu-enable-vopd=0 < /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.writelane.ll | /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/FileCheck -check-prefixes=GFX1100-SDAG /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.writelane.ll # RUN: at line 4
+ /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/llc -mtriple=amdgcn--amdhsa -mcpu=gfx1100 -verify-machineinstrs -amdgpu-enable-vopd=0
+ /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/FileCheck -check-prefixes=GFX1100-SDAG /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.writelane.ll
/b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.writelane.ll:2870:22: error: GFX1100-SDAG-NEXT: expected string not found in input
; GFX1100-SDAG-NEXT: s_delay_alu instid0(VALU_DEP_2)
                     ^
<stdin>:2199:20: note: scanning from here
 s_waitcnt vmcnt(1)
                   ^
<stdin>:2207:2: note: possible intended match here
 s_clause 0x1
 ^

Input file: <stdin>
Check file: /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.writelane.ll

-dump-input=help explains the following input dump.

Input was:
<<<<<<
             .
             .
             .
          2194:  v_readfirstlane_b32 s4, v4 
          2195:  v_readfirstlane_b32 s5, v3 
          2196:  v_readfirstlane_b32 s6, v2 
          2197:  v_readfirstlane_b32 s0, v7 
          2198:  v_readfirstlane_b32 s2, v6 
          2199:  s_waitcnt vmcnt(1) 
next:2870'0                        X error: no match found
          2200:  v_writelane_b32 v14, s0, s1 
next:2870'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          2201:  s_waitcnt vmcnt(0) 
next:2870'0     ~~~~~~~~~~~~~~~~~~~~
          2202:  v_writelane_b32 v12, s3, s1 
next:2870'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          2203:  v_writelane_b32 v11, s4, s1 
next:2870'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          2204:  v_writelane_b32 v10, s5, s1 
...

@llvm-ci
Copy link
Collaborator

llvm-ci commented Apr 15, 2025

LLVM Buildbot has detected a new failure on builder clang-x86_64-debian-fast running on gribozavr4 while building llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/56/builds/23451

Here is the relevant piece of the build log for the reference
Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM :: CodeGen/AMDGPU/llvm.amdgcn.writelane.ll' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
/b/1/clang-x86_64-debian-fast/llvm.obj/bin/llc -mtriple=amdgcn--amdhsa -mcpu=gfx802 -verify-machineinstrs < /b/1/clang-x86_64-debian-fast/llvm.src/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.writelane.ll | /b/1/clang-x86_64-debian-fast/llvm.obj/bin/FileCheck -check-prefixes=GFX802-SDAG /b/1/clang-x86_64-debian-fast/llvm.src/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.writelane.ll # RUN: at line 2
+ /b/1/clang-x86_64-debian-fast/llvm.obj/bin/FileCheck -check-prefixes=GFX802-SDAG /b/1/clang-x86_64-debian-fast/llvm.src/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.writelane.ll
+ /b/1/clang-x86_64-debian-fast/llvm.obj/bin/llc -mtriple=amdgcn--amdhsa -mcpu=gfx802 -verify-machineinstrs
/b/1/clang-x86_64-debian-fast/llvm.obj/bin/llc -mtriple=amdgcn--amdhsa -mcpu=gfx1010 -verify-machineinstrs < /b/1/clang-x86_64-debian-fast/llvm.src/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.writelane.ll | /b/1/clang-x86_64-debian-fast/llvm.obj/bin/FileCheck -check-prefixes=GFX1010-SDAG /b/1/clang-x86_64-debian-fast/llvm.src/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.writelane.ll # RUN: at line 3
+ /b/1/clang-x86_64-debian-fast/llvm.obj/bin/llc -mtriple=amdgcn--amdhsa -mcpu=gfx1010 -verify-machineinstrs
+ /b/1/clang-x86_64-debian-fast/llvm.obj/bin/FileCheck -check-prefixes=GFX1010-SDAG /b/1/clang-x86_64-debian-fast/llvm.src/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.writelane.ll
/b/1/clang-x86_64-debian-fast/llvm.obj/bin/llc -mtriple=amdgcn--amdhsa -mcpu=gfx1100 -verify-machineinstrs -amdgpu-enable-vopd=0 < /b/1/clang-x86_64-debian-fast/llvm.src/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.writelane.ll | /b/1/clang-x86_64-debian-fast/llvm.obj/bin/FileCheck -check-prefixes=GFX1100-SDAG /b/1/clang-x86_64-debian-fast/llvm.src/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.writelane.ll # RUN: at line 4
+ /b/1/clang-x86_64-debian-fast/llvm.obj/bin/FileCheck -check-prefixes=GFX1100-SDAG /b/1/clang-x86_64-debian-fast/llvm.src/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.writelane.ll
+ /b/1/clang-x86_64-debian-fast/llvm.obj/bin/llc -mtriple=amdgcn--amdhsa -mcpu=gfx1100 -verify-machineinstrs -amdgpu-enable-vopd=0
/b/1/clang-x86_64-debian-fast/llvm.src/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.writelane.ll:2870:22: error: GFX1100-SDAG-NEXT: expected string not found in input
; GFX1100-SDAG-NEXT: s_delay_alu instid0(VALU_DEP_2)
                     ^
<stdin>:2199:20: note: scanning from here
 s_waitcnt vmcnt(1)
                   ^
<stdin>:2207:2: note: possible intended match here
 s_clause 0x1
 ^

Input file: <stdin>
Check file: /b/1/clang-x86_64-debian-fast/llvm.src/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.writelane.ll

-dump-input=help explains the following input dump.

Input was:
<<<<<<
             .
             .
             .
          2194:  v_readfirstlane_b32 s4, v4 
          2195:  v_readfirstlane_b32 s5, v3 
          2196:  v_readfirstlane_b32 s6, v2 
          2197:  v_readfirstlane_b32 s0, v7 
          2198:  v_readfirstlane_b32 s2, v6 
          2199:  s_waitcnt vmcnt(1) 
next:2870'0                        X error: no match found
          2200:  v_writelane_b32 v14, s0, s1 
next:2870'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          2201:  s_waitcnt vmcnt(0) 
next:2870'0     ~~~~~~~~~~~~~~~~~~~~
          2202:  v_writelane_b32 v12, s3, s1 
next:2870'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          2203:  v_writelane_b32 v11, s4, s1 
next:2870'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          2204:  v_writelane_b32 v10, s5, s1 
...

@llvm-ci
Copy link
Collaborator

llvm-ci commented Apr 15, 2025

LLVM Buildbot has detected a new failure on builder llvm-x86_64-debian-dylib running on gribozavr4 while building llvm at step 7 "test-build-unified-tree-check-llvm".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/60/builds/24744

Here is the relevant piece of the build log for the reference
Step 7 (test-build-unified-tree-check-llvm) failure: test (failure)
******************** TEST 'LLVM :: CodeGen/AMDGPU/llvm.amdgcn.writelane.ll' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
/b/1/llvm-x86_64-debian-dylib/build/bin/llc -mtriple=amdgcn--amdhsa -mcpu=gfx802 -verify-machineinstrs < /b/1/llvm-x86_64-debian-dylib/llvm-project/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.writelane.ll | /b/1/llvm-x86_64-debian-dylib/build/bin/FileCheck -check-prefixes=GFX802-SDAG /b/1/llvm-x86_64-debian-dylib/llvm-project/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.writelane.ll # RUN: at line 2
+ /b/1/llvm-x86_64-debian-dylib/build/bin/FileCheck -check-prefixes=GFX802-SDAG /b/1/llvm-x86_64-debian-dylib/llvm-project/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.writelane.ll
+ /b/1/llvm-x86_64-debian-dylib/build/bin/llc -mtriple=amdgcn--amdhsa -mcpu=gfx802 -verify-machineinstrs
/b/1/llvm-x86_64-debian-dylib/build/bin/llc -mtriple=amdgcn--amdhsa -mcpu=gfx1010 -verify-machineinstrs < /b/1/llvm-x86_64-debian-dylib/llvm-project/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.writelane.ll | /b/1/llvm-x86_64-debian-dylib/build/bin/FileCheck -check-prefixes=GFX1010-SDAG /b/1/llvm-x86_64-debian-dylib/llvm-project/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.writelane.ll # RUN: at line 3
+ /b/1/llvm-x86_64-debian-dylib/build/bin/FileCheck -check-prefixes=GFX1010-SDAG /b/1/llvm-x86_64-debian-dylib/llvm-project/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.writelane.ll
+ /b/1/llvm-x86_64-debian-dylib/build/bin/llc -mtriple=amdgcn--amdhsa -mcpu=gfx1010 -verify-machineinstrs
/b/1/llvm-x86_64-debian-dylib/build/bin/llc -mtriple=amdgcn--amdhsa -mcpu=gfx1100 -verify-machineinstrs -amdgpu-enable-vopd=0 < /b/1/llvm-x86_64-debian-dylib/llvm-project/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.writelane.ll | /b/1/llvm-x86_64-debian-dylib/build/bin/FileCheck -check-prefixes=GFX1100-SDAG /b/1/llvm-x86_64-debian-dylib/llvm-project/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.writelane.ll # RUN: at line 4
+ /b/1/llvm-x86_64-debian-dylib/build/bin/FileCheck -check-prefixes=GFX1100-SDAG /b/1/llvm-x86_64-debian-dylib/llvm-project/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.writelane.ll
+ /b/1/llvm-x86_64-debian-dylib/build/bin/llc -mtriple=amdgcn--amdhsa -mcpu=gfx1100 -verify-machineinstrs -amdgpu-enable-vopd=0
/b/1/llvm-x86_64-debian-dylib/llvm-project/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.writelane.ll:2870:22: error: GFX1100-SDAG-NEXT: expected string not found in input
; GFX1100-SDAG-NEXT: s_delay_alu instid0(VALU_DEP_2)
                     ^
<stdin>:2199:20: note: scanning from here
 s_waitcnt vmcnt(1)
                   ^
<stdin>:2207:2: note: possible intended match here
 s_clause 0x1
 ^

Input file: <stdin>
Check file: /b/1/llvm-x86_64-debian-dylib/llvm-project/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.writelane.ll

-dump-input=help explains the following input dump.

Input was:
<<<<<<
             .
             .
             .
          2194:  v_readfirstlane_b32 s4, v4 
          2195:  v_readfirstlane_b32 s5, v3 
          2196:  v_readfirstlane_b32 s6, v2 
          2197:  v_readfirstlane_b32 s0, v7 
          2198:  v_readfirstlane_b32 s2, v6 
          2199:  s_waitcnt vmcnt(1) 
next:2870'0                        X error: no match found
          2200:  v_writelane_b32 v14, s0, s1 
next:2870'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          2201:  s_waitcnt vmcnt(0) 
next:2870'0     ~~~~~~~~~~~~~~~~~~~~
          2202:  v_writelane_b32 v12, s3, s1 
next:2870'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          2203:  v_writelane_b32 v11, s4, s1 
next:2870'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          2204:  v_writelane_b32 v10, s5, s1 
...

@llvm-ci
Copy link
Collaborator

llvm-ci commented Apr 15, 2025

LLVM Buildbot has detected a new failure on builder lld-x86_64-ubuntu-fast running on as-builder-4 while building llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/33/builds/14882

Here is the relevant piece of the build log for the reference
Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM :: CodeGen/AMDGPU/llvm.amdgcn.writelane.ll' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
/home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/build/bin/llc -mtriple=amdgcn--amdhsa -mcpu=gfx802 -verify-machineinstrs < /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/llvm-project/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.writelane.ll | /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/build/bin/FileCheck -check-prefixes=GFX802-SDAG /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/llvm-project/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.writelane.ll # RUN: at line 2
+ /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/build/bin/llc -mtriple=amdgcn--amdhsa -mcpu=gfx802 -verify-machineinstrs
+ /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/build/bin/FileCheck -check-prefixes=GFX802-SDAG /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/llvm-project/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.writelane.ll
/home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/build/bin/llc -mtriple=amdgcn--amdhsa -mcpu=gfx1010 -verify-machineinstrs < /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/llvm-project/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.writelane.ll | /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/build/bin/FileCheck -check-prefixes=GFX1010-SDAG /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/llvm-project/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.writelane.ll # RUN: at line 3
+ /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/build/bin/llc -mtriple=amdgcn--amdhsa -mcpu=gfx1010 -verify-machineinstrs
+ /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/build/bin/FileCheck -check-prefixes=GFX1010-SDAG /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/llvm-project/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.writelane.ll
/home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/build/bin/llc -mtriple=amdgcn--amdhsa -mcpu=gfx1100 -verify-machineinstrs -amdgpu-enable-vopd=0 < /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/llvm-project/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.writelane.ll | /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/build/bin/FileCheck -check-prefixes=GFX1100-SDAG /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/llvm-project/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.writelane.ll # RUN: at line 4
+ /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/build/bin/llc -mtriple=amdgcn--amdhsa -mcpu=gfx1100 -verify-machineinstrs -amdgpu-enable-vopd=0
+ /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/build/bin/FileCheck -check-prefixes=GFX1100-SDAG /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/llvm-project/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.writelane.ll
/home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/llvm-project/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.writelane.ll:2870:22: error: GFX1100-SDAG-NEXT: expected string not found in input
; GFX1100-SDAG-NEXT: s_delay_alu instid0(VALU_DEP_2)
                     ^
<stdin>:2199:20: note: scanning from here
 s_waitcnt vmcnt(1)
                   ^
<stdin>:2207:2: note: possible intended match here
 s_clause 0x1
 ^

Input file: <stdin>
Check file: /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/llvm-project/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.writelane.ll

-dump-input=help explains the following input dump.

Input was:
<<<<<<
             .
             .
             .
          2194:  v_readfirstlane_b32 s4, v4 
          2195:  v_readfirstlane_b32 s5, v3 
          2196:  v_readfirstlane_b32 s6, v2 
          2197:  v_readfirstlane_b32 s0, v7 
          2198:  v_readfirstlane_b32 s2, v6 
          2199:  s_waitcnt vmcnt(1) 
next:2870'0                        X error: no match found
          2200:  v_writelane_b32 v14, s0, s1 
next:2870'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          2201:  s_waitcnt vmcnt(0) 
next:2870'0     ~~~~~~~~~~~~~~~~~~~~
          2202:  v_writelane_b32 v12, s3, s1 
next:2870'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          2203:  v_writelane_b32 v11, s4, s1 
next:2870'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          2204:  v_writelane_b32 v10, s5, s1 
...

@vikramRH vikramRH deleted the laneop_gisel_fix branch April 15, 2025 07:41
vikramRH added a commit to vikramRH/llvm-project that referenced this pull request Apr 15, 2025
vikramRH added a commit that referenced this pull request Apr 16, 2025
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Apr 16, 2025
…ring for larger vector types (#132358)" (#135758)

reapply llvm/llvm-project#132358, tests updated.
var-const pushed a commit to ldionne/llvm-project that referenced this pull request Apr 17, 2025
…or types (llvm#132358)

Fixes llvm#128650

Also adds few previously existing permlane64 tests which somehow got
removed in between.
var-const pushed a commit to ldionne/llvm-project that referenced this pull request Apr 17, 2025
…ger vector types (llvm#132358)"

This reverts commit 62ef10a.

Multiple buildbot failures have been reported:
llvm#132358
var-const pushed a commit to ldionne/llvm-project that referenced this pull request Apr 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AMDGPU GlobalIsel mishandles readfirstlane lowering with 64-bit element vectors
5 participants