Skip to content

[AMDGPU][True16][CodeGen] True16 Add OpSel when optimizing exec mask #128928

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Feb 28, 2025

Conversation

broxigarchen
Copy link
Contributor

@broxigarchen broxigarchen commented Feb 26, 2025

True16 Add OpSel when optimizing exec mask

True16 VOPCX have the opsel argument. Add it when we create these
instructions in SIOptimizeExecMasking.

True16 VOPCX have the opsel argument. Add it when we create these
instructions in SIOptimizeExecMasking.
@broxigarchen broxigarchen marked this pull request as ready for review February 27, 2025 21:28
@broxigarchen broxigarchen changed the title True16 Add OpSel when optimizing exec mask [AMDGPU][True16][CodeGen] True16 Add OpSel when optimizing exec mask Feb 27, 2025
@llvmbot
Copy link
Member

llvmbot commented Feb 27, 2025

@llvm/pr-subscribers-backend-amdgpu

Author: Brox Chen (broxigarchen)

Changes

True16 Add OpSel when optimizing exec mask

True16 VOPCX have the opsel argument. Add it when we create these
instructions in SIOptimizeExecMasking.


Full diff: https://github.com/llvm/llvm-project/pull/128928.diff

2 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/SIOptimizeExecMasking.cpp (+2)
  • (added) llvm/test/CodeGen/AMDGPU/true16-saveexec.mir (+64)
diff --git a/llvm/lib/Target/AMDGPU/SIOptimizeExecMasking.cpp b/llvm/lib/Target/AMDGPU/SIOptimizeExecMasking.cpp
index 920c3e11e4718..745e4086bc7fe 100644
--- a/llvm/lib/Target/AMDGPU/SIOptimizeExecMasking.cpp
+++ b/llvm/lib/Target/AMDGPU/SIOptimizeExecMasking.cpp
@@ -632,6 +632,8 @@ bool SIOptimizeExecMasking::optimizeVCMPSaveExecSequence(
 
   TryAddImmediateValueFromNamedOperand(AMDGPU::OpName::clamp);
 
+  TryAddImmediateValueFromNamedOperand(AMDGPU::OpName::op_sel);
+
   // The kill flags may no longer be correct.
   if (Src0->isReg())
     MRI->clearKillFlags(Src0->getReg());
diff --git a/llvm/test/CodeGen/AMDGPU/true16-saveexec.mir b/llvm/test/CodeGen/AMDGPU/true16-saveexec.mir
new file mode 100644
index 0000000000000..07259a1d031f3
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/true16-saveexec.mir
@@ -0,0 +1,64 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 2
+# RUN: llc -march=amdgcn -mcpu=gfx1100 -mattr=+wavefrontsize64 -run-pass=si-optimize-exec-masking -verify-machineinstrs -o - %s | FileCheck %s
+
+---
+name:            int
+tracksRegLiveness: true
+body:             |
+  ; CHECK-LABEL: name: int
+  ; CHECK: bb.0:
+  ; CHECK-NEXT:   successors: %bb.1(0x80000000)
+  ; CHECK-NEXT:   liveins: $vgpr20
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT:   $sgpr0_sgpr1 = S_MOV_B64 $exec
+  ; CHECK-NEXT:   V_CMPX_LT_I16_t16_nosdst_e64 0, 15, 0, $vgpr20_lo16, 0, implicit-def $exec, implicit $exec
+  ; CHECK-NEXT:   renamable $sgpr0_sgpr1 = S_XOR_B64 $exec, killed renamable $sgpr0_sgpr1, implicit-def dead $scc
+  ; CHECK-NEXT:   S_CBRANCH_EXECZ %bb.1, implicit $exec
+  ; CHECK-NEXT:   S_BRANCH %bb.1
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: bb.1:
+  ; CHECK-NEXT:   S_ENDPGM 0
+  bb.1:
+    liveins: $vgpr20
+    $vcc = V_CMP_LT_I16_t16_e64 0, 15, 0, $vgpr20_lo16, 0, implicit $exec
+    renamable $sgpr0_sgpr1 = COPY $exec, implicit-def $exec
+    renamable $sgpr2_sgpr3 = S_AND_B64 renamable $sgpr0_sgpr1, killed $vcc, implicit-def dead $scc
+    renamable $sgpr0_sgpr1 = S_XOR_B64 renamable $sgpr2_sgpr3, killed renamable $sgpr0_sgpr1, implicit-def dead $scc
+    $exec = S_MOV_B64_term killed renamable $sgpr2_sgpr3
+    S_CBRANCH_EXECZ %bb.2, implicit $exec
+    S_BRANCH %bb.2
+
+  bb.2:
+    S_ENDPGM 0
+...
+
+---
+name:            float
+tracksRegLiveness: true
+body:             |
+  ; CHECK-LABEL: name: float
+  ; CHECK: bb.0:
+  ; CHECK-NEXT:   successors: %bb.1(0x80000000)
+  ; CHECK-NEXT:   liveins: $vgpr20
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT:   $sgpr0_sgpr1 = S_MOV_B64 $exec
+  ; CHECK-NEXT:   V_CMPX_LT_F16_t16_nosdst_e64 0, 15, 0, $vgpr20_lo16, 1, 0, implicit-def $exec, implicit $mode, implicit $exec
+  ; CHECK-NEXT:   renamable $sgpr0_sgpr1 = S_XOR_B64 $exec, killed renamable $sgpr0_sgpr1, implicit-def dead $scc
+  ; CHECK-NEXT:   S_CBRANCH_EXECZ %bb.1, implicit $exec
+  ; CHECK-NEXT:   S_BRANCH %bb.1
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: bb.1:
+  ; CHECK-NEXT:   S_ENDPGM 0
+  bb.1:
+    liveins: $vgpr20
+    $vcc = V_CMP_LT_F16_t16_e64 0, 15, 0, $vgpr20_lo16, 1, 0, implicit $exec, implicit $mode
+    renamable $sgpr0_sgpr1 = COPY $exec, implicit-def $exec
+    renamable $sgpr2_sgpr3 = S_AND_B64 renamable $sgpr0_sgpr1, killed $vcc, implicit-def dead $scc
+    renamable $sgpr0_sgpr1 = S_XOR_B64 renamable $sgpr2_sgpr3, killed renamable $sgpr0_sgpr1, implicit-def dead $scc
+    $exec = S_MOV_B64_term killed renamable $sgpr2_sgpr3
+    S_CBRANCH_EXECZ %bb.2, implicit $exec
+    S_BRANCH %bb.2
+
+  bb.2:
+    S_ENDPGM 0
+...

@broxigarchen broxigarchen merged commit db973ce into llvm:main Feb 28, 2025
5 of 9 checks passed
cheezeburglar pushed a commit to cheezeburglar/llvm-project that referenced this pull request Feb 28, 2025
…lvm#128928)

True16 Add OpSel when optimizing exec mask

True16 VOPCX have the opsel argument. Add it when we create these
instructions in SIOptimizeExecMasking.

---------

Co-authored-by: Matt Arsenault <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants