[AMDGPU][True16] fix a bug in codeGen causing e64 with wrong vgpr type to shrink (#102942)

broxigarchen · web-flow · commit 6b7afaa9db8f · 2024-08-12T17:03:05.000-04:00
This bug is introduced in #102198 The previous path change to use realTrue16 flag, however, we have some t16 instructions that are implemented with fake16, and has Lo128 registers types. Thus we should still using hasTrue16Bit flag for shrinking check --------- Co-authored-by: guochen2 <guochen2@amd.com>
diff --git a/llvm/lib/Target/AMDGPU/SIShrinkInstructions.cpp b/llvm/lib/Target/AMDGPU/SIShrinkInstructions.cpp
@@ -1048,7 +1048,7 @@ bool SIShrinkInstructions::runOnMachineFunction(MachineFunction &MF) {
               MachineFunctionProperties::Property::NoVRegs))
         continue;
 
-      if (ST->useRealTrue16Insts() && AMDGPU::isTrue16Inst(MI.getOpcode()) &&
+      if (ST->hasTrue16BitInsts() && AMDGPU::isTrue16Inst(MI.getOpcode()) &&
           !shouldShrinkTrue16(MI))
         continue;
 
diff --git a/llvm/test/CodeGen/AMDGPU/shrink-true16.mir b/llvm/test/CodeGen/AMDGPU/shrink-true16.mir
@@ -0,0 +1,28 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
+# RUN: llc -mtriple=amdgcn -mcpu=gfx1100 -run-pass=si-shrink-instructions -verify-machineinstrs -o - %s | FileCheck -check-prefix=GFX1100 %s
+
+---
+name: 16bit_lo128_shrink
+tracksRegLiveness: true
+body: |
+  bb.0:
+    liveins: $vgpr127
+    ; GFX1100-LABEL: name: 16bit_lo128_shrink
+    ; GFX1100: liveins: $vgpr127
+    ; GFX1100-NEXT: {{  $}}
+    ; GFX1100-NEXT: V_CMP_EQ_U16_t16_e32 0, $vgpr127, implicit-def $vcc, implicit $exec, implicit $exec
+    $vcc_lo = V_CMP_EQ_U16_t16_e64 0, $vgpr127, implicit-def $vcc, implicit $exec
+...
+
+---
+name: 16bit_lo128_no_shrink
+tracksRegLiveness: true
+body: |
+  bb.0:
+    liveins: $vgpr128
+    ; GFX1100-LABEL: name: 16bit_lo128_no_shrink
+    ; GFX1100: liveins: $vgpr128
+    ; GFX1100-NEXT: {{  $}}
+    ; GFX1100-NEXT: $vcc_lo = V_CMP_EQ_U16_t16_e64 0, $vgpr128, implicit-def $vcc_lo, implicit $exec
+    $vcc_lo = V_CMP_EQ_U16_t16_e64 0, $vgpr128, implicit-def $vcc, implicit $exec
+...