[AMDGPU][True16][CodeGen] do not legalize t16 operand during user scan #145450

broxigarchen · 2025-06-24T03:21:50Z

The legalize t16 operand function could insert a reg_sequence which modify the user list of the targetted register, and we should not call it in the middle of an user list iteration

llvmbot · 2025-06-24T14:20:13Z

@llvm/pr-subscribers-backend-amdgpu

Author: Brox Chen (broxigarchen)

Changes

The legalize t16 operand function could insert a reg_sequence which modify the user list of the targetted register, and we should not call it in the middle of an user list iteration

Full diff: https://github.com/llvm/llvm-project/pull/145450.diff

2 Files Affected:

(modified) llvm/lib/Target/AMDGPU/SIInstrInfo.cpp (+6-1)
(modified) llvm/test/CodeGen/AMDGPU/fix-sgpr-copies-f16-true16.mir (+26)

diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
index 333e91bf37df5..ca29c71030942 100644
--- a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
@@ -8716,6 +8716,8 @@ void SIInstrInfo::splitScalar64BitCountOp(SIInstrWorklist &Worklist,
 void SIInstrInfo::addUsersToMoveToVALUWorklist(
     Register DstReg, MachineRegisterInfo &MRI,
     SIInstrWorklist &Worklist) const {
+  SmallVector<std::pair<MachineInstr *, unsigned>, 4> LegalizeList;
+
   for (MachineRegisterInfo::use_iterator I = MRI.use_begin(DstReg),
          E = MRI.use_end(); I != E;) {
     MachineInstr &UseMI = *I->getParent();
@@ -8744,11 +8746,14 @@ void SIInstrInfo::addUsersToMoveToVALUWorklist(
         ++I;
       } while (I != E && I->getParent() == &UseMI);
     } else {
-      legalizeOperandsVALUt16(UseMI, OpNo, MRI);
+      LegalizeList.push_back(std::make_pair(&UseMI, OpNo));
 
       ++I;
     }
   }
+
+  for (auto &MI : LegalizeList)
+    legalizeOperandsVALUt16(*MI.first, MI.second, MRI);
 }
 
 void SIInstrInfo::movePackToVALU(SIInstrWorklist &Worklist,
diff --git a/llvm/test/CodeGen/AMDGPU/fix-sgpr-copies-f16-true16.mir b/llvm/test/CodeGen/AMDGPU/fix-sgpr-copies-f16-true16.mir
index 9b6a2f3a1aa1e..59184499de2f0 100644
--- a/llvm/test/CodeGen/AMDGPU/fix-sgpr-copies-f16-true16.mir
+++ b/llvm/test/CodeGen/AMDGPU/fix-sgpr-copies-f16-true16.mir
@@ -108,6 +108,32 @@ body:             |
     %4:sreg_32 = S_FMAC_F16 %3:sreg_32, %3:sreg_32, %2:sreg_32, implicit $mode
 ...
 
+---
+name:            legalize_with_multi_user
+body:             |
+  bb.0:
+    ; GCN-LABEL: name: legalize_with_multi_user
+    ; GCN: [[DEF:%[0-9]+]]:vgpr_16 = IMPLICIT_DEF
+    ; GCN-NEXT: [[DEF1:%[0-9]+]]:vgpr_16 = IMPLICIT_DEF
+    ; GCN-NEXT: [[REG_SEQUENCE:%[0-9]+]]:vgpr_32 = REG_SEQUENCE [[DEF]], %subreg.lo16, [[DEF1]], %subreg.hi16
+    ; GCN-NEXT: [[V_ADD_F16_t16_e64_:%[0-9]+]]:vgpr_16 = V_ADD_F16_t16_e64 0, [[REG_SEQUENCE]].lo16, 0, 1, 0, 0, 0, implicit $mode, implicit $exec
+    ; GCN-NEXT: [[S_MOV_B32_:%[0-9]+]]:sreg_32 = S_MOV_B32 32768
+    ; GCN-NEXT: [[DEF2:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[DEF3:%[0-9]+]]:vgpr_16 = IMPLICIT_DEF
+    ; GCN-NEXT: [[REG_SEQUENCE1:%[0-9]+]]:vgpr_32 = REG_SEQUENCE [[V_ADD_F16_t16_e64_]], %subreg.lo16, [[DEF3]], %subreg.hi16
+    ; GCN-NEXT: [[V_PK_FMA_F16_:%[0-9]+]]:vgpr_32 = V_PK_FMA_F16 11, [[S_MOV_B32_]], 0, [[REG_SEQUENCE1]], 8, [[DEF2]], 0, 0, 0, 0, 0, implicit $mode, implicit $exec
+    ; GCN-NEXT: [[DEF4:%[0-9]+]]:vgpr_16 = IMPLICIT_DEF
+    ; GCN-NEXT: [[REG_SEQUENCE2:%[0-9]+]]:vgpr_32 = REG_SEQUENCE [[V_ADD_F16_t16_e64_]], %subreg.lo16, [[DEF4]], %subreg.hi16
+    ; GCN-NEXT: [[V_XOR_B32_e64_:%[0-9]+]]:vgpr_32 = V_XOR_B32_e64 [[REG_SEQUENCE2]], [[S_MOV_B32_]], implicit $exec
+    %0:vgpr_16 = IMPLICIT_DEF
+    %1:sreg_32 = COPY %0:vgpr_16
+    %2:sreg_32 = S_ADD_F16 %1:sreg_32, 1, implicit $mode
+    %3:sreg_32 = S_MOV_B32 32768
+    %4:vgpr_32 = IMPLICIT_DEF
+    %5:vgpr_32 = V_PK_FMA_F16 11, %3:sreg_32, 0, %2:sreg_32, 8, %4:vgpr_32, 0, 0, 0, 0, 0, implicit $mode, implicit $exec
+    %6:sreg_32 = S_XOR_B32 %2:sreg_32, %3:sreg_32, implicit-def dead $scc
+...
+
 ---
 name:            vgpr16_to_spgr32
 body:             |

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp

Sisyph · 2025-06-24T21:10:46Z

llvm/test/CodeGen/AMDGPU/fix-sgpr-copies-f16-true16.mir

+    ; GCN-NEXT: [[S_MOV_B32_:%[0-9]+]]:sreg_32 = S_MOV_B32 32768
+    ; GCN-NEXT: [[DEF2:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[DEF3:%[0-9]+]]:vgpr_16 = IMPLICIT_DEF
+    ; GCN-NEXT: [[REG_SEQUENCE1:%[0-9]+]]:vgpr_32 = REG_SEQUENCE [[V_ADD_F16_t16_e64_]], %subreg.lo16, [[DEF3]], %subreg.hi16


Is it reasonable to have one REG_SEQUENCE per use? I guess most or all of them will eventually fold out?

reg_sequence should be fold out in the two-address, rewrite virtual reg and coalescer

Sisyph

Besides the noted cosmetic issues, LGTM

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp

llvm-ci · 2025-06-25T04:27:20Z

LLVM Buildbot has detected a new failure on builder lldb-remote-linux-ubuntu running on as-builder-9 while building llvm at step 16 "test-check-lldb-api".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/195/builds/10898

Here is the relevant piece of the build log for the reference

Step 16 (test-check-lldb-api) failure: Test just built components: check-lldb-api completed (failure)
...
PASS: lldb-api :: types/TestIntegerType.py (1274 of 1283)
PASS: lldb-api :: types/TestRecursiveTypes.py (1275 of 1283)
PASS: lldb-api :: python_api/watchpoint/watchlocation/TestTargetWatchAddress.py (1276 of 1283)
PASS: lldb-api :: types/TestIntegerTypeExpr.py (1277 of 1283)
UNSUPPORTED: lldb-api :: windows/launch/missing-dll/TestMissingDll.py (1278 of 1283)
PASS: lldb-api :: types/TestShortType.py (1279 of 1283)
PASS: lldb-api :: types/TestLongTypes.py (1280 of 1283)
PASS: lldb-api :: types/TestShortTypeExpr.py (1281 of 1283)
PASS: lldb-api :: types/TestLongTypesExpr.py (1282 of 1283)
TIMEOUT: lldb-api :: python_api/process/cancel_attach/TestCancelAttach.py (1283 of 1283)
******************** TEST 'lldb-api :: python_api/process/cancel_attach/TestCancelAttach.py' FAILED ********************
Script:
--
/usr/bin/python3.12 /home/buildbot/worker/as-builder-9/lldb-remote-linux-ubuntu/llvm-project/lldb/test/API/dotest.py -u CXXFLAGS -u CFLAGS --env LLVM_LIBS_DIR=/home/buildbot/worker/as-builder-9/lldb-remote-linux-ubuntu/build/./lib --env LLVM_INCLUDE_DIR=/home/buildbot/worker/as-builder-9/lldb-remote-linux-ubuntu/build/include --env LLVM_TOOLS_DIR=/home/buildbot/worker/as-builder-9/lldb-remote-linux-ubuntu/build/./bin --libcxx-include-dir /home/buildbot/worker/as-builder-9/lldb-remote-linux-ubuntu/build/include/c++/v1 --libcxx-include-target-dir /home/buildbot/worker/as-builder-9/lldb-remote-linux-ubuntu/build/include/aarch64-unknown-linux-gnu/c++/v1 --libcxx-library-dir /home/buildbot/worker/as-builder-9/lldb-remote-linux-ubuntu/build/./lib/aarch64-unknown-linux-gnu --arch aarch64 --build-dir /home/buildbot/worker/as-builder-9/lldb-remote-linux-ubuntu/build/lldb-test-build.noindex --lldb-module-cache-dir /home/buildbot/worker/as-builder-9/lldb-remote-linux-ubuntu/build/lldb-test-build.noindex/module-cache-lldb/lldb-api --clang-module-cache-dir /home/buildbot/worker/as-builder-9/lldb-remote-linux-ubuntu/build/lldb-test-build.noindex/module-cache-clang/lldb-api --executable /home/buildbot/worker/as-builder-9/lldb-remote-linux-ubuntu/build/./bin/lldb --compiler /home/buildbot/worker/as-builder-9/lldb-remote-linux-ubuntu/build/bin/clang --dsymutil /home/buildbot/worker/as-builder-9/lldb-remote-linux-ubuntu/build/./bin/dsymutil --make /usr/bin/gmake --llvm-tools-dir /home/buildbot/worker/as-builder-9/lldb-remote-linux-ubuntu/build/./bin --lldb-obj-root /home/buildbot/worker/as-builder-9/lldb-remote-linux-ubuntu/build/tools/lldb --lldb-libs-dir /home/buildbot/worker/as-builder-9/lldb-remote-linux-ubuntu/build/./lib --cmake-build-type Release --platform-url connect://jetson-agx-2198.lab.llvm.org:1234 --platform-working-dir /home/ubuntu/lldb-tests --sysroot /mnt/fs/jetson-agx-ubuntu --env ARCH_CFLAGS=-mcpu=cortex-a78 --platform-name remote-linux --skip-category=lldb-server /home/buildbot/worker/as-builder-9/lldb-remote-linux-ubuntu/llvm-project/lldb/test/API/python_api/process/cancel_attach -p TestCancelAttach.py
--
Exit Code: -9
Timeout: Reached timeout of 600 seconds

Command Output (stdout):
--
lldb version 21.0.0git (https://github.com/llvm/llvm-project.git revision 505906bff6ddf9813666f0404fb604a2b6e02722)
  clang revision 505906bff6ddf9813666f0404fb604a2b6e02722
  llvm revision 505906bff6ddf9813666f0404fb604a2b6e02722

--
Command Output (stderr):
--
WARNING:root:Custom libc++ is not supported for remote runs: ignoring --libcxx arguments
FAIL: LLDB (/home/buildbot/worker/as-builder-9/lldb-remote-linux-ubuntu/build/bin/clang-aarch64) :: test_scripted_implementation (TestCancelAttach.AttachCancelTestCase.test_scripted_implementation)

--

********************
Slowest Tests:
--------------------------------------------------------------------------
600.04s: lldb-api :: python_api/process/cancel_attach/TestCancelAttach.py
180.99s: lldb-api :: commands/command/script_alias/TestCommandScriptAlias.py
70.55s: lldb-api :: commands/process/attach/TestProcessAttach.py
39.55s: lldb-api :: functionalities/data-formatter/data-formatter-stl/libcxx-simulators/string/TestDataFormatterLibcxxStringSimulator.py
34.89s: lldb-api :: functionalities/completion/TestCompletion.py
34.43s: lldb-api :: functionalities/single-thread-step/TestSingleThreadStepTimeout.py
27.03s: lldb-api :: python_api/watchpoint/watchlocation/TestTargetWatchAddress.py
23.93s: lldb-api :: commands/statistics/basic/TestStats.py
20.66s: lldb-api :: functionalities/gdb_remote_client/TestPlatformClient.py
18.86s: lldb-api :: functionalities/thread/state/TestThreadStates.py
17.97s: lldb-api :: commands/dwim-print/TestDWIMPrint.py
14.47s: lldb-api :: commands/expression/expr-in-syscall/TestExpressionInSyscall.py
14.39s: lldb-api :: functionalities/data-formatter/data-formatter-stl/generic/set/TestDataFormatterGenericSet.py
13.94s: lldb-api :: python_api/find_in_memory/TestFindRangesInMemory.py

jayfoad · 2025-06-25T09:18:31Z

[AMDGPU][True16][CodeGen] do not legalize t16 operand during user scan

But you are still calling legalizeOperandsVALUt16 from addUsersToMoveToVALUWorklist, right? Can you explain why this is needed in the first place? It feels wrong, because that function is now doing more than just adding things to a work list. I guess it comes from #138734, but that review is hard to follow.

broxigarchen · 2025-06-25T13:33:21Z

[AMDGPU][True16][CodeGen] do not legalize t16 operand during user scan

But you are still calling legalizeOperandsVALUt16 from addUsersToMoveToVALUWorklist, right? Can you explain why this is needed in the first place? It feels wrong, because that function is now doing more than just adding things to a work list. I guess it comes from #138734, but that review is hard to follow.

Hi Jay. The problem is mainly from the mismatched register size in the MIR after we introduce VGPR16 but no SGPR16.

There are a few places that are generating mismatched register size in the moveToVALU sequence. i.e. A SALU32 used by SALU16. Before moved to VALU this is legal as they all use sgpr32. When this def-use is moved to VALU. SALU32 becomes a VALU32 (sgpr32 to vpgr32), and SALU16 becomes a VALU16 (sgpr32 to vgpr16). The register size is mismatched in the def-use chain.

Thus, in moveToVALU sequence, we need to legalize these operands. We cannot simply legalize the operands of the current instruction, but also need to legalize the users of the current instruction, because the iteration order of the moveToVALU sequence is bottom up (see #138734 (comment)).

For this reason, we need to add user list scan in multiple places in the movetoVALU function to do the legalization which it's pretty ugly. Thus, I think since the addUsersToMoveToVALUWorklist is doing user scan already, doing the legalization there could remove a lots of the dulicated code

llvm#145450) The legalize t16 operand function could insert a reg_sequence which modify the user list of the targetted register, and we should not call it in the middle of an user list iteration

broxigarchen force-pushed the main-sgpr-copy-multi-use branch from 16cef3a to 38e2154 Compare June 24, 2025 03:22

broxigarchen changed the title ~~fix mutli use list modification~~ [AMDGPU][True16][CodeGen] do not legalize t16 operand during user scan Jun 24, 2025

broxigarchen force-pushed the main-sgpr-copy-multi-use branch from 38e2154 to 9b3841a Compare June 24, 2025 14:19

broxigarchen marked this pull request as ready for review June 24, 2025 14:19

broxigarchen requested a review from Sisyph June 24, 2025 14:19

llvmbot added the backend:AMDGPU label Jun 24, 2025

broxigarchen requested review from arsenm and kosarev June 24, 2025 14:19

jayfoad reviewed Jun 24, 2025

View reviewed changes

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp Outdated Show resolved Hide resolved

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp Outdated Show resolved Hide resolved

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp Outdated Show resolved Hide resolved

Sisyph reviewed Jun 24, 2025

View reviewed changes

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp Outdated Show resolved Hide resolved

fix mutli use list modification

f53b5ca

broxigarchen force-pushed the main-sgpr-copy-multi-use branch from 9b3841a to f53b5ca Compare June 24, 2025 15:09

use early_inc_range

40bc7a3

Sisyph reviewed Jun 24, 2025

View reviewed changes

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp Outdated Show resolved Hide resolved

Sisyph reviewed Jun 24, 2025

View reviewed changes

Sisyph approved these changes Jun 24, 2025

View reviewed changes

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp Outdated Show resolved Hide resolved

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp Outdated Show resolved Hide resolved

remove dup check

75ce40d

broxigarchen force-pushed the main-sgpr-copy-multi-use branch from 56e2b07 to 75ce40d Compare June 24, 2025 21:45

broxigarchen mentioned this pull request Jun 24, 2025

[AMDGPU][True16][CodeGen] stop emitting spgr_lo16 from isel #144819

Open

broxigarchen merged commit 505906b into llvm:main Jun 25, 2025
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AMDGPU][True16][CodeGen] do not legalize t16 operand during user scan #145450

[AMDGPU][True16][CodeGen] do not legalize t16 operand during user scan #145450

broxigarchen commented Jun 24, 2025 •

edited

Loading

Uh oh!

llvmbot commented Jun 24, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Sisyph Jun 24, 2025

Uh oh!

broxigarchen Jun 24, 2025

Uh oh!

Sisyph left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

llvm-ci commented Jun 25, 2025

Uh oh!

jayfoad commented Jun 25, 2025

Uh oh!

broxigarchen commented Jun 25, 2025 •

edited

Loading

Uh oh!

Uh oh!

[AMDGPU][True16][CodeGen] do not legalize t16 operand during user scan #145450

[AMDGPU][True16][CodeGen] do not legalize t16 operand during user scan #145450

Conversation

broxigarchen commented Jun 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Jun 24, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Sisyph Jun 24, 2025

Choose a reason for hiding this comment

Uh oh!

broxigarchen Jun 24, 2025

Choose a reason for hiding this comment

Uh oh!

Sisyph left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

llvm-ci commented Jun 25, 2025

Uh oh!

jayfoad commented Jun 25, 2025

Uh oh!

broxigarchen commented Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

broxigarchen commented Jun 24, 2025 •

edited

Loading

broxigarchen commented Jun 25, 2025 •

edited

Loading