Skip to content

[AMDGPU][True16][CodeGen] do not legalize t16 operand during user scan #145450

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jun 25, 2025

Conversation

broxigarchen
Copy link
Contributor

@broxigarchen broxigarchen commented Jun 24, 2025

The legalize t16 operand function could insert a reg_sequence which modify the user list of the targetted register, and we should not call it in the middle of an user list iteration

@broxigarchen broxigarchen force-pushed the main-sgpr-copy-multi-use branch from 16cef3a to 38e2154 Compare June 24, 2025 03:22
@broxigarchen broxigarchen changed the title fix mutli use list modification [AMDGPU][True16][CodeGen] do not legalize t16 operand during user scan Jun 24, 2025
@broxigarchen broxigarchen force-pushed the main-sgpr-copy-multi-use branch from 38e2154 to 9b3841a Compare June 24, 2025 14:19
@broxigarchen broxigarchen marked this pull request as ready for review June 24, 2025 14:19
@broxigarchen broxigarchen requested a review from Sisyph June 24, 2025 14:19
@broxigarchen broxigarchen requested review from arsenm and kosarev June 24, 2025 14:19
@llvmbot
Copy link
Member

llvmbot commented Jun 24, 2025

@llvm/pr-subscribers-backend-amdgpu

Author: Brox Chen (broxigarchen)

Changes

The legalize t16 operand function could insert a reg_sequence which modify the user list of the targetted register, and we should not call it in the middle of an user list iteration


Full diff: https://github.com/llvm/llvm-project/pull/145450.diff

2 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/SIInstrInfo.cpp (+6-1)
  • (modified) llvm/test/CodeGen/AMDGPU/fix-sgpr-copies-f16-true16.mir (+26)
diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
index 333e91bf37df5..ca29c71030942 100644
--- a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
@@ -8716,6 +8716,8 @@ void SIInstrInfo::splitScalar64BitCountOp(SIInstrWorklist &Worklist,
 void SIInstrInfo::addUsersToMoveToVALUWorklist(
     Register DstReg, MachineRegisterInfo &MRI,
     SIInstrWorklist &Worklist) const {
+  SmallVector<std::pair<MachineInstr *, unsigned>, 4> LegalizeList;
+
   for (MachineRegisterInfo::use_iterator I = MRI.use_begin(DstReg),
          E = MRI.use_end(); I != E;) {
     MachineInstr &UseMI = *I->getParent();
@@ -8744,11 +8746,14 @@ void SIInstrInfo::addUsersToMoveToVALUWorklist(
         ++I;
       } while (I != E && I->getParent() == &UseMI);
     } else {
-      legalizeOperandsVALUt16(UseMI, OpNo, MRI);
+      LegalizeList.push_back(std::make_pair(&UseMI, OpNo));
 
       ++I;
     }
   }
+
+  for (auto &MI : LegalizeList)
+    legalizeOperandsVALUt16(*MI.first, MI.second, MRI);
 }
 
 void SIInstrInfo::movePackToVALU(SIInstrWorklist &Worklist,
diff --git a/llvm/test/CodeGen/AMDGPU/fix-sgpr-copies-f16-true16.mir b/llvm/test/CodeGen/AMDGPU/fix-sgpr-copies-f16-true16.mir
index 9b6a2f3a1aa1e..59184499de2f0 100644
--- a/llvm/test/CodeGen/AMDGPU/fix-sgpr-copies-f16-true16.mir
+++ b/llvm/test/CodeGen/AMDGPU/fix-sgpr-copies-f16-true16.mir
@@ -108,6 +108,32 @@ body:             |
     %4:sreg_32 = S_FMAC_F16 %3:sreg_32, %3:sreg_32, %2:sreg_32, implicit $mode
 ...
 
+---
+name:            legalize_with_multi_user
+body:             |
+  bb.0:
+    ; GCN-LABEL: name: legalize_with_multi_user
+    ; GCN: [[DEF:%[0-9]+]]:vgpr_16 = IMPLICIT_DEF
+    ; GCN-NEXT: [[DEF1:%[0-9]+]]:vgpr_16 = IMPLICIT_DEF
+    ; GCN-NEXT: [[REG_SEQUENCE:%[0-9]+]]:vgpr_32 = REG_SEQUENCE [[DEF]], %subreg.lo16, [[DEF1]], %subreg.hi16
+    ; GCN-NEXT: [[V_ADD_F16_t16_e64_:%[0-9]+]]:vgpr_16 = V_ADD_F16_t16_e64 0, [[REG_SEQUENCE]].lo16, 0, 1, 0, 0, 0, implicit $mode, implicit $exec
+    ; GCN-NEXT: [[S_MOV_B32_:%[0-9]+]]:sreg_32 = S_MOV_B32 32768
+    ; GCN-NEXT: [[DEF2:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[DEF3:%[0-9]+]]:vgpr_16 = IMPLICIT_DEF
+    ; GCN-NEXT: [[REG_SEQUENCE1:%[0-9]+]]:vgpr_32 = REG_SEQUENCE [[V_ADD_F16_t16_e64_]], %subreg.lo16, [[DEF3]], %subreg.hi16
+    ; GCN-NEXT: [[V_PK_FMA_F16_:%[0-9]+]]:vgpr_32 = V_PK_FMA_F16 11, [[S_MOV_B32_]], 0, [[REG_SEQUENCE1]], 8, [[DEF2]], 0, 0, 0, 0, 0, implicit $mode, implicit $exec
+    ; GCN-NEXT: [[DEF4:%[0-9]+]]:vgpr_16 = IMPLICIT_DEF
+    ; GCN-NEXT: [[REG_SEQUENCE2:%[0-9]+]]:vgpr_32 = REG_SEQUENCE [[V_ADD_F16_t16_e64_]], %subreg.lo16, [[DEF4]], %subreg.hi16
+    ; GCN-NEXT: [[V_XOR_B32_e64_:%[0-9]+]]:vgpr_32 = V_XOR_B32_e64 [[REG_SEQUENCE2]], [[S_MOV_B32_]], implicit $exec
+    %0:vgpr_16 = IMPLICIT_DEF
+    %1:sreg_32 = COPY %0:vgpr_16
+    %2:sreg_32 = S_ADD_F16 %1:sreg_32, 1, implicit $mode
+    %3:sreg_32 = S_MOV_B32 32768
+    %4:vgpr_32 = IMPLICIT_DEF
+    %5:vgpr_32 = V_PK_FMA_F16 11, %3:sreg_32, 0, %2:sreg_32, 8, %4:vgpr_32, 0, 0, 0, 0, 0, implicit $mode, implicit $exec
+    %6:sreg_32 = S_XOR_B32 %2:sreg_32, %3:sreg_32, implicit-def dead $scc
+...
+
 ---
 name:            vgpr16_to_spgr32
 body:             |

@broxigarchen broxigarchen force-pushed the main-sgpr-copy-multi-use branch from 9b3841a to f53b5ca Compare June 24, 2025 15:09
; GCN-NEXT: [[S_MOV_B32_:%[0-9]+]]:sreg_32 = S_MOV_B32 32768
; GCN-NEXT: [[DEF2:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
; GCN-NEXT: [[DEF3:%[0-9]+]]:vgpr_16 = IMPLICIT_DEF
; GCN-NEXT: [[REG_SEQUENCE1:%[0-9]+]]:vgpr_32 = REG_SEQUENCE [[V_ADD_F16_t16_e64_]], %subreg.lo16, [[DEF3]], %subreg.hi16
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it reasonable to have one REG_SEQUENCE per use? I guess most or all of them will eventually fold out?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reg_sequence should be fold out in the two-address, rewrite virtual reg and coalescer

Copy link
Contributor

@Sisyph Sisyph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides the noted cosmetic issues, LGTM

@broxigarchen broxigarchen force-pushed the main-sgpr-copy-multi-use branch from 56e2b07 to 75ce40d Compare June 24, 2025 21:45
@broxigarchen broxigarchen merged commit 505906b into llvm:main Jun 25, 2025
7 checks passed
@llvm-ci
Copy link
Collaborator

llvm-ci commented Jun 25, 2025

LLVM Buildbot has detected a new failure on builder lldb-remote-linux-ubuntu running on as-builder-9 while building llvm at step 16 "test-check-lldb-api".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/195/builds/10898

Here is the relevant piece of the build log for the reference
Step 16 (test-check-lldb-api) failure: Test just built components: check-lldb-api completed (failure)
...
PASS: lldb-api :: types/TestIntegerType.py (1274 of 1283)
PASS: lldb-api :: types/TestRecursiveTypes.py (1275 of 1283)
PASS: lldb-api :: python_api/watchpoint/watchlocation/TestTargetWatchAddress.py (1276 of 1283)
PASS: lldb-api :: types/TestIntegerTypeExpr.py (1277 of 1283)
UNSUPPORTED: lldb-api :: windows/launch/missing-dll/TestMissingDll.py (1278 of 1283)
PASS: lldb-api :: types/TestShortType.py (1279 of 1283)
PASS: lldb-api :: types/TestLongTypes.py (1280 of 1283)
PASS: lldb-api :: types/TestShortTypeExpr.py (1281 of 1283)
PASS: lldb-api :: types/TestLongTypesExpr.py (1282 of 1283)
TIMEOUT: lldb-api :: python_api/process/cancel_attach/TestCancelAttach.py (1283 of 1283)
******************** TEST 'lldb-api :: python_api/process/cancel_attach/TestCancelAttach.py' FAILED ********************
Script:
--
/usr/bin/python3.12 /home/buildbot/worker/as-builder-9/lldb-remote-linux-ubuntu/llvm-project/lldb/test/API/dotest.py -u CXXFLAGS -u CFLAGS --env LLVM_LIBS_DIR=/home/buildbot/worker/as-builder-9/lldb-remote-linux-ubuntu/build/./lib --env LLVM_INCLUDE_DIR=/home/buildbot/worker/as-builder-9/lldb-remote-linux-ubuntu/build/include --env LLVM_TOOLS_DIR=/home/buildbot/worker/as-builder-9/lldb-remote-linux-ubuntu/build/./bin --libcxx-include-dir /home/buildbot/worker/as-builder-9/lldb-remote-linux-ubuntu/build/include/c++/v1 --libcxx-include-target-dir /home/buildbot/worker/as-builder-9/lldb-remote-linux-ubuntu/build/include/aarch64-unknown-linux-gnu/c++/v1 --libcxx-library-dir /home/buildbot/worker/as-builder-9/lldb-remote-linux-ubuntu/build/./lib/aarch64-unknown-linux-gnu --arch aarch64 --build-dir /home/buildbot/worker/as-builder-9/lldb-remote-linux-ubuntu/build/lldb-test-build.noindex --lldb-module-cache-dir /home/buildbot/worker/as-builder-9/lldb-remote-linux-ubuntu/build/lldb-test-build.noindex/module-cache-lldb/lldb-api --clang-module-cache-dir /home/buildbot/worker/as-builder-9/lldb-remote-linux-ubuntu/build/lldb-test-build.noindex/module-cache-clang/lldb-api --executable /home/buildbot/worker/as-builder-9/lldb-remote-linux-ubuntu/build/./bin/lldb --compiler /home/buildbot/worker/as-builder-9/lldb-remote-linux-ubuntu/build/bin/clang --dsymutil /home/buildbot/worker/as-builder-9/lldb-remote-linux-ubuntu/build/./bin/dsymutil --make /usr/bin/gmake --llvm-tools-dir /home/buildbot/worker/as-builder-9/lldb-remote-linux-ubuntu/build/./bin --lldb-obj-root /home/buildbot/worker/as-builder-9/lldb-remote-linux-ubuntu/build/tools/lldb --lldb-libs-dir /home/buildbot/worker/as-builder-9/lldb-remote-linux-ubuntu/build/./lib --cmake-build-type Release --platform-url connect://jetson-agx-2198.lab.llvm.org:1234 --platform-working-dir /home/ubuntu/lldb-tests --sysroot /mnt/fs/jetson-agx-ubuntu --env ARCH_CFLAGS=-mcpu=cortex-a78 --platform-name remote-linux --skip-category=lldb-server /home/buildbot/worker/as-builder-9/lldb-remote-linux-ubuntu/llvm-project/lldb/test/API/python_api/process/cancel_attach -p TestCancelAttach.py
--
Exit Code: -9
Timeout: Reached timeout of 600 seconds

Command Output (stdout):
--
lldb version 21.0.0git (https://github.com/llvm/llvm-project.git revision 505906bff6ddf9813666f0404fb604a2b6e02722)
  clang revision 505906bff6ddf9813666f0404fb604a2b6e02722
  llvm revision 505906bff6ddf9813666f0404fb604a2b6e02722

--
Command Output (stderr):
--
WARNING:root:Custom libc++ is not supported for remote runs: ignoring --libcxx arguments
FAIL: LLDB (/home/buildbot/worker/as-builder-9/lldb-remote-linux-ubuntu/build/bin/clang-aarch64) :: test_scripted_implementation (TestCancelAttach.AttachCancelTestCase.test_scripted_implementation)

--

********************
Slowest Tests:
--------------------------------------------------------------------------
600.04s: lldb-api :: python_api/process/cancel_attach/TestCancelAttach.py
180.99s: lldb-api :: commands/command/script_alias/TestCommandScriptAlias.py
70.55s: lldb-api :: commands/process/attach/TestProcessAttach.py
39.55s: lldb-api :: functionalities/data-formatter/data-formatter-stl/libcxx-simulators/string/TestDataFormatterLibcxxStringSimulator.py
34.89s: lldb-api :: functionalities/completion/TestCompletion.py
34.43s: lldb-api :: functionalities/single-thread-step/TestSingleThreadStepTimeout.py
27.03s: lldb-api :: python_api/watchpoint/watchlocation/TestTargetWatchAddress.py
23.93s: lldb-api :: commands/statistics/basic/TestStats.py
20.66s: lldb-api :: functionalities/gdb_remote_client/TestPlatformClient.py
18.86s: lldb-api :: functionalities/thread/state/TestThreadStates.py
17.97s: lldb-api :: commands/dwim-print/TestDWIMPrint.py
14.47s: lldb-api :: commands/expression/expr-in-syscall/TestExpressionInSyscall.py
14.39s: lldb-api :: functionalities/data-formatter/data-formatter-stl/generic/set/TestDataFormatterGenericSet.py
13.94s: lldb-api :: python_api/find_in_memory/TestFindRangesInMemory.py

@jayfoad
Copy link
Contributor

jayfoad commented Jun 25, 2025

[AMDGPU][True16][CodeGen] do not legalize t16 operand during user scan

But you are still calling legalizeOperandsVALUt16 from addUsersToMoveToVALUWorklist, right? Can you explain why this is needed in the first place? It feels wrong, because that function is now doing more than just adding things to a work list. I guess it comes from #138734, but that review is hard to follow.

@broxigarchen
Copy link
Contributor Author

broxigarchen commented Jun 25, 2025

[AMDGPU][True16][CodeGen] do not legalize t16 operand during user scan

But you are still calling legalizeOperandsVALUt16 from addUsersToMoveToVALUWorklist, right? Can you explain why this is needed in the first place? It feels wrong, because that function is now doing more than just adding things to a work list. I guess it comes from #138734, but that review is hard to follow.

Hi Jay. The problem is mainly from the mismatched register size in the MIR after we introduce VGPR16 but no SGPR16.

There are a few places that are generating mismatched register size in the moveToVALU sequence. i.e. A SALU32 used by SALU16. Before moved to VALU this is legal as they all use sgpr32. When this def-use is moved to VALU. SALU32 becomes a VALU32 (sgpr32 to vpgr32), and SALU16 becomes a VALU16 (sgpr32 to vgpr16). The register size is mismatched in the def-use chain.

Thus, in moveToVALU sequence, we need to legalize these operands. We cannot simply legalize the operands of the current instruction, but also need to legalize the users of the current instruction, because the iteration order of the moveToVALU sequence is bottom up (see #138734 (comment)).

For this reason, we need to add user list scan in multiple places in the movetoVALU function to do the legalization which it's pretty ugly. Thus, I think since the addUsersToMoveToVALUWorklist is doing user scan already, doing the legalization there could remove a lots of the dulicated code

anthonyhatran pushed a commit to anthonyhatran/llvm-project that referenced this pull request Jun 26, 2025
llvm#145450)

The legalize t16 operand function could insert a reg_sequence which
modify the user list of the targetted register, and we should not call
it in the middle of an user list iteration
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants