Skip to content

SystemZ: Handle gr128 to fp128 copies in copyPhysReg #90861

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 2, 2024

Conversation

arsenm
Copy link
Contributor

@arsenm arsenm commented May 2, 2024

No description provided.

@llvmbot
Copy link
Member

llvmbot commented May 2, 2024

@llvm/pr-subscribers-backend-systemz

Author: Matt Arsenault (arsenm)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/90861.diff

2 Files Affected:

  • (modified) llvm/lib/Target/SystemZ/SystemZInstrInfo.cpp (+16)
  • (added) llvm/test/CodeGen/SystemZ/copy-phys-reg-gr128-to-fp128.mir (+82)
diff --git a/llvm/lib/Target/SystemZ/SystemZInstrInfo.cpp b/llvm/lib/Target/SystemZ/SystemZInstrInfo.cpp
index b3517fb0ea77f5..f4e521c824d6e0 100644
--- a/llvm/lib/Target/SystemZ/SystemZInstrInfo.cpp
+++ b/llvm/lib/Target/SystemZ/SystemZInstrInfo.cpp
@@ -856,6 +856,22 @@ void SystemZInstrInfo::copyPhysReg(MachineBasicBlock &MBB,
     return;
   }
 
+  if (SystemZ::FP128BitRegClass.contains(DestReg) &&
+      SystemZ::GR128BitRegClass.contains(SrcReg)) {
+    MCRegister DestRegHi = RI.getSubReg(DestReg, SystemZ::subreg_h64);
+    MCRegister DestRegLo = RI.getSubReg(DestReg, SystemZ::subreg_l64);
+    MCRegister SrcRegHi = RI.getSubReg(SrcReg, SystemZ::subreg_h64);
+    MCRegister SrcRegLo = RI.getSubReg(SrcReg, SystemZ::subreg_l64);
+
+    BuildMI(MBB, MBBI, DL, get(SystemZ::LDGR), DestRegHi)
+      .addReg(SrcRegHi)
+      .addReg(DestReg, RegState::ImplicitDefine);
+
+    BuildMI(MBB, MBBI, DL, get(SystemZ::LDGR), DestRegLo)
+      .addReg(SrcRegLo, getKillRegState(KillSrc));
+    return;
+  }
+
   // Move CC value from a GR32.
   if (DestReg == SystemZ::CC) {
     unsigned Opcode =
diff --git a/llvm/test/CodeGen/SystemZ/copy-phys-reg-gr128-to-fp128.mir b/llvm/test/CodeGen/SystemZ/copy-phys-reg-gr128-to-fp128.mir
new file mode 100644
index 00000000000000..d1de915e81cef7
--- /dev/null
+++ b/llvm/test/CodeGen/SystemZ/copy-phys-reg-gr128-to-fp128.mir
@@ -0,0 +1,82 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 4
+# RUN: llc -mtriple=s390x-ibm-linux -mcpu=z13 -run-pass=postrapseudos -verify-machineinstrs -o - %s | FileCheck %s
+
+---
+name:            copy_gr128_to_fp128__r0q_to_f0q
+tracksRegLiveness: true
+body:             |
+  bb.0:
+    liveins: $r0q
+    ; CHECK-LABEL: name: copy_gr128_to_fp128__r0q_to_f0q
+    ; CHECK: liveins: $r0q
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: $f0d = LDGR $r0d, implicit-def $f0q
+    ; CHECK-NEXT: $f2d = LDGR $r1d
+    ; CHECK-NEXT: Return implicit $f0q
+    $f0q = COPY $r0q
+    Return implicit $f0q
+...
+
+---
+name:            copy_gr128_to_fp128__r0q_to_f0q_killed
+tracksRegLiveness: true
+body:             |
+  bb.0:
+    liveins: $r0q
+    ; CHECK-LABEL: name: copy_gr128_to_fp128__r0q_to_f0q_killed
+    ; CHECK: liveins: $r0q
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: $f0d = LDGR $r0d, implicit-def $f0q
+    ; CHECK-NEXT: $f2d = LDGR killed $r1d
+    ; CHECK-NEXT: Return implicit $f0q
+    $f0q = COPY killed $r0q
+    Return implicit $f0q
+...
+
+---
+name:            copy_gr128_to_fp128__r0q_to_f0q_undef
+tracksRegLiveness: true
+body:             |
+  bb.0:
+    liveins: $r0q
+    ; CHECK-LABEL: name: copy_gr128_to_fp128__r0q_to_f0q_undef
+    ; CHECK: liveins: $r0q
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: $f0q = KILL undef $r0q
+    ; CHECK-NEXT: Return implicit $f0q
+    $f0q = COPY undef $r0q
+    Return implicit $f0q
+...
+
+---
+name:            copy_gr128_to_fp128__r0q_to_f0q_subreg0
+tracksRegLiveness: true
+body:             |
+  bb.0:
+    liveins: $r0q
+    ; CHECK-LABEL: name: copy_gr128_to_fp128__r0q_to_f0q_subreg0
+    ; CHECK: liveins: $r0q
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: $f0d = LDGR $r0d, implicit-def $f0q
+    ; CHECK-NEXT: $f2d = LDGR $r1d
+    ; CHECK-NEXT: Return implicit $f0q
+    $f0q = COPY $r0q
+    Return implicit $f0q
+...
+
+---
+name:            copy_gr128_to_fp128__r0q_to_f0q_subreg1
+tracksRegLiveness: true
+body:             |
+  bb.0:
+    liveins: $r0q
+    ; CHECK-LABEL: name: copy_gr128_to_fp128__r0q_to_f0q_subreg1
+    ; CHECK: liveins: $r0q
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: $f0d = LDGR $r0d, implicit-def $f0q
+    ; CHECK-NEXT: $f2d = LDGR $r1d
+    ; CHECK-NEXT: Return implicit $f0q
+    $f0q = COPY $r0q
+    Return implicit $f0q
+...
+

Copy link

github-actions bot commented May 2, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@arsenm arsenm force-pushed the systemz-copy-gpr128-to-fp128 branch from 00eff16 to 3076920 Compare May 2, 2024 14:57
Copy link
Member

@uweigand uweigand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but see comment about (apparently) duplicated test cases.

@arsenm arsenm merged commit d11afe1 into llvm:main May 2, 2024
@arsenm arsenm deleted the systemz-copy-gpr128-to-fp128 branch May 2, 2024 15:46
arsenm added a commit that referenced this pull request May 2, 2024
shouldCastAtomicLoadInIR is a hack that should be removed. Simple
bitcasting of operations should be in the domain of ordinary type
legalization and does not need to be done in the IR.

This introduces a code quality regression due to the hack currently used
to avoid using 128-bit values in the case where the floating point value
is ultimately used as an integer. This would be avoidable if there were
always a legal 128-bit type (like v2i64). This is a pretty niche
situation so I assume it's not important.

I implemented about 85% of the work necessary to make v2i64 legal, but
it was taking too long and I lack the necessary familiarity with systemz
to complete it. I've pushed it here for someone to pick up:
https://github.com/arsenm/llvm-project/pull/new/systemz-legal-v2i64

Depends #90861
@uweigand
Copy link
Member

uweigand commented May 8, 2024

Hi @arsenm, coming back to this I don't see how this is actually needed (any more). There are no types that can live in both FP128 and GR128, so nothing should ever generate such COPYs. I guess you ran into this with an early version of the f128 atomic load/store bitcast patch, but the current version splits into two 64-bit copies very early and never trigger this code path either.

Unless I'm missing something here, I'd prefer to revert this PR again rather than keep this as dead code.

@arsenm
Copy link
Contributor Author

arsenm commented May 8, 2024

I guess you ran into this with an early version of the f128 atomic load/store bitcast patch, but the current version splits into two 64-bit copies very early and never trigger this code path either.

No, I ran into this with the better version (https://github.com/arsenm/llvm-project/tree/systemz-legal-v2i64). v2i64 should be legal, and avoids all of this custom lowering hacking around the SelectionDAG requirement to avoid illegal types. What's pushed now is where I landed after I gave up on making sure none of the ABIs changed

Unless I'm missing something here, I'd prefer to revert this PR again rather than keep this as dead code.

copyPhysReg really should be comprehensive. This is tested code so I don't see a compelling reason to remove it and I strongly suggest leaving it. This is avoiding future work for someone

It's not guaranteed that the DAG never emits these copies. When I was working on the legal type patch, these were slipping in from not-100% precise selection pattern predicates. Having mandatory optimization is bad, and this should always be available as a fallback.

The DAG is also not the only possible source of copies, any machine pass should be able to insert them. GlobalIsel also does away with the "type = register class" DAG reasoning. The fact this was missing also shows there's a missed opportunity for RA to avoid spills by copying to these other classes. This would work if there was a new allocatable synthetic class that was the union of FPR128/GPR128

@uweigand
Copy link
Member

uweigand commented May 8, 2024

OK, fair enough. Thanks for the explanation. Let's keep it in then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants