Skip to content

[AArch64][GlobalISel] Legalize 128-bit types for FABS #104753

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Sep 3, 2024

Conversation

Him188
Copy link
Member

@Him188 Him188 commented Aug 19, 2024

This patch adds a common lower action for G_FABS, which generates and x8, x8, #0x7fffffffffffffff to reset the sign bit. The action does not support vectors since G_AND does not support fp128.

This approach is different than what SDAG is doing. SDAG stores the value onto stack, clears the sign bit in the most significant byte, and loads the value back into register. This involves multiple memory ops and sounds slower.

@Him188 Him188 changed the title Legalize 128-bit types for FABS [AArch64][GlobalISel] Legalize 128-bit types for FABS Aug 19, 2024
@llvmbot
Copy link
Member

llvmbot commented Aug 19, 2024

@llvm/pr-subscribers-llvm-globalisel

@llvm/pr-subscribers-backend-aarch64

Author: Him188 (Him188)

Changes

Generate and x8, x8, #<!-- -->0x7fffffffffffffff to reset the sign bit. Vectors are scalarized.

This approach is different than what SDAG is doing. SDAG stores the value onto stack, clears the sign bit in the most significant byte, and loads the value back into register. This involves multiple memory ops and sounds slower.


Full diff: https://github.com/llvm/llvm-project/pull/104753.diff

3 Files Affected:

  • (modified) llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp (+38-3)
  • (modified) llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.h (+2)
  • (added) llvm/test/CodeGen/AArch64/fabs-fp128.ll (+82)
diff --git a/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp b/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
index d42d5511a82422..cae0ce1f4567f2 100644
--- a/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
+++ b/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
@@ -241,9 +241,9 @@ AArch64LegalizerInfo::AArch64LegalizerInfo(const AArch64Subtarget &ST)
       .widenScalarToNextPow2(0);
 
   getActionDefinitionsBuilder({G_FADD, G_FSUB, G_FMUL, G_FDIV, G_FMA, G_FNEG,
-                               G_FABS, G_FSQRT, G_FMAXNUM, G_FMINNUM,
-                               G_FMAXIMUM, G_FMINIMUM, G_FCEIL, G_FFLOOR,
-                               G_FRINT, G_FNEARBYINT, G_INTRINSIC_TRUNC,
+                               G_FSQRT, G_FMAXNUM, G_FMINNUM, G_FMAXIMUM,
+                               G_FMINIMUM, G_FCEIL, G_FFLOOR, G_FRINT,
+                               G_FNEARBYINT, G_INTRINSIC_TRUNC,
                                G_INTRINSIC_ROUND, G_INTRINSIC_ROUNDEVEN})
       .legalFor({MinFPScalar, s32, s64, v2s32, v4s32, v2s64})
       .legalIf([=](const LegalityQuery &Query) {
@@ -257,6 +257,20 @@ AArch64LegalizerInfo::AArch64LegalizerInfo(const AArch64Subtarget &ST)
       .clampNumElements(0, v2s64, v2s64)
       .moreElementsToNextPow2(0);
 
+  getActionDefinitionsBuilder(G_FABS)
+      .legalFor({MinFPScalar, s32, s64, v2s32, v4s32, v2s64})
+      .legalIf([=](const LegalityQuery &Query) {
+        const auto &Ty = Query.Types[0];
+        return (Ty == v8s16 || Ty == v4s16) && HasFP16;
+      })
+      .customFor({s128})
+      .scalarizeIf(scalarOrEltWiderThan(0, 64), 0)
+      .minScalarOrElt(0, MinFPScalar)
+      .clampNumElements(0, v4s16, v8s16)
+      .clampNumElements(0, v2s32, v4s32)
+      .clampNumElements(0, v2s64, v2s64)
+      .moreElementsToNextPow2(0);
+
   getActionDefinitionsBuilder(G_FREM)
       .libcallFor({s32, s64})
       .minScalar(0, s32)
@@ -1336,6 +1350,8 @@ bool AArch64LegalizerInfo::legalizeCustom(
     return legalizePrefetch(MI, Helper);
   case TargetOpcode::G_ABS:
     return Helper.lowerAbsToCNeg(MI);
+  case TargetOpcode::G_FABS:
+    return legalizeFABS(MI, MRI, MIRBuilder);
   case TargetOpcode::G_ICMP:
     return legalizeICMP(MI, MRI, MIRBuilder);
   }
@@ -1396,6 +1412,25 @@ bool AArch64LegalizerInfo::legalizeFunnelShift(MachineInstr &MI,
   return true;
 }
 
+bool AArch64LegalizerInfo::legalizeFABS(MachineInstr &MI,
+                                        MachineRegisterInfo &MRI,
+                                        MachineIRBuilder &MIRBuilder) const {
+  Register SrcReg = MI.getOperand(1).getReg();
+  Register DstReg = MI.getOperand(0).getReg();
+
+  constexpr LLT S128 = LLT::scalar(128);
+  if (MRI.getType(SrcReg) != S128 || MRI.getType(DstReg) != S128)
+    return false;
+
+  MIRBuilder.buildAnd(
+      DstReg, SrcReg,
+      MIRBuilder.buildConstant(
+          S128, APInt::getSignedMaxValue(128)));
+
+  MI.eraseFromParent();
+  return true;
+}
+
 bool AArch64LegalizerInfo::legalizeICMP(MachineInstr &MI,
                                         MachineRegisterInfo &MRI,
                                         MachineIRBuilder &MIRBuilder) const {
diff --git a/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.h b/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.h
index 00d85a36e4b2ca..8bf642d1745aa9 100644
--- a/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.h
+++ b/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.h
@@ -50,6 +50,8 @@ class AArch64LegalizerInfo : public LegalizerInfo {
                                LegalizerHelper &Helper) const;
   bool legalizeRotate(MachineInstr &MI, MachineRegisterInfo &MRI,
                       LegalizerHelper &Helper) const;
+  bool legalizeFABS(MachineInstr &MI, MachineRegisterInfo &MRI,
+                    MachineIRBuilder &MIRBuilder) const;
   bool legalizeICMP(MachineInstr &MI, MachineRegisterInfo &MRI,
                     MachineIRBuilder &MIRBuilder) const;
   bool legalizeFunnelShift(MachineInstr &MI, MachineRegisterInfo &MRI,
diff --git a/llvm/test/CodeGen/AArch64/fabs-fp128.ll b/llvm/test/CodeGen/AArch64/fabs-fp128.ll
new file mode 100644
index 00000000000000..131af5e0a3281e
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/fabs-fp128.ll
@@ -0,0 +1,82 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 2
+; RUN: llc -mtriple=aarch64-unknown-linux-gnu -verify-machineinstrs %s -o - | FileCheck %s --check-prefixes=CHECK,CHECK-SD
+; RUN: llc -mtriple=aarch64-unknown-linux-gnu -global-isel=1 -verify-machineinstrs %s -o - | FileCheck %s --check-prefixes=CHECK,CHECK-GI
+
+define fp128 @fabs_f128(fp128 %a) {
+; CHECK-SD-LABEL: fabs_f128:
+; CHECK-SD:       // %bb.0: // %entry
+; CHECK-SD-NEXT:    str q0, [sp, #-16]!
+; CHECK-SD-NEXT:    .cfi_def_cfa_offset 16
+; CHECK-SD-NEXT:    ldrb w8, [sp, #15]
+; CHECK-SD-NEXT:    and w8, w8, #0x7f
+; CHECK-SD-NEXT:    strb w8, [sp, #15]
+; CHECK-SD-NEXT:    ldr q0, [sp], #16
+; CHECK-SD-NEXT:    ret
+;
+; CHECK-GI-LABEL: fabs_f128:
+; CHECK-GI:       // %bb.0: // %entry
+; CHECK-GI-NEXT:    mov x8, v0.d[1]
+; CHECK-GI-NEXT:    mov v0.d[0], v0.d[0]
+; CHECK-GI-NEXT:    and x8, x8, #0x7fffffffffffffff
+; CHECK-GI-NEXT:    mov v0.d[1], x8
+; CHECK-GI-NEXT:    ret
+entry:
+  %c = call fp128 @llvm.fabs.f128(fp128 %a)
+  ret fp128 %c
+}
+
+define <1 x fp128> @fabs_v1f128(<1 x fp128> %a) {
+; CHECK-SD-LABEL: fabs_v1f128:
+; CHECK-SD:       // %bb.0: // %entry
+; CHECK-SD-NEXT:    str q0, [sp, #-16]!
+; CHECK-SD-NEXT:    .cfi_def_cfa_offset 16
+; CHECK-SD-NEXT:    ldrb w8, [sp, #15]
+; CHECK-SD-NEXT:    and w8, w8, #0x7f
+; CHECK-SD-NEXT:    strb w8, [sp, #15]
+; CHECK-SD-NEXT:    ldr q0, [sp], #16
+; CHECK-SD-NEXT:    ret
+;
+; CHECK-GI-LABEL: fabs_v1f128:
+; CHECK-GI:       // %bb.0: // %entry
+; CHECK-GI-NEXT:    mov x8, v0.d[1]
+; CHECK-GI-NEXT:    mov v0.d[0], v0.d[0]
+; CHECK-GI-NEXT:    and x8, x8, #0x7fffffffffffffff
+; CHECK-GI-NEXT:    mov v0.d[1], x8
+; CHECK-GI-NEXT:    ret
+entry:
+  %c = call <1 x fp128> @llvm.fabs.v1f128(<1 x fp128> %a)
+  ret <1 x fp128> %c
+}
+
+define <2 x fp128> @fabs_v2f128(<2 x fp128> %a) {
+; CHECK-SD-LABEL: fabs_v2f128:
+; CHECK-SD:       // %bb.0: // %entry
+; CHECK-SD-NEXT:    stp q0, q1, [sp, #-32]!
+; CHECK-SD-NEXT:    .cfi_def_cfa_offset 32
+; CHECK-SD-NEXT:    ldrb w8, [sp, #15]
+; CHECK-SD-NEXT:    and w8, w8, #0x7f
+; CHECK-SD-NEXT:    strb w8, [sp, #15]
+; CHECK-SD-NEXT:    ldrb w8, [sp, #31]
+; CHECK-SD-NEXT:    and w8, w8, #0x7f
+; CHECK-SD-NEXT:    strb w8, [sp, #31]
+; CHECK-SD-NEXT:    ldp q0, q1, [sp], #32
+; CHECK-SD-NEXT:    ret
+;
+; CHECK-GI-LABEL: fabs_v2f128:
+; CHECK-GI:       // %bb.0: // %entry
+; CHECK-GI-NEXT:    mov x8, v0.d[1]
+; CHECK-GI-NEXT:    mov x9, v1.d[1]
+; CHECK-GI-NEXT:    mov v0.d[0], v0.d[0]
+; CHECK-GI-NEXT:    mov v1.d[0], v1.d[0]
+; CHECK-GI-NEXT:    and x8, x8, #0x7fffffffffffffff
+; CHECK-GI-NEXT:    and x9, x9, #0x7fffffffffffffff
+; CHECK-GI-NEXT:    mov v0.d[1], x8
+; CHECK-GI-NEXT:    mov v1.d[1], x9
+; CHECK-GI-NEXT:    ret
+entry:
+  %c = call <2 x fp128> @llvm.fabs.v2f128(<2 x fp128> %a)
+  ret <2 x fp128> %c
+}
+
+;; NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
+; CHECK: {{.*}}

Copy link

github-actions bot commented Aug 19, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

const auto &Ty = Query.Types[0];
return (Ty == v8s16 || Ty == v4s16) && HasFP16;
})
.customFor({s128})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it possible to make it lowerIf and have the generic LegalizerHelper::lower handle the expansion into an AND? It should be very similar code I hope (maybe generalized to more types), but could be useful to other targets / fp types too. FP16 without +fullfp16 might be able to use it, for example.

Comment on lines 1425 to 1428
MIRBuilder.buildAnd(
DstReg, SrcReg,
MIRBuilder.buildConstant(
S128, APInt::getSignedMaxValue(128)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should already be the default lower action. This shouldn't require custom lowering

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I failed to find such a handling. lowerFor({128}) also failed without adding a lower action. So I think there isn't a default lower action for G_FABS.

Copy link
Contributor

@arsenm arsenm Aug 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you should put this in the generic LegalizerHelper. As a rule, if your custom lowering only emits generic instructions, it should not be in a specific target

@Him188 Him188 requested review from davemgreen and arsenm August 19, 2024 14:03
Comment on lines 8466 to 8467
if (MRI.getType(SrcReg) != Ty)
return UnableToLegalize;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this check, the IR is malformed if this happens

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

Comment on lines 8469 to 8470
if (!Ty.isScalar())
return UnableToLegalize;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't need this, this code should handle vectors just fine

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

@Him188 Him188 requested a review from arsenm August 20, 2024 13:46
Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm with nit

Comment on lines 8469 to 8470
MIRBuilder.buildConstant(
Ty, APInt::getSignedMaxValue(Ty.getSizeInBits())));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not 100% sure this works for ppc_fp128 but we don't support that yet anyway

Copy link

@tschuett tschuett Aug 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ty.getScalarSizeInBits() to support vectors? And vector support is not testet.

Copy link
Member Author

@Him188 Him188 Aug 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tschuett How do I test vectors? Is there a way to unit-test the helper or do I need to find a target which does use the helper for vectors (AArch64 does not).

But if we remove the scalarizeIf in AArch64 then AArch64 will use the helper and hence this can be tested

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct.

getActionDefinitionsBuilder(G_FABS)
.legalFor({MinFPScalar, s32, s64, v2s32, v4s32, v2s64})
.legalIf([=](const LegalityQuery &Query) {
const auto &Ty = Query.Types[0];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no auto, and no reference. this is just LLT. You can also express this as a conditional legalFor, or swap out the legalFor list based on the HasFP16 feature

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was actually copied from above

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but that could also be better. These predicates can be called many times so they ideally should be as simple as possible to get to the legal case

return (Ty == v8s16 || Ty == v4s16) && HasFP16;
})
.lowerFor({s128})
.scalarizeIf(scalarOrEltWiderThan(0, 64), 0)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need the scalarizeIf when lowerFAbs supports vectors?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

G_AND does not support s128 so we still need this. Left a TODO.

LLVM ERROR: unable to legalize instruction: %3:_(<2 x s128>) = G_AND %0:_, %7:_ (in function: fabs_v2f128)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I happened to be adding i128 support for add/sub recently, and and/or/xor was caught up in them. There might not be a lot of benefit at the moment choosing one method vs the other as we will just scalarize in either case. That could be improved in the future if we change how they lower.

; CHECK-GI-NEXT: mov v1.d[1], x9
; CHECK-GI-NEXT: ret
entry:
%c = call <2 x fp128> @llvm.fabs.v2f128(<2 x fp128> %a)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While at it <4 x fp128> would be nice (low priority).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One moment later, non-power of two would be even nicer, e.g., <3 x fp128>.

Copy link
Member Author

@Him188 Him188 Aug 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we scalarize vectors so I assumed the tests are not necessary, but yes let me add them to be safer.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do not scalarize in the G_FABS legalizer, some other component will do the scalarization for us.

@Him188 Him188 requested a review from tschuett August 23, 2024 09:52
// A legality predicate that returns true if the subtarget has FP16 support.
// To be used in combination with other predicates, e.g:
// .legalIf(all(hasFP16(), typeInSet(0, {v8s16, v4s16})))
const auto hasFP16 = [=]() -> LegalityPredicate {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like it has more layers than it needs.

But it might be worth keeping as it was before. I feel like we will have a lot of conditions like these, and we should have something like .legalFor(hasFP16, {v4f16, v8f16})

return (Ty == v8s16 || Ty == v4s16) && HasFP16;
})
.lowerFor({s128})
.scalarizeIf(scalarOrEltWiderThan(0, 64), 0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I happened to be adding i128 support for add/sub recently, and and/or/xor was caught up in them. There might not be a lot of benefit at the moment choosing one method vs the other as we will just scalarize in either case. That could be improved in the future if we change how they lower.

Comment on lines 2 to 3
; RUN: llc -mtriple=aarch64-unknown-linux-gnu -verify-machineinstrs %s -o - | FileCheck %s --check-prefixes=CHECK,CHECK-SD
; RUN: llc -mtriple=aarch64-unknown-linux-gnu -global-isel=1 -verify-machineinstrs %s -o - | FileCheck %s --check-prefixes=CHECK,CHECK-GI
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't need -verify-machineinstrs. Also should use -global-isel=0 if you're explicitly trying to handle both selectors

@Him188 Him188 requested a review from davemgreen August 30, 2024 17:39
Copy link
Collaborator

@davemgreen davemgreen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. LGTM

@Him188 Him188 merged commit 0748f42 into llvm:main Sep 3, 2024
6 of 8 checks passed
@Him188 Him188 deleted the tguan/gisel-fabs branch September 3, 2024 11:47
@llvm-ci
Copy link
Collaborator

llvm-ci commented Sep 3, 2024

LLVM Buildbot has detected a new failure on builder openmp-s390x-linux running on systemz-1 while building llvm at step 6 "test-openmp".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/88/builds/2230

Here is the relevant piece of the build log for the reference
Step 6 (test-openmp) failure: test (failure)
******************** TEST 'libomp :: tasking/issue-94260-2.c' FAILED ********************
Exit Code: -11

Command Output (stdout):
--
# RUN: at line 1
/home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.build/./bin/clang -fopenmp   -I /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -I /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.src/openmp/runtime/test -L /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.build/runtimes/runtimes-bins/openmp/runtime/src  -fno-omit-frame-pointer -mbackchain -I /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.src/openmp/runtime/test/ompt /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.src/openmp/runtime/test/tasking/issue-94260-2.c -o /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.build/runtimes/runtimes-bins/openmp/runtime/test/tasking/Output/issue-94260-2.c.tmp -lm -latomic && /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.build/runtimes/runtimes-bins/openmp/runtime/test/tasking/Output/issue-94260-2.c.tmp
# executed command: /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.build/./bin/clang -fopenmp -I /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -I /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.src/openmp/runtime/test -L /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -fno-omit-frame-pointer -mbackchain -I /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.src/openmp/runtime/test/ompt /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.src/openmp/runtime/test/tasking/issue-94260-2.c -o /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.build/runtimes/runtimes-bins/openmp/runtime/test/tasking/Output/issue-94260-2.c.tmp -lm -latomic
# executed command: /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.build/runtimes/runtimes-bins/openmp/runtime/test/tasking/Output/issue-94260-2.c.tmp
# note: command had no output on stdout or stderr
# error: command failed with exit status: -11

--

********************


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants