Skip to content

[PowerPC] Add missing patterns for lround when i32 is returned. #111863

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Oct 16, 2024

Conversation

stefanp-synopsys
Copy link
Contributor

The patch adds support for lround when the output type of the rounding is i32.
The support for a rounding result of type i64 existed before this patch.

@llvmbot
Copy link
Member

llvmbot commented Oct 10, 2024

@llvm/pr-subscribers-backend-powerpc

Author: Stefan Pintilie (stefanp-ibm)

Changes

The patch adds support for lround when the output type of the rounding is i32.
The support for a rounding result of type i64 existed before this patch.


Full diff: https://github.com/llvm/llvm-project/pull/111863.diff

2 Files Affected:

  • (modified) llvm/lib/Target/PowerPC/PPCInstrVSX.td (+4)
  • (modified) llvm/test/CodeGen/PowerPC/scalar-rounding-ops.ll (+84)
diff --git a/llvm/lib/Target/PowerPC/PPCInstrVSX.td b/llvm/lib/Target/PowerPC/PPCInstrVSX.td
index dd07892794d599..fe9ab22c576349 100644
--- a/llvm/lib/Target/PowerPC/PPCInstrVSX.td
+++ b/llvm/lib/Target/PowerPC/PPCInstrVSX.td
@@ -3606,6 +3606,10 @@ def : Pat<(i64 (lround f64:$S)),
           (i64 (MFVSRD (FCTID (XSRDPI $S))))>;
 def : Pat<(i64 (lround f32:$S)),
           (i64 (MFVSRD (FCTID (XSRDPI (COPY_TO_REGCLASS $S, VSFRC)))))>;
+def : Pat<(i32 (lround f64:$S)),
+          (i32 (MFVSRWZ (FCTIW (XSRDPI $S))))>;
+def : Pat<(i32 (lround f32:$S)),
+          (i32 (MFVSRWZ (FCTIW (XSRDPI (COPY_TO_REGCLASS $S, VSFRC)))))>;
 def : Pat<(i64 (llround f64:$S)),
           (i64 (MFVSRD (FCTID (XSRDPI $S))))>;
 def : Pat<(i64 (llround f32:$S)),
diff --git a/llvm/test/CodeGen/PowerPC/scalar-rounding-ops.ll b/llvm/test/CodeGen/PowerPC/scalar-rounding-ops.ll
index e950c0a2efac49..f393bdeb8626eb 100644
--- a/llvm/test/CodeGen/PowerPC/scalar-rounding-ops.ll
+++ b/llvm/test/CodeGen/PowerPC/scalar-rounding-ops.ll
@@ -214,6 +214,48 @@ entry:
 
 declare i64 @llvm.lround.i64.f64(double)
 
+define dso_local i32 @test_lround32(double %d) local_unnamed_addr {
+; BE-LABEL: test_lround32:
+; BE:       # %bb.0: # %entry
+; BE-NEXT:    mflr r0
+; BE-NEXT:    stdu r1, -112(r1)
+; BE-NEXT:    std r0, 128(r1)
+; BE-NEXT:    .cfi_def_cfa_offset 112
+; BE-NEXT:    .cfi_offset lr, 16
+; BE-NEXT:    bl lround
+; BE-NEXT:    nop
+; BE-NEXT:    addi r1, r1, 112
+; BE-NEXT:    ld r0, 16(r1)
+; BE-NEXT:    mtlr r0
+; BE-NEXT:    blr
+;
+; CHECK-LABEL: test_lround32:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    mflr r0
+; CHECK-NEXT:    stdu r1, -32(r1)
+; CHECK-NEXT:    std r0, 48(r1)
+; CHECK-NEXT:    .cfi_def_cfa_offset 32
+; CHECK-NEXT:    .cfi_offset lr, 16
+; CHECK-NEXT:    bl lround
+; CHECK-NEXT:    nop
+; CHECK-NEXT:    addi r1, r1, 32
+; CHECK-NEXT:    ld r0, 16(r1)
+; CHECK-NEXT:    mtlr r0
+; CHECK-NEXT:    blr
+;
+; FAST-LABEL: test_lround32:
+; FAST:       # %bb.0: # %entry
+; FAST-NEXT:    xsrdpi f0, f1
+; FAST-NEXT:    fctiw f0, f0
+; FAST-NEXT:    mffprwz r3, f0
+; FAST-NEXT:    blr
+entry:
+  %0 = tail call i32 @llvm.lround.i32.f64(double %d)
+  ret i32 %0
+}
+
+declare i32 @llvm.lround.i32.f64(double)
+
 define dso_local i64 @test_lroundf(float %f) local_unnamed_addr {
 ; BE-LABEL: test_lroundf:
 ; BE:       # %bb.0: # %entry
@@ -256,6 +298,48 @@ entry:
 
 declare i64 @llvm.lround.i64.f32(float)
 
+define dso_local i32 @test_lroundf32(float %d) local_unnamed_addr {
+; BE-LABEL: test_lroundf32:
+; BE:       # %bb.0: # %entry
+; BE-NEXT:    mflr r0
+; BE-NEXT:    stdu r1, -112(r1)
+; BE-NEXT:    std r0, 128(r1)
+; BE-NEXT:    .cfi_def_cfa_offset 112
+; BE-NEXT:    .cfi_offset lr, 16
+; BE-NEXT:    bl lroundf
+; BE-NEXT:    nop
+; BE-NEXT:    addi r1, r1, 112
+; BE-NEXT:    ld r0, 16(r1)
+; BE-NEXT:    mtlr r0
+; BE-NEXT:    blr
+;
+; CHECK-LABEL: test_lroundf32:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    mflr r0
+; CHECK-NEXT:    stdu r1, -32(r1)
+; CHECK-NEXT:    std r0, 48(r1)
+; CHECK-NEXT:    .cfi_def_cfa_offset 32
+; CHECK-NEXT:    .cfi_offset lr, 16
+; CHECK-NEXT:    bl lroundf
+; CHECK-NEXT:    nop
+; CHECK-NEXT:    addi r1, r1, 32
+; CHECK-NEXT:    ld r0, 16(r1)
+; CHECK-NEXT:    mtlr r0
+; CHECK-NEXT:    blr
+;
+; FAST-LABEL: test_lroundf32:
+; FAST:       # %bb.0: # %entry
+; FAST-NEXT:    xsrdpi f0, f1
+; FAST-NEXT:    fctiw f0, f0
+; FAST-NEXT:    mffprwz r3, f0
+; FAST-NEXT:    blr
+entry:
+  %0 = tail call i32 @llvm.lround.i32.f32(float %d)
+  ret i32 %0
+}
+
+declare i32 @llvm.lround.i32.f32(float)
+
 define dso_local i64 @test_llround(double %d) local_unnamed_addr {
 ; BE-LABEL: test_llround:
 ; BE:       # %bb.0: # %entry

Copy link
Contributor

@amy-kwan amy-kwan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one nit but I think LGTM.

@@ -214,6 +214,48 @@ entry:

declare i64 @llvm.lround.i64.f64(double)

define dso_local i32 @test_lround32(double %d) local_unnamed_addr {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: We're passing a double into the intrinsic in this test. I think the name of this test might sound a bit too similar to the other one (test_lroundf32). Maybe we should update this to indicate something like f64 (but returning an i32)?

Anyway, this is just a picky nit of mine but if you don't feel it is necessary you can feel free to disregard.

The patch adds support for lround when the output type of the rounding is i32.
The support for a rounding result of type i64 existed before this patch.
@stefanp-synopsys stefanp-synopsys merged commit dcc5ba4 into llvm:main Oct 16, 2024
5 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants