[PowerPC] Implement llvm.set.rounding intrinsic #67302

ecnelises · 2023-09-25T09:11:41Z

According to LangRef, llvm.set.rounding sets rounding mode by integer argument:

0 - toward zero
1 - to nearest, ties to even
2 - toward positive infinity
3 - toward negative infinity
4 - to nearest, ties away from zero

While PowerPC ISA says:

0 - to nearest
1 - toward zero
2 - toward positive infinity
3 - toward negative infinity

This patch maps the argument and write into last two bits of FPSCR (rounding mode).

Migrated from https://reviews.llvm.org/D154933

According to LangRef, llvm.set.rounding sets rounding mode by integer argument: 0 - toward zero 1 - to nearest, ties to even 2 - toward positive infinity 3 - toward negative infinity 4 - to nearest, ties away from zero While PowerPC ISA says: 0 - to nearest 1 - toward zero 2 - toward positive infinity 3 - toward negative infinity This patch maps the argument and write into last two bits of FPSCR (rounding mode).

llvm/lib/Target/PowerPC/PPCISelLowering.cpp

…ounding

spavloff

The patch looks good but I am not familiar with PPC instructions enough. Could you please run the runtime tests from here: https://github.com/llvm/llvm-test-suite/tree/main/MultiSource/UnitTests/Float/rounding? You just need to build application from two files: clang rounding.c rounding-dynamic.c.

llvm/lib/Target/PowerPC/PPCISelLowering.cpp

…ounding

ecnelises · 2023-11-27T10:35:04Z

The patch looks good but I am not familiar with PPC instructions enough. Could you please run the runtime tests from here: https://github.com/llvm/llvm-test-suite/tree/main/MultiSource/UnitTests/Float/rounding? You just need to build application from two files: clang rounding.c rounding-dynamic.c.

Sure. All passed on ppc64le, ppc64 and ppc32. (of course changes to test-suite and clang are needed, I'll update then)

spavloff · 2023-11-28T12:50:50Z

LGTM.

chenzheng1030

Maybe we can do some perf test between this expansion for set rounding mode and the system library's version for fesetround(). On AIX, I saw some improvements were introduced in the system library's implementation.

llvm/lib/Target/PowerPC/PPCISelLowering.cpp

chenzheng1030 · 2024-01-02T06:26:33Z

llvm/lib/Target/PowerPC/PPCISelLowering.cpp

+  SDValue Chain = Op.getOperand(0);
+
+  // If requested mode is constant, just use simpler mtfsb.
+  if (auto *CVal = dyn_cast<ConstantSDNode>(Op.getOperand(1))) {


Can we use DAG.computeKnownBits() to handle more cases instead of just the constant inputs?

Here we want to make sure higher bits are all zeroes. KnownBits and constant don't make an difference?

chenzheng1030 · 2024-01-02T07:21:25Z

llvm/lib/Target/PowerPC/PPCISelLowering.cpp

+    return SDValue(SetLo, 0);
+  }
+
+  // Use x ^ (~(x >> 1) & 1) to transform LLVM rounding mode to Power format.


The comment does not match below logic. x should be (x & 3)?

And the LLVM mode 4(4 - to nearest, ties away from zero) is mapped to Power mode 1(1 - toward zero)? I think LLVM mode 4 should map to Power mode 0(0- Round to Nearest)?

I think we are using a at-best-effort approach. The meaning looks implementation-defined:

The llvm.set.rounding intrinsic sets the current rounding mode. It is similar to C library function ‘fesetround’, however this intrinsic does not return any value and uses platform-independent representation of IEEE rounding modes.

chenzheng1030 · 2024-01-02T07:23:12Z

llvm/lib/Target/PowerPC/PPCISelLowering.cpp

+  // Use x ^ (~(x >> 1) & 1) to transform LLVM rounding mode to Power format.
+  SDValue One = DAG.getConstant(1, Dl, MVT::i32);
+  SDValue SrcFlag = DAG.getNode(ISD::AND, Dl, MVT::i32, Op.getOperand(1),
+                                DAG.getConstant(3, Dl, MVT::i32));


Can we add an assert here too if compiler can infer that the high 29 bits of operand 1 is non-zero?

llvm/lib/Target/PowerPC/PPCISelLowering.cpp

llvm/test/CodeGen/PowerPC/frounds.ll

…ounding

ecnelises · 2024-01-10T09:29:26Z

Maybe we can do some perf test between this expansion for set rounding mode and the system library's version for fesetround().

They are faster than system fesetround on both Linux and AIX. Linux glibc optimizes fesetround with faster mffscrn on P9, I just exploited the instruction here.

ecnelises · 2024-08-19T09:34:25Z

Ping

spavloff

LGTM.

…ounding

llvm-ci · 2024-09-10T06:46:15Z

LLVM Buildbot has detected a new failure on builder openmp-offload-libc-amdgpu-runtime running on omp-vega20-1 while building llvm at step 10 "Add check check-offload".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/73/builds/5247

Here is the relevant piece of the build log for the reference

Step 10 (Add check check-offload) failure: test (failure)
******************** TEST 'libomptarget :: amdgcn-amd-amdhsa :: sanitizer/ptr_outside_alloc_2.c' FAILED ********************
Exit Code: 1

Command Output (stdout):
--
# RUN: at line 2
/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/clang -fopenmp    -I /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test -I /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -L /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -L /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./lib -L /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src  -nogpulib -Wl,-rpath,/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -Wl,-rpath,/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -Wl,-rpath,/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./lib  -fopenmp-targets=amdgcn-amd-amdhsa -O3 /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test/sanitizer/ptr_outside_alloc_2.c -o /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/sanitizer/Output/ptr_outside_alloc_2.c.tmp /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./lib/libomptarget.devicertl.a
# executed command: /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/clang -fopenmp -I /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test -I /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -L /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -L /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./lib -L /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -nogpulib -Wl,-rpath,/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -Wl,-rpath,/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -Wl,-rpath,/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./lib -fopenmp-targets=amdgcn-amd-amdhsa -O3 /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test/sanitizer/ptr_outside_alloc_2.c -o /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/sanitizer/Output/ptr_outside_alloc_2.c.tmp /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./lib/libomptarget.devicertl.a
# RUN: at line 3
/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/not --crash env -u LLVM_DISABLE_SYMBOLIZATION OFFLOAD_TRACK_ALLOCATION_TRACES=1 /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/sanitizer/Output/ptr_outside_alloc_2.c.tmp 2>&1 | /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/FileCheck /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test/sanitizer/ptr_outside_alloc_2.c --check-prefixes=CHECK
# executed command: /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/not --crash env -u LLVM_DISABLE_SYMBOLIZATION OFFLOAD_TRACK_ALLOCATION_TRACES=1 /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/sanitizer/Output/ptr_outside_alloc_2.c.tmp
# executed command: /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/FileCheck /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test/sanitizer/ptr_outside_alloc_2.c --check-prefixes=CHECK
# .---command stderr------------
# | /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test/sanitizer/ptr_outside_alloc_2.c:21:11: error: CHECK: expected string not found in input
# | // CHECK: OFFLOAD ERROR: Memory access fault by GPU {{.*}} (agent 0x{{.*}}) at virtual address [[PTR:0x[0-9a-z]*]]. Reasons: {{.*}}
# |           ^
# | <stdin>:1:1: note: scanning from here
# | AMDGPU error: Error in hsa_amd_memory_pool_allocate: HSA_STATUS_ERROR_OUT_OF_RESOURCES: The runtime failed to allocate the necessary resources. This error may also occur when the core runtime library needs to spawn threads or create internal OS-specific events.
# | ^
# | 
# | Input file: <stdin>
# | Check file: /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test/sanitizer/ptr_outside_alloc_2.c
# | 
# | -dump-input=help explains the following input dump.
# | 
# | Input was:
# | <<<<<<
# |           1: AMDGPU error: Error in hsa_amd_memory_pool_allocate: HSA_STATUS_ERROR_OUT_OF_RESOURCES: The runtime failed to allocate the necessary resources. This error may also occur when the core runtime library needs to spawn threads or create internal OS-specific events. 
# | check:21     X~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ error: no match found
# |           2: AMDGPU error: Error in hsa_amd_memory_pool_allocate: HSA_STATUS_ERROR_OUT_OF_RESOURCES: The runtime failed to allocate the necessary resources. This error may also occur when the core runtime library needs to spawn threads or create internal OS-specific events. 
# | check:21     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           3: "PluginInterface" error: Failure to allocate device memory: Failed to allocate from memory manager 
# | check:21     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           4: omptarget error: Call to getTargetPointer returned null pointer (device failure or illegal mapping). 
# | check:21     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           5: omptarget error: Call to targetDataBegin failed, abort target. 
# | check:21     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           6: omptarget error: Failed to process data before launching the kernel. 
# | check:21     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           .
# |           .
# |           .
# | >>>>>>
# `-----------------------------
# error: command failed with exit status: 1

--

********************

chmeeedalf reviewed Sep 25, 2023

View reviewed changes

llvm/lib/Target/PowerPC/PPCISelLowering.cpp Outdated Show resolved Hide resolved

ecnelises added 2 commits November 16, 2023 15:21

Merge commit '212a60ec37322f853e91e171b305479b1abff2f2' into ppc_setr…

cef28ea

…ounding

Exclude SPE

f1c1a5c

ecnelises requested review from chenzheng1030 and stefanp-synopsys November 16, 2023 07:25

spavloff reviewed Nov 23, 2023

View reviewed changes

llvm/lib/Target/PowerPC/PPCISelLowering.cpp Outdated Show resolved Hide resolved

llvm/lib/Target/PowerPC/PPCISelLowering.cpp Outdated Show resolved Hide resolved

Merge commit '25b5e5c4e9a39a86ca3c1a05ad6eae33771ab052' into ppc_setr…

f19ccd0

…ounding

ecnelises requested a review from nemanjai November 27, 2023 09:41

ecnelises added 2 commits November 27, 2023 17:42

Use assert instead of unreachable

a2c1490

Fixup

00a5ae8

ecnelises requested a review from bzEq December 21, 2023 09:49

chenzheng1030 reviewed Jan 2, 2024

View reviewed changes

ecnelises added 3 commits January 10, 2024 15:32

Merge commit 'c9124adfd8291a5f5b1d23295308d8940648c596' into ppc_setr…

0122b6f

…ounding

Exploit P9 mffscrn

228d184

Address comments

9636dea

Use rlwimi/rldimi

f9af667

spavloff approved these changes Sep 6, 2024

View reviewed changes

Merge commit '02ab43596f7aac857d0b55f3f551721594ffb484' into ppc_setr…

1f57705

…ounding

ecnelises merged commit 06c3311 into llvm:main Sep 10, 2024
8 checks passed

ecnelises deleted the ppc_setrounding branch September 10, 2024 06:30

[PowerPC] Implement llvm.set.rounding intrinsic #67302

[PowerPC] Implement llvm.set.rounding intrinsic #67302

Uh oh!

Conversation

ecnelises commented Sep 25, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

spavloff left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ecnelises commented Nov 27, 2023

Uh oh!

spavloff commented Nov 28, 2023

Uh oh!

chenzheng1030 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

chenzheng1030 Jan 2, 2024

Choose a reason for hiding this comment

Uh oh!

ecnelises Jan 10, 2024

Choose a reason for hiding this comment

Uh oh!

chenzheng1030 Jan 2, 2024

Choose a reason for hiding this comment

Uh oh!

ecnelises Jan 10, 2024

Choose a reason for hiding this comment

Uh oh!

chenzheng1030 Jan 2, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ecnelises commented Jan 10, 2024

Uh oh!

ecnelises commented Aug 19, 2024

Uh oh!

spavloff left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

llvm-ci commented Sep 10, 2024

Uh oh!

Uh oh!

ecnelises commented Sep 25, 2023 •

edited

Loading