Skip to content

[AMDGPU] Extend promotion of alloca to vectors #127973

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Mar 12, 2025

Conversation

perlfu
Copy link
Contributor

@perlfu perlfu commented Feb 20, 2025

  • Add multi dimensional array support
  • Make maximum vector size tunable
  • Make ratio of VGPRs used for vector promotion tunable

@llvmbot
Copy link
Member

llvmbot commented Feb 20, 2025

@llvm/pr-subscribers-backend-amdgpu

Author: Carl Ritson (perlfu)

Changes
  • Add multi dimensional array support
  • Make maximum vector size tunable
  • Make ratio of VGPRs used for vector promotion tunable

Patch is 93.46 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/127973.diff

9 Files Affected:

  • (modified) llvm/docs/AMDGPUUsage.rst (+169-165)
  • (modified) llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp (+86-25)
  • (modified) llvm/test/CodeGen/AMDGPU/amdgpu.private-memory.ll (+7-2)
  • (modified) llvm/test/CodeGen/AMDGPU/array-ptr-calc-i32.ll (+1-7)
  • (added) llvm/test/CodeGen/AMDGPU/promote-alloca-max-elements.ll (+236)
  • (modified) llvm/test/CodeGen/AMDGPU/promote-alloca-memset.ll (+3-9)
  • (added) llvm/test/CodeGen/AMDGPU/promote-alloca-multidim.ll (+292)
  • (modified) llvm/test/CodeGen/AMDGPU/promote-alloca-no-opts.ll (+2-2)
  • (added) llvm/test/CodeGen/AMDGPU/promote-alloca-vgpr-ratio.ll (+276)
diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
index d580be1eb8cfc..734434641b4bd 100644
--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -1546,180 +1546,184 @@ The AMDGPU backend supports the following LLVM IR attributes.
   .. table:: AMDGPU LLVM IR Attributes
      :name: amdgpu-llvm-ir-attributes-table
 
-     ============================================ ==========================================================
-     LLVM Attribute                               Description
-     ============================================ ==========================================================
-     "amdgpu-flat-work-group-size"="min,max"      Specify the minimum and maximum flat work group sizes that
-                                                  will be specified when the kernel is dispatched. Generated
-                                                  by the ``amdgpu_flat_work_group_size`` CLANG attribute [CLANG-ATTR]_.
-                                                  The IR implied default value is 1,1024. Clang may emit this attribute
-                                                  with more restrictive bounds depending on language defaults.
-                                                  If the actual block or workgroup size exceeds the limit at any point during
-                                                  the execution, the behavior is undefined. For example, even if there is
-                                                  only one active thread but the thread local id exceeds the limit, the
-                                                  behavior is undefined.
-
-     "amdgpu-implicitarg-num-bytes"="n"           Number of kernel argument bytes to add to the kernel
-                                                  argument block size for the implicit arguments. This
-                                                  varies by OS and language (for OpenCL see
-                                                  :ref:`opencl-kernel-implicit-arguments-appended-for-amdhsa-os-table`).
-     "amdgpu-num-sgpr"="n"                        Specifies the number of SGPRs to use. Generated by
-                                                  the ``amdgpu_num_sgpr`` CLANG attribute [CLANG-ATTR]_.
-     "amdgpu-num-vgpr"="n"                        Specifies the number of VGPRs to use. Generated by the
-                                                  ``amdgpu_num_vgpr`` CLANG attribute [CLANG-ATTR]_.
-     "amdgpu-waves-per-eu"="m,n"                  Specify the minimum and maximum number of waves per
-                                                  execution unit. Generated by the ``amdgpu_waves_per_eu``
-                                                  CLANG attribute [CLANG-ATTR]_. This is an optimization hint,
-                                                  and the backend may not be able to satisfy the request. If
-                                                  the specified range is incompatible with the function's
-                                                  "amdgpu-flat-work-group-size" value, the implied occupancy
-                                                  bounds by the workgroup size takes precedence.
-
-     "amdgpu-ieee" true/false.                    GFX6-GFX11 Only
-                                                  Specify whether the function expects the IEEE field of the
-                                                  mode register to be set on entry. Overrides the default for
-                                                  the calling convention.
-     "amdgpu-dx10-clamp" true/false.              GFX6-GFX11 Only
-                                                  Specify whether the function expects the DX10_CLAMP field of
-                                                  the mode register to be set on entry. Overrides the default
-                                                  for the calling convention.
-
-     "amdgpu-no-workitem-id-x"                    Indicates the function does not depend on the value of the
-                                                  llvm.amdgcn.workitem.id.x intrinsic. If a function is marked with this
-                                                  attribute, or reached through a call site marked with this attribute, and
-                                                  that intrinsic is called, the behavior of the program is undefined. (Whole-program
-                                                  undefined behavior is used here because, for example, the absence of a required workitem
-                                                  ID in the preloaded register set can mean that all other preloaded registers
-                                                  are earlier than the compilation assumed they would be.) The backend can
-                                                  generally infer this during code generation, so typically there is no
-                                                  benefit to frontends marking functions with this.
-
-     "amdgpu-no-workitem-id-y"                    The same as amdgpu-no-workitem-id-x, except for the
-                                                  llvm.amdgcn.workitem.id.y intrinsic.
-
-     "amdgpu-no-workitem-id-z"                    The same as amdgpu-no-workitem-id-x, except for the
-                                                  llvm.amdgcn.workitem.id.z intrinsic.
-
-     "amdgpu-no-workgroup-id-x"                   The same as amdgpu-no-workitem-id-x, except for the
-                                                  llvm.amdgcn.workgroup.id.x intrinsic.
-
-     "amdgpu-no-workgroup-id-y"                   The same as amdgpu-no-workitem-id-x, except for the
-                                                  llvm.amdgcn.workgroup.id.y intrinsic.
-
-     "amdgpu-no-workgroup-id-z"                   The same as amdgpu-no-workitem-id-x, except for the
-                                                  llvm.amdgcn.workgroup.id.z intrinsic.
-
-     "amdgpu-no-dispatch-ptr"                     The same as amdgpu-no-workitem-id-x, except for the
-                                                  llvm.amdgcn.dispatch.ptr intrinsic.
-
-     "amdgpu-no-implicitarg-ptr"                  The same as amdgpu-no-workitem-id-x, except for the
-                                                  llvm.amdgcn.implicitarg.ptr intrinsic.
-
-     "amdgpu-no-dispatch-id"                      The same as amdgpu-no-workitem-id-x, except for the
-                                                  llvm.amdgcn.dispatch.id intrinsic.
-
-     "amdgpu-no-queue-ptr"                        Similar to amdgpu-no-workitem-id-x, except for the
-                                                  llvm.amdgcn.queue.ptr intrinsic. Note that unlike the other ABI hint
-                                                  attributes, the queue pointer may be required in situations where the
-                                                  intrinsic call does not directly appear in the program. Some subtargets
-                                                  require the queue pointer for to handle some addrspacecasts, as well
-                                                  as the llvm.amdgcn.is.shared, llvm.amdgcn.is.private, llvm.trap, and
-                                                  llvm.debug intrinsics.
-
-     "amdgpu-no-hostcall-ptr"                     Similar to amdgpu-no-implicitarg-ptr, except specific to the implicit
-                                                  kernel argument that holds the pointer to the hostcall buffer. If this
-                                                  attribute is absent, then the amdgpu-no-implicitarg-ptr is also removed.
-
-     "amdgpu-no-heap-ptr"                         Similar to amdgpu-no-implicitarg-ptr, except specific to the implicit
-                                                  kernel argument that holds the pointer to an initialized memory buffer
-                                                  that conforms to the requirements of the malloc/free device library V1
-                                                  version implementation. If this attribute is absent, then the
-                                                  amdgpu-no-implicitarg-ptr is also removed.
-
-     "amdgpu-no-multigrid-sync-arg"               Similar to amdgpu-no-implicitarg-ptr, except specific to the implicit
-                                                  kernel argument that holds the multigrid synchronization pointer. If this
-                                                  attribute is absent, then the amdgpu-no-implicitarg-ptr is also removed.
-
-     "amdgpu-no-default-queue"                    Similar to amdgpu-no-implicitarg-ptr, except specific to the implicit
-                                                  kernel argument that holds the default queue pointer. If this
-                                                  attribute is absent, then the amdgpu-no-implicitarg-ptr is also removed.
-
-     "amdgpu-no-completion-action"                Similar to amdgpu-no-implicitarg-ptr, except specific to the implicit
-                                                  kernel argument that holds the completion action pointer. If this
-                                                  attribute is absent, then the amdgpu-no-implicitarg-ptr is also removed.
-
-     "amdgpu-lds-size"="min[,max]"                Min is the minimum number of bytes that will be allocated in the Local
-                                                  Data Store at address zero. Variables are allocated within this frame
-                                                  using absolute symbol metadata, primarily by the AMDGPULowerModuleLDS
-                                                  pass. Optional max is the maximum number of bytes that will be allocated.
-                                                  Note that min==max indicates that no further variables can be added to
-                                                  the frame. This is an internal detail of how LDS variables are lowered,
-                                                  language front ends should not set this attribute.
-
-     "amdgpu-gds-size"                            Bytes expected to be allocated at the start of GDS memory at entry.
-
-     "amdgpu-git-ptr-high"                        The hard-wired high half of the address of the global information table
-                                                  for AMDPAL OS type. 0xffffffff represents no hard-wired high half, since
-                                                  current hardware only allows a 16 bit value.
-
-     "amdgpu-32bit-address-high-bits"             Assumed high 32-bits for 32-bit address spaces which are really truncated
-                                                  64-bit addresses (i.e., addrspace(6))
-
-     "amdgpu-color-export"                        Indicates shader exports color information if set to 1.
-                                                  Defaults to 1 for :ref:`amdgpu_ps <amdgpu-cc>`, and 0 for other calling
-                                                  conventions. Determines the necessity and type of null exports when a shader
-                                                  terminates early by killing lanes.
-
-     "amdgpu-depth-export"                        Indicates shader exports depth information if set to 1. Determines the
-                                                  necessity and type of null exports when a shader terminates early by killing
-                                                  lanes. A depth-only shader will export to depth channel when no null export
-                                                  target is available (GFX11+).
-
-     "InitialPSInputAddr"                         Set the initial value of the `spi_ps_input_addr` register for
-                                                  :ref:`amdgpu_ps <amdgpu-cc>` shaders. Any bits enabled by this value will
-                                                  be enabled in the final register value.
-
-     "amdgpu-wave-priority-threshold"             VALU instruction count threshold for adjusting wave priority. If exceeded,
-                                                  temporarily raise the wave priority at the start of the shader function
-                                                  until its last VMEM instructions to allow younger waves to issue their VMEM
-                                                  instructions as well.
+     =============================================== ==========================================================
+     LLVM Attribute                                  Description
+     =============================================== ==========================================================
+     "amdgpu-flat-work-group-size"="min,max"         Specify the minimum and maximum flat work group sizes that
+                                                     will be specified when the kernel is dispatched. Generated
+                                                     by the ``amdgpu_flat_work_group_size`` CLANG attribute [CLANG-ATTR]_.
+                                                     The IR implied default value is 1,1024. Clang may emit this attribute
+                                                     with more restrictive bounds depending on language defaults.
+                                                     If the actual block or workgroup size exceeds the limit at any point during
+                                                     the execution, the behavior is undefined. For example, even if there is
+                                                     only one active thread but the thread local id exceeds the limit, the
+                                                     behavior is undefined.
+
+     "amdgpu-implicitarg-num-bytes"="n"              Number of kernel argument bytes to add to the kernel
+                                                     argument block size for the implicit arguments. This
+                                                     varies by OS and language (for OpenCL see
+                                                     :ref:`opencl-kernel-implicit-arguments-appended-for-amdhsa-os-table`).
+     "amdgpu-num-sgpr"="n"                           Specifies the number of SGPRs to use. Generated by
+                                                     the ``amdgpu_num_sgpr`` CLANG attribute [CLANG-ATTR]_.
+     "amdgpu-num-vgpr"="n"                           Specifies the number of VGPRs to use. Generated by the
+                                                     ``amdgpu_num_vgpr`` CLANG attribute [CLANG-ATTR]_.
+     "amdgpu-waves-per-eu"="m,n"                     Specify the minimum and maximum number of waves per
+                                                     execution unit. Generated by the ``amdgpu_waves_per_eu``
+                                                     CLANG attribute [CLANG-ATTR]_. This is an optimization hint,
+                                                     and the backend may not be able to satisfy the request. If
+                                                     the specified range is incompatible with the function's
+                                                     "amdgpu-flat-work-group-size" value, the implied occupancy
+                                                     bounds by the workgroup size takes precedence.
+
+     "amdgpu-ieee" true/false.                       GFX6-GFX11 Only
+                                                     Specify whether the function expects the IEEE field of the
+                                                     mode register to be set on entry. Overrides the default for
+                                                     the calling convention.
+     "amdgpu-dx10-clamp" true/false.                 GFX6-GFX11 Only
+                                                     Specify whether the function expects the DX10_CLAMP field of
+                                                     the mode register to be set on entry. Overrides the default
+                                                     for the calling convention.
+
+     "amdgpu-no-workitem-id-x"                       Indicates the function does not depend on the value of the
+                                                     llvm.amdgcn.workitem.id.x intrinsic. If a function is marked with this
+                                                     attribute, or reached through a call site marked with this attribute, and
+                                                     that intrinsic is called, the behavior of the program is undefined. (Whole-program
+                                                     undefined behavior is used here because, for example, the absence of a required workitem
+                                                     ID in the preloaded register set can mean that all other preloaded registers
+                                                     are earlier than the compilation assumed they would be.) The backend can
+                                                     generally infer this during code generation, so typically there is no
+                                                     benefit to frontends marking functions with this.
+
+     "amdgpu-no-workitem-id-y"                       The same as amdgpu-no-workitem-id-x, except for the
+                                                     llvm.amdgcn.workitem.id.y intrinsic.
+
+     "amdgpu-no-workitem-id-z"                       The same as amdgpu-no-workitem-id-x, except for the
+                                                     llvm.amdgcn.workitem.id.z intrinsic.
+
+     "amdgpu-no-workgroup-id-x"                      The same as amdgpu-no-workitem-id-x, except for the
+                                                     llvm.amdgcn.workgroup.id.x intrinsic.
+
+     "amdgpu-no-workgroup-id-y"                      The same as amdgpu-no-workitem-id-x, except for the
+                                                     llvm.amdgcn.workgroup.id.y intrinsic.
+
+     "amdgpu-no-workgroup-id-z"                      The same as amdgpu-no-workitem-id-x, except for the
+                                                     llvm.amdgcn.workgroup.id.z intrinsic.
+
+     "amdgpu-no-dispatch-ptr"                        The same as amdgpu-no-workitem-id-x, except for the
+                                                     llvm.amdgcn.dispatch.ptr intrinsic.
+
+     "amdgpu-no-implicitarg-ptr"                     The same as amdgpu-no-workitem-id-x, except for the
+                                                     llvm.amdgcn.implicitarg.ptr intrinsic.
+
+     "amdgpu-no-dispatch-id"                         The same as amdgpu-no-workitem-id-x, except for the
+                                                     llvm.amdgcn.dispatch.id intrinsic.
+
+     "amdgpu-no-queue-ptr"                           Similar to amdgpu-no-workitem-id-x, except for the
+                                                     llvm.amdgcn.queue.ptr intrinsic. Note that unlike the other ABI hint
+                                                     attributes, the queue pointer may be required in situations where the
+                                                     intrinsic call does not directly appear in the program. Some subtargets
+                                                     require the queue pointer for to handle some addrspacecasts, as well
+                                                     as the llvm.amdgcn.is.shared, llvm.amdgcn.is.private, llvm.trap, and
+                                                     llvm.debug intrinsics.
+
+     "amdgpu-no-hostcall-ptr"                        Similar to amdgpu-no-implicitarg-ptr, except specific to the implicit
+                                                     kernel argument that holds the pointer to the hostcall buffer. If this
+                                                     attribute is absent, then the amdgpu-no-implicitarg-ptr is also removed.
+
+     "amdgpu-no-heap-ptr"                            Similar to amdgpu-no-implicitarg-ptr, except specific to the implicit
+                                                     kernel...
[truncated]

Copy link

github-actions bot commented Feb 20, 2025

✅ With the latest revision this PR passed the undef deprecator.

Comment on lines 69 to 73
static cl::opt<unsigned> PromoteAllocaToVectorMaxElements(
"amdgpu-promote-alloca-to-vector-max-elements",
cl::desc("Maximum vector size (in elements) to use when promoting alloca"),
cl::init(16));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should turn these into pass parameters instead of opts.

Elements seems like a strange way to express this. Ideally we would pack the sub-32-bit element vectors into access of 32-bit vector

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we expect end users to use these options?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Elements seems like a strange way to express this.

Element count is how the limit is currently defined in the code. I agree in terms of 32-bit words (registers) would make more sense. I'll change to this model, but it does mean this patch will not preserve the existing limit so some edge case promotion will change.

Do we expect end users to use these options?

Graphics front end will use these for shader tuning, which is why they are accessible via function attributes as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. The reason I was asking is, if we expect any uses from users, we need to expose them as an option instead of a pass option, but based on your description, we don't expect from end users, and compiler front end can tune it via function attributes.

@arsenm arsenm requested a review from Pierre-vh February 20, 2025 09:06
* Add multi dimensional array support
* Make maximum vector size tunable
* Make ratio of VGPRs used for vector promotion tunable
- Move options into pass attributes
- Change vector size limit from max elements to max 32b registers
- Add tests for i16, float and ptr in multi-dimensional arrays
@perlfu perlfu force-pushed the amdgpu-promote-alloca-multidim branch from 2afc3ac to 77d30c3 Compare February 21, 2025 10:23
@perlfu
Copy link
Contributor Author

perlfu commented Feb 21, 2025

Note: I have changed the max vector size to be based on number of 32b registers. With the current value of 16, this mean some alloca which were previously promoted are no longer promoted.
See promote-alloca-subvecs.ll and vector-alloca-limits.ll where I had to increase the limit for the tests to pass.
Alternatively we could raise the default to 32 to allow 16 x i64/double/ptr to be promoted as previously, with the effect that 32 x i32/float would now be promoted.

@perlfu
Copy link
Contributor Author

perlfu commented Mar 5, 2025

Ping

@perlfu perlfu merged commit d921bf2 into llvm:main Mar 12, 2025
12 checks passed
@llvm-ci
Copy link
Collaborator

llvm-ci commented Mar 12, 2025

LLVM Buildbot has detected a new failure on builder openmp-offload-amdgpu-runtime-2 running on rocm-worker-hw-02 while building llvm at step 5 "compile-openmp".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/10/builds/1133

Here is the relevant piece of the build log for the reference
Step 5 (compile-openmp) failure: build (failure)
...
1.114 [4/29/672] Building CXX object libc/src/math/generic/CMakeFiles/libc.src.math.generic.cbrt.dir/cbrt.cpp.o
1.117 [4/28/673] Building CXX object libc/src/math/generic/CMakeFiles/libc.src.math.generic.asinhf.dir/asinhf.cpp.o
1.118 [4/27/674] Building CXX object libc/src/math/generic/CMakeFiles/libc.src.math.generic.f16sqrt.dir/f16sqrt.cpp.o
1.124 [4/26/675] Building CXX object libc/src/math/generic/CMakeFiles/libc.src.math.generic.f16subl.dir/f16subl.cpp.o
1.129 [4/25/676] Building CXX object libc/src/math/generic/CMakeFiles/libc.src.math.generic.f16addf.dir/f16addf.cpp.o
1.129 [4/24/677] Building CXX object libc/src/math/generic/CMakeFiles/libc.src.math.generic.fsqrtl.dir/fsqrtl.cpp.o
1.131 [4/23/678] Building CXX object libc/src/stdio/CMakeFiles/libc.src.stdio.vasprintf.dir/vasprintf.cpp.o
1.133 [4/22/679] Building CXX object libc/src/math/generic/CMakeFiles/libc.src.math.generic.f16sqrtl.dir/f16sqrtl.cpp.o
1.139 [4/21/680] Building CXX object libc/src/math/generic/CMakeFiles/libc.src.math.generic.scalblnf.dir/scalblnf.cpp.o
1.140 [4/20/681] Building CXX object libc/src/math/generic/CMakeFiles/libc.src.math.generic.tanhf.dir/tanhf.cpp.o
FAILED: libc/src/math/generic/CMakeFiles/libc.src.math.generic.tanhf.dir/tanhf.cpp.o 
/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/./bin/clang++ --target=amdgcn-amd-amdhsa -DLIBC_NAMESPACE=__llvm_libc_21_0_0_git -D__LIBC_USE_FLOAT16_CONVERSION -I/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/libc -isystem /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/include/amdgcn-amd-amdhsa -O3 -DNDEBUG --target=amdgcn-amd-amdhsa -D__LIBC_USE_BUILTIN_CEIL_FLOOR_RINT_TRUNC -D__LIBC_USE_BUILTIN_ROUND -D__LIBC_USE_BUILTIN_ROUNDEVEN -DLIBC_QSORT_IMPL=LIBC_QSORT_QUICK_SORT -DLIBC_ADD_NULL_CHECKS "-DLIBC_MATH=(LIBC_MATH_SKIP_ACCURATE_PASS | LIBC_MATH_SMALL_TABLES | LIBC_MATH_NO_ERRNO | LIBC_MATH_NO_EXCEPT)" -fpie -DLIBC_FULL_BUILD -nostdlibinc -ffixed-point -fno-exceptions -fno-lax-vector-conversions -fno-unwind-tables -fno-asynchronous-unwind-tables -fno-rtti -ftrivial-auto-var-init=pattern -fno-omit-frame-pointer -Wall -Wextra -Werror -Wconversion -Wno-sign-conversion -Wdeprecated -Wno-c99-extensions -Wno-gnu-imaginary-constant -Wno-pedantic -Wimplicit-fallthrough -Wwrite-strings -Wextra-semi -Wnewline-eof -Wnonportable-system-include-path -Wstrict-prototypes -Wthread-safety -Wglobal-constructors -nogpulib -fvisibility=hidden -fconvergent-functions -flto -Wno-multi-gpu -Xclang -mcode-object-version=none -DLIBC_COPT_PUBLIC_PACKAGING -UNDEBUG -MD -MT libc/src/math/generic/CMakeFiles/libc.src.math.generic.tanhf.dir/tanhf.cpp.o -MF libc/src/math/generic/CMakeFiles/libc.src.math.generic.tanhf.dir/tanhf.cpp.o.d -o libc/src/math/generic/CMakeFiles/libc.src.math.generic.tanhf.dir/tanhf.cpp.o -c /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/libc/src/math/generic/tanhf.cpp
clang++: /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/llvm/lib/IR/Instructions.cpp:2694: static llvm::BinaryOperator* llvm::BinaryOperator::Create(llvm::Instruction::BinaryOps, llvm::Value*, llvm::Value*, const llvm::Twine&, llvm::InsertPosition): Assertion `S1->getType() == S2->getType() && "Cannot create binary operator with two operands of differing type!"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.	Program arguments: /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/./bin/clang++ --target=amdgcn-amd-amdhsa -DLIBC_NAMESPACE=__llvm_libc_21_0_0_git -D__LIBC_USE_FLOAT16_CONVERSION -I/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/libc -isystem /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/include/amdgcn-amd-amdhsa -O3 -DNDEBUG --target=amdgcn-amd-amdhsa -D__LIBC_USE_BUILTIN_CEIL_FLOOR_RINT_TRUNC -D__LIBC_USE_BUILTIN_ROUND -D__LIBC_USE_BUILTIN_ROUNDEVEN -DLIBC_QSORT_IMPL=LIBC_QSORT_QUICK_SORT -DLIBC_ADD_NULL_CHECKS "-DLIBC_MATH=(LIBC_MATH_SKIP_ACCURATE_PASS | LIBC_MATH_SMALL_TABLES | LIBC_MATH_NO_ERRNO | LIBC_MATH_NO_EXCEPT)" -fpie -DLIBC_FULL_BUILD -nostdlibinc -ffixed-point -fno-exceptions -fno-lax-vector-conversions -fno-unwind-tables -fno-asynchronous-unwind-tables -fno-rtti -ftrivial-auto-var-init=pattern -fno-omit-frame-pointer -Wall -Wextra -Werror -Wconversion -Wno-sign-conversion -Wdeprecated -Wno-c99-extensions -Wno-gnu-imaginary-constant -Wno-pedantic -Wimplicit-fallthrough -Wwrite-strings -Wextra-semi -Wnewline-eof -Wnonportable-system-include-path -Wstrict-prototypes -Wthread-safety -Wglobal-constructors -nogpulib -fvisibility=hidden -fconvergent-functions -flto -Wno-multi-gpu -Xclang -mcode-object-version=none -DLIBC_COPT_PUBLIC_PACKAGING -UNDEBUG -MD -MT libc/src/math/generic/CMakeFiles/libc.src.math.generic.tanhf.dir/tanhf.cpp.o -MF libc/src/math/generic/CMakeFiles/libc.src.math.generic.tanhf.dir/tanhf.cpp.o.d -o libc/src/math/generic/CMakeFiles/libc.src.math.generic.tanhf.dir/tanhf.cpp.o -c /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/libc/src/math/generic/tanhf.cpp
1.	<eof> parser at end of file
2.	Optimizer
3.	Running pass "require<globals-aa>,function(invalidate<aa>),require<profile-summary>,cgscc(devirt<4>(inline,function-attrs<skip-non-recursive-function-attrs>,argpromotion,openmp-opt-cgscc,function(amdgpu-promote-kernel-arguments,infer-address-spaces,amdgpu-lower-kernel-attributes,amdgpu-promote-alloca-to-vector),function<eager-inv;no-rerun>(sroa<modify-cfg>,early-cse<memssa>,speculative-execution<only-if-divergent-target>,jump-threading,correlated-propagation,simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;switch-range-to-icmp;no-switch-to-lookup;keep-loops;no-hoist-common-insts;no-hoist-loads-stores-with-cond-faulting;no-sink-common-insts;speculate-blocks;simplify-cond-branch;no-speculate-unpredictables>,instcombine<max-iterations=1;no-verify-fixpoint>,aggressive-instcombine,libcalls-shrinkwrap,amdgpu-usenative,amdgpu-simplifylib,tailcallelim,simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;switch-range-to-icmp;no-switch-to-lookup;keep-loops;no-hoist-common-insts;no-hoist-loads-stores-with-cond-faulting;no-sink-common-insts;speculate-blocks;simplify-cond-branch;no-speculate-unpredictables>,reassociate,constraint-elimination,loop-mssa(loop-instsimplify,loop-simplifycfg,licm<no-allowspeculation>,loop-rotate<header-duplication;prepare-for-lto>,licm<allowspeculation>,simple-loop-unswitch<nontrivial;trivial>),simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;switch-range-to-icmp;no-switch-to-lookup;keep-loops;no-hoist-common-insts;no-hoist-loads-stores-with-cond-faulting;no-sink-common-insts;speculate-blocks;simplify-cond-branch;no-speculate-unpredictables>,instcombine<max-iterations=1;no-verify-fixpoint>,loop(loop-idiom,indvars,extra-simple-loop-unswitch-passes,loop-deletion,loop-unroll-full),sroa<modify-cfg>,vector-combine,mldst-motion<no-split-footer-bb>,gvn<>,sccp,bdce,instcombine<max-iterations=1;no-verify-fixpoint>,amdgpu-usenative,amdgpu-simplifylib,jump-threading,correlated-propagation,adce,memcpyopt,dse,move-auto-init,loop-mssa(licm<allowspeculation>),coro-elide,simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;switch-range-to-icmp;no-switch-to-lookup;keep-loops;hoist-common-insts;no-hoist-loads-stores-with-cond-faulting;sink-common-insts;speculate-blocks;simplify-cond-branch;no-speculate-unpredictables>,instcombine<max-iterations=1;no-verify-fixpoint>,amdgpu-usenative,amdgpu-simplifylib),function-attrs,function(require<should-not-run-function-passes>),coro-split,coro-annotation-elide)),function(invalidate<should-not-run-function-passes>),cgscc(devirt<4>())" on module "/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/libc/src/math/generic/tanhf.cpp"
4.	Running pass "cgscc(devirt<4>(inline,function-attrs<skip-non-recursive-function-attrs>,argpromotion,openmp-opt-cgscc,function(amdgpu-promote-kernel-arguments,infer-address-spaces,amdgpu-lower-kernel-attributes,amdgpu-promote-alloca-to-vector),function<eager-inv;no-rerun>(sroa<modify-cfg>,early-cse<memssa>,speculative-execution<only-if-divergent-target>,jump-threading,correlated-propagation,simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;switch-range-to-icmp;no-switch-to-lookup;keep-loops;no-hoist-common-insts;no-hoist-loads-stores-with-cond-faulting;no-sink-common-insts;speculate-blocks;simplify-cond-branch;no-speculate-unpredictables>,instcombine<max-iterations=1;no-verify-fixpoint>,aggressive-instcombine,libcalls-shrinkwrap,amdgpu-usenative,amdgpu-simplifylib,tailcallelim,simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;switch-range-to-icmp;no-switch-to-lookup;keep-loops;no-hoist-common-insts;no-hoist-loads-stores-with-cond-faulting;no-sink-common-insts;speculate-blocks;simplify-cond-branch;no-speculate-unpredictables>,reassociate,constraint-elimination,loop-mssa(loop-instsimplify,loop-simplifycfg,licm<no-allowspeculation>,loop-rotate<header-duplication;prepare-for-lto>,licm<allowspeculation>,simple-loop-unswitch<nontrivial;trivial>),simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;switch-range-to-icmp;no-switch-to-lookup;keep-loops;no-hoist-common-insts;no-hoist-loads-stores-with-cond-faulting;no-sink-common-insts;speculate-blocks;simplify-cond-branch;no-speculate-unpredictables>,instcombine<max-iterations=1;no-verify-fixpoint>,loop(loop-idiom,indvars,extra-simple-loop-unswitch-passes,loop-deletion,loop-unroll-full),sroa<modify-cfg>,vector-combine,mldst-motion<no-split-footer-bb>,gvn<>,sccp,bdce,instcombine<max-iterations=1;no-verify-fixpoint>,amdgpu-usenative,amdgpu-simplifylib,jump-threading,correlated-propagation,adce,memcpyopt,dse,move-auto-init,loop-mssa(licm<allowspeculation>),coro-elide,simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;switch-range-to-icmp;no-switch-to-lookup;keep-loops;hoist-common-insts;no-hoist-loads-stores-with-cond-faulting;sink-common-insts;speculate-blocks;simplify-cond-branch;no-speculate-unpredictables>,instcombine<max-iterations=1;no-verify-fixpoint>,amdgpu-usenative,amdgpu-simplifylib),function-attrs,function(require<should-not-run-function-passes>),coro-split,coro-annotation-elide))" on module "/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/libc/src/math/generic/tanhf.cpp"
5.	Running pass "amdgpu-promote-alloca-to-vector" on function "tanhf"
 #0 0x00007758b23e5790 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/bin/../lib/libLLVMSupport.so.21.0git+0x1e5790)
 #1 0x00007758b23e2b8f llvm::sys::RunSignalHandlers() (/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/bin/../lib/libLLVMSupport.so.21.0git+0x1e2b8f)
 #2 0x00007758b22dfca8 CrashRecoverySignalHandler(int) CrashRecoveryContext.cpp:0:0
 #3 0x00007758b1a42520 (/lib/x86_64-linux-gnu/libc.so.6+0x42520)
 #4 0x00007758b1a969fc __pthread_kill_implementation ./nptl/pthread_kill.c:44:76
 #5 0x00007758b1a969fc __pthread_kill_internal ./nptl/pthread_kill.c:78:10
 #6 0x00007758b1a969fc pthread_kill ./nptl/pthread_kill.c:89:10
 #7 0x00007758b1a42476 gsignal ./signal/../sysdeps/posix/raise.c:27:6
 #8 0x00007758b1a287f3 abort ./stdlib/abort.c:81:7
 #9 0x00007758b1a2871b _nl_load_domain ./intl/loadmsgcat.c:1177:9
#10 0x00007758b1a39e96 (/lib/x86_64-linux-gnu/libc.so.6+0x39e96)
#11 0x00007758b285a0d5 (/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/bin/../lib/libLLVMCore.so.21.0git+0x25a0d5)
#12 0x00007758b2867500 llvm::BinaryOperator::CreateNeg(llvm::Value*, llvm::Twine const&, llvm::InsertPosition) (/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/bin/../lib/libLLVMCore.so.21.0git+0x267500)
#13 0x00007758b8374da6 llvm::IRBuilderBase::CreateMul(llvm::Value*, llvm::Value*, llvm::Twine const&, bool, bool) (/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/bin/../lib/libLLVMAMDGPUCodeGen.so.21.0git+0x174da6)
#14 0x00007758b853bc34 GEPToVectorIndex(llvm::GetElementPtrInst*, llvm::AllocaInst*, llvm::Type*, llvm::DataLayout const&, llvm::SmallVector<llvm::Instruction*, 6u>&) AMDGPUPromoteAlloca.cpp:0:0
#15 0x00007758b8541f53 (anonymous namespace)::AMDGPUPromoteAllocaImpl::tryPromoteAllocaToVector(llvm::AllocaInst&) AMDGPUPromoteAlloca.cpp:0:0
#16 0x00007758b8543a94 (anonymous namespace)::AMDGPUPromoteAllocaImpl::run(llvm::Function&, bool) (.part.0) AMDGPUPromoteAlloca.cpp:0:0
#17 0x00007758b85468cd llvm::AMDGPUPromoteAllocaToVectorPass::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/bin/../lib/libLLVMAMDGPUCodeGen.so.21.0git+0x3468cd)
#18 0x00007758b85be476 llvm::detail::PassModel<llvm::Function, llvm::AMDGPUPromoteAllocaToVectorPass, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/bin/../lib/libLLVMAMDGPUCodeGen.so.21.0git+0x3be476)
#19 0x00007758b28f9d0f llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/bin/../lib/libLLVMCore.so.21.0git+0x2f9d0f)
#20 0x00007758b94ab036 llvm::detail::PassModel<llvm::Function, llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/bin/../lib/libLLVMX86CodeGen.so.21.0git+0xab036)
#21 0x00007758b2d62964 llvm::CGSCCToFunctionPassAdaptor::run(llvm::LazyCallGraph::SCC&, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>&, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&) (/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/bin/../lib/libLLVMAnalysis.so.21.0git+0x162964)
#22 0x00007758b85becf6 llvm::detail::PassModel<llvm::LazyCallGraph::SCC, llvm::CGSCCToFunctionPassAdaptor, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&>::run(llvm::LazyCallGraph::SCC&, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>&, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&) (/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/bin/../lib/libLLVMAMDGPUCodeGen.so.21.0git+0x3becf6)
#23 0x00007758b2d5b337 llvm::PassManager<llvm::LazyCallGraph::SCC, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&>::run(llvm::LazyCallGraph::SCC&, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>&, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&) (/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/bin/../lib/libLLVMAnalysis.so.21.0git+0x15b337)
#24 0x00007758b5810346 llvm::detail::PassModel<llvm::LazyCallGraph::SCC, llvm::PassManager<llvm::LazyCallGraph::SCC, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&>, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&>::run(llvm::LazyCallGraph::SCC&, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>&, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&) (/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/bin/../lib/libLLVMipo.so.21.0git+0x210346)
#25 0x00007758b2d63c1d llvm::DevirtSCCRepeatedPass::run(llvm::LazyCallGraph::SCC&, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>&, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&) (/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/bin/../lib/libLLVMAnalysis.so.21.0git+0x163c1d)
#26 0x00007758b58102f6 llvm::detail::PassModel<llvm::LazyCallGraph::SCC, llvm::DevirtSCCRepeatedPass, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&>::run(llvm::LazyCallGraph::SCC&, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>&, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&) (/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/bin/../lib/libLLVMipo.so.21.0git+0x2102f6)
#27 0x00007758b2d5e3ee llvm::ModuleToPostOrderCGSCCPassAdaptor::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/bin/../lib/libLLVMAnalysis.so.21.0git+0x15e3ee)

@llvm-ci
Copy link
Collaborator

llvm-ci commented Mar 12, 2025

LLVM Buildbot has detected a new failure on builder openmp-offload-libc-amdgpu-runtime running on omp-vega20-1 while building llvm at step 5 "compile-openmp".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/73/builds/14431

Here is the relevant piece of the build log for the reference
Step 5 (compile-openmp) failure: build (failure)
...
3.394 [106/34/565] Building CXX object libc/src/math/generic/CMakeFiles/libc.src.math.generic.scalbnl.dir/scalbnl.cpp.o
3.395 [105/34/566] Building CXX object libc/src/math/generic/CMakeFiles/libc.src.math.generic.setpayloadsigf16.dir/setpayloadsigf16.cpp.o
3.557 [104/34/567] Building CXX object libc/src/stdio/CMakeFiles/libc.src.stdio.snprintf.dir/snprintf.cpp.o
3.572 [103/34/568] Building CXX object libc/src/stdio/CMakeFiles/libc.src.stdio.asprintf.dir/asprintf.cpp.o
3.574 [102/34/569] Building CXX object libc/src/math/generic/CMakeFiles/libc.src.math.generic.cbrtf.dir/cbrtf.cpp.o
3.585 [101/34/570] Building CXX object libc/src/stdio/CMakeFiles/libc.src.stdio.vsnprintf.dir/vsnprintf.cpp.o
3.596 [100/34/571] Building CXX object libc/src/stdio/CMakeFiles/libc.src.stdio.sprintf.dir/sprintf.cpp.o
3.622 [99/34/572] Building CXX object libc/src/stdio/CMakeFiles/libc.src.stdio.vasprintf.dir/vasprintf.cpp.o
3.626 [98/34/573] Building CXX object libc/src/stdio/CMakeFiles/libc.src.stdio.vsprintf.dir/vsprintf.cpp.o
3.646 [97/34/574] Building CXX object libc/src/math/generic/CMakeFiles/libc.src.math.generic.tanhf.dir/tanhf.cpp.o
FAILED: libc/src/math/generic/CMakeFiles/libc.src.math.generic.tanhf.dir/tanhf.cpp.o 
/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/clang++ --target=amdgcn-amd-amdhsa -DLIBC_NAMESPACE=__llvm_libc_21_0_0_git -D__LIBC_USE_FLOAT16_CONVERSION -I/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/libc -isystem /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/include/amdgcn-amd-amdhsa -O3 -DNDEBUG --target=amdgcn-amd-amdhsa -D__LIBC_USE_BUILTIN_CEIL_FLOOR_RINT_TRUNC -D__LIBC_USE_BUILTIN_ROUND -D__LIBC_USE_BUILTIN_ROUNDEVEN -DLIBC_QSORT_IMPL=LIBC_QSORT_QUICK_SORT -DLIBC_ADD_NULL_CHECKS "-DLIBC_MATH=(LIBC_MATH_SKIP_ACCURATE_PASS | LIBC_MATH_SMALL_TABLES | LIBC_MATH_NO_ERRNO | LIBC_MATH_NO_EXCEPT)" -fpie -DLIBC_FULL_BUILD -nostdlibinc -ffixed-point -fno-exceptions -fno-lax-vector-conversions -fno-unwind-tables -fno-asynchronous-unwind-tables -fno-rtti -ftrivial-auto-var-init=pattern -fno-omit-frame-pointer -Wall -Wextra -Werror -Wconversion -Wno-sign-conversion -Wdeprecated -Wno-c99-extensions -Wno-gnu-imaginary-constant -Wno-pedantic -Wimplicit-fallthrough -Wwrite-strings -Wextra-semi -Wnewline-eof -Wnonportable-system-include-path -Wstrict-prototypes -Wthread-safety -Wglobal-constructors -nogpulib -fvisibility=hidden -fconvergent-functions -flto -Wno-multi-gpu -Xclang -mcode-object-version=none -DLIBC_COPT_PUBLIC_PACKAGING -UNDEBUG -MD -MT libc/src/math/generic/CMakeFiles/libc.src.math.generic.tanhf.dir/tanhf.cpp.o -MF libc/src/math/generic/CMakeFiles/libc.src.math.generic.tanhf.dir/tanhf.cpp.o.d -o libc/src/math/generic/CMakeFiles/libc.src.math.generic.tanhf.dir/tanhf.cpp.o -c /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/libc/src/math/generic/tanhf.cpp
clang++: /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/llvm/lib/IR/Instructions.cpp:2694: static llvm::BinaryOperator* llvm::BinaryOperator::Create(llvm::Instruction::BinaryOps, llvm::Value*, llvm::Value*, const llvm::Twine&, llvm::InsertPosition): Assertion `S1->getType() == S2->getType() && "Cannot create binary operator with two operands of differing type!"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.	Program arguments: /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/clang++ --target=amdgcn-amd-amdhsa -DLIBC_NAMESPACE=__llvm_libc_21_0_0_git -D__LIBC_USE_FLOAT16_CONVERSION -I/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/libc -isystem /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/include/amdgcn-amd-amdhsa -O3 -DNDEBUG --target=amdgcn-amd-amdhsa -D__LIBC_USE_BUILTIN_CEIL_FLOOR_RINT_TRUNC -D__LIBC_USE_BUILTIN_ROUND -D__LIBC_USE_BUILTIN_ROUNDEVEN -DLIBC_QSORT_IMPL=LIBC_QSORT_QUICK_SORT -DLIBC_ADD_NULL_CHECKS "-DLIBC_MATH=(LIBC_MATH_SKIP_ACCURATE_PASS | LIBC_MATH_SMALL_TABLES | LIBC_MATH_NO_ERRNO | LIBC_MATH_NO_EXCEPT)" -fpie -DLIBC_FULL_BUILD -nostdlibinc -ffixed-point -fno-exceptions -fno-lax-vector-conversions -fno-unwind-tables -fno-asynchronous-unwind-tables -fno-rtti -ftrivial-auto-var-init=pattern -fno-omit-frame-pointer -Wall -Wextra -Werror -Wconversion -Wno-sign-conversion -Wdeprecated -Wno-c99-extensions -Wno-gnu-imaginary-constant -Wno-pedantic -Wimplicit-fallthrough -Wwrite-strings -Wextra-semi -Wnewline-eof -Wnonportable-system-include-path -Wstrict-prototypes -Wthread-safety -Wglobal-constructors -nogpulib -fvisibility=hidden -fconvergent-functions -flto -Wno-multi-gpu -Xclang -mcode-object-version=none -DLIBC_COPT_PUBLIC_PACKAGING -UNDEBUG -MD -MT libc/src/math/generic/CMakeFiles/libc.src.math.generic.tanhf.dir/tanhf.cpp.o -MF libc/src/math/generic/CMakeFiles/libc.src.math.generic.tanhf.dir/tanhf.cpp.o.d -o libc/src/math/generic/CMakeFiles/libc.src.math.generic.tanhf.dir/tanhf.cpp.o -c /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/libc/src/math/generic/tanhf.cpp
1.	<eof> parser at end of file
2.	Optimizer
3.	Running pass "require<globals-aa>,function(invalidate<aa>),require<profile-summary>,cgscc(devirt<4>(inline,function-attrs<skip-non-recursive-function-attrs>,argpromotion,openmp-opt-cgscc,function(amdgpu-promote-kernel-arguments,infer-address-spaces,amdgpu-lower-kernel-attributes,amdgpu-promote-alloca-to-vector),function<eager-inv;no-rerun>(sroa<modify-cfg>,early-cse<memssa>,speculative-execution<only-if-divergent-target>,jump-threading,correlated-propagation,simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;switch-range-to-icmp;no-switch-to-lookup;keep-loops;no-hoist-common-insts;no-hoist-loads-stores-with-cond-faulting;no-sink-common-insts;speculate-blocks;simplify-cond-branch;no-speculate-unpredictables>,instcombine<max-iterations=1;no-verify-fixpoint>,aggressive-instcombine,libcalls-shrinkwrap,amdgpu-usenative,amdgpu-simplifylib,tailcallelim,simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;switch-range-to-icmp;no-switch-to-lookup;keep-loops;no-hoist-common-insts;no-hoist-loads-stores-with-cond-faulting;no-sink-common-insts;speculate-blocks;simplify-cond-branch;no-speculate-unpredictables>,reassociate,constraint-elimination,loop-mssa(loop-instsimplify,loop-simplifycfg,licm<no-allowspeculation>,loop-rotate<header-duplication;prepare-for-lto>,licm<allowspeculation>,simple-loop-unswitch<nontrivial;trivial>),simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;switch-range-to-icmp;no-switch-to-lookup;keep-loops;no-hoist-common-insts;no-hoist-loads-stores-with-cond-faulting;no-sink-common-insts;speculate-blocks;simplify-cond-branch;no-speculate-unpredictables>,instcombine<max-iterations=1;no-verify-fixpoint>,loop(loop-idiom,indvars,extra-simple-loop-unswitch-passes,loop-deletion,loop-unroll-full),sroa<modify-cfg>,vector-combine,mldst-motion<no-split-footer-bb>,gvn<>,sccp,bdce,instcombine<max-iterations=1;no-verify-fixpoint>,amdgpu-usenative,amdgpu-simplifylib,jump-threading,correlated-propagation,adce,memcpyopt,dse,move-auto-init,loop-mssa(licm<allowspeculation>),coro-elide,simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;switch-range-to-icmp;no-switch-to-lookup;keep-loops;hoist-common-insts;no-hoist-loads-stores-with-cond-faulting;sink-common-insts;speculate-blocks;simplify-cond-branch;no-speculate-unpredictables>,instcombine<max-iterations=1;no-verify-fixpoint>,amdgpu-usenative,amdgpu-simplifylib),function-attrs,function(require<should-not-run-function-passes>),coro-split,coro-annotation-elide)),function(invalidate<should-not-run-function-passes>),cgscc(devirt<4>())" on module "/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/libc/src/math/generic/tanhf.cpp"
4.	Running pass "cgscc(devirt<4>(inline,function-attrs<skip-non-recursive-function-attrs>,argpromotion,openmp-opt-cgscc,function(amdgpu-promote-kernel-arguments,infer-address-spaces,amdgpu-lower-kernel-attributes,amdgpu-promote-alloca-to-vector),function<eager-inv;no-rerun>(sroa<modify-cfg>,early-cse<memssa>,speculative-execution<only-if-divergent-target>,jump-threading,correlated-propagation,simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;switch-range-to-icmp;no-switch-to-lookup;keep-loops;no-hoist-common-insts;no-hoist-loads-stores-with-cond-faulting;no-sink-common-insts;speculate-blocks;simplify-cond-branch;no-speculate-unpredictables>,instcombine<max-iterations=1;no-verify-fixpoint>,aggressive-instcombine,libcalls-shrinkwrap,amdgpu-usenative,amdgpu-simplifylib,tailcallelim,simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;switch-range-to-icmp;no-switch-to-lookup;keep-loops;no-hoist-common-insts;no-hoist-loads-stores-with-cond-faulting;no-sink-common-insts;speculate-blocks;simplify-cond-branch;no-speculate-unpredictables>,reassociate,constraint-elimination,loop-mssa(loop-instsimplify,loop-simplifycfg,licm<no-allowspeculation>,loop-rotate<header-duplication;prepare-for-lto>,licm<allowspeculation>,simple-loop-unswitch<nontrivial;trivial>),simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;switch-range-to-icmp;no-switch-to-lookup;keep-loops;no-hoist-common-insts;no-hoist-loads-stores-with-cond-faulting;no-sink-common-insts;speculate-blocks;simplify-cond-branch;no-speculate-unpredictables>,instcombine<max-iterations=1;no-verify-fixpoint>,loop(loop-idiom,indvars,extra-simple-loop-unswitch-passes,loop-deletion,loop-unroll-full),sroa<modify-cfg>,vector-combine,mldst-motion<no-split-footer-bb>,gvn<>,sccp,bdce,instcombine<max-iterations=1;no-verify-fixpoint>,amdgpu-usenative,amdgpu-simplifylib,jump-threading,correlated-propagation,adce,memcpyopt,dse,move-auto-init,loop-mssa(licm<allowspeculation>),coro-elide,simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;switch-range-to-icmp;no-switch-to-lookup;keep-loops;hoist-common-insts;no-hoist-loads-stores-with-cond-faulting;sink-common-insts;speculate-blocks;simplify-cond-branch;no-speculate-unpredictables>,instcombine<max-iterations=1;no-verify-fixpoint>,amdgpu-usenative,amdgpu-simplifylib),function-attrs,function(require<should-not-run-function-passes>),coro-split,coro-annotation-elide))" on module "/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/libc/src/math/generic/tanhf.cpp"
5.	Running pass "amdgpu-promote-alloca-to-vector" on function "tanhf"
 #0 0x000055fa2b6399ff llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/clang+++0x2ad09ff)
 #1 0x000055fa2b637514 llvm::sys::CleanupOnSignal(unsigned long) (/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/clang+++0x2ace514)
 #2 0x000055fa2b576988 CrashRecoverySignalHandler(int) CrashRecoveryContext.cpp:0:0
 #3 0x00007f7202281420 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x14420)
 #4 0x00007f7201d4e00b raise /build/glibc-FcRMwW/glibc-2.31/signal/../sysdeps/unix/sysv/linux/raise.c:51:1
 #5 0x00007f7201d2d859 abort /build/glibc-FcRMwW/glibc-2.31/stdlib/abort.c:81:7
 #6 0x00007f7201d2d729 get_sysdep_segment_value /build/glibc-FcRMwW/glibc-2.31/intl/loadmsgcat.c:509:8
 #7 0x00007f7201d2d729 _nl_load_domain /build/glibc-FcRMwW/glibc-2.31/intl/loadmsgcat.c:970:34
 #8 0x00007f7201d3efd6 (/lib/x86_64-linux-gnu/libc.so.6+0x33fd6)
 #9 0x000055fa2af055c5 (/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/clang+++0x239c5c5)
#10 0x000055fa2af10500 llvm::BinaryOperator::CreateNeg(llvm::Value*, llvm::Twine const&, llvm::InsertPosition) (/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/clang+++0x23a7500)
#11 0x000055fa299c1885 llvm::IRBuilderBase::CreateMul(llvm::Value*, llvm::Value*, llvm::Twine const&, bool, bool) (/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/clang+++0xe58885)
#12 0x000055fa2a1f8b6b GEPToVectorIndex(llvm::GetElementPtrInst*, llvm::AllocaInst*, llvm::Type*, llvm::DataLayout const&, llvm::SmallVector<llvm::Instruction*, 6u>&) AMDGPUPromoteAlloca.cpp:0:0
#13 0x000055fa2a1feb7d (anonymous namespace)::AMDGPUPromoteAllocaImpl::tryPromoteAllocaToVector(llvm::AllocaInst&) AMDGPUPromoteAlloca.cpp:0:0
#14 0x000055fa2a201ee0 (anonymous namespace)::AMDGPUPromoteAllocaImpl::run(llvm::Function&, bool) (.part.0) AMDGPUPromoteAlloca.cpp:0:0
#15 0x000055fa2a20443d llvm::AMDGPUPromoteAllocaToVectorPass::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/clang+++0x169b43d)
#16 0x000055fa29dcc7b6 llvm::detail::PassModel<llvm::Function, llvm::AMDGPUPromoteAllocaToVectorPass, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/clang+++0x12637b6)
#17 0x000055fa2afa8d99 llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/clang+++0x243fd99)
#18 0x000055fa2999cef6 llvm::detail::PassModel<llvm::Function, llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/clang+++0xe33ef6)
#19 0x000055fa2a48aeaa llvm::CGSCCToFunctionPassAdaptor::run(llvm::LazyCallGraph::SCC&, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>&, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&) (/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/clang+++0x1921eaa)
#20 0x000055fa29dccec6 llvm::detail::PassModel<llvm::LazyCallGraph::SCC, llvm::CGSCCToFunctionPassAdaptor, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&>::run(llvm::LazyCallGraph::SCC&, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>&, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&) (/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/clang+++0x1263ec6)
#21 0x000055fa2a48207a llvm::PassManager<llvm::LazyCallGraph::SCC, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&>::run(llvm::LazyCallGraph::SCC&, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>&, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&) (/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/clang+++0x191907a)
#22 0x000055fa2cc12476 llvm::detail::PassModel<llvm::LazyCallGraph::SCC, llvm::PassManager<llvm::LazyCallGraph::SCC, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&>, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&>::run(llvm::LazyCallGraph::SCC&, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>&, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&) (/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/clang+++0x40a9476)
#23 0x000055fa2a487a8d llvm::DevirtSCCRepeatedPass::run(llvm::LazyCallGraph::SCC&, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>&, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&) (/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/clang+++0x191ea8d)
#24 0x000055fa2cc124c6 llvm::detail::PassModel<llvm::LazyCallGraph::SCC, llvm::DevirtSCCRepeatedPass, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&>::run(llvm::LazyCallGraph::SCC&, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>&, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&) (/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/clang+++0x40a94c6)
#25 0x000055fa2a485848 llvm::ModuleToPostOrderCGSCCPassAdaptor::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/clang+++0x191c848)
#26 0x000055fa2cc12426 llvm::detail::PassModel<llvm::Module, llvm::ModuleToPostOrderCGSCCPassAdaptor, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/clang+++0x40a9426)
#27 0x000055fa2afa7021 llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/clang+++0x243e021)

perlfu added a commit that referenced this pull request Mar 12, 2025
Fix type error when GEP uses i64 offset introduced in #127973.
perlfu added a commit to perlfu/llvm-project that referenced this pull request Mar 18, 2025
Fix type error when GEP uses i64 index introduced in llvm#127973.
perlfu added a commit to perlfu/llvm-project that referenced this pull request Mar 18, 2025
Fix type error when GEP uses i64 index introduced in llvm#127973.
perlfu added a commit to perlfu/llvm-project that referenced this pull request Mar 18, 2025
Fix type error when GEP uses i64 index introduced in llvm#127973.
perlfu added a commit that referenced this pull request Mar 18, 2025
Fix type error when GEP uses i64 index introduced in #127973.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants