[AMDGPU] Extend promotion of alloca to vectors #127973

perlfu · 2025-02-20T09:01:17Z

Add multi dimensional array support
Make maximum vector size tunable
Make ratio of VGPRs used for vector promotion tunable

llvmbot · 2025-02-20T09:01:51Z

@llvm/pr-subscribers-backend-amdgpu

Author: Carl Ritson (perlfu)

Changes

Add multi dimensional array support
Make maximum vector size tunable
Make ratio of VGPRs used for vector promotion tunable

Patch is 93.46 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/127973.diff

9 Files Affected:

(modified) llvm/docs/AMDGPUUsage.rst (+169-165)
(modified) llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp (+86-25)
(modified) llvm/test/CodeGen/AMDGPU/amdgpu.private-memory.ll (+7-2)
(modified) llvm/test/CodeGen/AMDGPU/array-ptr-calc-i32.ll (+1-7)
(added) llvm/test/CodeGen/AMDGPU/promote-alloca-max-elements.ll (+236)
(modified) llvm/test/CodeGen/AMDGPU/promote-alloca-memset.ll (+3-9)
(added) llvm/test/CodeGen/AMDGPU/promote-alloca-multidim.ll (+292)
(modified) llvm/test/CodeGen/AMDGPU/promote-alloca-no-opts.ll (+2-2)
(added) llvm/test/CodeGen/AMDGPU/promote-alloca-vgpr-ratio.ll (+276)

diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
index d580be1eb8cfc..734434641b4bd 100644
--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -1546,180 +1546,184 @@ The AMDGPU backend supports the following LLVM IR attributes.
   .. table:: AMDGPU LLVM IR Attributes
      :name: amdgpu-llvm-ir-attributes-table
 
-     ============================================ ==========================================================
-     LLVM Attribute                               Description
-     ============================================ ==========================================================
-     "amdgpu-flat-work-group-size"="min,max"      Specify the minimum and maximum flat work group sizes that
-                                                  will be specified when the kernel is dispatched. Generated
-                                                  by the ``amdgpu_flat_work_group_size`` CLANG attribute [CLANG-ATTR]_.
-                                                  The IR implied default value is 1,1024. Clang may emit this attribute
-                                                  with more restrictive bounds depending on language defaults.
-                                                  If the actual block or workgroup size exceeds the limit at any point during
-                                                  the execution, the behavior is undefined. For example, even if there is
-                                                  only one active thread but the thread local id exceeds the limit, the
-                                                  behavior is undefined.
-
-     "amdgpu-implicitarg-num-bytes"="n"           Number of kernel argument bytes to add to the kernel
-                                                  argument block size for the implicit arguments. This
-                                                  varies by OS and language (for OpenCL see
-                                                  :ref:`opencl-kernel-implicit-arguments-appended-for-amdhsa-os-table`).
-     "amdgpu-num-sgpr"="n"                        Specifies the number of SGPRs to use. Generated by
-                                                  the ``amdgpu_num_sgpr`` CLANG attribute [CLANG-ATTR]_.
-     "amdgpu-num-vgpr"="n"                        Specifies the number of VGPRs to use. Generated by the
-                                                  ``amdgpu_num_vgpr`` CLANG attribute [CLANG-ATTR]_.
-     "amdgpu-waves-per-eu"="m,n"                  Specify the minimum and maximum number of waves per
-                                                  execution unit. Generated by the ``amdgpu_waves_per_eu``
-                                                  CLANG attribute [CLANG-ATTR]_. This is an optimization hint,
-                                                  and the backend may not be able to satisfy the request. If
-                                                  the specified range is incompatible with the function's
-                                                  "amdgpu-flat-work-group-size" value, the implied occupancy
-                                                  bounds by the workgroup size takes precedence.
-
-     "amdgpu-ieee" true/false.                    GFX6-GFX11 Only
-                                                  Specify whether the function expects the IEEE field of the
-                                                  mode register to be set on entry. Overrides the default for
-                                                  the calling convention.
-     "amdgpu-dx10-clamp" true/false.              GFX6-GFX11 Only
-                                                  Specify whether the function expects the DX10_CLAMP field of
-                                                  the mode register to be set on entry. Overrides the default
-                                                  for the calling convention.
-
-     "amdgpu-no-workitem-id-x"                    Indicates the function does not depend on the value of the
-                                                  llvm.amdgcn.workitem.id.x intrinsic. If a function is marked with this
-                                                  attribute, or reached through a call site marked with this attribute, and
-                                                  that intrinsic is called, the behavior of the program is undefined. (Whole-program
-                                                  undefined behavior is used here because, for example, the absence of a required workitem
-                                                  ID in the preloaded register set can mean that all other preloaded registers
-                                                  are earlier than the compilation assumed they would be.) The backend can
-                                                  generally infer this during code generation, so typically there is no
-                                                  benefit to frontends marking functions with this.
-
-     "amdgpu-no-workitem-id-y"                    The same as amdgpu-no-workitem-id-x, except for the
-                                                  llvm.amdgcn.workitem.id.y intrinsic.
-
-     "amdgpu-no-workitem-id-z"                    The same as amdgpu-no-workitem-id-x, except for the
-                                                  llvm.amdgcn.workitem.id.z intrinsic.
-
-     "amdgpu-no-workgroup-id-x"                   The same as amdgpu-no-workitem-id-x, except for the
-                                                  llvm.amdgcn.workgroup.id.x intrinsic.
-
-     "amdgpu-no-workgroup-id-y"                   The same as amdgpu-no-workitem-id-x, except for the
-                                                  llvm.amdgcn.workgroup.id.y intrinsic.
-
-     "amdgpu-no-workgroup-id-z"                   The same as amdgpu-no-workitem-id-x, except for the
-                                                  llvm.amdgcn.workgroup.id.z intrinsic.
-
-     "amdgpu-no-dispatch-ptr"                     The same as amdgpu-no-workitem-id-x, except for the
-                                                  llvm.amdgcn.dispatch.ptr intrinsic.
-
-     "amdgpu-no-implicitarg-ptr"                  The same as amdgpu-no-workitem-id-x, except for the
-                                                  llvm.amdgcn.implicitarg.ptr intrinsic.
-
-     "amdgpu-no-dispatch-id"                      The same as amdgpu-no-workitem-id-x, except for the
-                                                  llvm.amdgcn.dispatch.id intrinsic.
-
-     "amdgpu-no-queue-ptr"                        Similar to amdgpu-no-workitem-id-x, except for the
-                                                  llvm.amdgcn.queue.ptr intrinsic. Note that unlike the other ABI hint
-                                                  attributes, the queue pointer may be required in situations where the
-                                                  intrinsic call does not directly appear in the program. Some subtargets
-                                                  require the queue pointer for to handle some addrspacecasts, as well
-                                                  as the llvm.amdgcn.is.shared, llvm.amdgcn.is.private, llvm.trap, and
-                                                  llvm.debug intrinsics.
-
-     "amdgpu-no-hostcall-ptr"                     Similar to amdgpu-no-implicitarg-ptr, except specific to the implicit
-                                                  kernel argument that holds the pointer to the hostcall buffer. If this
-                                                  attribute is absent, then the amdgpu-no-implicitarg-ptr is also removed.
-
-     "amdgpu-no-heap-ptr"                         Similar to amdgpu-no-implicitarg-ptr, except specific to the implicit
-                                                  kernel argument that holds the pointer to an initialized memory buffer
-                                                  that conforms to the requirements of the malloc/free device library V1
-                                                  version implementation. If this attribute is absent, then the
-                                                  amdgpu-no-implicitarg-ptr is also removed.
-
-     "amdgpu-no-multigrid-sync-arg"               Similar to amdgpu-no-implicitarg-ptr, except specific to the implicit
-                                                  kernel argument that holds the multigrid synchronization pointer. If this
-                                                  attribute is absent, then the amdgpu-no-implicitarg-ptr is also removed.
-
-     "amdgpu-no-default-queue"                    Similar to amdgpu-no-implicitarg-ptr, except specific to the implicit
-                                                  kernel argument that holds the default queue pointer. If this
-                                                  attribute is absent, then the amdgpu-no-implicitarg-ptr is also removed.
-
-     "amdgpu-no-completion-action"                Similar to amdgpu-no-implicitarg-ptr, except specific to the implicit
-                                                  kernel argument that holds the completion action pointer. If this
-                                                  attribute is absent, then the amdgpu-no-implicitarg-ptr is also removed.
-
-     "amdgpu-lds-size"="min[,max]"                Min is the minimum number of bytes that will be allocated in the Local
-                                                  Data Store at address zero. Variables are allocated within this frame
-                                                  using absolute symbol metadata, primarily by the AMDGPULowerModuleLDS
-                                                  pass. Optional max is the maximum number of bytes that will be allocated.
-                                                  Note that min==max indicates that no further variables can be added to
-                                                  the frame. This is an internal detail of how LDS variables are lowered,
-                                                  language front ends should not set this attribute.
-
-     "amdgpu-gds-size"                            Bytes expected to be allocated at the start of GDS memory at entry.
-
-     "amdgpu-git-ptr-high"                        The hard-wired high half of the address of the global information table
-                                                  for AMDPAL OS type. 0xffffffff represents no hard-wired high half, since
-                                                  current hardware only allows a 16 bit value.
-
-     "amdgpu-32bit-address-high-bits"             Assumed high 32-bits for 32-bit address spaces which are really truncated
-                                                  64-bit addresses (i.e., addrspace(6))
-
-     "amdgpu-color-export"                        Indicates shader exports color information if set to 1.
-                                                  Defaults to 1 for :ref:`amdgpu_ps <amdgpu-cc>`, and 0 for other calling
-                                                  conventions. Determines the necessity and type of null exports when a shader
-                                                  terminates early by killing lanes.
-
-     "amdgpu-depth-export"                        Indicates shader exports depth information if set to 1. Determines the
-                                                  necessity and type of null exports when a shader terminates early by killing
-                                                  lanes. A depth-only shader will export to depth channel when no null export
-                                                  target is available (GFX11+).
-
-     "InitialPSInputAddr"                         Set the initial value of the `spi_ps_input_addr` register for
-                                                  :ref:`amdgpu_ps <amdgpu-cc>` shaders. Any bits enabled by this value will
-                                                  be enabled in the final register value.
-
-     "amdgpu-wave-priority-threshold"             VALU instruction count threshold for adjusting wave priority. If exceeded,
-                                                  temporarily raise the wave priority at the start of the shader function
-                                                  until its last VMEM instructions to allow younger waves to issue their VMEM
-                                                  instructions as well.
+     =============================================== ==========================================================
+     LLVM Attribute                                  Description
+     =============================================== ==========================================================
+     "amdgpu-flat-work-group-size"="min,max"         Specify the minimum and maximum flat work group sizes that
+                                                     will be specified when the kernel is dispatched. Generated
+                                                     by the ``amdgpu_flat_work_group_size`` CLANG attribute [CLANG-ATTR]_.
+                                                     The IR implied default value is 1,1024. Clang may emit this attribute
+                                                     with more restrictive bounds depending on language defaults.
+                                                     If the actual block or workgroup size exceeds the limit at any point during
+                                                     the execution, the behavior is undefined. For example, even if there is
+                                                     only one active thread but the thread local id exceeds the limit, the
+                                                     behavior is undefined.
+
+     "amdgpu-implicitarg-num-bytes"="n"              Number of kernel argument bytes to add to the kernel
+                                                     argument block size for the implicit arguments. This
+                                                     varies by OS and language (for OpenCL see
+                                                     :ref:`opencl-kernel-implicit-arguments-appended-for-amdhsa-os-table`).
+     "amdgpu-num-sgpr"="n"                           Specifies the number of SGPRs to use. Generated by
+                                                     the ``amdgpu_num_sgpr`` CLANG attribute [CLANG-ATTR]_.
+     "amdgpu-num-vgpr"="n"                           Specifies the number of VGPRs to use. Generated by the
+                                                     ``amdgpu_num_vgpr`` CLANG attribute [CLANG-ATTR]_.
+     "amdgpu-waves-per-eu"="m,n"                     Specify the minimum and maximum number of waves per
+                                                     execution unit. Generated by the ``amdgpu_waves_per_eu``
+                                                     CLANG attribute [CLANG-ATTR]_. This is an optimization hint,
+                                                     and the backend may not be able to satisfy the request. If
+                                                     the specified range is incompatible with the function's
+                                                     "amdgpu-flat-work-group-size" value, the implied occupancy
+                                                     bounds by the workgroup size takes precedence.
+
+     "amdgpu-ieee" true/false.                       GFX6-GFX11 Only
+                                                     Specify whether the function expects the IEEE field of the
+                                                     mode register to be set on entry. Overrides the default for
+                                                     the calling convention.
+     "amdgpu-dx10-clamp" true/false.                 GFX6-GFX11 Only
+                                                     Specify whether the function expects the DX10_CLAMP field of
+                                                     the mode register to be set on entry. Overrides the default
+                                                     for the calling convention.
+
+     "amdgpu-no-workitem-id-x"                       Indicates the function does not depend on the value of the
+                                                     llvm.amdgcn.workitem.id.x intrinsic. If a function is marked with this
+                                                     attribute, or reached through a call site marked with this attribute, and
+                                                     that intrinsic is called, the behavior of the program is undefined. (Whole-program
+                                                     undefined behavior is used here because, for example, the absence of a required workitem
+                                                     ID in the preloaded register set can mean that all other preloaded registers
+                                                     are earlier than the compilation assumed they would be.) The backend can
+                                                     generally infer this during code generation, so typically there is no
+                                                     benefit to frontends marking functions with this.
+
+     "amdgpu-no-workitem-id-y"                       The same as amdgpu-no-workitem-id-x, except for the
+                                                     llvm.amdgcn.workitem.id.y intrinsic.
+
+     "amdgpu-no-workitem-id-z"                       The same as amdgpu-no-workitem-id-x, except for the
+                                                     llvm.amdgcn.workitem.id.z intrinsic.
+
+     "amdgpu-no-workgroup-id-x"                      The same as amdgpu-no-workitem-id-x, except for the
+                                                     llvm.amdgcn.workgroup.id.x intrinsic.
+
+     "amdgpu-no-workgroup-id-y"                      The same as amdgpu-no-workitem-id-x, except for the
+                                                     llvm.amdgcn.workgroup.id.y intrinsic.
+
+     "amdgpu-no-workgroup-id-z"                      The same as amdgpu-no-workitem-id-x, except for the
+                                                     llvm.amdgcn.workgroup.id.z intrinsic.
+
+     "amdgpu-no-dispatch-ptr"                        The same as amdgpu-no-workitem-id-x, except for the
+                                                     llvm.amdgcn.dispatch.ptr intrinsic.
+
+     "amdgpu-no-implicitarg-ptr"                     The same as amdgpu-no-workitem-id-x, except for the
+                                                     llvm.amdgcn.implicitarg.ptr intrinsic.
+
+     "amdgpu-no-dispatch-id"                         The same as amdgpu-no-workitem-id-x, except for the
+                                                     llvm.amdgcn.dispatch.id intrinsic.
+
+     "amdgpu-no-queue-ptr"                           Similar to amdgpu-no-workitem-id-x, except for the
+                                                     llvm.amdgcn.queue.ptr intrinsic. Note that unlike the other ABI hint
+                                                     attributes, the queue pointer may be required in situations where the
+                                                     intrinsic call does not directly appear in the program. Some subtargets
+                                                     require the queue pointer for to handle some addrspacecasts, as well
+                                                     as the llvm.amdgcn.is.shared, llvm.amdgcn.is.private, llvm.trap, and
+                                                     llvm.debug intrinsics.
+
+     "amdgpu-no-hostcall-ptr"                        Similar to amdgpu-no-implicitarg-ptr, except specific to the implicit
+                                                     kernel argument that holds the pointer to the hostcall buffer. If this
+                                                     attribute is absent, then the amdgpu-no-implicitarg-ptr is also removed.
+
+     "amdgpu-no-heap-ptr"                            Similar to amdgpu-no-implicitarg-ptr, except specific to the implicit
+                                                     kernel...
[truncated]

github-actions · 2025-02-20T09:04:53Z

✅ With the latest revision this PR passed the undef deprecator.

arsenm · 2025-02-20T09:02:36Z

llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp

+static cl::opt<unsigned> PromoteAllocaToVectorMaxElements(
+    "amdgpu-promote-alloca-to-vector-max-elements",
+    cl::desc("Maximum vector size (in elements) to use when promoting alloca"),
+    cl::init(16));


Should turn these into pass parameters instead of opts.

Elements seems like a strange way to express this. Ideally we would pack the sub-32-bit element vectors into access of 32-bit vector

Do we expect end users to use these options?

Elements seems like a strange way to express this.

Element count is how the limit is currently defined in the code. I agree in terms of 32-bit words (registers) would make more sense. I'll change to this model, but it does mean this patch will not preserve the existing limit so some edge case promotion will change.

Do we expect end users to use these options?

Graphics front end will use these for shader tuning, which is why they are accessible via function attributes as well.

Okay. The reason I was asking is, if we expect any uses from users, we need to expose them as an option instead of a pass option, but based on your description, we don't expect from end users, and compiler front end can tune it via function attributes.

llvm/test/CodeGen/AMDGPU/promote-alloca-multidim.ll

llvm/docs/AMDGPUUsage.rst

llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp

* Add multi dimensional array support * Make maximum vector size tunable * Make ratio of VGPRs used for vector promotion tunable

- Move options into pass attributes - Change vector size limit from max elements to max 32b registers - Add tests for i16, float and ptr in multi-dimensional arrays

perlfu · 2025-02-21T10:27:25Z

Note: I have changed the max vector size to be based on number of 32b registers. With the current value of 16, this mean some alloca which were previously promoted are no longer promoted.
See promote-alloca-subvecs.ll and vector-alloca-limits.ll where I had to increase the limit for the tests to pass.
Alternatively we could raise the default to 32 to allow 16 x i64/double/ptr to be promoted as previously, with the effect that 32 x i32/float would now be promoted.

perlfu · 2025-03-05T02:48:53Z

Ping

llvm-ci · 2025-03-12T06:14:51Z

LLVM Buildbot has detected a new failure on builder openmp-offload-amdgpu-runtime-2 running on rocm-worker-hw-02 while building llvm at step 5 "compile-openmp".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/10/builds/1133

Here is the relevant piece of the build log for the reference

Step 5 (compile-openmp) failure: build (failure)
...
1.114 [4/29/672] Building CXX object libc/src/math/generic/CMakeFiles/libc.src.math.generic.cbrt.dir/cbrt.cpp.o
1.117 [4/28/673] Building CXX object libc/src/math/generic/CMakeFiles/libc.src.math.generic.asinhf.dir/asinhf.cpp.o
1.118 [4/27/674] Building CXX object libc/src/math/generic/CMakeFiles/libc.src.math.generic.f16sqrt.dir/f16sqrt.cpp.o
1.124 [4/26/675] Building CXX object libc/src/math/generic/CMakeFiles/libc.src.math.generic.f16subl.dir/f16subl.cpp.o
1.129 [4/25/676] Building CXX object libc/src/math/generic/CMakeFiles/libc.src.math.generic.f16addf.dir/f16addf.cpp.o
1.129 [4/24/677] Building CXX object libc/src/math/generic/CMakeFiles/libc.src.math.generic.fsqrtl.dir/fsqrtl.cpp.o
1.131 [4/23/678] Building CXX object libc/src/stdio/CMakeFiles/libc.src.stdio.vasprintf.dir/vasprintf.cpp.o
1.133 [4/22/679] Building CXX object libc/src/math/generic/CMakeFiles/libc.src.math.generic.f16sqrtl.dir/f16sqrtl.cpp.o
1.139 [4/21/680] Building CXX object libc/src/math/generic/CMakeFiles/libc.src.math.generic.scalblnf.dir/scalblnf.cpp.o
1.140 [4/20/681] Building CXX object libc/src/math/generic/CMakeFiles/libc.src.math.generic.tanhf.dir/tanhf.cpp.o
FAILED: libc/src/math/generic/CMakeFiles/libc.src.math.generic.tanhf.dir/tanhf.cpp.o 
/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/./bin/clang++ --target=amdgcn-amd-amdhsa -DLIBC_NAMESPACE=__llvm_libc_21_0_0_git -D__LIBC_USE_FLOAT16_CONVERSION -I/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/libc -isystem /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/include/amdgcn-amd-amdhsa -O3 -DNDEBUG --target=amdgcn-amd-amdhsa -D__LIBC_USE_BUILTIN_CEIL_FLOOR_RINT_TRUNC -D__LIBC_USE_BUILTIN_ROUND -D__LIBC_USE_BUILTIN_ROUNDEVEN -DLIBC_QSORT_IMPL=LIBC_QSORT_QUICK_SORT -DLIBC_ADD_NULL_CHECKS "-DLIBC_MATH=(LIBC_MATH_SKIP_ACCURATE_PASS | LIBC_MATH_SMALL_TABLES | LIBC_MATH_NO_ERRNO | LIBC_MATH_NO_EXCEPT)" -fpie -DLIBC_FULL_BUILD -nostdlibinc -ffixed-point -fno-exceptions -fno-lax-vector-conversions -fno-unwind-tables -fno-asynchronous-unwind-tables -fno-rtti -ftrivial-auto-var-init=pattern -fno-omit-frame-pointer -Wall -Wextra -Werror -Wconversion -Wno-sign-conversion -Wdeprecated -Wno-c99-extensions -Wno-gnu-imaginary-constant -Wno-pedantic -Wimplicit-fallthrough -Wwrite-strings -Wextra-semi -Wnewline-eof -Wnonportable-system-include-path -Wstrict-prototypes -Wthread-safety -Wglobal-constructors -nogpulib -fvisibility=hidden -fconvergent-functions -flto -Wno-multi-gpu -Xclang -mcode-object-version=none -DLIBC_COPT_PUBLIC_PACKAGING -UNDEBUG -MD -MT libc/src/math/generic/CMakeFiles/libc.src.math.generic.tanhf.dir/tanhf.cpp.o -MF libc/src/math/generic/CMakeFiles/libc.src.math.generic.tanhf.dir/tanhf.cpp.o.d -o libc/src/math/generic/CMakeFiles/libc.src.math.generic.tanhf.dir/tanhf.cpp.o -c /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/libc/src/math/generic/tanhf.cpp
clang++: /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/llvm/lib/IR/Instructions.cpp:2694: static llvm::BinaryOperator* llvm::BinaryOperator::Create(llvm::Instruction::BinaryOps, llvm::Value*, llvm::Value*, const llvm::Twine&, llvm::InsertPosition): Assertion `S1->getType() == S2->getType() && "Cannot create binary operator with two operands of differing type!"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.	Program arguments: /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/./bin/clang++ --target=amdgcn-amd-amdhsa -DLIBC_NAMESPACE=__llvm_libc_21_0_0_git -D__LIBC_USE_FLOAT16_CONVERSION -I/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/libc -isystem /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/include/amdgcn-amd-amdhsa -O3 -DNDEBUG --target=amdgcn-amd-amdhsa -D__LIBC_USE_BUILTIN_CEIL_FLOOR_RINT_TRUNC -D__LIBC_USE_BUILTIN_ROUND -D__LIBC_USE_BUILTIN_ROUNDEVEN -DLIBC_QSORT_IMPL=LIBC_QSORT_QUICK_SORT -DLIBC_ADD_NULL_CHECKS "-DLIBC_MATH=(LIBC_MATH_SKIP_ACCURATE_PASS | LIBC_MATH_SMALL_TABLES | LIBC_MATH_NO_ERRNO | LIBC_MATH_NO_EXCEPT)" -fpie -DLIBC_FULL_BUILD -nostdlibinc -ffixed-point -fno-exceptions -fno-lax-vector-conversions -fno-unwind-tables -fno-asynchronous-unwind-tables -fno-rtti -ftrivial-auto-var-init=pattern -fno-omit-frame-pointer -Wall -Wextra -Werror -Wconversion -Wno-sign-conversion -Wdeprecated -Wno-c99-extensions -Wno-gnu-imaginary-constant -Wno-pedantic -Wimplicit-fallthrough -Wwrite-strings -Wextra-semi -Wnewline-eof -Wnonportable-system-include-path -Wstrict-prototypes -Wthread-safety -Wglobal-constructors -nogpulib -fvisibility=hidden -fconvergent-functions -flto -Wno-multi-gpu -Xclang -mcode-object-version=none -DLIBC_COPT_PUBLIC_PACKAGING -UNDEBUG -MD -MT libc/src/math/generic/CMakeFiles/libc.src.math.generic.tanhf.dir/tanhf.cpp.o -MF libc/src/math/generic/CMakeFiles/libc.src.math.generic.tanhf.dir/tanhf.cpp.o.d -o libc/src/math/generic/CMakeFiles/libc.src.math.generic.tanhf.dir/tanhf.cpp.o -c /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/libc/src/math/generic/tanhf.cpp
1.	<eof> parser at end of file
2.	Optimizer
3.	Running pass "require<globals-aa>,function(invalidate<aa>),require<profile-summary>,cgscc(devirt<4>(inline,function-attrs<skip-non-recursive-function-attrs>,argpromotion,openmp-opt-cgscc,function(amdgpu-promote-kernel-arguments,infer-address-spaces,amdgpu-lower-kernel-attributes,amdgpu-promote-alloca-to-vector),function<eager-inv;no-rerun>(sroa<modify-cfg>,early-cse<memssa>,speculative-execution<only-if-divergent-target>,jump-threading,correlated-propagation,simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;switch-range-to-icmp;no-switch-to-lookup;keep-loops;no-hoist-common-insts;no-hoist-loads-stores-with-cond-faulting;no-sink-common-insts;speculate-blocks;simplify-cond-branch;no-speculate-unpredictables>,instcombine<max-iterations=1;no-verify-fixpoint>,aggressive-instcombine,libcalls-shrinkwrap,amdgpu-usenative,amdgpu-simplifylib,tailcallelim,simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;switch-range-to-icmp;no-switch-to-lookup;keep-loops;no-hoist-common-insts;no-hoist-loads-stores-with-cond-faulting;no-sink-common-insts;speculate-blocks;simplify-cond-branch;no-speculate-unpredictables>,reassociate,constraint-elimination,loop-mssa(loop-instsimplify,loop-simplifycfg,licm<no-allowspeculation>,loop-rotate<header-duplication;prepare-for-lto>,licm<allowspeculation>,simple-loop-unswitch<nontrivial;trivial>),simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;switch-range-to-icmp;no-switch-to-lookup;keep-loops;no-hoist-common-insts;no-hoist-loads-stores-with-cond-faulting;no-sink-common-insts;speculate-blocks;simplify-cond-branch;no-speculate-unpredictables>,instcombine<max-iterations=1;no-verify-fixpoint>,loop(loop-idiom,indvars,extra-simple-loop-unswitch-passes,loop-deletion,loop-unroll-full),sroa<modify-cfg>,vector-combine,mldst-motion<no-split-footer-bb>,gvn<>,sccp,bdce,instcombine<max-iterations=1;no-verify-fixpoint>,amdgpu-usenative,amdgpu-simplifylib,jump-threading,correlated-propagation,adce,memcpyopt,dse,move-auto-init,loop-mssa(licm<allowspeculation>),coro-elide,simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;switch-range-to-icmp;no-switch-to-lookup;keep-loops;hoist-common-insts;no-hoist-loads-stores-with-cond-faulting;sink-common-insts;speculate-blocks;simplify-cond-branch;no-speculate-unpredictables>,instcombine<max-iterations=1;no-verify-fixpoint>,amdgpu-usenative,amdgpu-simplifylib),function-attrs,function(require<should-not-run-function-passes>),coro-split,coro-annotation-elide)),function(invalidate<should-not-run-function-passes>),cgscc(devirt<4>())" on module "/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/libc/src/math/generic/tanhf.cpp"
4.	Running pass "cgscc(devirt<4>(inline,function-attrs<skip-non-recursive-function-attrs>,argpromotion,openmp-opt-cgscc,function(amdgpu-promote-kernel-arguments,infer-address-spaces,amdgpu-lower-kernel-attributes,amdgpu-promote-alloca-to-vector),function<eager-inv;no-rerun>(sroa<modify-cfg>,early-cse<memssa>,speculative-execution<only-if-divergent-target>,jump-threading,correlated-propagation,simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;switch-range-to-icmp;no-switch-to-lookup;keep-loops;no-hoist-common-insts;no-hoist-loads-stores-with-cond-faulting;no-sink-common-insts;speculate-blocks;simplify-cond-branch;no-speculate-unpredictables>,instcombine<max-iterations=1;no-verify-fixpoint>,aggressive-instcombine,libcalls-shrinkwrap,amdgpu-usenative,amdgpu-simplifylib,tailcallelim,simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;switch-range-to-icmp;no-switch-to-lookup;keep-loops;no-hoist-common-insts;no-hoist-loads-stores-with-cond-faulting;no-sink-common-insts;speculate-blocks;simplify-cond-branch;no-speculate-unpredictables>,reassociate,constraint-elimination,loop-mssa(loop-instsimplify,loop-simplifycfg,licm<no-allowspeculation>,loop-rotate<header-duplication;prepare-for-lto>,licm<allowspeculation>,simple-loop-unswitch<nontrivial;trivial>),simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;switch-range-to-icmp;no-switch-to-lookup;keep-loops;no-hoist-common-insts;no-hoist-loads-stores-with-cond-faulting;no-sink-common-insts;speculate-blocks;simplify-cond-branch;no-speculate-unpredictables>,instcombine<max-iterations=1;no-verify-fixpoint>,loop(loop-idiom,indvars,extra-simple-loop-unswitch-passes,loop-deletion,loop-unroll-full),sroa<modify-cfg>,vector-combine,mldst-motion<no-split-footer-bb>,gvn<>,sccp,bdce,instcombine<max-iterations=1;no-verify-fixpoint>,amdgpu-usenative,amdgpu-simplifylib,jump-threading,correlated-propagation,adce,memcpyopt,dse,move-auto-init,loop-mssa(licm<allowspeculation>),coro-elide,simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;switch-range-to-icmp;no-switch-to-lookup;keep-loops;hoist-common-insts;no-hoist-loads-stores-with-cond-faulting;sink-common-insts;speculate-blocks;simplify-cond-branch;no-speculate-unpredictables>,instcombine<max-iterations=1;no-verify-fixpoint>,amdgpu-usenative,amdgpu-simplifylib),function-attrs,function(require<should-not-run-function-passes>),coro-split,coro-annotation-elide))" on module "/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/libc/src/math/generic/tanhf.cpp"
5.	Running pass "amdgpu-promote-alloca-to-vector" on function "tanhf"
 #0 0x00007758b23e5790 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/bin/../lib/libLLVMSupport.so.21.0git+0x1e5790)
 #1 0x00007758b23e2b8f llvm::sys::RunSignalHandlers() (/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/bin/../lib/libLLVMSupport.so.21.0git+0x1e2b8f)
 #2 0x00007758b22dfca8 CrashRecoverySignalHandler(int) CrashRecoveryContext.cpp:0:0
 #3 0x00007758b1a42520 (/lib/x86_64-linux-gnu/libc.so.6+0x42520)
 #4 0x00007758b1a969fc __pthread_kill_implementation ./nptl/pthread_kill.c:44:76
 #5 0x00007758b1a969fc __pthread_kill_internal ./nptl/pthread_kill.c:78:10
 #6 0x00007758b1a969fc pthread_kill ./nptl/pthread_kill.c:89:10
 #7 0x00007758b1a42476 gsignal ./signal/../sysdeps/posix/raise.c:27:6
 #8 0x00007758b1a287f3 abort ./stdlib/abort.c:81:7
 #9 0x00007758b1a2871b _nl_load_domain ./intl/loadmsgcat.c:1177:9
#10 0x00007758b1a39e96 (/lib/x86_64-linux-gnu/libc.so.6+0x39e96)
#11 0x00007758b285a0d5 (/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/bin/../lib/libLLVMCore.so.21.0git+0x25a0d5)
#12 0x00007758b2867500 llvm::BinaryOperator::CreateNeg(llvm::Value*, llvm::Twine const&, llvm::InsertPosition) (/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/bin/../lib/libLLVMCore.so.21.0git+0x267500)
#13 0x00007758b8374da6 llvm::IRBuilderBase::CreateMul(llvm::Value*, llvm::Value*, llvm::Twine const&, bool, bool) (/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/bin/../lib/libLLVMAMDGPUCodeGen.so.21.0git+0x174da6)
#14 0x00007758b853bc34 GEPToVectorIndex(llvm::GetElementPtrInst*, llvm::AllocaInst*, llvm::Type*, llvm::DataLayout const&, llvm::SmallVector<llvm::Instruction*, 6u>&) AMDGPUPromoteAlloca.cpp:0:0
#15 0x00007758b8541f53 (anonymous namespace)::AMDGPUPromoteAllocaImpl::tryPromoteAllocaToVector(llvm::AllocaInst&) AMDGPUPromoteAlloca.cpp:0:0
#16 0x00007758b8543a94 (anonymous namespace)::AMDGPUPromoteAllocaImpl::run(llvm::Function&, bool) (.part.0) AMDGPUPromoteAlloca.cpp:0:0
#17 0x00007758b85468cd llvm::AMDGPUPromoteAllocaToVectorPass::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/bin/../lib/libLLVMAMDGPUCodeGen.so.21.0git+0x3468cd)
#18 0x00007758b85be476 llvm::detail::PassModel<llvm::Function, llvm::AMDGPUPromoteAllocaToVectorPass, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/bin/../lib/libLLVMAMDGPUCodeGen.so.21.0git+0x3be476)
#19 0x00007758b28f9d0f llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/bin/../lib/libLLVMCore.so.21.0git+0x2f9d0f)
#20 0x00007758b94ab036 llvm::detail::PassModel<llvm::Function, llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/bin/../lib/libLLVMX86CodeGen.so.21.0git+0xab036)
#21 0x00007758b2d62964 llvm::CGSCCToFunctionPassAdaptor::run(llvm::LazyCallGraph::SCC&, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>&, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&) (/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/bin/../lib/libLLVMAnalysis.so.21.0git+0x162964)
#22 0x00007758b85becf6 llvm::detail::PassModel<llvm::LazyCallGraph::SCC, llvm::CGSCCToFunctionPassAdaptor, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&>::run(llvm::LazyCallGraph::SCC&, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>&, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&) (/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/bin/../lib/libLLVMAMDGPUCodeGen.so.21.0git+0x3becf6)
#23 0x00007758b2d5b337 llvm::PassManager<llvm::LazyCallGraph::SCC, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&>::run(llvm::LazyCallGraph::SCC&, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>&, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&) (/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/bin/../lib/libLLVMAnalysis.so.21.0git+0x15b337)
#24 0x00007758b5810346 llvm::detail::PassModel<llvm::LazyCallGraph::SCC, llvm::PassManager<llvm::LazyCallGraph::SCC, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&>, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&>::run(llvm::LazyCallGraph::SCC&, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>&, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&) (/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/bin/../lib/libLLVMipo.so.21.0git+0x210346)
#25 0x00007758b2d63c1d llvm::DevirtSCCRepeatedPass::run(llvm::LazyCallGraph::SCC&, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>&, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&) (/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/bin/../lib/libLLVMAnalysis.so.21.0git+0x163c1d)
#26 0x00007758b58102f6 llvm::detail::PassModel<llvm::LazyCallGraph::SCC, llvm::DevirtSCCRepeatedPass, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&>::run(llvm::LazyCallGraph::SCC&, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>&, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&) (/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/bin/../lib/libLLVMipo.so.21.0git+0x2102f6)
#27 0x00007758b2d5e3ee llvm::ModuleToPostOrderCGSCCPassAdaptor::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/bin/../lib/libLLVMAnalysis.so.21.0git+0x15e3ee)

llvm-ci · 2025-03-12T06:16:00Z

LLVM Buildbot has detected a new failure on builder openmp-offload-libc-amdgpu-runtime running on omp-vega20-1 while building llvm at step 5 "compile-openmp".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/73/builds/14431

Here is the relevant piece of the build log for the reference

Step 5 (compile-openmp) failure: build (failure)
...
3.394 [106/34/565] Building CXX object libc/src/math/generic/CMakeFiles/libc.src.math.generic.scalbnl.dir/scalbnl.cpp.o
3.395 [105/34/566] Building CXX object libc/src/math/generic/CMakeFiles/libc.src.math.generic.setpayloadsigf16.dir/setpayloadsigf16.cpp.o
3.557 [104/34/567] Building CXX object libc/src/stdio/CMakeFiles/libc.src.stdio.snprintf.dir/snprintf.cpp.o
3.572 [103/34/568] Building CXX object libc/src/stdio/CMakeFiles/libc.src.stdio.asprintf.dir/asprintf.cpp.o
3.574 [102/34/569] Building CXX object libc/src/math/generic/CMakeFiles/libc.src.math.generic.cbrtf.dir/cbrtf.cpp.o
3.585 [101/34/570] Building CXX object libc/src/stdio/CMakeFiles/libc.src.stdio.vsnprintf.dir/vsnprintf.cpp.o
3.596 [100/34/571] Building CXX object libc/src/stdio/CMakeFiles/libc.src.stdio.sprintf.dir/sprintf.cpp.o
3.622 [99/34/572] Building CXX object libc/src/stdio/CMakeFiles/libc.src.stdio.vasprintf.dir/vasprintf.cpp.o
3.626 [98/34/573] Building CXX object libc/src/stdio/CMakeFiles/libc.src.stdio.vsprintf.dir/vsprintf.cpp.o
3.646 [97/34/574] Building CXX object libc/src/math/generic/CMakeFiles/libc.src.math.generic.tanhf.dir/tanhf.cpp.o
FAILED: libc/src/math/generic/CMakeFiles/libc.src.math.generic.tanhf.dir/tanhf.cpp.o 
/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/clang++ --target=amdgcn-amd-amdhsa -DLIBC_NAMESPACE=__llvm_libc_21_0_0_git -D__LIBC_USE_FLOAT16_CONVERSION -I/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/libc -isystem /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/include/amdgcn-amd-amdhsa -O3 -DNDEBUG --target=amdgcn-amd-amdhsa -D__LIBC_USE_BUILTIN_CEIL_FLOOR_RINT_TRUNC -D__LIBC_USE_BUILTIN_ROUND -D__LIBC_USE_BUILTIN_ROUNDEVEN -DLIBC_QSORT_IMPL=LIBC_QSORT_QUICK_SORT -DLIBC_ADD_NULL_CHECKS "-DLIBC_MATH=(LIBC_MATH_SKIP_ACCURATE_PASS | LIBC_MATH_SMALL_TABLES | LIBC_MATH_NO_ERRNO | LIBC_MATH_NO_EXCEPT)" -fpie -DLIBC_FULL_BUILD -nostdlibinc -ffixed-point -fno-exceptions -fno-lax-vector-conversions -fno-unwind-tables -fno-asynchronous-unwind-tables -fno-rtti -ftrivial-auto-var-init=pattern -fno-omit-frame-pointer -Wall -Wextra -Werror -Wconversion -Wno-sign-conversion -Wdeprecated -Wno-c99-extensions -Wno-gnu-imaginary-constant -Wno-pedantic -Wimplicit-fallthrough -Wwrite-strings -Wextra-semi -Wnewline-eof -Wnonportable-system-include-path -Wstrict-prototypes -Wthread-safety -Wglobal-constructors -nogpulib -fvisibility=hidden -fconvergent-functions -flto -Wno-multi-gpu -Xclang -mcode-object-version=none -DLIBC_COPT_PUBLIC_PACKAGING -UNDEBUG -MD -MT libc/src/math/generic/CMakeFiles/libc.src.math.generic.tanhf.dir/tanhf.cpp.o -MF libc/src/math/generic/CMakeFiles/libc.src.math.generic.tanhf.dir/tanhf.cpp.o.d -o libc/src/math/generic/CMakeFiles/libc.src.math.generic.tanhf.dir/tanhf.cpp.o -c /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/libc/src/math/generic/tanhf.cpp
clang++: /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/llvm/lib/IR/Instructions.cpp:2694: static llvm::BinaryOperator* llvm::BinaryOperator::Create(llvm::Instruction::BinaryOps, llvm::Value*, llvm::Value*, const llvm::Twine&, llvm::InsertPosition): Assertion `S1->getType() == S2->getType() && "Cannot create binary operator with two operands of differing type!"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.	Program arguments: /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/clang++ --target=amdgcn-amd-amdhsa -DLIBC_NAMESPACE=__llvm_libc_21_0_0_git -D__LIBC_USE_FLOAT16_CONVERSION -I/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/libc -isystem /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/include/amdgcn-amd-amdhsa -O3 -DNDEBUG --target=amdgcn-amd-amdhsa -D__LIBC_USE_BUILTIN_CEIL_FLOOR_RINT_TRUNC -D__LIBC_USE_BUILTIN_ROUND -D__LIBC_USE_BUILTIN_ROUNDEVEN -DLIBC_QSORT_IMPL=LIBC_QSORT_QUICK_SORT -DLIBC_ADD_NULL_CHECKS "-DLIBC_MATH=(LIBC_MATH_SKIP_ACCURATE_PASS | LIBC_MATH_SMALL_TABLES | LIBC_MATH_NO_ERRNO | LIBC_MATH_NO_EXCEPT)" -fpie -DLIBC_FULL_BUILD -nostdlibinc -ffixed-point -fno-exceptions -fno-lax-vector-conversions -fno-unwind-tables -fno-asynchronous-unwind-tables -fno-rtti -ftrivial-auto-var-init=pattern -fno-omit-frame-pointer -Wall -Wextra -Werror -Wconversion -Wno-sign-conversion -Wdeprecated -Wno-c99-extensions -Wno-gnu-imaginary-constant -Wno-pedantic -Wimplicit-fallthrough -Wwrite-strings -Wextra-semi -Wnewline-eof -Wnonportable-system-include-path -Wstrict-prototypes -Wthread-safety -Wglobal-constructors -nogpulib -fvisibility=hidden -fconvergent-functions -flto -Wno-multi-gpu -Xclang -mcode-object-version=none -DLIBC_COPT_PUBLIC_PACKAGING -UNDEBUG -MD -MT libc/src/math/generic/CMakeFiles/libc.src.math.generic.tanhf.dir/tanhf.cpp.o -MF libc/src/math/generic/CMakeFiles/libc.src.math.generic.tanhf.dir/tanhf.cpp.o.d -o libc/src/math/generic/CMakeFiles/libc.src.math.generic.tanhf.dir/tanhf.cpp.o -c /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/libc/src/math/generic/tanhf.cpp
1.	<eof> parser at end of file
2.	Optimizer
3.	Running pass "require<globals-aa>,function(invalidate<aa>),require<profile-summary>,cgscc(devirt<4>(inline,function-attrs<skip-non-recursive-function-attrs>,argpromotion,openmp-opt-cgscc,function(amdgpu-promote-kernel-arguments,infer-address-spaces,amdgpu-lower-kernel-attributes,amdgpu-promote-alloca-to-vector),function<eager-inv;no-rerun>(sroa<modify-cfg>,early-cse<memssa>,speculative-execution<only-if-divergent-target>,jump-threading,correlated-propagation,simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;switch-range-to-icmp;no-switch-to-lookup;keep-loops;no-hoist-common-insts;no-hoist-loads-stores-with-cond-faulting;no-sink-common-insts;speculate-blocks;simplify-cond-branch;no-speculate-unpredictables>,instcombine<max-iterations=1;no-verify-fixpoint>,aggressive-instcombine,libcalls-shrinkwrap,amdgpu-usenative,amdgpu-simplifylib,tailcallelim,simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;switch-range-to-icmp;no-switch-to-lookup;keep-loops;no-hoist-common-insts;no-hoist-loads-stores-with-cond-faulting;no-sink-common-insts;speculate-blocks;simplify-cond-branch;no-speculate-unpredictables>,reassociate,constraint-elimination,loop-mssa(loop-instsimplify,loop-simplifycfg,licm<no-allowspeculation>,loop-rotate<header-duplication;prepare-for-lto>,licm<allowspeculation>,simple-loop-unswitch<nontrivial;trivial>),simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;switch-range-to-icmp;no-switch-to-lookup;keep-loops;no-hoist-common-insts;no-hoist-loads-stores-with-cond-faulting;no-sink-common-insts;speculate-blocks;simplify-cond-branch;no-speculate-unpredictables>,instcombine<max-iterations=1;no-verify-fixpoint>,loop(loop-idiom,indvars,extra-simple-loop-unswitch-passes,loop-deletion,loop-unroll-full),sroa<modify-cfg>,vector-combine,mldst-motion<no-split-footer-bb>,gvn<>,sccp,bdce,instcombine<max-iterations=1;no-verify-fixpoint>,amdgpu-usenative,amdgpu-simplifylib,jump-threading,correlated-propagation,adce,memcpyopt,dse,move-auto-init,loop-mssa(licm<allowspeculation>),coro-elide,simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;switch-range-to-icmp;no-switch-to-lookup;keep-loops;hoist-common-insts;no-hoist-loads-stores-with-cond-faulting;sink-common-insts;speculate-blocks;simplify-cond-branch;no-speculate-unpredictables>,instcombine<max-iterations=1;no-verify-fixpoint>,amdgpu-usenative,amdgpu-simplifylib),function-attrs,function(require<should-not-run-function-passes>),coro-split,coro-annotation-elide)),function(invalidate<should-not-run-function-passes>),cgscc(devirt<4>())" on module "/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/libc/src/math/generic/tanhf.cpp"
4.	Running pass "cgscc(devirt<4>(inline,function-attrs<skip-non-recursive-function-attrs>,argpromotion,openmp-opt-cgscc,function(amdgpu-promote-kernel-arguments,infer-address-spaces,amdgpu-lower-kernel-attributes,amdgpu-promote-alloca-to-vector),function<eager-inv;no-rerun>(sroa<modify-cfg>,early-cse<memssa>,speculative-execution<only-if-divergent-target>,jump-threading,correlated-propagation,simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;switch-range-to-icmp;no-switch-to-lookup;keep-loops;no-hoist-common-insts;no-hoist-loads-stores-with-cond-faulting;no-sink-common-insts;speculate-blocks;simplify-cond-branch;no-speculate-unpredictables>,instcombine<max-iterations=1;no-verify-fixpoint>,aggressive-instcombine,libcalls-shrinkwrap,amdgpu-usenative,amdgpu-simplifylib,tailcallelim,simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;switch-range-to-icmp;no-switch-to-lookup;keep-loops;no-hoist-common-insts;no-hoist-loads-stores-with-cond-faulting;no-sink-common-insts;speculate-blocks;simplify-cond-branch;no-speculate-unpredictables>,reassociate,constraint-elimination,loop-mssa(loop-instsimplify,loop-simplifycfg,licm<no-allowspeculation>,loop-rotate<header-duplication;prepare-for-lto>,licm<allowspeculation>,simple-loop-unswitch<nontrivial;trivial>),simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;switch-range-to-icmp;no-switch-to-lookup;keep-loops;no-hoist-common-insts;no-hoist-loads-stores-with-cond-faulting;no-sink-common-insts;speculate-blocks;simplify-cond-branch;no-speculate-unpredictables>,instcombine<max-iterations=1;no-verify-fixpoint>,loop(loop-idiom,indvars,extra-simple-loop-unswitch-passes,loop-deletion,loop-unroll-full),sroa<modify-cfg>,vector-combine,mldst-motion<no-split-footer-bb>,gvn<>,sccp,bdce,instcombine<max-iterations=1;no-verify-fixpoint>,amdgpu-usenative,amdgpu-simplifylib,jump-threading,correlated-propagation,adce,memcpyopt,dse,move-auto-init,loop-mssa(licm<allowspeculation>),coro-elide,simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;switch-range-to-icmp;no-switch-to-lookup;keep-loops;hoist-common-insts;no-hoist-loads-stores-with-cond-faulting;sink-common-insts;speculate-blocks;simplify-cond-branch;no-speculate-unpredictables>,instcombine<max-iterations=1;no-verify-fixpoint>,amdgpu-usenative,amdgpu-simplifylib),function-attrs,function(require<should-not-run-function-passes>),coro-split,coro-annotation-elide))" on module "/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/libc/src/math/generic/tanhf.cpp"
5.	Running pass "amdgpu-promote-alloca-to-vector" on function "tanhf"
 #0 0x000055fa2b6399ff llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/clang+++0x2ad09ff)
 #1 0x000055fa2b637514 llvm::sys::CleanupOnSignal(unsigned long) (/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/clang+++0x2ace514)
 #2 0x000055fa2b576988 CrashRecoverySignalHandler(int) CrashRecoveryContext.cpp:0:0
 #3 0x00007f7202281420 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x14420)
 #4 0x00007f7201d4e00b raise /build/glibc-FcRMwW/glibc-2.31/signal/../sysdeps/unix/sysv/linux/raise.c:51:1
 #5 0x00007f7201d2d859 abort /build/glibc-FcRMwW/glibc-2.31/stdlib/abort.c:81:7
 #6 0x00007f7201d2d729 get_sysdep_segment_value /build/glibc-FcRMwW/glibc-2.31/intl/loadmsgcat.c:509:8
 #7 0x00007f7201d2d729 _nl_load_domain /build/glibc-FcRMwW/glibc-2.31/intl/loadmsgcat.c:970:34
 #8 0x00007f7201d3efd6 (/lib/x86_64-linux-gnu/libc.so.6+0x33fd6)
 #9 0x000055fa2af055c5 (/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/clang+++0x239c5c5)
#10 0x000055fa2af10500 llvm::BinaryOperator::CreateNeg(llvm::Value*, llvm::Twine const&, llvm::InsertPosition) (/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/clang+++0x23a7500)
#11 0x000055fa299c1885 llvm::IRBuilderBase::CreateMul(llvm::Value*, llvm::Value*, llvm::Twine const&, bool, bool) (/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/clang+++0xe58885)
#12 0x000055fa2a1f8b6b GEPToVectorIndex(llvm::GetElementPtrInst*, llvm::AllocaInst*, llvm::Type*, llvm::DataLayout const&, llvm::SmallVector<llvm::Instruction*, 6u>&) AMDGPUPromoteAlloca.cpp:0:0
#13 0x000055fa2a1feb7d (anonymous namespace)::AMDGPUPromoteAllocaImpl::tryPromoteAllocaToVector(llvm::AllocaInst&) AMDGPUPromoteAlloca.cpp:0:0
#14 0x000055fa2a201ee0 (anonymous namespace)::AMDGPUPromoteAllocaImpl::run(llvm::Function&, bool) (.part.0) AMDGPUPromoteAlloca.cpp:0:0
#15 0x000055fa2a20443d llvm::AMDGPUPromoteAllocaToVectorPass::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/clang+++0x169b43d)
#16 0x000055fa29dcc7b6 llvm::detail::PassModel<llvm::Function, llvm::AMDGPUPromoteAllocaToVectorPass, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/clang+++0x12637b6)
#17 0x000055fa2afa8d99 llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/clang+++0x243fd99)
#18 0x000055fa2999cef6 llvm::detail::PassModel<llvm::Function, llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/clang+++0xe33ef6)
#19 0x000055fa2a48aeaa llvm::CGSCCToFunctionPassAdaptor::run(llvm::LazyCallGraph::SCC&, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>&, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&) (/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/clang+++0x1921eaa)
#20 0x000055fa29dccec6 llvm::detail::PassModel<llvm::LazyCallGraph::SCC, llvm::CGSCCToFunctionPassAdaptor, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&>::run(llvm::LazyCallGraph::SCC&, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>&, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&) (/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/clang+++0x1263ec6)
#21 0x000055fa2a48207a llvm::PassManager<llvm::LazyCallGraph::SCC, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&>::run(llvm::LazyCallGraph::SCC&, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>&, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&) (/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/clang+++0x191907a)
#22 0x000055fa2cc12476 llvm::detail::PassModel<llvm::LazyCallGraph::SCC, llvm::PassManager<llvm::LazyCallGraph::SCC, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&>, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&>::run(llvm::LazyCallGraph::SCC&, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>&, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&) (/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/clang+++0x40a9476)
#23 0x000055fa2a487a8d llvm::DevirtSCCRepeatedPass::run(llvm::LazyCallGraph::SCC&, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>&, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&) (/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/clang+++0x191ea8d)
#24 0x000055fa2cc124c6 llvm::detail::PassModel<llvm::LazyCallGraph::SCC, llvm::DevirtSCCRepeatedPass, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&>::run(llvm::LazyCallGraph::SCC&, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>&, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&) (/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/clang+++0x40a94c6)
#25 0x000055fa2a485848 llvm::ModuleToPostOrderCGSCCPassAdaptor::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/clang+++0x191c848)
#26 0x000055fa2cc12426 llvm::detail::PassModel<llvm::Module, llvm::ModuleToPostOrderCGSCCPassAdaptor, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/clang+++0x40a9426)
#27 0x000055fa2afa7021 llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/clang+++0x243e021)

Fix type error when GEP uses i64 offset introduced in #127973.

Fix type error when GEP uses i64 index introduced in llvm#127973.

Fix type error when GEP uses i64 index introduced in #127973.

perlfu requested review from jayfoad, arsenm, shiltian, ruiling and dstutt February 20, 2025 09:01

llvmbot added the backend:AMDGPU label Feb 20, 2025

arsenm reviewed Feb 20, 2025

View reviewed changes

arsenm requested a review from Pierre-vh February 20, 2025 09:06

Pierre-vh reviewed Feb 20, 2025

View reviewed changes

llvm/docs/AMDGPUUsage.rst Outdated Show resolved Hide resolved

llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp Outdated Show resolved Hide resolved

llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp Outdated Show resolved Hide resolved

perlfu added 2 commits February 21, 2025 16:38

[AMDGPU] Extend promotion of alloca to vectors

9293d1a

* Add multi dimensional array support * Make maximum vector size tunable * Make ratio of VGPRs used for vector promotion tunable

Address reviewer comments:

77d30c3

- Move options into pass attributes - Change vector size limit from max elements to max 32b registers - Add tests for i16, float and ptr in multi-dimensional arrays

perlfu force-pushed the amdgpu-promote-alloca-multidim branch from 2afc3ac to 77d30c3 Compare February 21, 2025 10:23

- Hide generated undefs in lit test to avoid deprecation warning

a7d5123

shiltian approved these changes Mar 11, 2025

View reviewed changes

perlfu merged commit d921bf2 into llvm:main Mar 12, 2025
12 checks passed

perlfu added a commit that referenced this pull request Mar 12, 2025

[AMDGPU] Fix typing error introduce in promote alloca change

525d412

Fix type error when GEP uses i64 offset introduced in #127973.

perlfu added a commit to perlfu/llvm-project that referenced this pull request Mar 18, 2025

[AMDGPU] Fix typing error in multi dimensional promote alloca

0d6d749

Fix type error when GEP uses i64 index introduced in llvm#127973.

perlfu mentioned this pull request Mar 18, 2025

[AMDGPU] Fix typing error in multi dimensional promote alloca #131763

Merged

perlfu added a commit to perlfu/llvm-project that referenced this pull request Mar 18, 2025

[AMDGPU] Fix typing error in multi dimensional promote alloca

4657185

Fix type error when GEP uses i64 index introduced in llvm#127973.

perlfu added a commit to perlfu/llvm-project that referenced this pull request Mar 18, 2025

[AMDGPU] Fix typing error in multi dimensional promote alloca

4a784b3

Fix type error when GEP uses i64 index introduced in llvm#127973.

perlfu added a commit that referenced this pull request Mar 18, 2025

[AMDGPU] Fix typing error in multi dimensional promote alloca (#131763)

0e4116a

Fix type error when GEP uses i64 index introduced in #127973.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AMDGPU] Extend promotion of alloca to vectors #127973

[AMDGPU] Extend promotion of alloca to vectors #127973

Uh oh!

perlfu commented Feb 20, 2025

Uh oh!

llvmbot commented Feb 20, 2025

Uh oh!

github-actions bot commented Feb 20, 2025 •

edited

Loading

Uh oh!

arsenm Feb 20, 2025

Uh oh!

shiltian Feb 20, 2025

Uh oh!

perlfu Feb 21, 2025

Uh oh!

shiltian Feb 21, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

perlfu commented Feb 21, 2025

Uh oh!

perlfu commented Mar 5, 2025

Uh oh!

Uh oh!

llvm-ci commented Mar 12, 2025

Uh oh!

llvm-ci commented Mar 12, 2025

Uh oh!

Uh oh!

[AMDGPU] Extend promotion of alloca to vectors #127973

[AMDGPU] Extend promotion of alloca to vectors #127973

Uh oh!

Conversation

perlfu commented Feb 20, 2025

Uh oh!

llvmbot commented Feb 20, 2025

Uh oh!

github-actions bot commented Feb 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arsenm Feb 20, 2025

Choose a reason for hiding this comment

Uh oh!

shiltian Feb 20, 2025

Choose a reason for hiding this comment

Uh oh!

perlfu Feb 21, 2025

Choose a reason for hiding this comment

Uh oh!

shiltian Feb 21, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

perlfu commented Feb 21, 2025

Uh oh!

perlfu commented Mar 5, 2025

Uh oh!

Uh oh!

llvm-ci commented Mar 12, 2025

Uh oh!

llvm-ci commented Mar 12, 2025

Uh oh!

Uh oh!

github-actions bot commented Feb 20, 2025 •

edited

Loading