Skip to content

Commit 7e596d4

Browse files
committed
HIP: force max threads per block to be 1024
Some old compilers still use 256. Explicitly set it to 1024 to get correct result from ops like ARGMAX and GROUP_NORM. Related: #10610, #11619 Signed-off-by: fxzjshm <[email protected]>
1 parent d92cb67 commit 7e596d4

File tree

1 file changed

+3
-0
lines changed

1 file changed

+3
-0
lines changed

ggml/src/ggml-hip/CMakeLists.txt

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,9 @@ find_package(hip REQUIRED)
4040
find_package(hipblas REQUIRED)
4141
find_package(rocblas REQUIRED)
4242

43+
# Workaround old compilers
44+
set(CMAKE_HIP_FLAGS "${CMAKE_HIP_FLAGS} --gpu-max-threads-per-block=1024")
45+
4346
if (${hip_VERSION} VERSION_LESS 5.5)
4447
message(FATAL_ERROR "At least ROCM/HIP V5.5 is required")
4548
endif()

0 commit comments

Comments
 (0)