Skip to content

Commit 59ad593

Browse files
committed
HIP: force max threads per block to be 1024
Some old compilers still use 256. Explicitly set it to 1024 to get correct result from ops like ARGMAX and GROUP_NORM. Related: #10610, #11619 Signed-off-by: fxzjshm <[email protected]>
1 parent d92cb67 commit 59ad593

File tree

1 file changed

+3
-0
lines changed

1 file changed

+3
-0
lines changed

ggml/src/ggml-hip/CMakeLists.txt

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,9 @@ endif()
4646

4747
message(STATUS "HIP and hipBLAS found")
4848

49+
# Workaround old compilers
50+
set(CMAKE_HIP_FLAGS "${CMAKE_HIP_FLAGS} --gpu-max-threads-per-block=1024")
51+
4952
file(GLOB GGML_HEADERS_ROCM "../ggml-cuda/*.cuh")
5053
list(APPEND GGML_HEADERS_ROCM "../../include/ggml-cuda.h")
5154

0 commit comments

Comments
 (0)