Skip to content

Commit 7b41c5f

Browse files
daniandthewebNeo Zhang
authored andcommitted
CUDA: revert part of the RDNA1 optimizations (ggml-org#8309)
The change on the launch_bounds was causing a small performance drop in perplexity of 25 t/s
1 parent cebe433 commit 7b41c5f

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

ggml/src/ggml-cuda/mmq.cuh

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2263,9 +2263,9 @@ static __device__ void mul_mat_q_process_tile(
22632263

22642264
template <ggml_type type, int mmq_x, int nwarps, bool need_check>
22652265
#if defined(GGML_USE_HIPBLAS) && defined(__HIP_PLATFORM_AMD__)
2266-
#if defined(RDNA3) || defined(RDNA2) || defined(RDNA1)
2266+
#if defined(RDNA3) || defined(RDNA2)
22672267
__launch_bounds__(WARP_SIZE*nwarps, 2)
2268-
#endif // defined(RDNA3) || defined(RDNA2) || defined(RDNA1)
2268+
#endif // defined(RDNA3) || defined(RDNA2)
22692269
#else
22702270
#if __CUDA_ARCH__ >= CC_VOLTA
22712271
__launch_bounds__(WARP_SIZE*nwarps, 1)

0 commit comments

Comments
 (0)