-
Notifications
You must be signed in to change notification settings - Fork 12.2k
ggml: move fp16/bf16 conversion optimizations to CPU backend + export conversion APIs #13107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
It would be better to simply move the accelerated versions to the CPU backend, since that's the only place where these functions are going to affect evaluation performance. Keep the basic C-only functions in |
@slaren Hi. You're right that moving the accelerated implementations to the CPU backend would make the separation cleaner. However, there's a practical issue here:
Unless we explicitly register the accelerated implementation function pointers from the CPU backend during Let me know if you'd prefer this approach. |
Yes, the optimized version will not be available to applications, but that's not really a problem since these functions are not generally used by applications in performance sensitive paths. So, to be clear:
Applications can continue to use the basic C implementation from |
Got it — I understand your point now. I've updated the implementation accordingly: the vectorized versions have been moved to the CPU backend, and the CPU backend now uses its own optimized functions instead of relying on the ones from Additionally, I added four new exported functions: GGML_BACKEND_API void ggml_cpu_fp32_to_fp16(const float *, ggml_fp16_t *, int64_t);
GGML_BACKEND_API void ggml_cpu_fp16_to_fp32(const ggml_fp16_t *, float *, int64_t);
GGML_BACKEND_API void ggml_cpu_fp32_to_bf16(const float *, ggml_bf16_t *, int64_t);
GGML_BACKEND_API void ggml_cpu_bf16_to_fp32(const ggml_bf16_t *, float *, int64_t); |
|
… conversion APIs (ggml-org#13107) * ggml: dynamic x86_64 feature detection for FP32 <-> FP16/BF16 conversion * move fp converter to ggml-cpu * Switch ggml_compute_forward_get_rows_f16/bf16 to new ggml_cpu_fp16/bf16_to_fp32
This PR introduces two main improvements to the x86_64 implementations of low-precision floating point conversions in GGML:
Added runtime detection of CPU capabilities (e.g., AVX2, AVX-512) for the functionsggml_bf16_to_fp32_row()
andggml_fp32_to_fp16_row()
.ggml_fp32_to_fp16_row()
ggml-cpu
.Benchmark (Ryzen 9950X)
build: 13be08d (5186)
Before Optimization:
After Optimization:
pp512
: ~5% throughput increase