Releases: ggml-org/llama.cpp
Releases · ggml-org/llama.cpp
b4445
llama : avoid hardcoded QK_K (#11061) ggml-ci
b4443
ggml : allow loading backend with env variable (ggml/1059) ref: #1058
b4440
llamafile : ppc64le MMA INT8 implementation (#10912) This change upstreams llamafile's cpu matrix multiplication kernels for ppc64le using MMA builtins for quantised int8 datatype. This change results in 10% - 70% improvement in total speed(ie all tokens/total time), across various batch sizes. The patch is tested with Meta-Lllama-3-8B, Mistral-7B, Llama-2-7B-chat-hf models on a IBM POWER10 machine. Signed-off-by: Amrita H S <[email protected]>
b4439
ci : fix cmake option (#11125)
b4438
Disable GL_KHR_cooperative_matrix Vulkan extension if not available. …
b4437
fix: Vulkan shader gen binary path when Cross-compiling (#11096) * fix: Vulkan shader gen binary path when cross compiling
b4435
ggml-backend : only offload from host buffers (fix) (#11124)
b4434
ggml-backend : only offload from host buffers (#11120)
b4433
rpc : code cleanup (#11107) Remove duplicated macros, use GGML_LOG_ERROR for errors
b4432
SYCL: Use get_multi_ptr instead of deprecated get_pointer in wkv6 (#1…