Skip to content

Releases: ggml-org/llama.cpp

b4445

08 Jan 15:42
c07d437
Compare
Choose a tag to compare
llama : avoid hardcoded QK_K (#11061)

ggml-ci

b4443

08 Jan 12:24
c792dcf
Compare
Choose a tag to compare
ggml : allow loading backend with env variable (ggml/1059)

ref: #1058

b4440

08 Jan 11:31
8cef75c
Compare
Choose a tag to compare
llamafile : ppc64le MMA INT8 implementation (#10912)

This change upstreams llamafile's cpu matrix
multiplication kernels for ppc64le using MMA
builtins for quantised int8 datatype.

This change results in 10% - 70% improvement
in total speed(ie all tokens/total time), across
various batch sizes.

The patch is tested with Meta-Lllama-3-8B,
Mistral-7B, Llama-2-7B-chat-hf models on a
IBM POWER10 machine.

Signed-off-by: Amrita H S <[email protected]>

b4439

08 Jan 10:15
0d52a69
Compare
Choose a tag to compare
ci : fix cmake option (#11125)

b4438

08 Jan 09:11
02f0430
Compare
Choose a tag to compare
Disable GL_KHR_cooperative_matrix Vulkan extension if not available. …

b4437

08 Jan 09:06
bec2183
Compare
Choose a tag to compare
fix: Vulkan shader gen binary path when Cross-compiling (#11096)

* fix: Vulkan shader gen binary path when cross compiling

b4435

07 Jan 16:01
017cc5f
Compare
Choose a tag to compare
ggml-backend : only offload from host buffers (fix) (#11124)

b4434

07 Jan 12:25
a3d50bc
Compare
Choose a tag to compare
ggml-backend : only offload from host buffers (#11120)

b4433

07 Jan 07:24
a4dd490
Compare
Choose a tag to compare
rpc : code cleanup (#11107)

Remove duplicated macros, use GGML_LOG_ERROR for errors

b4432

07 Jan 07:08
c0d6f79
Compare
Choose a tag to compare
SYCL: Use get_multi_ptr instead of deprecated get_pointer in wkv6 (#1…