Skip to content

Releases: ggml-org/llama.cpp

b5646

12 Jun 10:19
7d51644
Compare
Choose a tag to compare
server : re-enable SWA speculative decoding (#14131)

ggml-ci

b5645

12 Jun 10:09
f6e1a7a
Compare
Choose a tag to compare
context : simplify output counting logic during decode (#14142)

* batch : remove logits_all flag

ggml-ci

* context : simplify output counting logic during decode

ggml-ci

* cont : fix comments

b5644

12 Jun 09:53
c3ee46f
Compare
Choose a tag to compare
batch : remove logits_all flag (#14141)

ggml-ci

b5642

12 Jun 08:38
9596506
Compare
Choose a tag to compare
kv-cache : fix split_equal handling in unified implementation (#14130)

ggml-ci

b5641

12 Jun 08:12
a20b2b0
Compare
Choose a tag to compare
context : round n_tokens to next multiple of n_seqs when reserving (#…

b5640

11 Jun 20:49
2e89f76
Compare
Choose a tag to compare
common: fix issue with regex_escape routine on windows (#14133)

b5639

11 Jun 19:28
532802f
Compare
Choose a tag to compare
Implement GGML_CPU_ALL_VARIANTS for ARM (#14080)

* ggml-cpu: Factor out feature detection build from x86

* ggml-cpu: Add ARM feature detection and scoring

This is analogous to cpu-feats-x86.cpp. However, to detect compile-time
activation of features, we rely on GGML_USE_<FEAT> which need to be set
in cmake, instead of GGML_<FEAT> that users would set for x86.

This is because on ARM, users specify features with GGML_CPU_ARM_ARCH,
rather than with individual flags.

* ggml-cpu: Implement GGML_CPU_ALL_VARIANTS for ARM

Like x86, however to pass around arch flags within cmake, we use
GGML_INTERNAL_<FEAT> as we don't have GGML_<FEAT>.

Some features are optional, so we may need to build multiple backends
per arch version (armv8.2_1, armv8.2_2, ...), and let the scoring
function sort out which one can be used.

* ggml-cpu: Limit ARM GGML_CPU_ALL_VARIANTS to Linux for now

The other platforms will need their own specific variants.

This also fixes the bug that the the variant-building branch was always
being executed as the else-branch of GGML_NATIVE=OFF. The branch is
moved to an elseif-branch which restores the previous behavior.

b5638

11 Jun 17:20
d4e0d95
Compare
Choose a tag to compare
chore : clean up relative source dir paths (#14128)

b5637

11 Jun 16:22
cc66a7f
Compare
Choose a tag to compare
tests : add test-tokenizers-repo (#14017)

b5636

11 Jun 15:27
bd248d4
Compare
Choose a tag to compare
vulkan: Better thread-safety for command pools/buffers (#14116)

This change moves the command pool/buffer tracking into a vk_command_pool
structure. There are two instances per context (for compute+transfer) and
two instances per device for operations that don't go through a context.
This should prevent separate contexts from stomping on each other.