Releases: ggml-org/llama.cpp
Releases · ggml-org/llama.cpp
b5646
b5645
context : simplify output counting logic during decode (#14142) * batch : remove logits_all flag ggml-ci * context : simplify output counting logic during decode ggml-ci * cont : fix comments
b5644
batch : remove logits_all flag (#14141) ggml-ci
b5642
kv-cache : fix split_equal handling in unified implementation (#14130) ggml-ci
b5641
context : round n_tokens to next multiple of n_seqs when reserving (#…
b5640
common: fix issue with regex_escape routine on windows (#14133)
b5639
Implement GGML_CPU_ALL_VARIANTS for ARM (#14080) * ggml-cpu: Factor out feature detection build from x86 * ggml-cpu: Add ARM feature detection and scoring This is analogous to cpu-feats-x86.cpp. However, to detect compile-time activation of features, we rely on GGML_USE_<FEAT> which need to be set in cmake, instead of GGML_<FEAT> that users would set for x86. This is because on ARM, users specify features with GGML_CPU_ARM_ARCH, rather than with individual flags. * ggml-cpu: Implement GGML_CPU_ALL_VARIANTS for ARM Like x86, however to pass around arch flags within cmake, we use GGML_INTERNAL_<FEAT> as we don't have GGML_<FEAT>. Some features are optional, so we may need to build multiple backends per arch version (armv8.2_1, armv8.2_2, ...), and let the scoring function sort out which one can be used. * ggml-cpu: Limit ARM GGML_CPU_ALL_VARIANTS to Linux for now The other platforms will need their own specific variants. This also fixes the bug that the the variant-building branch was always being executed as the else-branch of GGML_NATIVE=OFF. The branch is moved to an elseif-branch which restores the previous behavior.
b5638
chore : clean up relative source dir paths (#14128)
b5637
tests : add test-tokenizers-repo (#14017)
b5636
vulkan: Better thread-safety for command pools/buffers (#14116) This change moves the command pool/buffer tracking into a vk_command_pool structure. There are two instances per context (for compute+transfer) and two instances per device for operations that don't go through a context. This should prevent separate contexts from stomping on each other.