Skip to content

Releases: ggml-org/llama.cpp

b5508

27 May 13:09
bc583e3
Compare
Choose a tag to compare
mtmd : support Qwen 2.5 Omni (input audio+vision, no audio output) (#…

b5506

27 May 12:43
7fe03e7
Compare
Choose a tag to compare
ggml-cpu: x86 feature detection is specific to x86 (#13811)

b5505

27 May 12:01
952f395
Compare
Choose a tag to compare
ggml : allow CUDA graphs when using pipeline parallelism (#13814)

b5504

27 May 11:53
8171312
Compare
Choose a tag to compare
kv-cells : track min/max used cells and per-sequence positions (#13808)

* kv-cells : track min/max used cells and per-sequence positions

ggml-ci

* kv-cells : fix pos-modification updates for seq_pos

ggml-ci

* kv-cells : add comments

ggml-ci

b5503

27 May 09:43
f9cd683
Compare
Choose a tag to compare
sampling : make sure samplers return at least 1 token (#13822)

* sampling : min-p should always return at least one token

ggml-ci

* sampling : same for typical sampling

* tests : sampling tests use min_keep == 0

ggml-ci

b5502

27 May 07:00
4f81b33
Compare
Choose a tag to compare
llama : validate seq id batch input (#13809)

* llama : validate seq id batch input

ggml-ci

* cont : fix the fix

ggml-ci

b5501

26 May 22:06
cdf94a1
Compare
Choose a tag to compare
server: --offline mode (#13804)

* server: --offline mode (env: LLAMA_OFFLINE)

---------

Co-authored-by: Xuan-Son Nguyen <[email protected]>

b5499

26 May 19:51
4265a87
Compare
Choose a tag to compare
cuda : avoid cuGetErrorString (#13791)

ggml-ci

b5498

26 May 17:46
6f180b9
Compare
Choose a tag to compare
SYCL: Add non contiguous support in RMS_NORM and NORM kernels (#13611)

* SYCL: Add non contiguous input support to norm kernel

* refactor and add RMS_NORM non contiguous input support

ggml-ci

* restore subgroup reduction for multi-subgroup thread blocks in norm kernels

* Swap grid dims of nsamples and nrows

ggml-ci

* Revert "Swap grid dims of nsamples and nrows"

This reverts commit 43be2d657fec7f7fba54e2cd154106bc0fc45adf.

* restore not required changes
ggml-ci

* address review comments: change it to more like SYCL

* Use a common function to calculate offset

* remove wrap around logic for handling broadcasts

* remove static from calculate_offset fn and use ceil_div

b5497

26 May 16:07
03f582a
Compare
Choose a tag to compare
server: fix streaming crashes (#13786)

* add preludes to content on partial regex match

* allow all parsers to parse non-tool-call content.

* tweak order of <|python_tag|> vs <function= parsing for functionary v3.1 format. still not ideal but hopefully less prone to crash