Releases · ggml-org/llama.cpp

27 May 13:09

bc583e3

b5508

mtmd : support Qwen 2.5 Omni (input audio+vision, no audio output) (#…

Assets 18

27 May 12:43

github-actions

b5506

7fe03e7

b5506

ggml-cpu: x86 feature detection is specific to x86 (#13811)

Assets 18

27 May 12:01

github-actions

b5505

952f395

b5505

ggml : allow CUDA graphs when using pipeline parallelism (#13814)

Assets 18

27 May 11:53

github-actions

b5504

8171312

b5504

kv-cells : track min/max used cells and per-sequence positions (#13808)

* kv-cells : track min/max used cells and per-sequence positions

ggml-ci

* kv-cells : fix pos-modification updates for seq_pos

ggml-ci

* kv-cells : add comments

ggml-ci

Assets 18

27 May 09:43

github-actions

b5503

f9cd683

b5503

sampling : make sure samplers return at least 1 token (#13822)

* sampling : min-p should always return at least one token

ggml-ci

* sampling : same for typical sampling

* tests : sampling tests use min_keep == 0

ggml-ci

Assets 18

27 May 07:00

github-actions

b5502

4f81b33

b5502

llama : validate seq id batch input (#13809)

* llama : validate seq id batch input

ggml-ci

* cont : fix the fix

ggml-ci

Assets 18

26 May 22:06

github-actions

b5501

cdf94a1

b5501

server: --offline mode (#13804)

* server: --offline mode (env: LLAMA_OFFLINE)

---------

Co-authored-by: Xuan-Son Nguyen <[email protected]>

Assets 18

26 May 19:51

github-actions

b5499

4265a87

b5499

cuda : avoid cuGetErrorString (#13791)

ggml-ci

Assets 18

26 May 17:46

github-actions

b5498

6f180b9

b5498

SYCL: Add non contiguous support in RMS_NORM and NORM kernels (#13611)

* SYCL: Add non contiguous input support to norm kernel

* refactor and add RMS_NORM non contiguous input support

ggml-ci

* restore subgroup reduction for multi-subgroup thread blocks in norm kernels

* Swap grid dims of nsamples and nrows

ggml-ci

* Revert "Swap grid dims of nsamples and nrows"

This reverts commit 43be2d657fec7f7fba54e2cd154106bc0fc45adf.

* restore not required changes
ggml-ci

* address review comments: change it to more like SYCL

* Use a common function to calculate offset

* remove wrap around logic for handling broadcasts

* remove static from calculate_offset fn and use ceil_div

Assets 18

26 May 16:07

github-actions

b5497

03f582a

b5497

server: fix streaming crashes (#13786)

* add preludes to content on partial regex match

* allow all parsers to parse non-tool-call content.

* tweak order of <|python_tag|> vs <function= parsing for functionary v3.1 format. still not ideal but hopefully less prone to crash

Assets 18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Releases: ggml-org/llama.cpp

b5508

Uh oh!

b5506

Uh oh!

b5505

Uh oh!

b5504

Uh oh!

b5503

Uh oh!

b5502

Uh oh!

b5501

Uh oh!

b5499

Uh oh!

b5498

Uh oh!

b5497

Uh oh!