Skip to content

Releases: ngxson/llama.cpp

b5504

27 May 11:13
8171312
Compare
Choose a tag to compare
kv-cells : track min/max used cells and per-sequence positions (#13808)

* kv-cells : track min/max used cells and per-sequence positions

ggml-ci

* kv-cells : fix pos-modification updates for seq_pos

ggml-ci

* kv-cells : add comments

ggml-ci

b5503

27 May 09:33
f9cd683
Compare
Choose a tag to compare
sampling : make sure samplers return at least 1 token (#13822)

* sampling : min-p should always return at least one token

ggml-ci

* sampling : same for typical sampling

* tests : sampling tests use min_keep == 0

ggml-ci

b5502

27 May 06:56
4f81b33
Compare
Choose a tag to compare
llama : validate seq id batch input (#13809)

* llama : validate seq id batch input

ggml-ci

* cont : fix the fix

ggml-ci

b5501

26 May 21:49
cdf94a1
Compare
Choose a tag to compare
server: --offline mode (#13804)

* server: --offline mode (env: LLAMA_OFFLINE)

---------

Co-authored-by: Xuan-Son Nguyen <[email protected]>

b5499

26 May 19:34
4265a87
Compare
Choose a tag to compare
cuda : avoid cuGetErrorString (#13791)

ggml-ci

b5498

26 May 16:11
6f180b9
Compare
Choose a tag to compare
SYCL: Add non contiguous support in RMS_NORM and NORM kernels (#13611)

* SYCL: Add non contiguous input support to norm kernel

* refactor and add RMS_NORM non contiguous input support

ggml-ci

* restore subgroup reduction for multi-subgroup thread blocks in norm kernels

* Swap grid dims of nsamples and nrows

ggml-ci

* Revert "Swap grid dims of nsamples and nrows"

This reverts commit 43be2d657fec7f7fba54e2cd154106bc0fc45adf.

* restore not required changes
ggml-ci

* address review comments: change it to more like SYCL

* Use a common function to calculate offset

* remove wrap around logic for handling broadcasts

* remove static from calculate_offset fn and use ceil_div

b5497

26 May 15:25
03f582a
Compare
Choose a tag to compare
server: fix streaming crashes (#13786)

* add preludes to content on partial regex match

* allow all parsers to parse non-tool-call content.

* tweak order of <|python_tag|> vs <function= parsing for functionary v3.1 format. still not ideal but hopefully less prone to crash

b5495

26 May 14:33
d74e94c
Compare
Choose a tag to compare
`server`: fix format of streamed tool call deltas (diff name, fix id …

b5494

26 May 13:33
f13847c
Compare
Choose a tag to compare
server: fix regression on streamed non-chat completion w/ stops (#13785)

* more forgiving message diffs: partial stop words aren't erased, full stops are

* Add (slow) server test for completion + stream + stop

b5493

26 May 11:25
79c137f
Compare
Choose a tag to compare
examples : allow extracting embeddings from decoder contexts (#13797)

ggml-ci