Releases · ngxson/llama.cpp

27 May 11:13

8171312

b5504

kv-cells : track min/max used cells and per-sequence positions (#13808)

* kv-cells : track min/max used cells and per-sequence positions

ggml-ci

* kv-cells : fix pos-modification updates for seq_pos

ggml-ci

* kv-cells : add comments

ggml-ci

Assets 18

27 May 09:33

github-actions

b5503

f9cd683

b5503

sampling : make sure samplers return at least 1 token (#13822)

* sampling : min-p should always return at least one token

ggml-ci

* sampling : same for typical sampling

* tests : sampling tests use min_keep == 0

ggml-ci

Assets 18

27 May 06:56

github-actions

b5502

4f81b33

b5502

llama : validate seq id batch input (#13809)

* llama : validate seq id batch input

ggml-ci

* cont : fix the fix

ggml-ci

Assets 18

26 May 21:49

github-actions

b5501

cdf94a1

b5501

server: --offline mode (#13804)

* server: --offline mode (env: LLAMA_OFFLINE)

---------

Co-authored-by: Xuan-Son Nguyen <[email protected]>

Assets 18

26 May 19:34

github-actions

b5499

4265a87

b5499

cuda : avoid cuGetErrorString (#13791)

ggml-ci

Assets 18

26 May 16:11

github-actions

b5498

6f180b9

b5498

SYCL: Add non contiguous support in RMS_NORM and NORM kernels (#13611)

* SYCL: Add non contiguous input support to norm kernel

* refactor and add RMS_NORM non contiguous input support

ggml-ci

* restore subgroup reduction for multi-subgroup thread blocks in norm kernels

* Swap grid dims of nsamples and nrows

ggml-ci

* Revert "Swap grid dims of nsamples and nrows"

This reverts commit 43be2d657fec7f7fba54e2cd154106bc0fc45adf.

* restore not required changes
ggml-ci

* address review comments: change it to more like SYCL

* Use a common function to calculate offset

* remove wrap around logic for handling broadcasts

* remove static from calculate_offset fn and use ceil_div

Assets 18

26 May 15:25

github-actions

b5497

03f582a

b5497

server: fix streaming crashes (#13786)

* add preludes to content on partial regex match

* allow all parsers to parse non-tool-call content.

* tweak order of <|python_tag|> vs <function= parsing for functionary v3.1 format. still not ideal but hopefully less prone to crash

Assets 18

26 May 14:33

github-actions

b5495

d74e94c

b5495

`server`: fix format of streamed tool call deltas (diff name, fix id …

Assets 18

26 May 13:33

github-actions

b5494

f13847c

b5494

server: fix regression on streamed non-chat completion w/ stops (#13785)

* more forgiving message diffs: partial stop words aren't erased, full stops are

* Add (slow) server test for completion + stream + stop

Assets 18

26 May 11:25

github-actions

b5493

79c137f

b5493

examples : allow extracting embeddings from decoder contexts (#13797)

ggml-ci

Assets 18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Releases: ngxson/llama.cpp

b5504

Uh oh!

b5503

Uh oh!

b5502

Uh oh!

b5501

Uh oh!

b5499

Uh oh!

b5498

Uh oh!

b5497

Uh oh!

b5495

Uh oh!

b5494

Uh oh!

b5493

Uh oh!