Releases · ggml-org/llama.cpp

09 Jun 15:34

8f47e25

b5614

cuda : fix device sync on buffer clear (#14033)

Assets 15

09 Jun 15:32

github-actions

b5613

201b31d

b5613

graph : fix geglu (#14077)

ggml-ci

Assets 15

09 Jun 12:12

github-actions

b5612

e21d2d4

b5612

CANN: Simplify the environment variable setting(#13104)

* Simplify the environment variable setting to specify the memory pool type.

* Adjust the GGML_CANN_ASYNC_MODE setting to accept yes, enable, 1, or on (case-insensitive) as valid options.

* update

* fix CI

* update

* delete whitespace

* fix according to review

* update CANN.md

* update CANN.md

Assets 15

09 Jun 10:24

github-actions

b5610

87d34b3

b5610

server : fix LRU check (#14079)

ggml-ci

Assets 15

09 Jun 10:17

github-actions

b5609

b460d16

b5609

sycl: Add reorder to Q6_K mmvq implementation (#13885)

* Add Reorder to Q6_K mmvq implementation

* Address PR comments: clean up comments

* Remove unused parameter after refactoring q4_k

* Adding inline to function and removing unnecessary reference to int

---------

Signed-off-by: nscipione <[email protected]>

Assets 15

09 Jun 04:41

github-actions

b5608

91a8ee6

b5608

add geglu activation function (#14074)

Co-authored-by: dinhhuy <[email protected]>

Assets 15

08 Jun 19:31

github-actions

b5606

247e5c6

b5606

cuda : fix buffer type check with integrated GPUs (#14069)

Assets 15

07 Jun 13:43

github-actions

b5604

228f34c

b5604

SYCL: Implement few same quantized type copy kernels (#13739)

* SYCL: Implement few same quantized type copy kernels

* Use memcpy for copying contiguous tensors

ggml-ci

* feat(sycl): add contiguous tensor copy support and device checks

Adds a memcpy path for contiguous tensors of the same type to optimize data transfer. Updates device support checks to recognize contiguous tensor operations, improving compatibility and performance.

* refactor: replace specific block copy functions with template

The changes replace multiple redundant block copy functions (e.g., cpy_block_q8_0_q8_0, cpy_block_q5_0_q5_0) with a single templated function cpy_blck_q_q. This reduces code duplication by using a generic template that works for any block type, improving maintainability while preserving the same functionality. The template is instantiated with specific block types (e.g., block_q8_0) where needed.

* Exclude BF16 support for COPY tensors for now
ggml-ci

* perf: adjust SYCL copy kernel block sizes for efficiency

Use ceil_div to ensure full element coverage and update nd_range parameters to better align with SYCL block sizes, improving parallelism and device utilization in copy operations.

Assets 15

07 Jun 13:13

github-actions

b5603

0974ad7

b5603

llama : fix llama_model_chat_template with template name (LLM_KV with…

Assets 15

06 Jun 13:05

github-actions

b5602

745aa53

b5602

llama : deprecate llama_kv_self_ API (#14030)

* llama : deprecate llama_kv_self_ API

ggml-ci

* llama : allow llama_memory_(nullptr)

ggml-ci

* memory : add flag for optional data clear in llama_memory_clear

ggml-ci

Assets 15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Releases: ggml-org/llama.cpp

b5614

Uh oh!

b5613

Uh oh!

b5612

Uh oh!

b5610

Uh oh!

b5609

Uh oh!

b5608

Uh oh!

b5606

Uh oh!

b5604

Uh oh!

b5603

Uh oh!

b5602

Uh oh!