Skip to content

Releases: ggml-org/llama.cpp

b5614

09 Jun 15:34
8f47e25
Compare
Choose a tag to compare
cuda : fix device sync on buffer clear (#14033)

b5613

09 Jun 15:32
201b31d
Compare
Choose a tag to compare
graph : fix geglu (#14077)

ggml-ci

b5612

09 Jun 12:12
e21d2d4
Compare
Choose a tag to compare
CANN: Simplify the environment variable setting(#13104)

* Simplify the environment variable setting to specify the memory pool type.

* Adjust the GGML_CANN_ASYNC_MODE setting to accept yes, enable, 1, or on (case-insensitive) as valid options.

* update

* fix CI

* update

* delete whitespace

* fix according to review

* update CANN.md

* update CANN.md

b5610

09 Jun 10:24
87d34b3
Compare
Choose a tag to compare
server : fix LRU check (#14079)

ggml-ci

b5609

09 Jun 10:17
b460d16
Compare
Choose a tag to compare
sycl: Add reorder to Q6_K mmvq implementation (#13885)

* Add Reorder to Q6_K mmvq implementation

* Address PR comments: clean up comments

* Remove unused parameter after refactoring q4_k

* Adding inline to function and removing unnecessary reference to int

---------

Signed-off-by: nscipione <[email protected]>

b5608

09 Jun 04:41
91a8ee6
Compare
Choose a tag to compare
add geglu activation function (#14074)

Co-authored-by: dinhhuy <[email protected]>

b5606

08 Jun 19:31
247e5c6
Compare
Choose a tag to compare
cuda : fix buffer type check with integrated GPUs (#14069)

b5604

07 Jun 13:43
228f34c
Compare
Choose a tag to compare
SYCL: Implement few same quantized type copy kernels (#13739)

* SYCL: Implement few same quantized type copy kernels

* Use memcpy for copying contiguous tensors

ggml-ci

* feat(sycl): add contiguous tensor copy support and device checks

Adds a memcpy path for contiguous tensors of the same type to optimize data transfer. Updates device support checks to recognize contiguous tensor operations, improving compatibility and performance.

* refactor: replace specific block copy functions with template

The changes replace multiple redundant block copy functions (e.g., cpy_block_q8_0_q8_0, cpy_block_q5_0_q5_0) with a single templated function cpy_blck_q_q. This reduces code duplication by using a generic template that works for any block type, improving maintainability while preserving the same functionality. The template is instantiated with specific block types (e.g., block_q8_0) where needed.

* Exclude BF16 support for COPY tensors for now
ggml-ci

* perf: adjust SYCL copy kernel block sizes for efficiency

Use ceil_div to ensure full element coverage and update nd_range parameters to better align with SYCL block sizes, improving parallelism and device utilization in copy operations.

b5603

07 Jun 13:13
0974ad7
Compare
Choose a tag to compare
llama : fix llama_model_chat_template with template name (LLM_KV with…

b5602

06 Jun 13:05
745aa53
Compare
Choose a tag to compare
llama : deprecate llama_kv_self_ API (#14030)

* llama : deprecate llama_kv_self_ API

ggml-ci

* llama : allow llama_memory_(nullptr)

ggml-ci

* memory : add flag for optional data clear in llama_memory_clear

ggml-ci