Releases: ggml-org/llama.cpp
Releases · ggml-org/llama.cpp
b5614
cuda : fix device sync on buffer clear (#14033)
b5613
graph : fix geglu (#14077) ggml-ci
b5612
CANN: Simplify the environment variable setting(#13104) * Simplify the environment variable setting to specify the memory pool type. * Adjust the GGML_CANN_ASYNC_MODE setting to accept yes, enable, 1, or on (case-insensitive) as valid options. * update * fix CI * update * delete whitespace * fix according to review * update CANN.md * update CANN.md
b5610
server : fix LRU check (#14079) ggml-ci
b5609
sycl: Add reorder to Q6_K mmvq implementation (#13885) * Add Reorder to Q6_K mmvq implementation * Address PR comments: clean up comments * Remove unused parameter after refactoring q4_k * Adding inline to function and removing unnecessary reference to int --------- Signed-off-by: nscipione <[email protected]>
b5608
add geglu activation function (#14074) Co-authored-by: dinhhuy <[email protected]>
b5606
cuda : fix buffer type check with integrated GPUs (#14069)
b5604
SYCL: Implement few same quantized type copy kernels (#13739) * SYCL: Implement few same quantized type copy kernels * Use memcpy for copying contiguous tensors ggml-ci * feat(sycl): add contiguous tensor copy support and device checks Adds a memcpy path for contiguous tensors of the same type to optimize data transfer. Updates device support checks to recognize contiguous tensor operations, improving compatibility and performance. * refactor: replace specific block copy functions with template The changes replace multiple redundant block copy functions (e.g., cpy_block_q8_0_q8_0, cpy_block_q5_0_q5_0) with a single templated function cpy_blck_q_q. This reduces code duplication by using a generic template that works for any block type, improving maintainability while preserving the same functionality. The template is instantiated with specific block types (e.g., block_q8_0) where needed. * Exclude BF16 support for COPY tensors for now ggml-ci * perf: adjust SYCL copy kernel block sizes for efficiency Use ceil_div to ensure full element coverage and update nd_range parameters to better align with SYCL block sizes, improving parallelism and device utilization in copy operations.
b5603
llama : fix llama_model_chat_template with template name (LLM_KV with…
b5602
llama : deprecate llama_kv_self_ API (#14030) * llama : deprecate llama_kv_self_ API ggml-ci * llama : allow llama_memory_(nullptr) ggml-ci * memory : add flag for optional data clear in llama_memory_clear ggml-ci