Releases: ggml-org/llama.cpp
Releases Β· ggml-org/llama.cpp
b4411
fix: Vulkan shader gen binary path (#11037)
b4409
metal : avoid uint (#11019)
b4406
server : allow using LoRA adapters per-request (#10994) * slot.can_batch_with * lora per request * test: force disable cache prompt * move can_batch_with check * fix condition * add slow test with llama 8b * update docs * move lora change task to queue * Apply suggestions from code review Co-authored-by: Georgi Gerganov <[email protected]> * lora_base * remove redundant check --------- Co-authored-by: Georgi Gerganov <[email protected]>
b4404
ggml : fixes for AVXVNNI instruction set with MSVC and Clang (#11027) * Fixes for clang AVX VNNI * enable AVX VNNI and alder lake build for MSVC * Apply suggestions from code review --------- Co-authored-by: slaren <[email protected]>
b4403
server : clean up built-in template detection (#11026) * server : clean up built-in template detection * fix compilation * add chat template test * fix condition
b4402
server : add OAI compat for /v1/completions (#10974) * server : add OAI compat for /v1/completions * add test * add docs * better docs
b4400
common, examples, ggml : fix MSYS2 GCC compiler errors and warnings wβ¦
b4399
vulkan: optimize mul_mat for small values of N (#10991) Make the mul_mat_vec shaders support N>1 (as a spec constant, NUM_COLS) where the batch_strides are overloaded to hold the row strides. Put the loads from the B matrix in the innermost loop because it should cache better. Share some code for reducing the result values to memory in mul_mat_vec_base.
b4398
android : fix llama_batch free (#11014)
b4397
vulkan: im2col and matmul optimizations for stable diffusion (#10942) * tests: Add im2col perf tests * vulkan: optimize im2col, more elements per thread * vulkan: increase small tile size for NV_coopmat2 * vulkan: change im2col to 512 elements per workgroup