Releases: ngxson/llama.cpp
Releases · ngxson/llama.cpp
b4209
ggml : fix row condition for i8mm kernels (#10561) ggml-ci
b4206
kompute : improve backend to pass test_backend_ops (#10542) * kompute: op_unary: reject unsupported parameters Signed-off-by: Sergio Lopez <[email protected]> * kompute: softmax: implement ALiBi support Signed-off-by: Sergio Lopez <[email protected]> * kompute: rope: implement neox and phi3 support Signed-off-by: Sergio Lopez <[email protected]> * kompute: op_mul_mat_q4_k permutted support Signed-off-by: Sergio Lopez <[email protected]> * kompute: op_mul_mat_[q4_0|q4_1|q8_0] permutted support Signed-off-by: Sergio Lopez <[email protected]> * kompute: op_mul_mat_f16 permutted support Signed-off-by: Sergio Lopez <[email protected]> * kompute: op_mul_mat_q6_k permutted support Signed-off-by: Sergio Lopez <[email protected]> --------- Signed-off-by: Sergio Lopez <[email protected]>
b4205
CANN: Update cann.md to display correctly in CLion (#10538)
b4203
CANN: ROPE operator optimization (#10540) * [cann] ROPE operator optimization Co-authored-by: noemotiovon <[email protected]>
b4202
common : fix duplicated file name with hf_repo and hf_file (#10550)
b4201
Add some minimal optimizations for CDNA (#10498) * Add some minimal optimizations for CDNA * ggml_cuda: set launch bounds also for GCN as it helps there too
b4200
ci : faster CUDA toolkit installation method and use ccache (#10537) * ci : faster CUDA toolkit installation method and use ccache * remove fetch-depth * only pack CUDA runtime on master
b4196
vulkan: define all quant data structures in types.comp (#10440)
b4177
speculative : simplify the implementation (#10504) ggml-ci
b4175
CANN: RoPE and CANCAT operator optimization (#10488) Co-authored-by: noemotiovon <[email protected]>