Releases: ngxson/llama.cpp
Releases · ngxson/llama.cpp
b4137
sync : ggml
b4133
Add required ggml-base and backend libs to cmake pkg (#10407)
b4132
cuda : fix CUDA_FLAGS not being applied (#10403)
b4131
llama : add check for KV cache shifts (#10401) ggml-ci
b4130
llama : add OLMo November 2024 support (#10394) * Add OLMo November 2024 constants * Add OLMo November 2024 converter * Add loading of OLMo November 2024 tensors and hyper parameters * Add building of OLMo November 2024 model
b4129
sycl : Add option to set the SYCL architecture for all targets (#10266) * Add option to set the SYCL architecture for all targets * Convert GGML_SYCL_HIP_TARGET to the more generic GGML_SYCL_ARCH option * Document that setting GGML_SYCL_ARCH can improve the performance
b4128
vulkan: Optimize soft_max (#10301) * vulkan: Optimize soft_max Large soft_max could already saturate memory, but small/medium sizes were pretty slow. The bulk of the gains for them comes from using a smaller workgroup size, and making the workgroup size match the subgroup size also makes the barriers much cheaper. Cache some values in locals to avoid refetching/recomputing. And stamp out a few "template instantiations" so smaller cases will fully unroll. Add a missing early return for OOB rows. This happens when there are more than 512 rows and the dispatch is 512 x H. * vulkan: Further soft_max optimizations Restore the workgroup size of 512 case, use it for >1024. Use unrollable loops for more iteration counts.
b4127
sycl: Revert MUL_MAT_OP support changes (#10385)
b4126
cuda : only use native when supported by cmake (#10389)
b4122
Vulkan: Fix device info output format specifiers (#10366) * Vulkan: Fix device info output format specifiers * Vulkan: Use zu printf specifier for size_t instead of ld