Skip to content

Releases: ngxson/llama.cpp

b4137

19 Nov 19:44
Compare
Choose a tag to compare
sync : ggml

b4133

19 Nov 17:54
2a11b6b
Compare
Choose a tag to compare
Add required ggml-base and backend libs to cmake pkg (#10407)

b4132

19 Nov 14:47
3ee6382
Compare
Choose a tag to compare
cuda : fix CUDA_FLAGS not being applied (#10403)

b4131

19 Nov 12:50
8e752a7
Compare
Choose a tag to compare
llama : add check for KV cache shifts (#10401)

ggml-ci

b4130

19 Nov 10:33
a88ad00
Compare
Choose a tag to compare
llama : add OLMo November 2024 support (#10394)

* Add OLMo November 2024 constants

* Add OLMo November 2024 converter

* Add loading of OLMo November 2024 tensors and hyper parameters

* Add building of OLMo November 2024 model

b4129

19 Nov 10:02
2a1507c
Compare
Choose a tag to compare
sycl : Add option to set the SYCL architecture for all targets (#10266)

* Add option to set the SYCL architecture for all targets
* Convert GGML_SYCL_HIP_TARGET to the more generic GGML_SYCL_ARCH option
* Document that setting GGML_SYCL_ARCH can improve the performance

b4128

19 Nov 08:29
b3e5859
Compare
Choose a tag to compare
vulkan: Optimize soft_max (#10301)

* vulkan: Optimize soft_max

Large soft_max could already saturate memory, but small/medium sizes were
pretty slow. The bulk of the gains for them comes from using a smaller
workgroup size, and making the workgroup size match the subgroup size also
makes the barriers much cheaper.

Cache some values in locals to avoid refetching/recomputing. And stamp
out a few "template instantiations" so smaller cases will fully unroll.

Add a missing early return for OOB rows. This happens when there are more
than 512 rows and the dispatch is 512 x H.

* vulkan: Further soft_max optimizations

Restore the workgroup size of 512 case, use it for >1024.

Use unrollable loops for more iteration counts.

b4127

19 Nov 02:47
557924f
Compare
Choose a tag to compare
sycl: Revert MUL_MAT_OP support changes (#10385)

b4126

18 Nov 19:54
d3481e6
Compare
Choose a tag to compare
cuda : only use native when supported by cmake (#10389)

b4122

18 Nov 11:38
9b75f03
Compare
Choose a tag to compare
Vulkan: Fix device info output format specifiers (#10366)

* Vulkan: Fix device info output format specifiers

* Vulkan: Use zu printf specifier for size_t instead of ld