Releases · ngxson/llama.cpp

19 Nov 19:44

9fe0fb0

b4137

sync : ggml

Assets 21

19 Nov 17:54

github-actions

b4133

2a11b6b

b4133

Add required ggml-base and backend libs to cmake pkg (#10407)

Assets 21

19 Nov 14:47

github-actions

b4132

3ee6382

b4132

cuda : fix CUDA_FLAGS not being applied (#10403)

Assets 21

19 Nov 12:50

github-actions

b4131

8e752a7

b4131

llama : add check for KV cache shifts (#10401)

ggml-ci

Assets 21

19 Nov 10:33

github-actions

b4130

a88ad00

b4130

llama : add OLMo November 2024 support (#10394)

* Add OLMo November 2024 constants

* Add OLMo November 2024 converter

* Add loading of OLMo November 2024 tensors and hyper parameters

* Add building of OLMo November 2024 model

Assets 21

19 Nov 10:02

github-actions

b4129

2a1507c

b4129

sycl : Add option to set the SYCL architecture for all targets (#10266)

* Add option to set the SYCL architecture for all targets
* Convert GGML_SYCL_HIP_TARGET to the more generic GGML_SYCL_ARCH option
* Document that setting GGML_SYCL_ARCH can improve the performance

Assets 21

19 Nov 08:29

github-actions

b4128

b3e5859

b4128

vulkan: Optimize soft_max (#10301)

* vulkan: Optimize soft_max

Large soft_max could already saturate memory, but small/medium sizes were
pretty slow. The bulk of the gains for them comes from using a smaller
workgroup size, and making the workgroup size match the subgroup size also
makes the barriers much cheaper.

Cache some values in locals to avoid refetching/recomputing. And stamp
out a few "template instantiations" so smaller cases will fully unroll.

Add a missing early return for OOB rows. This happens when there are more
than 512 rows and the dispatch is 512 x H.

* vulkan: Further soft_max optimizations

Restore the workgroup size of 512 case, use it for >1024.

Use unrollable loops for more iteration counts.

Assets 21

19 Nov 02:47

github-actions

b4127

557924f

b4127

sycl: Revert MUL_MAT_OP support changes (#10385)

Assets 21

18 Nov 19:54

github-actions

b4126

d3481e6

b4126

cuda : only use native when supported by cmake (#10389)

Assets 21

18 Nov 11:38

github-actions

b4122

9b75f03

b4122

Vulkan: Fix device info output format specifiers (#10366)

* Vulkan: Fix device info output format specifiers

* Vulkan: Use zu printf specifier for size_t instead of ld

Assets 21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Releases: ngxson/llama.cpp

b4137

Uh oh!

b4133

Uh oh!

b4132

Uh oh!

b4131

Uh oh!

b4130

Uh oh!

b4129

Uh oh!

b4128

Uh oh!

b4127

Uh oh!

b4126

Uh oh!

b4122

Uh oh!