Releases: ngxson/llama.cpp
Releases · ngxson/llama.cpp
b5547
sched : avoid changing cur_copy when a graph is already allocated (#1…
b5546
parallel : increase the variability of the prompt lengths (#13927) ggml-ci
b5545
cuda : prevent using split buffers with 3d/4d matrices (#13919)
b5544
SYCL: Add mrope kernel (#13755) * SYCL: Add mrope kernel * feat: Optimize rope operations with vectorization Uses `sycl::vec` to load and store two elements at a time, significantly improving performance in `rope_norm`, `rope_neox`, and `rope_multi`. This reduces the number of memory accesses and leverages SIMD instructions for faster execution. * Use ceil_div
b5543
sync : vendor (#13901) * sync : vendor ggml-ci * cont : fix httplib version ggml-ci * cont : fix lint * cont : fix lint * vendor : move to common folder /vendor ggml-ci * cont : fix lint * cont : move httplib to /vendor + use json_fwd.hpp ggml-ci * cont : fix server build ggml-ci * cont : add missing headers ggml-ci * cont : header clean-up ggml-ci
b5541
convert : allow partial update to the chkhsh pre-tokenizer list (#13847) * convert : allow partial update to the chkhsh pre-tokenizer list * code style * update tokenizer out * rm inp/out files for models not having gguf * fixed hash for glm * skip nomic-bert-moe test * Update convert_hf_to_gguf_update.py * fix minerva-7b hash * rm redundant import
b5540
llama : add support for DistilBert (#13907) * add distilbert * small fixes * add note for LLM_ARCH_DISTIL_BERT * Use MODEL_ARCH.BERT for DistilBert --------- Co-authored-by: dinhhuy <[email protected]>
b5539
llama : use llm_build_granite for minicpm (#13911)
b5538
cmake: Guard GGML_CPU_ALL_VARIANTS by architecture (#13890)
b5537
llama : add support for jina-reranker-v2 (#13900)