server: args for draft model cache types (#11200) #13782

aa956 · 2025-05-25T17:48:36Z

Should fix the #11200, while keeping the default f16 from #10586.

New command line arguments:

Argument	Explanation
`-ctkd, --cache-type-k-draft TYPE`	KV cache data type for K for speculative decoding model allowed values: f32, f16, bf16, q8_0, q4_0, q4_1, iq4_nl, q5_0, q5_1 (default: f16) (env: LLAMA_ARG_CACHE_TYPE_K_DRAFT)
`-ctvd, --cache-type-v-draft TYPE`	KV cache data type for V for speculative decoding model allowed values: f32, f16, bf16, q8_0, q4_0, q4_1, iq4_nl, q5_0, q5_1 (default: f16) (env: LLAMA_ARG_CACHE_TYPE_V_DRAFT)

…ma.cpp/ggml-org#11200

CISC · 2025-06-19T10:04:08Z

@ggerganov forgot to merge?

aa956 · 2025-06-19T11:20:58Z

@ggerganov forgot to merge?

@ngxson was added automatically on pull request creation, so merging probably needs 2 approvals.

I'm not sure what can I do regarding the failed arm64/ppc64/risc64 builds as I have no access to that kind of hardware and looking at the build logs all of these failed initial apt operations (maybe apt caches/mirrors for these architectures were down at the check time or something similar).

E: Failed to fetch https://security.ubuntu.com/ubuntu/dists/noble-security/main/binary-riscv64/Packages  404  Not Found [IP: 52.252.163.49 80]
E: Some index files failed to download. They have been ignored, or old ones used instead.
Reading package lists...
Building dependency tree...
Reading state information...
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
 libcurl4t64:riscv64 : Depends: libgssapi-krb5-2:riscv64 (>= 1.17) but it is not going to be installed
 libssh-4:riscv64 : Depends: libgssapi-krb5-2:riscv64 (>= 1.17) but it is not going to be installed
E: Unable to correct problems, you have held broken packages.

CISC · 2025-06-19T11:34:13Z

@ggerganov forgot to merge?

@ngxson was added automatically on pull request creation, so merging probably needs 2 approvals.

No, this can be merged now, but just want to make sure it's not on hold for some reason first. :)

I'm not sure what can I do regarding the failed arm64/ppc64/risc64 builds as I have no access to that kind of hardware and looking at the build logs all of these failed initial apt operations (maybe apt caches/mirrors for these architectures were down at the check time or something similar).

Yes, these failures are irrelevant, it has been fixed in CI since.

ggerganov · 2025-06-19T13:01:22Z

Yes, thanks for the reminder.

* mamba2-sync: (24 commits) sync : ggml Add `ggml_roll` (ggml/1274) docs : fix the link to llama.h (ggml-org#14293) CUDA: add conv_2d_transpose (ggml-org#14287) lint : remove trailing whitepace (ggml-org#14304) vocab : prevent tokenizer overflow (ggml-org#14301) sycl: add usage of enqueue_functions extension (ggml-org#14244) Implement GGML_CPU_ALL_VARIANTS for PowerPC (ggml-org#14286) llama : improve sep token handling (ggml-org#14272) cuda : synchronize graph capture and cublas handle destruction (ggml-org#14288) ggml : fix repack work size for mul_mat_id (ggml-org#14292) ggml: Update KleidiAI to v1.9.0 (ggml-org#14277) model : more uniform output id handling (ggml-org#14275) ubatch : new splitting logic (ggml-org#14217) CUDA: add conv_2d_dw (ggml-org#14265) ggml-cpu : remove unnecesary arm feature detection (ggml-org#14281) gguf-py : make sentencepiece optional (ggml-org#14200) server : add server parameters for draft model cache type (ggml-org#13782) build : suppress gcc15 compile warnings (ggml-org#14261) sycl: Cleanup codepaths in Get Rows in sycl backend (ggml-org#14215) ...

Adds server parameters for draft model cache type. Fixes ggml-org/lla…

0522270

…ma.cpp/ggml-org#11200

aa956 requested a review from ngxson as a code owner May 25, 2025 17:48

github-actions bot added examples server labels May 25, 2025

ggerganov approved these changes May 30, 2025

View reviewed changes

ggerganov merged commit d67341d into ggml-org:master Jun 19, 2025
41 of 46 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

server: args for draft model cache types (#11200) #13782

server: args for draft model cache types (#11200) #13782

Uh oh!

aa956 commented May 25, 2025 •

edited

Loading

Uh oh!

CISC commented Jun 19, 2025

Uh oh!

aa956 commented Jun 19, 2025

Uh oh!

CISC commented Jun 19, 2025

Uh oh!

Uh oh!

ggerganov commented Jun 19, 2025 •

edited

Loading

Uh oh!

Uh oh!

server: args for draft model cache types (#11200) #13782

server: args for draft model cache types (#11200) #13782

Uh oh!

Conversation

aa956 commented May 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CISC commented Jun 19, 2025

Uh oh!

aa956 commented Jun 19, 2025

Uh oh!

CISC commented Jun 19, 2025

Uh oh!

Uh oh!

ggerganov commented Jun 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

aa956 commented May 25, 2025 •

edited

Loading

ggerganov commented Jun 19, 2025 •

edited

Loading