Skip to content

server: args for draft model cache types (#11200) #13782

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 19, 2025

Conversation

aa956
Copy link
Contributor

@aa956 aa956 commented May 25, 2025

Should fix the #11200, while keeping the default f16 from #10586.

New command line arguments:

Argument Explanation
-ctkd, --cache-type-k-draft TYPE KV cache data type for K for speculative decoding model
allowed values: f32, f16, bf16, q8_0, q4_0, q4_1, iq4_nl, q5_0, q5_1
(default: f16)
(env: LLAMA_ARG_CACHE_TYPE_K_DRAFT)
-ctvd, --cache-type-v-draft TYPE KV cache data type for V for speculative decoding model
allowed values: f32, f16, bf16, q8_0, q4_0, q4_1, iq4_nl, q5_0, q5_1
(default: f16)
(env: LLAMA_ARG_CACHE_TYPE_V_DRAFT)

@CISC
Copy link
Collaborator

CISC commented Jun 19, 2025

@ggerganov forgot to merge?

@aa956
Copy link
Contributor Author

aa956 commented Jun 19, 2025

@ggerganov forgot to merge?

@ngxson was added automatically on pull request creation, so merging probably needs 2 approvals.

I'm not sure what can I do regarding the failed arm64/ppc64/risc64 builds as I have no access to that kind of hardware and looking at the build logs all of these failed initial apt operations (maybe apt caches/mirrors for these architectures were down at the check time or something similar).

E: Failed to fetch https://security.ubuntu.com/ubuntu/dists/noble-security/main/binary-riscv64/Packages  404  Not Found [IP: 52.252.163.49 80]
E: Some index files failed to download. They have been ignored, or old ones used instead.
Reading package lists...
Building dependency tree...
Reading state information...
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
 libcurl4t64:riscv64 : Depends: libgssapi-krb5-2:riscv64 (>= 1.17) but it is not going to be installed
 libssh-4:riscv64 : Depends: libgssapi-krb5-2:riscv64 (>= 1.17) but it is not going to be installed
E: Unable to correct problems, you have held broken packages.

@CISC
Copy link
Collaborator

CISC commented Jun 19, 2025

@ggerganov forgot to merge?

@ngxson was added automatically on pull request creation, so merging probably needs 2 approvals.

No, this can be merged now, but just want to make sure it's not on hold for some reason first. :)

I'm not sure what can I do regarding the failed arm64/ppc64/risc64 builds as I have no access to that kind of hardware and looking at the build logs all of these failed initial apt operations (maybe apt caches/mirrors for these architectures were down at the check time or something similar).

Yes, these failures are irrelevant, it has been fixed in CI since.

@ggerganov ggerganov merged commit d67341d into ggml-org:master Jun 19, 2025
41 of 46 checks passed
@ggerganov
Copy link
Member

ggerganov commented Jun 19, 2025

Yes, thanks for the reminder.

gabe-l-hart added a commit to gabe-l-hart/llama.cpp that referenced this pull request Jun 20, 2025
* mamba2-sync: (24 commits)
sync : ggml
Add `ggml_roll` (ggml/1274)
docs : fix the link to llama.h (ggml-org#14293)
CUDA: add conv_2d_transpose (ggml-org#14287)
lint : remove trailing whitepace (ggml-org#14304)
vocab : prevent tokenizer overflow (ggml-org#14301)
sycl: add usage of enqueue_functions extension (ggml-org#14244)
Implement GGML_CPU_ALL_VARIANTS for PowerPC (ggml-org#14286)
llama : improve sep token handling (ggml-org#14272)
cuda : synchronize graph capture and cublas handle destruction (ggml-org#14288)
ggml : fix repack work size for mul_mat_id (ggml-org#14292)
ggml: Update KleidiAI to v1.9.0 (ggml-org#14277)
model : more uniform output id handling (ggml-org#14275)
ubatch : new splitting logic (ggml-org#14217)
CUDA: add conv_2d_dw (ggml-org#14265)
ggml-cpu : remove unnecesary arm feature detection (ggml-org#14281)
gguf-py : make sentencepiece optional (ggml-org#14200)
server : add server parameters for draft model cache type (ggml-org#13782)
build : suppress gcc15 compile warnings (ggml-org#14261)
sycl: Cleanup codepaths in Get Rows in sycl backend (ggml-org#14215)
...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants