Skip to content

Commit a5a915b

Browse files
committed
server : fix speculative decoding with context shift
ggml-ci
1 parent cc98896 commit a5a915b

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

examples/server/server.cpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2325,7 +2325,7 @@ struct server_context {
23252325
llama_token id = slot.sampled;
23262326

23272327
struct common_speculative_params params_spec;
2328-
params_spec.n_draft = slot.params.speculative.n_max;
2328+
params_spec.n_draft = std::min(slot.params.speculative.n_max, slot.n_ctx - slot.n_past - 1);
23292329
params_spec.n_reuse = llama_n_ctx(slot.ctx_dft) - slot.params.speculative.n_max;
23302330
params_spec.p_min = slot.params.speculative.p_min;
23312331

0 commit comments

Comments
 (0)