Skip to content

Commit 4e3e894

Browse files
slarenmglambda
authored andcommitted
server : fix draft context not being released (ggml-org#11354)
1 parent 723c8cf commit 4e3e894

File tree

1 file changed

+3
-0
lines changed

1 file changed

+3
-0
lines changed

examples/server/server.cpp

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1772,6 +1772,9 @@ struct server_context {
17721772
// force F16 KV cache for the draft model for extra performance
17731773
cparams_dft.type_k = GGML_TYPE_F16;
17741774
cparams_dft.type_v = GGML_TYPE_F16;
1775+
1776+
// the context is not needed - we will create one for each slot
1777+
llama_init_dft.context.reset();
17751778
}
17761779

17771780
chat_templates = common_chat_templates_from_model(model, params_base.chat_template);

0 commit comments

Comments
 (0)