Skip to content

Commit 6f02448

Browse files
committed
cont : revert n_outputs < n_tokens optimization
ggml-ci
1 parent b8a2d10 commit 6f02448

File tree

2 files changed

+70
-62
lines changed

2 files changed

+70
-62
lines changed

src/llama-graph.cpp

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -874,6 +874,14 @@ ggml_tensor * llm_graph_context::build_inp_attn_scale() const {
874874
}
875875

876876
ggml_tensor * llm_graph_context::build_inp_out_ids() const {
877+
// note: when all tokens are output, we could skip this optimization to spare the ggml_get_rows() calls,
878+
// but this would make the graph topology depend on the number of output tokens, which can interere with
879+
// features that require constant topology such as pipline parallelism
880+
// ref: https://github.com/ggml-org/llama.cpp/pull/14275#issuecomment-2987424471
881+
//if (n_outputs < n_tokens) {
882+
// return nullptr;
883+
//}
884+
877885
auto inp = std::make_unique<llm_graph_input_out_ids>(hparams, cparams, n_outputs);
878886

879887
auto & cur = inp->out_ids;

0 commit comments

Comments
 (0)