model : more uniform output id handling #14275

ggerganov · 2025-06-19T08:21:28Z

Move inp_out_ids creation at the beginning of the graph (needed for graph reuse later)
~~Minor optimization: do ggml_get_rows() only if n_outputs < n_tokens~~
Move ggml_get_rows() as early as possible after attention in some models
Avoid unnecessary ggml_get_rows() on unused tensors in some models
Decouple inp_out_ids logic from pooling type

slaren · 2025-06-19T09:36:14Z

Minor optimization: do ggml_get_rows() only if n_outputs < n_tokens

If the graph topology changes, it may force a re-allocation in ggml-alloc, which can interfere with the pipeline parallelism implementation.

ggerganov · 2025-06-19T10:02:53Z

Minor optimization: do ggml_get_rows() only if n_outputs < n_tokens

If the graph topology changes, it may force a re-allocation in ggml-alloc, which can interfere with the pipeline parallelism implementation.

Thanks for spotting - will revert this change.

ggml-ci

* mamba2-sync: (24 commits) sync : ggml Add `ggml_roll` (ggml/1274) docs : fix the link to llama.h (ggml-org#14293) CUDA: add conv_2d_transpose (ggml-org#14287) lint : remove trailing whitepace (ggml-org#14304) vocab : prevent tokenizer overflow (ggml-org#14301) sycl: add usage of enqueue_functions extension (ggml-org#14244) Implement GGML_CPU_ALL_VARIANTS for PowerPC (ggml-org#14286) llama : improve sep token handling (ggml-org#14272) cuda : synchronize graph capture and cublas handle destruction (ggml-org#14288) ggml : fix repack work size for mul_mat_id (ggml-org#14292) ggml: Update KleidiAI to v1.9.0 (ggml-org#14277) model : more uniform output id handling (ggml-org#14275) ubatch : new splitting logic (ggml-org#14217) CUDA: add conv_2d_dw (ggml-org#14265) ggml-cpu : remove unnecesary arm feature detection (ggml-org#14281) gguf-py : make sentencepiece optional (ggml-org#14200) server : add server parameters for draft model cache type (ggml-org#13782) build : suppress gcc15 compile warnings (ggml-org#14261) sycl: Cleanup codepaths in Get Rows in sycl backend (ggml-org#14215) ...

ggerganov force-pushed the gg/model-rework-out-ids branch 4 times, most recently from 78934b6 to 1e86597 Compare June 19, 2025 12:38

ggerganov mentioned this pull request Jun 19, 2025

kv-cache : use ggml_set_rows #14285

Open

4 tasks

Base automatically changed from gg/ubatch-rework to master June 20, 2025 07:14

ggerganov requested a review from ngxson as a code owner June 20, 2025 07:14

ggerganov added 3 commits June 20, 2025 10:16

model : more uniform output id handling

b8a2d10

ggml-ci

cont : revert n_outputs < n_tokens optimization

6f02448

ggml-ci

cont : fix out_ids initialization

2b940c0

ggml-ci

ggerganov force-pushed the gg/model-rework-out-ids branch from 1e86597 to 2b940c0 Compare June 20, 2025 07:16

ggerganov merged commit 812939a into master Jun 20, 2025
56 of 57 checks passed

ggerganov deleted the gg/model-rework-out-ids branch June 20, 2025 07:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

model : more uniform output id handling #14275

model : more uniform output id handling #14275

Uh oh!

ggerganov commented Jun 19, 2025 •

edited

Loading

Uh oh!

slaren commented Jun 19, 2025 •

edited

Loading

Uh oh!

ggerganov commented Jun 19, 2025

Uh oh!

Uh oh!

Uh oh!

model : more uniform output id handling #14275

model : more uniform output id handling #14275

Uh oh!

Conversation

ggerganov commented Jun 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

slaren commented Jun 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ggerganov commented Jun 19, 2025

Uh oh!

Uh oh!

Uh oh!

ggerganov commented Jun 19, 2025 •

edited

Loading

slaren commented Jun 19, 2025 •

edited

Loading