Skip to content

model : more uniform output id handling #14275

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jun 20, 2025
Merged

Conversation

ggerganov
Copy link
Member

@ggerganov ggerganov commented Jun 19, 2025

target #14217

  • Move inp_out_ids creation at the beginning of the graph (needed for graph reuse later)
  • Minor optimization: do ggml_get_rows() only if n_outputs < n_tokens
  • Move ggml_get_rows() as early as possible after attention in some models
  • Avoid unnecessary ggml_get_rows() on unused tensors in some models
  • Decouple inp_out_ids logic from pooling type

@slaren
Copy link
Member

slaren commented Jun 19, 2025

  • Minor optimization: do ggml_get_rows() only if n_outputs < n_tokens

If the graph topology changes, it may force a re-allocation in ggml-alloc, which can interfere with the pipeline parallelism implementation.

@ggerganov
Copy link
Member Author

  • Minor optimization: do ggml_get_rows() only if n_outputs < n_tokens

If the graph topology changes, it may force a re-allocation in ggml-alloc, which can interfere with the pipeline parallelism implementation.

Thanks for spotting - will revert this change.

@ggerganov ggerganov force-pushed the gg/model-rework-out-ids branch 4 times, most recently from 78934b6 to 1e86597 Compare June 19, 2025 12:38
@ggerganov ggerganov mentioned this pull request Jun 19, 2025
4 tasks
Base automatically changed from gg/ubatch-rework to master June 20, 2025 07:14
@ggerganov ggerganov requested a review from ngxson as a code owner June 20, 2025 07:14
@ggerganov ggerganov force-pushed the gg/model-rework-out-ids branch from 1e86597 to 2b940c0 Compare June 20, 2025 07:16
@ggerganov ggerganov merged commit 812939a into master Jun 20, 2025
56 of 57 checks passed
@ggerganov ggerganov deleted the gg/model-rework-out-ids branch June 20, 2025 07:50
gabe-l-hart added a commit to gabe-l-hart/llama.cpp that referenced this pull request Jun 20, 2025
* mamba2-sync: (24 commits)
sync : ggml
Add `ggml_roll` (ggml/1274)
docs : fix the link to llama.h (ggml-org#14293)
CUDA: add conv_2d_transpose (ggml-org#14287)
lint : remove trailing whitepace (ggml-org#14304)
vocab : prevent tokenizer overflow (ggml-org#14301)
sycl: add usage of enqueue_functions extension (ggml-org#14244)
Implement GGML_CPU_ALL_VARIANTS for PowerPC (ggml-org#14286)
llama : improve sep token handling (ggml-org#14272)
cuda : synchronize graph capture and cublas handle destruction (ggml-org#14288)
ggml : fix repack work size for mul_mat_id (ggml-org#14292)
ggml: Update KleidiAI to v1.9.0 (ggml-org#14277)
model : more uniform output id handling (ggml-org#14275)
ubatch : new splitting logic (ggml-org#14217)
CUDA: add conv_2d_dw (ggml-org#14265)
ggml-cpu : remove unnecesary arm feature detection (ggml-org#14281)
gguf-py : make sentencepiece optional (ggml-org#14200)
server : add server parameters for draft model cache type (ggml-org#13782)
build : suppress gcc15 compile warnings (ggml-org#14261)
sycl: Cleanup codepaths in Get Rows in sycl backend (ggml-org#14215)
...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants