Skip to content

Early return for zero size calls to get_tensor. #5482

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Feb 13, 2024

Conversation

manyoso
Copy link
Contributor

@manyoso manyoso commented Feb 13, 2024

The llama_copy_state_data_internal function can sometimes call ggml_backend_tensor_get with a null size when the kv cache is empty. This guards against unnecessarily syncing from device to host in such a case.

@slaren
Copy link
Member

slaren commented Feb 13, 2024

It would also be ok to do this more generally in ggml-backend so that all backends benefit from it.

@cebtenzzre
Copy link
Collaborator

See also abde521

@slaren slaren merged commit f5ca054 into ggml-org:master Feb 13, 2024
cebtenzzre pushed a commit to nomic-ai/llama.cpp that referenced this pull request Feb 13, 2024
* Early return for zero size calls to get_tensor.

Signed-off-by: Adam Treat <[email protected]>

* Update ggml-kompute.cpp

Co-authored-by: Georgi Gerganov <[email protected]>

* Update ggml-kompute.cpp

Co-authored-by: Georgi Gerganov <[email protected]>

* Add an early return to the get/set tensor when the size is null.

Signed-off-by: Adam Treat <[email protected]>

* Early return after the assertions.

Signed-off-by: Adam Treat <[email protected]>

* Since we do the early return in the generic backend now no reason to do so here as well.

Signed-off-by: Adam Treat <[email protected]>

---------

Signed-off-by: Adam Treat <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>
jordankanter pushed a commit to jordankanter/llama.cpp that referenced this pull request Mar 13, 2024
* Early return for zero size calls to get_tensor.

Signed-off-by: Adam Treat <[email protected]>

* Update ggml-kompute.cpp

Co-authored-by: Georgi Gerganov <[email protected]>

* Update ggml-kompute.cpp

Co-authored-by: Georgi Gerganov <[email protected]>

* Add an early return to the get/set tensor when the size is null.

Signed-off-by: Adam Treat <[email protected]>

* Early return after the assertions.

Signed-off-by: Adam Treat <[email protected]>

* Since we do the early return in the generic backend now no reason to do so here as well.

Signed-off-by: Adam Treat <[email protected]>

---------

Signed-off-by: Adam Treat <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>
hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 1, 2024
* Early return for zero size calls to get_tensor.

Signed-off-by: Adam Treat <[email protected]>

* Update ggml-kompute.cpp

Co-authored-by: Georgi Gerganov <[email protected]>

* Update ggml-kompute.cpp

Co-authored-by: Georgi Gerganov <[email protected]>

* Add an early return to the get/set tensor when the size is null.

Signed-off-by: Adam Treat <[email protected]>

* Early return after the assertions.

Signed-off-by: Adam Treat <[email protected]>

* Since we do the early return in the generic backend now no reason to do so here as well.

Signed-off-by: Adam Treat <[email protected]>

---------

Signed-off-by: Adam Treat <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants