Skip to content

Single allocation of encode_async block with non-ARC capture in ggml-metal.m #9747

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Oct 7, 2024

Conversation

ptsochantaris
Copy link
Collaborator

This PR hopefully addresses the issue raised in a comment in ggml-metal.m:

// TODO: how to avoid this allocation? I tried initializing it in ggml_backend_metal_set_n_cb but it crashes.

The reason that fails is because ObjC blocks in manual memory management mode are not retained upon creation, they need to be explicitly retained, or rather, copied after creation. Usually this is done by using a copyable property declaration on the owning object, but in this case the block is assigned to a plain C structure. In such cases we can reproduce the same behaviour by using the Block_copy function. Having done that, we also need to release the block once we are done with it, preferably just before setting that property to nil.

This PR makes this change, and seems to fix the crash mentioned, but please do give it a go to verify before merging, just in case :D I hope this helps!

Copy link
Member

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great - thank you for the explanation!

Would Block_release make more sense to be in ggml_metal_free() instead?

diff --git a/ggml/src/ggml-metal.m b/ggml/src/ggml-metal.m
index 40835074..83bf658c 100644
--- a/ggml/src/ggml-metal.m
+++ b/ggml/src/ggml-metal.m
@@ -438,7 +438,6 @@ static struct ggml_backend_metal_context * ggml_metal_init(void) {
     ctx->capture_scope = nil;
 
     ctx->gf = nil;
-    Block_release(ctx->encode_async);
     ctx->encode_async = nil;
     for (int i = 0; i < GGML_METAL_MAX_COMMAND_BUFFERS; ++i) {
         ctx->command_buffers[i] = nil;
@@ -684,6 +683,8 @@ static void ggml_metal_free(struct ggml_backend_metal_context * ctx) {
         [ctx->kernels[i].pipeline release];
     }
 
+    Block_release(ctx->encode_async);
+
     [ctx->queue release];
     [ctx->device release];

In ggml_metal_init() we only initialize the pointer with nil and are never supposed to pass through that function a second time.

@ptsochantaris
Copy link
Collaborator Author

ptsochantaris commented Oct 7, 2024

Oh indeed, I had thought that was the place where things were being de-init because of the nil assignment. It definitely belongs in the deallocation place instead. Will update in a sec.

//ctx->encode_async = ^(size_t iter) {
// ...
//};
ctx->encode_async = Block_copy(^(size_t iter) {
Copy link
Member

@ggerganov ggerganov Oct 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory, we might want to call ggml_backend_metal_set_n_cb() multiple times in the future (although unlikely). So to make this more correct in this case, we should check if ctx->encode_async == nil here and if not, call Block_release() first

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that had crossed my mind indeed, but didn't want to add more logic there than needed for now - worried about scope creep. I agree this makes total sense. Will add 👍

@ggerganov ggerganov merged commit 96b6912 into master Oct 7, 2024
53 checks passed
@ggerganov ggerganov deleted the ggml-metal-fix-single-allocation-crash branch October 7, 2024 12:26
dsx1986 pushed a commit to dsx1986/llama.cpp that referenced this pull request Oct 29, 2024
* Single allocation of encode_async block with non-ARC capture in ggml-metal.m

* Moving Block_release to the deallocation code

* Release encode block when re-setting encoding buffer count if needed

* Update ggml/src/ggml-metal.m

---------

Co-authored-by: Georgi Gerganov <[email protected]>
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 15, 2024
* Single allocation of encode_async block with non-ARC capture in ggml-metal.m

* Moving Block_release to the deallocation code

* Release encode block when re-setting encoding buffer count if needed

* Update ggml/src/ggml-metal.m

---------

Co-authored-by: Georgi Gerganov <[email protected]>
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 18, 2024
* Single allocation of encode_async block with non-ARC capture in ggml-metal.m

* Moving Block_release to the deallocation code

* Release encode block when re-setting encoding buffer count if needed

* Update ggml/src/ggml-metal.m

---------

Co-authored-by: Georgi Gerganov <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants