-
Notifications
You must be signed in to change notification settings - Fork 12.3k
fix: use vm_allocate
to allocate CPU backend buffer on macOS
#9875
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: use vm_allocate
to allocate CPU backend buffer on macOS
#9875
Conversation
ggml/src/ggml-backend.cpp
Outdated
@@ -770,12 +772,21 @@ static const char * ggml_backend_cpu_buffer_type_get_name(ggml_backend_buffer_ty | |||
} | |||
|
|||
static ggml_backend_buffer_t ggml_backend_cpu_buffer_type_alloc_buffer(ggml_backend_buffer_type_t buft, size_t size) { | |||
#if defined(GGML_USE_METAL) || defined(TARGET_OS_OSX) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cannot rely on backend macros like GGML_USE_METAL
, and I don't see the point of enabling this specifically with Metal anyway, when the crash is happening in the CPU backend. Instead I would suggest using GGML_ALIGNED_MALLOC
everywhere like the TODO in the comment suggests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main focus here is on TARGET_OS_OSX
, but I figured that including GGML_USE_METAL
covers more cases where posix_memalign
is available and can be used instead of malloc
.
I'll move GGML_ALIGNED_MALLOC
to ggml-backend-impl.h
and use it instead then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It still needs some changes:
- It should be in
ggml-impl.h
, notggml-backend-impl.h
- The implementation details need to be hidden in the source file, the header should only have the
ggml_aligned_malloc
andggml_aligned_free
functions, and the macros should be removed
I realize this is a lot more work than just fixing the bug, but we cannot add technical debt every time we fix a bug.
…t for `vm_allocate` on macOS
…ctions and add them to `ggml-impl.h`
posix_memalign
to allocate CPU backend buffer on macOSvm_allocate
to allocate CPU backend buffer on macOS
ggml/src/ggml.c
Outdated
#if defined(_MSC_VER) || defined(__MINGW32__) | ||
#define GGML_ALIGNED_MALLOC(size) _aligned_malloc(size, GGML_MEM_ALIGN) | ||
#define GGML_ALIGNED_FREE(ptr) _aligned_free(ptr) | ||
return _aligned_malloc(size, GGML_MEM_ALIGN); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GGML_MEM_ALIGN
is 16, but the CPU backend expects an alignment of TENSOR_ALIGNMENT = 32
. I suspect that's why it is crashing sometimes. It would probably be ok to increase the alignment of this function to the page size in every case.
Do we really need
|
When loading I used We can limit the allocs to a smaller size, but I think that using |
The problem with reducing the value of |
Just seems like unnecessary complexity. ie if @giladgd It might worth trying to run your test with malloc debugging enabled |
@max-krasnyansky Thanks for the guidance for malloc debugging. To investigate it further, I've run some tests where I tried calling
Test 1const long alloc_size = 587218944;
fprintf(stderr, "alloc_size: %ld\n", alloc_size);
fflush(stderr);
void * aligned_memory = NULL;
void * aligned_memory2 = NULL;
int result = posix_memalign(&aligned_memory, sysconf(_SC_PAGESIZE), alloc_size);
int result2 = posix_memalign(&aligned_memory2, sysconf(_SC_PAGESIZE), alloc_size);
if (result != 0 || result2 != 0) {
fputs("failed to allocate memory\n", stderr);
fflush(stderr);
} else {
fputs("done\n", stderr);
fflush(stderr);
} Result:
Test 2const long alloc_size = 1073741824;
fprintf(stderr, "alloc_size: %ld\n", alloc_size);
fflush(stderr);
void * aligned_memory = NULL;
int result = posix_memalign(&aligned_memory, sysconf(_SC_PAGESIZE), alloc_size);
if (result != 0) {
fputs("failed to allocate memory\n", stderr);
fflush(stderr);
} else {
fputs("done\n", stderr);
fflush(stderr);
} Result:
(crashed) Test 3const long alloc_size = 1073741824;
fprintf(stderr, "alloc_size: %ld\n", alloc_size);
fflush(stderr);
void * aligned_memory = NULL;
kern_return_t alloc_status = vm_allocate((vm_map_t) mach_task_self(), (vm_address_t *) &aligned_memory, alloc_size, VM_FLAGS_ANYWHERE);
if (alloc_status != KERN_SUCCESS) {
fputs("failed to allocate memory\n", stderr);
fflush(stderr);
} else {
fputs("done\n", stderr);
fflush(stderr);
} Result:
Notice that the I've shared my findings on the Electron repo in the past for a similar issue, but I'll open another issue for this so they can investigate it further. Update: here's the issue on the Electron repo: link |
@giladgd |
…org#9875) * fix: use `vm_allocate` to allocate CPU backend buffer on macOS * fix: switch to `posix_memalign` to keep existing `free()` usages work * feat: move `GGML_ALIGNED_MALLOC` to `ggml-backend-impl.h`, add support for `vm_allocate` on macOS * style: formatting * fix: move const outside of `#ifndef` * style: formatting * fix: unused var * fix: transform `GGML_ALIGNED_MALLOC` and `GGML_ALIGNED_FREE` into functions and add them to `ggml-impl.h` * fix: unused var * fix: page align to `GGUF_DEFAULT_ALIGNMENT` * fix: page align to `TENSOR_ALIGNMENT` * fix: convert `TENSOR_ALIGNMENT` to a macro * fix: increase page size to `32` on iOS * fix: iOS page size * fix: `hbw_posix_memalign` alignment
…org#9875) * fix: use `vm_allocate` to allocate CPU backend buffer on macOS * fix: switch to `posix_memalign` to keep existing `free()` usages work * feat: move `GGML_ALIGNED_MALLOC` to `ggml-backend-impl.h`, add support for `vm_allocate` on macOS * style: formatting * fix: move const outside of `#ifndef` * style: formatting * fix: unused var * fix: transform `GGML_ALIGNED_MALLOC` and `GGML_ALIGNED_FREE` into functions and add them to `ggml-impl.h` * fix: unused var * fix: page align to `GGUF_DEFAULT_ALIGNMENT` * fix: page align to `TENSOR_ALIGNMENT` * fix: convert `TENSOR_ALIGNMENT` to a macro * fix: increase page size to `32` on iOS * fix: iOS page size * fix: `hbw_posix_memalign` alignment
…org#9875) * fix: use `vm_allocate` to allocate CPU backend buffer on macOS * fix: switch to `posix_memalign` to keep existing `free()` usages work * feat: move `GGML_ALIGNED_MALLOC` to `ggml-backend-impl.h`, add support for `vm_allocate` on macOS * style: formatting * fix: move const outside of `#ifndef` * style: formatting * fix: unused var * fix: transform `GGML_ALIGNED_MALLOC` and `GGML_ALIGNED_FREE` into functions and add them to `ggml-impl.h` * fix: unused var * fix: page align to `GGUF_DEFAULT_ALIGNMENT` * fix: page align to `TENSOR_ALIGNMENT` * fix: convert `TENSOR_ALIGNMENT` to a macro * fix: increase page size to `32` on iOS * fix: iOS page size * fix: `hbw_posix_memalign` alignment
…org#9875) * fix: use `vm_allocate` to allocate CPU backend buffer on macOS * fix: switch to `posix_memalign` to keep existing `free()` usages work * feat: move `GGML_ALIGNED_MALLOC` to `ggml-backend-impl.h`, add support for `vm_allocate` on macOS * style: formatting * fix: move const outside of `#ifndef` * style: formatting * fix: unused var * fix: transform `GGML_ALIGNED_MALLOC` and `GGML_ALIGNED_FREE` into functions and add them to `ggml-impl.h` * fix: unused var * fix: page align to `GGUF_DEFAULT_ALIGNMENT` * fix: page align to `TENSOR_ALIGNMENT` * fix: convert `TENSOR_ALIGNMENT` to a macro * fix: increase page size to `32` on iOS * fix: iOS page size * fix: `hbw_posix_memalign` alignment
Creating a context in an Electron app using
node-llama-cpp
on macOS without Metal crashes the process with some models (issue), so I've investigated what's happening and found that allocating a large memory block usingmalloc
is the culprit.For some reason, it happens only on Electron and not on Nodejs, but I couldn't figure out why.
To fix this issue, I switched to use
GGML_ALIGNED_MALLOC
instead ofmalloc
inggml_backend_cpu_buffer_type_alloc_buffer
, and added support forvm_allocate
inGGML_ALIGNED_MALLOC
on macOS.