-
Notifications
You must be signed in to change notification settings - Fork 455
CDRIVER-4627 Avoid unexpected crash with SIGBUS when trying to allocate a /dev/shm shared memory but there is not enough memory #1244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the thorough analysis and the fix. Left a suggested edit to fix build on macOS.
Also consider: the Performance counters are an optional feature and can be disabled with the cmake option -DENABLE_SHM_COUNTERS=OFF
.
@@ -184,10 +184,10 @@ mongoc_counters_alloc (size_t size) | |||
/* | |||
* NOTE: | |||
* | |||
* ftruncate() will cause reads to be zero. Therefore, we don't need to | |||
* posix_fallocate() will cause reads to be zero. Therefore, we don't need to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
posix_fallocate() will cause reads to be zero.
Is (or was) this note correct? The call to memset (mem, 0, size);
below zeros the memory, suggesting writing zeros may be required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is correct (it's made explicit in the man page of ftruncate, it's not so clear in the man page of posix_fallocate but the caveats about the emulation case written by glibc folks mentions explicitly this feature of zeroing-out the "new" memory region allocated by this).
Yet indeed, the rest of the code seems to memset things to zero anyway. So, I don't know if you just want to drop this comment entirely ? Or remove the memset ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not so clear in the man page of posix_fallocate but the caveats about the emulation case written by glibc folks mentions explicitly this feature of zeroing-out the "new" memory region allocated by this
I suggest removing the comment. If this is not documented behavior in the POSIX standard, I err towards keeping the memset
.
ca3d37e
to
25100f7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the fix. Changes look good with the comment removal.
… shared memory but there is not enough memory. Indeed ftruncate doesn't ensure memory is properly allocated, it ony changes the file size from the VFS point of view, but doesn't actually allocate any memory. So ftruncate might work despite we have no memory left, and later when trying to zero-memset the mmapped buffer, we might actually get a SIGBUS signal crashing the whole process. Instead, make sure we can allocate the whole shared memory using posix_fallocate and gracefully handle allocation problems, without crashing. The chromium project faced a similar issue in the past: https://bugs.chromium.org/p/chromium/issues/detail?id=951431
25100f7
to
b65db34
Compare
Avoids an unexpected crash with SIGBUS when trying to allocate a /dev/shm shared memory but there is not enough memory. Indeed ftruncate doesn't ensure memory is properly allocated, it ony changes the file size from the VFS point of view, but doesn't actually allocate any memory. So ftruncate might work despite we have no memory left, and later when trying to zero-memset the mmapped buffer, we might actually get a SIGBUS signal crashing the whole process. Instead, make sure we can allocate the whole shared memory using posix_fallocate and gracefully handle allocation problems, without crashing. The chromium project faced a similar issue in the past: https://bugs.chromium.org/p/chromium/issues/detail?id=951431
Hi,
This patch tries to fix an unexpected crash we have been witnessing in one of our production environment as /dev/shm was somehow (almost) full.
Indeed
ftruncate
doesn't ensure memory is properly allocated, it only changes the file size from the VFS point of view, but doesn't actually allocate any memory. Softruncate
might work despite we have no memory left, and later when trying to zero-memset the mmapped buffer, we might actually get a SIGBUS signal crashing the whole process.Instead, make sure we can allocate the whole shared memory using
posix_fallocate
and gracefully handle allocation problems, without crashing.The chromium project faced a similar issue in the past: https://bugs.chromium.org/p/chromium/issues/detail?id=951431
As a complementary, here is how to reproduce this rather simply if you have an OCI engine like podman. I started a
fedora
container with podman and force using a very small /dev/shm, large enough to contain a single page (4k on x86_64):podman run -t -i --rm --shm-size=4096 fedora
Inside this container, we can install basic tools like this:
dnf install -y vim gcc gdb
and we can check the/dev/shm
size:Now compiling this simple test program:
We can try to run it with and without fallocate, for 4096 (always work) and 4097 (fails) bytes:
The first core indeed cores when trying to
memset
the allocated memory to zero, checking with gdb:Cheers,
Romain