-
Notifications
You must be signed in to change notification settings - Fork 3.4k
emmalloc: Fix allocations of high alignment #20704
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
system/lib/emmalloc.c
Outdated
@@ -657,8 +657,6 @@ static size_t validate_alloc_alignment(size_t alignment) | |||
// Cannot perform allocations that are less than 4 byte aligned, because the Region |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps replace 4 byte
with MALLOC_ALIGNMENT
here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually think this might be accurate, as in wasm32 those data structures contain 32-bit pointers, and so they need 4-byte alignment (to avoid being unaligned). At least that's how I interpret the comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somehow the code and comments have become out of sync then.
Why not just say "Also round up to minimum outputted alignment." since that is the only thing the code is doing. It is not specifically handling 4-byte alignment.
system/lib/emmalloc.c
Outdated
@@ -657,8 +657,6 @@ static size_t validate_alloc_alignment(size_t alignment) | |||
// Cannot perform allocations that are less than 4 byte aligned, because the Region | |||
// control structures need to be aligned. Also round up to minimum outputted alignment. | |||
alignment = MAX(alignment, MALLOC_ALIGNMENT); | |||
// Arbitrary upper limit on alignment - very likely a programming bug if alignment is higher than this. | |||
assert(alignment <= 1024*1024); | |||
return alignment; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just return MAX(alignment, MALLOC_ALIGNMENT);
?
// MALLOC_ALIGNMENT), but it can matter if the alignment is very high. In that | ||
// case, not adding the alignment can lead to this sbrk not giving us enough | ||
// (in which case, the next attempt fails and will sbrk the same amount again, | ||
// potentially allocating a lot more memory than necessary). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand the status quo.. does did this work at all without this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It worked, but it would do the recursion in the earlier code twice. So it allocated a lot more than it needed, until it finally found enough room for the right size with the right allocation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't iteratively doing this actually use less memory in some cases? i.e. you could get lucky and get a good alignmenty without quite as much over allocation?
With this change are are guaranteed to waste alignment bytes on each allocation aren't we? In the mimalloc case that would be ~4Mb for each allocation. Or we we return the over-allocated chunk to the free list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we potentially reuse the extra space, Here is where the early unused space is added to the region before us:
emscripten/system/lib/emmalloc.c
Lines 604 to 617 in 87b24d7
// Before we proceed further, fix up the boundary between this and the preceding region, | |
// so that the boundary between the two regions happens at a right spot for the payload to be aligned. | |
if (payloadStartPtr != payloadStartPtrAligned) | |
{ | |
Region *prevRegion = prev_region((Region*)freeRegion); | |
// We never have two free regions adjacent to each other, so the region before this free | |
// region should be in use. | |
assert(region_is_in_use(prevRegion)); | |
size_t regionBoundaryBumpAmount = payloadStartPtrAligned - payloadStartPtr; | |
size_t newThisRegionSize = freeRegion->size - regionBoundaryBumpAmount; | |
create_used_region(prevRegion, prevRegion->size + regionBoundaryBumpAmount); | |
freeRegion = (Region *)((uint8_t*)freeRegion + regionBoundaryBumpAmount); | |
freeRegion->size = newThisRegionSize; | |
} |
Overall emmalloc's goal is to be compact and not the most optimal allocator, so I think this is a reasonable compromise. It's true that we might get lucky without this, as you said, and just happen to be aligned, but in large alignments that is less and less likely, and it's better to do the more straightforward and less risky thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this we should wait for @juj to weigh in there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Talking offline, @juj might not be available to review this for a bit and suggested it land for now if it is urgent, and be reviewed later. I think that makes sense as I'd like to get this in to avoid mimalloc using 2x the memory it needs to, in the first release with mimalloc. @sbc100 does that sound ok?
system/lib/emmalloc.c
Outdated
// case, not adding the alignment can lead to this sbrk not giving us enough | ||
// (in which case, the next attempt fails and will sbrk the same amount again, | ||
// potentially allocating a lot more memory than necessary). | ||
numBytesToClaim += alignment - MALLOC_ALIGNMENT; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So alignment - MALLOC_ALIGNMENT
is the most we would ever need to add to the resulting allocation to achieve and an alignment of alignment
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so, yes. What I am not certain of is whether the 3*
figure a few lines above this include slack or if they are precise, which could matter here.
Another way to go could is if alignment > malloc alignment then add the alignment
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think doing it that way would make the code more straight forwardly correct to a reader.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, done.
Code size changes here appear on |
I'll do a separate codesize update on the main branch if thats ok with you? I do these from time to time and push them directly to avoid mixing code size change with unrelated changes like this one. |
Makes sense, sounds good to update those expectations separately. I see you've done that already. |
Tests failed here in an interesting way temporarily:
It seems odd that
Based on that, I think this PR is actually a bugfix. However, I am confused as to why we do not combine regions after a first failed allocation - that seems wrong, so it's either a bug or a misunderstanding of mine. Still looking into that. |
#20759 fixes that issue: indeed an emmalloc bug caused us to fail to combined adjacent regions, due to an sbrk alignment issue. With that PR, the test passes in the middle case ( |
If we give it an unaligned size, then it will align it, and then memory will seem non-contiguous. That is, if we sbrk 4 bytes from address 0 then it returns 0 (the start of the allocation), and logically we reserved the range 0-4. Our next sbrk of any amount will return 8, because sbrk aligns up the address to multiples of 8. So it seems like we have a region 0-4 and another 8-. Because of this emmalloc would generate separate root regions for them, that is, they would not get merged because of the 4 bytes of dead space in the middle. With this PR, the amount of new regions created in test_emmalloc_memvalidate_verbose goes from 1311 to 1234. I'm not sure if it's worth testing that, as it would make updating the test difficult, and this is such an internal detail of emmalloc. Also add some verbose logging that helps debug this. Noticed in #20704 (comment)
If we give it an unaligned size, then it will align it, and then memory will seem non-contiguous. That is, if we sbrk 4 bytes from address 0 then it returns 0 (the start of the allocation), and logically we reserved the range 0-4. Our next sbrk of any amount will return 8, because sbrk aligns up the address to multiples of 8. So it seems like we have a region 0-4 and another 8-. Because of this emmalloc would generate separate root regions for them, that is, they would not get merged because of the 4 bytes of dead space in the middle. With this PR, the amount of new regions created in test_emmalloc_memvalidate_verbose goes from 1311 to 1234. I'm not sure if it's worth testing that, as it would make updating the test difficult, and this is such an internal detail of emmalloc. Also add some verbose logging that helps debug this. Noticed in emscripten-core#20704 (comment)
Take the alignment into account when requesting new memory from the OS. If the alignment is high then we need to ask for enough to ensure we end up aligned, or else we can end up allocating double the memory we need (see emscripten-core#20645). This only changes the amount we allocate from the OS if the alignment is non- standard, that is, this is NFC for normal calls to malloc. Also remove an assertion that limited the maximum alignment. mimalloc wants 4 MB alignment. Fixes emscripten-core#20654
#20704 removed the maximum alignment limit from emmalloc, so we no longer need it in the mimalloc port. Without it, we can properly use the 4GB-aligned pages that mimalloc requests.
Take the alignment into account when requesting new memory from the OS. If the
alignment is high then we need to ask for enough to ensure we end up aligned, or
else we can end up allocating double the memory we need (see #20645).
This only changes the amount we allocate from the OS if the alignment is non-
standard, that is, this is NFC for normal calls to
malloc
.Also remove an assertion that limited the maximum alignment. mimalloc wants
4 MB alignment.
Fixes #20654