Skip to content

emmalloc: Fix allocations of high alignment #20704

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Nov 22, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 19 additions & 6 deletions system/lib/emmalloc.c
Original file line number Diff line number Diff line change
Expand Up @@ -671,12 +671,9 @@ static void *attempt_allocate(Region *freeRegion, size_t alignment, size_t size)

static size_t validate_alloc_alignment(size_t alignment)
{
// Cannot perform allocations that are less than 4 byte aligned, because the Region
// control structures need to be aligned. Also round up to minimum outputted alignment.
alignment = MAX(alignment, MALLOC_ALIGNMENT);
// Arbitrary upper limit on alignment - very likely a programming bug if alignment is higher than this.
assert(alignment <= 1024*1024);
return alignment;
// Cannot perform allocations that are less our minimal alignment, because
// the Region control structures need to be aligned themselves.
return MAX(alignment, MALLOC_ALIGNMENT);
}

static size_t validate_alloc_size(size_t size)
Expand Down Expand Up @@ -808,6 +805,22 @@ static void *allocate_memory(size_t alignment, size_t size)

// We were unable to find a free memory region. Must sbrk() in more memory!
size_t numBytesToClaim = size+sizeof(Region)*3;
// Take into account the alignment as well. For typical alignment we don't
// need to add anything here (so we do nothing if the alignment is equal to
// MALLOC_ALIGNMENT), but it can matter if the alignment is very high. In that
// case, not adding the alignment can lead to this sbrk not giving us enough
// (in which case, the next attempt fails and will sbrk the same amount again,
// potentially allocating a lot more memory than necessary).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand the status quo.. does did this work at all without this change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It worked, but it would do the recursion in the earlier code twice. So it allocated a lot more than it needed, until it finally found enough room for the right size with the right allocation.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't iteratively doing this actually use less memory in some cases? i.e. you could get lucky and get a good alignmenty without quite as much over allocation?

With this change are are guaranteed to waste alignment bytes on each allocation aren't we? In the mimalloc case that would be ~4Mb for each allocation. Or we we return the over-allocated chunk to the free list?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we potentially reuse the extra space, Here is where the early unused space is added to the region before us:

// Before we proceed further, fix up the boundary between this and the preceding region,
// so that the boundary between the two regions happens at a right spot for the payload to be aligned.
if (payloadStartPtr != payloadStartPtrAligned)
{
Region *prevRegion = prev_region((Region*)freeRegion);
// We never have two free regions adjacent to each other, so the region before this free
// region should be in use.
assert(region_is_in_use(prevRegion));
size_t regionBoundaryBumpAmount = payloadStartPtrAligned - payloadStartPtr;
size_t newThisRegionSize = freeRegion->size - regionBoundaryBumpAmount;
create_used_region(prevRegion, prevRegion->size + regionBoundaryBumpAmount);
freeRegion = (Region *)((uint8_t*)freeRegion + regionBoundaryBumpAmount);
freeRegion->size = newThisRegionSize;
}
However, that previous region is in use, so this means that region uses more than it needs. It might end up using that if it reallocs, or it would be freed entirely if it is freed, but this could be better, that's true.

Overall emmalloc's goal is to be compact and not the most optimal allocator, so I think this is a reasonable compromise. It's true that we might get lucky without this, as you said, and just happen to be aligned, but in large alignments that is less and less likely, and it's better to do the more straightforward and less risky thing.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this we should wait for @juj to weigh in there.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Talking offline, @juj might not be available to review this for a bit and suggested it land for now if it is urgent, and be reviewed later. I think that makes sense as I'd like to get this in to avoid mimalloc using 2x the memory it needs to, in the first release with mimalloc. @sbc100 does that sound ok?

//
// Note that this is not necessarily optimal, as the extra allocation size for
// the alignment might not be needed (if we are lucky and already aligned),
// and even if it helps us allocate it will not immediately be ready for reuse
// (as it will be added to the currently-in-use region before us, so it is not
// in a free list). As a compromise however it seems reasonable in practice as
// a way to handle large aligned regions to avoid even worse waste.
if (alignment > MALLOC_ALIGNMENT) {
numBytesToClaim += alignment;
}
assert(numBytesToClaim > size); // 32-bit wraparound should not happen here, allocation size has been validated above!
bool success = claim_more_memory(numBytesToClaim);
if (success)
Expand Down
8 changes: 4 additions & 4 deletions test/code_size/embind_val_wasm.json
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@
"a.html.gz": 431,
"a.js": 7498,
"a.js.gz": 3142,
"a.wasm": 11533,
"a.wasm.gz": 5767,
"total": 19704,
"total_gz": 9340
"a.wasm": 11515,
"a.wasm.gz": 5771,
"total": 19686,
"total_gz": 9344
}
8 changes: 4 additions & 4 deletions test/code_size/hello_wasm_worker_wasm.json
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@
"a.html.gz": 433,
"a.js": 667,
"a.js.gz": 458,
"a.wasm": 1863,
"a.wasm.gz": 1051,
"total": 3267,
"total_gz": 1942
"a.wasm": 1858,
"a.wasm.gz": 1058,
"total": 3262,
"total_gz": 1949
}
8 changes: 4 additions & 4 deletions test/code_size/hello_webgl2_wasm.json
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@
"a.html.gz": 379,
"a.js": 4699,
"a.js.gz": 2419,
"a.wasm": 10475,
"a.wasm.gz": 6710,
"total": 15743,
"total_gz": 9508
"a.wasm": 10468,
"a.wasm.gz": 6719,
"total": 15736,
"total_gz": 9517
}
8 changes: 4 additions & 4 deletions test/code_size/hello_webgl2_wasm2js.json
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
{
"a.html": 567,
"a.html.gz": 379,
"a.js": 17921,
"a.js.gz": 8079,
"a.js": 17841,
"a.js.gz": 8088,
"a.mem": 3171,
"a.mem.gz": 2713,
"total": 21659,
"total_gz": 11171
"total": 21579,
"total_gz": 11180
}
8 changes: 4 additions & 4 deletions test/code_size/hello_webgl_wasm.json
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@
"a.html.gz": 379,
"a.js": 4185,
"a.js.gz": 2243,
"a.wasm": 10475,
"a.wasm.gz": 6710,
"total": 15229,
"total_gz": 9332
"a.wasm": 10468,
"a.wasm.gz": 6719,
"total": 15222,
"total_gz": 9341
}
8 changes: 4 additions & 4 deletions test/code_size/hello_webgl_wasm2js.json
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
{
"a.html": 567,
"a.html.gz": 379,
"a.js": 17399,
"a.js.gz": 7910,
"a.js": 17319,
"a.js.gz": 7908,
"a.mem": 3171,
"a.mem.gz": 2713,
"total": 21137,
"total_gz": 11002
"total": 21057,
"total_gz": 11000
}
32 changes: 32 additions & 0 deletions test/other/test_emmalloc_high_align.c
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
#include <assert.h>
#include <emscripten/console.h>
#include <stdlib.h>
#include <unistd.h>

size_t sizeT(void* p) {
return (size_t)p;
}

int main() {
const size_t MB = 1024 * 1024;
const size_t ALIGN = 4 * MB;
const size_t SIZE = 32 * MB;

// Allocate a very large chunk of memory (32MB) with very high alignment (4
// MB). This is similar to what mimalloc does in practice.
void* before = sbrk(0);
void* p = aligned_alloc(ALIGN, SIZE);
void* after = sbrk(0);
emscripten_console_logf("before: %p after: %p p: %p\n", before, after, p);

// The allocation must be properly aligned.
assert(sizeT(p) % ALIGN == 0);

// We should only have sbrk'd a reasonable amount (this is a regression test
// for #20645 where we sbrk'd double the necessary amount). We expect at most
// 36 MB (the size of the allocation plus the alignment) plus a few bytes of
// general overhead.
assert(sizeT(after) - sizeT(before) < ALIGN + SIZE + 100);

emscripten_console_log("success");
}
1 change: 1 addition & 0 deletions test/other/test_emmalloc_high_align.out
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
success
4 changes: 4 additions & 0 deletions test/test_other.py
Original file line number Diff line number Diff line change
Expand Up @@ -7494,6 +7494,10 @@ def test(args, text=None):
test(['-sALLOW_MEMORY_GROWTH', '-sMAXIMUM_MEMORY=1GB'])
test(['-sALLOW_MEMORY_GROWTH', '-sMAXIMUM_MEMORY=4GB'])

def test_emmalloc_high_align(self):
self.do_other_test('test_emmalloc_high_align.c',
emcc_args=['-sMALLOC=emmalloc', '-sINITIAL_MEMORY=128MB'])

def test_2GB_plus(self):
# when the heap size can be over 2GB, we rewrite pointers to be unsigned
def test(page_diff):
Expand Down