emmalloc: Fix allocations of high alignment #20704

kripken · 2023-11-14T00:05:25Z

Take the alignment into account when requesting new memory from the OS. If the
alignment is high then we need to ask for enough to ensure we end up aligned, or
else we can end up allocating double the memory we need (see #20645).

This only changes the amount we allocate from the OS if the alignment is non-
standard, that is, this is NFC for normal calls to malloc.

Also remove an assertion that limited the maximum alignment. mimalloc wants
4 MB alignment.

Fixes #20654

sbc100 · 2023-11-14T00:30:05Z

system/lib/emmalloc.c

@@ -657,8 +657,6 @@ static size_t validate_alloc_alignment(size_t alignment)
  // Cannot perform allocations that are less than 4 byte aligned, because the Region


Perhaps replace 4 byte with MALLOC_ALIGNMENT here

I actually think this might be accurate, as in wasm32 those data structures contain 32-bit pointers, and so they need 4-byte alignment (to avoid being unaligned). At least that's how I interpret the comment?

Somehow the code and comments have become out of sync then.

Why not just say "Also round up to minimum outputted alignment." since that is the only thing the code is doing. It is not specifically handling 4-byte alignment.

sbc100 · 2023-11-14T00:30:19Z

system/lib/emmalloc.c

@@ -657,8 +657,6 @@ static size_t validate_alloc_alignment(size_t alignment)
  // Cannot perform allocations that are less than 4 byte aligned, because the Region
  // control structures need to be aligned. Also round up to minimum outputted alignment.
  alignment = MAX(alignment, MALLOC_ALIGNMENT);
-  // Arbitrary upper limit on alignment - very likely a programming bug if alignment is higher than this.
-  assert(alignment <= 1024*1024);
  return alignment;


Just return MAX(alignment, MALLOC_ALIGNMENT);?

sbc100 · 2023-11-14T00:31:50Z

system/lib/emmalloc.c

+  // MALLOC_ALIGNMENT), but it can matter if the alignment is very high. In that
+  // case, not adding the alignment can lead to this sbrk not giving us enough
+  // (in which case, the next attempt fails and will sbrk the same amount again,
+  // potentially allocating a lot more memory than necessary).


I'm not sure I understand the status quo.. does did this work at all without this change?

It worked, but it would do the recursion in the earlier code twice. So it allocated a lot more than it needed, until it finally found enough room for the right size with the right allocation.

Wouldn't iteratively doing this actually use less memory in some cases? i.e. you could get lucky and get a good alignmenty without quite as much over allocation?

With this change are are guaranteed to waste alignment bytes on each allocation aren't we? In the mimalloc case that would be ~4Mb for each allocation. Or we we return the over-allocated chunk to the free list?

I believe we potentially reuse the extra space, Here is where the early unused space is added to the region before us:

emscripten/system/lib/emmalloc.c

Lines 604 to 617 in 87b24d7

// Before we proceed further, fix up the boundary between this and the preceding region,

// so that the boundary between the two regions happens at a right spot for the payload to be aligned.

if (payloadStartPtr != payloadStartPtrAligned)

{

Region *prevRegion = prev_region((Region*)freeRegion);

// We never have two free regions adjacent to each other, so the region before this free

// region should be in use.

assert(region_is_in_use(prevRegion));

size_t regionBoundaryBumpAmount = payloadStartPtrAligned - payloadStartPtr;

size_t newThisRegionSize = freeRegion->size - regionBoundaryBumpAmount;

create_used_region(prevRegion, prevRegion->size + regionBoundaryBumpAmount);

freeRegion = (Region *)((uint8_t*)freeRegion + regionBoundaryBumpAmount);

freeRegion->size = newThisRegionSize;

}

However, that previous region is in use, so this means that region uses more than it needs. It might end up using that if it reallocs, or it would be freed entirely if it is freed, but this could be better, that's true.

Overall emmalloc's goal is to be compact and not the most optimal allocator, so I think this is a reasonable compromise. It's true that we might get lucky without this, as you said, and just happen to be aligned, but in large alignments that is less and less likely, and it's better to do the more straightforward and less risky thing.

I think this we should wait for @juj to weigh in there.

Talking offline, @juj might not be available to review this for a bit and suggested it land for now if it is urgent, and be reviewed later. I think that makes sense as I'd like to get this in to avoid mimalloc using 2x the memory it needs to, in the first release with mimalloc. @sbc100 does that sound ok?

sbc100 · 2023-11-14T00:32:40Z

system/lib/emmalloc.c

+  // case, not adding the alignment can lead to this sbrk not giving us enough
+  // (in which case, the next attempt fails and will sbrk the same amount again,
+  // potentially allocating a lot more memory than necessary).
+  numBytesToClaim += alignment - MALLOC_ALIGNMENT;


So alignment - MALLOC_ALIGNMENT is the most we would ever need to add to the resulting allocation to achieve and an alignment of alignment?

I think so, yes. What I am not certain of is whether the 3* figure a few lines above this include slack or if they are precise, which could matter here.

Another way to go could is if alignment > malloc alignment then add the alignment?

I think doing it that way would make the code more straight forwardly correct to a reader.

Makes sense, done.

kripken · 2023-11-20T19:49:58Z

Code size changes here appear on main as well, so are unrelated.

sbc100 · 2023-11-20T19:53:14Z

I'll do a separate codesize update on the main branch if thats ok with you?

I do these from time to time and push them directly to avoid mixing code size change with unrelated changes like this one.

kripken · 2023-11-20T21:56:26Z

Makes sense, sounds good to update those expectations separately. I see you've done that already.

kripken · 2023-11-20T22:37:12Z

Tests failed here in an interesting way temporarily:

The old code (before this PR) worked, which does not allocate more free space based on alignment.
The typo code failed. The typo was to add +8 for alignment rather than the asked alignment.
The correct code worked. The correct code is to ask for +32 in the failing testcase, as that is what memalign is called with for alignment.

It seems odd that +0 worked while +8 failed and then +32 worked - the one in the middle is different from the sides. It turns out that:

In the old code we indeed did not allocate enough the first time, so we allocated a second time. That second allocation is not on top of the previous one, it stands by itself. That is, we allocate again, of the same size, and try to find enough room in that region. That second one happens to end up aligned, so it worked! But in general it doesn't have to.
In the typo code we hit an infinite loop: Adding +8 was just right to make the alignment stay the same all the time, that is, it never fell into the lucky spot the old code did. Eventually we'd OOM.
The correct code asked for enough the first time and succeeded, as expected.

Based on that, I think this PR is actually a bugfix.

However, I am confused as to why we do not combine regions after a first failed allocation - that seems wrong, so it's either a bug or a misunderstanding of mine. Still looking into that.

kripken · 2023-11-21T00:30:38Z

#20759 fixes that issue: indeed an emmalloc bug caused us to fail to combined adjacent regions, due to an sbrk alignment issue. With that PR, the test passes in the middle case (+8 from the last comment) as well. That bug is independent of this, however, so neither blocks the other.

If we give it an unaligned size, then it will align it, and then memory will seem non-contiguous. That is, if we sbrk 4 bytes from address 0 then it returns 0 (the start of the allocation), and logically we reserved the range 0-4. Our next sbrk of any amount will return 8, because sbrk aligns up the address to multiples of 8. So it seems like we have a region 0-4 and another 8-. Because of this emmalloc would generate separate root regions for them, that is, they would not get merged because of the 4 bytes of dead space in the middle. With this PR, the amount of new regions created in test_emmalloc_memvalidate_verbose goes from 1311 to 1234. I'm not sure if it's worth testing that, as it would make updating the test difficult, and this is such an internal detail of emmalloc. Also add some verbose logging that helps debug this. Noticed in #20704 (comment)

If we give it an unaligned size, then it will align it, and then memory will seem non-contiguous. That is, if we sbrk 4 bytes from address 0 then it returns 0 (the start of the allocation), and logically we reserved the range 0-4. Our next sbrk of any amount will return 8, because sbrk aligns up the address to multiples of 8. So it seems like we have a region 0-4 and another 8-. Because of this emmalloc would generate separate root regions for them, that is, they would not get merged because of the 4 bytes of dead space in the middle. With this PR, the amount of new regions created in test_emmalloc_memvalidate_verbose goes from 1311 to 1234. I'm not sure if it's worth testing that, as it would make updating the test difficult, and this is such an internal detail of emmalloc. Also add some verbose logging that helps debug this. Noticed in emscripten-core#20704 (comment)

Take the alignment into account when requesting new memory from the OS. If the alignment is high then we need to ask for enough to ensure we end up aligned, or else we can end up allocating double the memory we need (see emscripten-core#20645). This only changes the amount we allocate from the OS if the alignment is non- standard, that is, this is NFC for normal calls to malloc. Also remove an assertion that limited the maximum alignment. mimalloc wants 4 MB alignment. Fixes emscripten-core#20654

#20704 removed the maximum alignment limit from emmalloc, so we no longer need it in the mimalloc port. Without it, we can properly use the 4GB-aligned pages that mimalloc requests.

kripken added 2 commits November 13, 2023 15:57

fix

73b1eda

fix

cf5cb09

kripken requested review from juj and sbc100 November 14, 2023 00:05

kripken added 2 commits November 13, 2023 16:18

Merge remote-tracking branch 'origin/main' into emmalloc.high-align

5243c07

code size improvements

87b24d7

sbc100 reviewed Nov 14, 2023

View reviewed changes

kripken added 3 commits November 13, 2023 16:38

simplify

dfa2019

feedback

a6c39ad

feedback

c6aa7dc

kripken mentioned this pull request Nov 15, 2023

Add a port of mimalloc, a fast and scalable multithreaded allocator #20651

Merged

Merge remote-tracking branch 'origin/main' into emmalloc.high-align

630dfd1

oops

c295958

kripken added 2 commits November 20, 2023 14:27

Merge remote-tracking branch 'origin/main' into emmalloc.high-align

376ab0d

fix

3c800ab

rebaseline

1caf4c1

kripken mentioned this pull request Nov 21, 2023

emmalloc: Ask sbrk for aligned sizes #20759

Merged

update

4531bdd

sbc100 approved these changes Nov 22, 2023

View reviewed changes

kripken merged commit 990b754 into main Nov 22, 2023

kripken deleted the emmalloc.high-align branch November 22, 2023 16:44

juj mentioned this pull request Nov 27, 2023

sbrk_aligned() #20785

Closed

englercj mentioned this pull request Jan 15, 2024

emmalloc wastes memory with high alignments #20645

Closed

kripken mentioned this pull request Feb 6, 2024

emmalloc failing to allocate memory whereas dlmalloc can successfully allocate #21277

Closed

kripken mentioned this pull request Apr 25, 2024

Add an Emscripten/wasm implementation of primitives microsoft/mimalloc#822

Closed

This was referenced May 7, 2024

[mimalloc] Remove no-longer-needed limit on emmalloc alignment #21905

Merged

Emscripten: Remove no-longer-needed minimum emmalloc alignment microsoft/mimalloc#891

Merged

		@@ -657,8 +657,6 @@ static size_t validate_alloc_alignment(size_t alignment)
		// Cannot perform allocations that are less than 4 byte aligned, because the Region

	// Before we proceed further, fix up the boundary between this and the preceding region,
	// so that the boundary between the two regions happens at a right spot for the payload to be aligned.
	if (payloadStartPtr != payloadStartPtrAligned)
	{
	Region prevRegion = prev_region((Region)freeRegion);
	// We never have two free regions adjacent to each other, so the region before this free
	// region should be in use.
	assert(region_is_in_use(prevRegion));
	size_t regionBoundaryBumpAmount = payloadStartPtrAligned - payloadStartPtr;
	size_t newThisRegionSize = freeRegion->size - regionBoundaryBumpAmount;
	create_used_region(prevRegion, prevRegion->size + regionBoundaryBumpAmount);
	freeRegion = (Region )((uint8_t)freeRegion + regionBoundaryBumpAmount);
	freeRegion->size = newThisRegionSize;
	}

emmalloc: Fix allocations of high alignment #20704

emmalloc: Fix allocations of high alignment #20704

Uh oh!

Conversation

kripken commented Nov 14, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sbc100 Nov 14, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kripken Nov 14, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kripken commented Nov 20, 2023

Uh oh!

sbc100 commented Nov 20, 2023

Uh oh!

kripken commented Nov 20, 2023

Uh oh!

kripken commented Nov 20, 2023

Uh oh!

kripken commented Nov 21, 2023

Uh oh!

Uh oh!

sbc100 Nov 14, 2023 •

edited

Loading

kripken Nov 14, 2023 •

edited

Loading