Skip to content

emmalloc: Fix allocations of high alignment #20704

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Nov 22, 2023
Merged

emmalloc: Fix allocations of high alignment #20704

merged 13 commits into from
Nov 22, 2023

Conversation

kripken
Copy link
Member

@kripken kripken commented Nov 14, 2023

Take the alignment into account when requesting new memory from the OS. If the
alignment is high then we need to ask for enough to ensure we end up aligned, or
else we can end up allocating double the memory we need (see #20645).

This only changes the amount we allocate from the OS if the alignment is non-
standard, that is, this is NFC for normal calls to malloc.

Also remove an assertion that limited the maximum alignment. mimalloc wants
4 MB alignment.

Fixes #20654

@kripken kripken requested review from juj and sbc100 November 14, 2023 00:05
@@ -657,8 +657,6 @@ static size_t validate_alloc_alignment(size_t alignment)
// Cannot perform allocations that are less than 4 byte aligned, because the Region
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps replace 4 byte with MALLOC_ALIGNMENT here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually think this might be accurate, as in wasm32 those data structures contain 32-bit pointers, and so they need 4-byte alignment (to avoid being unaligned). At least that's how I interpret the comment?

Copy link
Collaborator

@sbc100 sbc100 Nov 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somehow the code and comments have become out of sync then.

Why not just say "Also round up to minimum outputted alignment." since that is the only thing the code is doing. It is not specifically handling 4-byte alignment.

@@ -657,8 +657,6 @@ static size_t validate_alloc_alignment(size_t alignment)
// Cannot perform allocations that are less than 4 byte aligned, because the Region
// control structures need to be aligned. Also round up to minimum outputted alignment.
alignment = MAX(alignment, MALLOC_ALIGNMENT);
// Arbitrary upper limit on alignment - very likely a programming bug if alignment is higher than this.
assert(alignment <= 1024*1024);
return alignment;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just return MAX(alignment, MALLOC_ALIGNMENT);?

// MALLOC_ALIGNMENT), but it can matter if the alignment is very high. In that
// case, not adding the alignment can lead to this sbrk not giving us enough
// (in which case, the next attempt fails and will sbrk the same amount again,
// potentially allocating a lot more memory than necessary).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand the status quo.. does did this work at all without this change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It worked, but it would do the recursion in the earlier code twice. So it allocated a lot more than it needed, until it finally found enough room for the right size with the right allocation.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't iteratively doing this actually use less memory in some cases? i.e. you could get lucky and get a good alignmenty without quite as much over allocation?

With this change are are guaranteed to waste alignment bytes on each allocation aren't we? In the mimalloc case that would be ~4Mb for each allocation. Or we we return the over-allocated chunk to the free list?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we potentially reuse the extra space, Here is where the early unused space is added to the region before us:

// Before we proceed further, fix up the boundary between this and the preceding region,
// so that the boundary between the two regions happens at a right spot for the payload to be aligned.
if (payloadStartPtr != payloadStartPtrAligned)
{
Region *prevRegion = prev_region((Region*)freeRegion);
// We never have two free regions adjacent to each other, so the region before this free
// region should be in use.
assert(region_is_in_use(prevRegion));
size_t regionBoundaryBumpAmount = payloadStartPtrAligned - payloadStartPtr;
size_t newThisRegionSize = freeRegion->size - regionBoundaryBumpAmount;
create_used_region(prevRegion, prevRegion->size + regionBoundaryBumpAmount);
freeRegion = (Region *)((uint8_t*)freeRegion + regionBoundaryBumpAmount);
freeRegion->size = newThisRegionSize;
}
However, that previous region is in use, so this means that region uses more than it needs. It might end up using that if it reallocs, or it would be freed entirely if it is freed, but this could be better, that's true.

Overall emmalloc's goal is to be compact and not the most optimal allocator, so I think this is a reasonable compromise. It's true that we might get lucky without this, as you said, and just happen to be aligned, but in large alignments that is less and less likely, and it's better to do the more straightforward and less risky thing.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this we should wait for @juj to weigh in there.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Talking offline, @juj might not be available to review this for a bit and suggested it land for now if it is urgent, and be reviewed later. I think that makes sense as I'd like to get this in to avoid mimalloc using 2x the memory it needs to, in the first release with mimalloc. @sbc100 does that sound ok?

// case, not adding the alignment can lead to this sbrk not giving us enough
// (in which case, the next attempt fails and will sbrk the same amount again,
// potentially allocating a lot more memory than necessary).
numBytesToClaim += alignment - MALLOC_ALIGNMENT;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So alignment - MALLOC_ALIGNMENT is the most we would ever need to add to the resulting allocation to achieve and an alignment of alignment?

Copy link
Member Author

@kripken kripken Nov 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so, yes. What I am not certain of is whether the 3* figure a few lines above this include slack or if they are precise, which could matter here.

Another way to go could is if alignment > malloc alignment then add the alignment?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think doing it that way would make the code more straight forwardly correct to a reader.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, done.

@kripken
Copy link
Member Author

kripken commented Nov 20, 2023

Code size changes here appear on main as well, so are unrelated.

@sbc100
Copy link
Collaborator

sbc100 commented Nov 20, 2023

I'll do a separate codesize update on the main branch if thats ok with you?

I do these from time to time and push them directly to avoid mixing code size change with unrelated changes like this one.

@kripken
Copy link
Member Author

kripken commented Nov 20, 2023

Makes sense, sounds good to update those expectations separately. I see you've done that already.

@kripken
Copy link
Member Author

kripken commented Nov 20, 2023

Tests failed here in an interesting way temporarily:

  • The old code (before this PR) worked, which does not allocate more free space based on alignment.
  • The typo code failed. The typo was to add +8 for alignment rather than the asked alignment.
  • The correct code worked. The correct code is to ask for +32 in the failing testcase, as that is what memalign is called with for alignment.

It seems odd that +0 worked while +8 failed and then +32 worked - the one in the middle is different from the sides. It turns out that:

  • In the old code we indeed did not allocate enough the first time, so we allocated a second time. That second allocation is not on top of the previous one, it stands by itself. That is, we allocate again, of the same size, and try to find enough room in that region. That second one happens to end up aligned, so it worked! But in general it doesn't have to.
  • In the typo code we hit an infinite loop: Adding +8 was just right to make the alignment stay the same all the time, that is, it never fell into the lucky spot the old code did. Eventually we'd OOM.
  • The correct code asked for enough the first time and succeeded, as expected.

Based on that, I think this PR is actually a bugfix.

However, I am confused as to why we do not combine regions after a first failed allocation - that seems wrong, so it's either a bug or a misunderstanding of mine. Still looking into that.

@kripken
Copy link
Member Author

kripken commented Nov 21, 2023

#20759 fixes that issue: indeed an emmalloc bug caused us to fail to combined adjacent regions, due to an sbrk alignment issue. With that PR, the test passes in the middle case (+8 from the last comment) as well. That bug is independent of this, however, so neither blocks the other.

kripken added a commit that referenced this pull request Nov 21, 2023
If we give it an unaligned size, then it will align it, and then memory will
seem non-contiguous. That is, if we sbrk 4 bytes from address 0 then it
returns 0 (the start of the allocation), and logically we reserved the range
0-4. Our next sbrk of any amount will return 8, because sbrk aligns up the
address to multiples of 8. So it seems like we have a region 0-4 and another
8-. Because of this emmalloc would generate separate root regions for
them, that is, they would not get merged because of the 4 bytes of dead
space in the middle.

With this PR, the amount of new regions created in
test_emmalloc_memvalidate_verbose goes from 1311 to 1234. I'm not sure
if it's worth testing that, as it would make updating the test difficult, and this
is such an internal detail of emmalloc.

Also add some verbose logging that helps debug this.

Noticed in #20704 (comment)
@kripken kripken merged commit 990b754 into main Nov 22, 2023
@kripken kripken deleted the emmalloc.high-align branch November 22, 2023 16:44
@juj juj mentioned this pull request Nov 27, 2023
br0nk pushed a commit to br0nk/emscripten that referenced this pull request Nov 29, 2023
If we give it an unaligned size, then it will align it, and then memory will
seem non-contiguous. That is, if we sbrk 4 bytes from address 0 then it
returns 0 (the start of the allocation), and logically we reserved the range
0-4. Our next sbrk of any amount will return 8, because sbrk aligns up the
address to multiples of 8. So it seems like we have a region 0-4 and another
8-. Because of this emmalloc would generate separate root regions for
them, that is, they would not get merged because of the 4 bytes of dead
space in the middle.

With this PR, the amount of new regions created in
test_emmalloc_memvalidate_verbose goes from 1311 to 1234. I'm not sure
if it's worth testing that, as it would make updating the test difficult, and this
is such an internal detail of emmalloc.

Also add some verbose logging that helps debug this.

Noticed in emscripten-core#20704 (comment)
br0nk pushed a commit to br0nk/emscripten that referenced this pull request Nov 29, 2023
Take the alignment into account when requesting new memory from the OS. If the
alignment is high then we need to ask for enough to ensure we end up aligned, or
else we can end up allocating double the memory we need (see emscripten-core#20645).

This only changes the amount we allocate from the OS if the alignment is non-
standard, that is, this is NFC for normal calls to malloc.

Also remove an assertion that limited the maximum alignment. mimalloc wants
4 MB alignment.

Fixes emscripten-core#20654
kripken added a commit that referenced this pull request May 7, 2024
#20704 removed the maximum alignment limit from emmalloc, so we no longer
need it in the mimalloc port. Without it, we can properly use the 4GB-aligned pages
that mimalloc requests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants