-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[scudo] Allow to resize allocation ring buffer #82683
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -174,7 +174,7 @@ class Allocator { | |
static_cast<uptr>(getFlags()->quarantine_size_kb << 10), | ||
static_cast<uptr>(getFlags()->thread_local_quarantine_size_kb << 10)); | ||
|
||
mapAndInitializeRingBuffer(); | ||
mapAndInitializeRingBuffer(getFlags()->allocation_ring_buffer_size); | ||
} | ||
|
||
void enableRingBuffer() { | ||
|
@@ -189,6 +189,8 @@ class Allocator { | |
RB->Depot->disable(); | ||
} | ||
|
||
bool resizeRingBuffer(int Size) { return mapAndInitializeRingBuffer(Size); } | ||
|
||
// Initialize the embedded GWP-ASan instance. Requires the main allocator to | ||
// be functional, best called from PostInitCallback. | ||
void initGwpAsan() { | ||
|
@@ -1535,11 +1537,15 @@ class Allocator { | |
RBEntryStart)[N]; | ||
} | ||
|
||
void mapAndInitializeRingBuffer() { | ||
if (getFlags()->allocation_ring_buffer_size <= 0) | ||
return; | ||
u32 AllocationRingBufferSize = | ||
static_cast<u32>(getFlags()->allocation_ring_buffer_size); | ||
bool mapAndInitializeRingBuffer(int RingBufferSize) { | ||
if (RingBufferSize < 0 || | ||
static_cast<unsigned int>(RingBufferSize) >= UINT32_MAX) | ||
return false; | ||
if (RingBufferSize == 0) { | ||
swapOutRingBuffer(nullptr); | ||
return true; | ||
} | ||
Comment on lines
+1544
to
+1547
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Now I feel like we may want to hint when TrackAllocationStacks is set but the ring buffer is nullptr. |
||
u32 AllocationRingBufferSize = static_cast<u32>(RingBufferSize); | ||
|
||
// We store alloc and free stacks for each entry. | ||
constexpr u32 kStacksPerRingBufferEntry = 2; | ||
|
@@ -1564,11 +1570,11 @@ class Allocator { | |
UINTPTR_MAX); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please move this to the class definition of StackDepot There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This doesn't belong to StackDepot, this belongs here to make sure that even with the biggest possible There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There's nothing that relates to any constexpr arguments or variables here. That's the constraint to the Besides, I'm not sure if the overflow is intended,
I guess the assertion will never be triggered. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If you read the comment, this assumes the max RingSize and TabSize, and the static_assert proves that we don't need to bother with overflow logic because it can never overflow There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. (except of course it doesn't work, because that will also overflow. let me remove it for now). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I mean, from the statements,
They are checking things independent to The expression results in a number larger than uint32_t but it'll wrap-around and the value will always be in the range of uint32_t. When it compares with UINTPTR_MAX, it's almost always true on 32 bit and always true on 64 bit. static_assert doesn't help with this. In addition, even the `StackDepot meets the requirement, you still can't allocate ring buffer with UINT32_MAX. This seems to be an unreasonable assertion. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just did a quick glance on this, another problem in that expression is that we may want to change it from
to
which change the middle "*" to "+". But still, the UINT32_MAX may not be a good boundary. |
||
|
||
if (AllocationRingBufferSize > kMaxU32Pow2 / kStacksPerRingBufferEntry) | ||
return; | ||
return false; | ||
u32 TabSize = static_cast<u32>(roundUpPowerOfTwo(kStacksPerRingBufferEntry * | ||
AllocationRingBufferSize)); | ||
if (TabSize > UINT32_MAX / kFramesPerStack) | ||
return; | ||
return false; | ||
u32 RingSize = static_cast<u32>(TabSize * kFramesPerStack); | ||
DCHECK(isPowerOfTwo(RingSize)); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This should be a real check There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure (seems like clang knows to optimize this away anyway https://godbolt.org/z/GM6sKM5YE): #84255 |
||
|
||
|
@@ -1594,12 +1600,33 @@ class Allocator { | |
RB->StackDepotSize = StackDepotSize; | ||
RB->RawStackDepotMap = DepotMap; | ||
|
||
atomic_store(&RingBufferAddress, reinterpret_cast<uptr>(RB), | ||
memory_order_release); | ||
swapOutRingBuffer(RB); | ||
static_assert(sizeof(AllocationRingBuffer) % | ||
alignof(typename AllocationRingBuffer::Entry) == | ||
0, | ||
"invalid alignment"); | ||
Comment on lines
1604
to
1607
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please move this to the class definition of AllocationRingBuffer There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
return true; | ||
} | ||
|
||
void swapOutRingBuffer(AllocationRingBuffer *NewRB) { | ||
// To allow resizeRingBuffer to be called in a multi-threaded context by apps, | ||
// we do not actually unmap, but only madvise(DONTNEED) the pages. That way, | ||
// straggler threads will not crash. | ||
Comment on lines
+1612
to
+1614
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is weird. We are leaking memory because we never unmap the old AllocationRingBuffer (and we just leak the old mapping). I think the right way to do this is that we need to disable the AllocationRingBuffer (Like how we disable the allocator) and do all the transition works (map new buffer, copy the content, .etc) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We don't leak the memory, we DONTNEED it so the kernel releases the actual memory. We only leak a few pages in the case where other threads touch the pages after we unmapped it. We do leak the mapping, that is correct. We would need to lock protect all the lock-free operations in order to disable and re-enable and make sure no thread currently has operations in progress. In the case where we know we are single-threaded, we can unmap the buffers afterwards. This is to support the case where apps want to change the buffer (I forwarded you an email on why they might want that). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In addition, keeping the old buffer doesn't prevent the potential crash. Any access to the old buffer can cause problem. Like if some thread tries to read the data from the old buffer which has been cleaned. Not to say the DONTNEED may happen at the same time some thread is still accessing it, which seems to be some undefined behavior. And yes, the buffer swap is complicated under current design (Thanks for the email BTW). One easy way in my mind is, we only reset the available size of ring buffer and the ring buffer will be initialized with some value is big enough for all of their cases. As a result, in most cases, they still use small memory for the ring buffer. I suppose we have the assumption that once we reset the available size, all the writes happened or happening will be viewed as staled (i.e., the buffer contains zero element). This will avoid the need of synchronizing the writing position and the available buffer size. I'm not sure if we need to do some changes here and there in Android. What do you think? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
That is not possible with the current design of the StackDepot as a hash table. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'll take some time to review StackDepot and share the thought here later There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would think something like for apps don't need to change the buffer size, they adopt the default behavior. Otherwise, they need to set the buffer size and call Before the app starts, I think there some places we can call We need some changes for native processes as well (like a different place to call Overall, I would guess it's still doable but may require some additional works in Android. I can ask around to see other's opinions. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. An app developer explicitly asked for this to be a mallopt, not a manifest attribute. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This DONTNEED usage looks safe to me. A trailing access with either repopulate the page or substitute a zero page, according to the spec. Not completely sure, but the latter might be the only actual possibility in Linux. In any case, the worst that may happen is we waste (leak) a page or two. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The case I'm concerning is like, one thread is writing to the old buffer and another thread is calling DONTNEED on the old buffer. Both are writing ("either repopulate the page or substitute a zero page" is still a writing) to the same memory and they are not atomic operations. This is why I think it's an undefined behavior. About the wasting (leaking) pages, right, this is another concern I have. Even it's only for debugging, intentionally leaking memory in a memory allocator seems weird to me. In the perspective of maintenance, it's confusing people who are not familiar with this. I know it's unlikely to cause problem but I would prefer considering any approaches that are safer and reasonable. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree that the spec says that the data is either repopulated or reads back as zero. However, if multiple threads are accessing this at the same time, I would presume that the repopulate or read back zero is on a page by page basis. Is there something in the kernel that makes the operation atomic for the entire mapped data? The kernel might make this atomic by preventing all interrupts while executing the madvise. Honestly, if this is only really used for debugging purposes, it might be better to simply let it leak away and skip the madvise completely. Maybe even log that you are leaking away the old data. Even the data going to zero requires that anyone ever touching this code has to remember that is a possibility, and might result in undetected bugs in the future. The best way would have been to copy the data for callers rather than having pointers to it. But since this is really a debugging item, a simpler interface with some known issues is probably fine. |
||
AllocationRingBuffer *PrevRB = reinterpret_cast<AllocationRingBuffer *>( | ||
atomic_exchange(&RingBufferAddress, reinterpret_cast<uptr>(NewRB), | ||
memory_order_acq_rel)); | ||
if (PrevRB) { | ||
auto RawStackDepotMap = PrevRB->RawStackDepotMap; | ||
if (RawStackDepotMap.isAllocated()) { | ||
RawStackDepotMap.releaseAndZeroPagesToOS( | ||
RawStackDepotMap.getBase(), RawStackDepotMap.getCapacity()); | ||
} | ||
auto RawRingBufferMap = PrevRB->RawRingBufferMap; | ||
if (RawRingBufferMap.isAllocated()) { | ||
RawRingBufferMap.releaseAndZeroPagesToOS( | ||
RawRingBufferMap.getBase(), RawRingBufferMap.getCapacity()); | ||
} | ||
} | ||
} | ||
|
||
void unmapRingBuffer() { | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -123,6 +123,8 @@ size_t __scudo_get_region_info_size(void); | |
const char *__scudo_get_ring_buffer_addr(void); | ||
size_t __scudo_get_ring_buffer_size(void); | ||
|
||
bool __scudo_resize_ring_buffer(int); | ||
|
||
Comment on lines
+126
to
+127
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I barely remember that we had discussion over the email about how we will implement this. I think it'll be set through mallopt, right? Besides, I'm not sure if we have agreed to implement it like this. For example, can we just grow the buffer when it meets certain condition? Or can't we use the env variable to set the size before the start of debugging app? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, this will be used in the mallopt, but we need to do additional stuff in libc (see the AOSP draft CL linked in description) |
||
#ifndef M_DECAY_TIME | ||
#define M_DECAY_TIME -100 | ||
#endif | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This condition won't be triggered. What's the limit of the
RingBufferSize
?