-
Notifications
You must be signed in to change notification settings - Fork 10.5k
Huge speedup of RemoteMirror testing (in debug) #75415
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
**Background:** Each RemoteMirror test runs two processes: A host process runs the RemoteMirror library and queries memory data from a target process. The host and target communicate by passing strings back and forth with requests to read particular information in memory. The target is pure Swift; the host is C/C++. **What this change does:** Without this, the host makes very many small requests to the target. Each of those requests has to be individually parsed by the target using String operations from the debug stdlib. Actually transferring the raw bytes is relatively quick. With this, the host requests and caches "pages" of about 4k and satisfies most requests from previously-fetched data. This dramatically reduces the total number of operations. **Performance notes:** The following notes only count the time to compile and run the 78 tests in the validation-tests/Reflection directory using `lit.py`; it does not include time to rebuild the project before running tests. **Performance with debug stdlib:** (That is: `build-script -ra --debug-swift-stdlib`) Before this change, I got tired of waiting after about 15 minutes and it was about 1/4 done. Some very simplistic profiling showed >99% of the time being spent in stdlib String operations in the target process. Activity Monitor shows that individual tests run for about 6 minutes of CPU time each. With this change, the tests run in about 70s of wall time. Almost all of the time seems to spent compiling the tests; the tests themselves run too quickly to even show up in the Activity Monitor. **Performance with release stdlib:** (That is: `build-script -ra`) The tests run in about 70s of wall time with or without this change.
typedef struct PipeMemoryReader { | ||
int to_child[2]; | ||
int from_child[2]; | ||
PipeMemoryReaderPage *Pages; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a simple linked list of "pages" to the PipeMemoryReader. This works surprisingly well: The RemoteMirror logic tends to make a lot of requests to the same area of memory, and this puts the most recent page at the top of the stack where it's found very quickly.
PipeMemoryReader *Reader = (PipeMemoryReader *)Context; | ||
|
||
PipeMemoryReaderPage *Page = Reader->Pages; | ||
while (Page != NULL) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try to find an existing page
Page = Page->Next; | ||
} | ||
|
||
if (Page == NULL) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there isn't one, round the request to a multiple of aligned pages and request that.
void *Buf = malloc(Size); | ||
PipeMemoryReader_collectBytesFromPipe(Reader, Buf, Size); | ||
|
||
memcpy(Buf, Page->Data + Address - Page->BaseAddress, Size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regardless, we have a page here, so we can copy data out of it.
@@ -288,6 +331,21 @@ PipeMemoryReader createPipeMemoryReader() { | |||
return Reader; | |||
} | |||
|
|||
static | |||
void flushPipeMemoryReader(PipeMemoryReader *Reader) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need a way to clean up pages.
const PipeMemoryReader Pipe) { | ||
uintptr_t instance = PipeMemoryReader_receiveInstanceAddress(&Pipe); | ||
const PipeMemoryReader *Reader) { | ||
uintptr_t instance = PipeMemoryReader_receiveInstanceAddress(Reader); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed a bunch of functions to accept a pointer to the PipeMemoryReader
instead of passing it directly.
return doDumpHeapInstance(BinaryFilename); | ||
PipeMemoryReader Pipe = createPipeMemoryReader(); | ||
int ret = doDumpHeapInstance(BinaryFilename, &Pipe); | ||
destroyPipeMemoryReader(&Pipe); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Memory management: Allocate the PipeMemoryReader
here on the stack and pass it into doDumpHeapInstance
. This was simpler than changing doDumpHeapInstance
to clean up its own PipeMemoryReader
(because doDumpHeapInstance
has a bunch of return
statements).
int doDumpHeapInstance(const char *BinaryFilename) { | ||
PipeMemoryReader Pipe = createPipeMemoryReader(); | ||
|
||
int doDumpHeapInstance(const char *BinaryFilename, PipeMemoryReader *Reader) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doDumpHeapInstance
now accepts a pointer to the PipeMemoryReader
instead of allocating one locally.
@swift-ci Please test |
#endif | ||
|
||
while (1) { | ||
InstanceKind Kind = PipeMemoryReader_receiveInstanceKind(&Pipe); | ||
// Flush the cache between every reflection operation | ||
flushPipeMemoryReader(Reader); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important: We flush the cache between reflection operations. Otherwise, we can miss changes to target memory that occur between operations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. I suggest promoting your GitHub comments in PipeMemoryReader_readBytes
to comments in the source, they'd be good to have in there.
@swift-ci Please test |
Background:
Each RemoteMirror test runs two processes: A host process runs the RemoteMirror library and queries memory data from a target process. The host and target communicate by passing strings back and forth with requests to read particular information in memory. The target is pure Swift; the host is C/C++.
What this change does:
Without this, the host makes very many small requests to the target. Each of those requests has to be individually parsed by the target using String operations from the debug stdlib. Actually transferring the raw bytes is relatively quick.
With this, the host requests and caches "pages" of about 4k and satisfies most requests from previously-fetched data. This dramatically reduces the total number of operations.
Performance notes:
The following notes only count the time to compile and run the 78 tests in the validation-tests/Reflection directory using
lit.py
; it does not include time to rebuild the project before running tests.Performance with debug stdlib:
(That is:
build-script -ra --debug-swift-stdlib
) Before this change, I got tired of waiting after about 15 minutes and it was about 1/4 done. Some very simplistic profiling showed >99% of the time being spent in stdlib String operations in the target process. Activity Monitor shows that individual tests run for about 6 minutes of CPU time each.With this change, the tests run in about 70s of wall time. Almost all of the time seems to spent compiling the tests; the tests themselves run too quickly to even show up in the Activity Monitor.
Performance with release stdlib:
(That is:
build-script -ra
)The tests run in about 70s of wall time with or without this change.