-
Notifications
You must be signed in to change notification settings - Fork 10.5k
Switch MetadataCache to use a global slab allocator. #7109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch MetadataCache to use a global slab allocator. #7109
Conversation
@swift-ci Please smoke test. |
@swift-ci Please benchmark. |
Hmm, GCC doesn't emit this with cmpxchg16b. That's going to require some investigation. |
We should chain the pools together (reserve the first or last word for the link to the next or something). Otherwise heap analysis tools will complain about leaks once there are three pools allocated. |
Do you have any measurements of pool space occupied by any current apps? |
Build comment file:Optimized (O) Regression (0)Improvement (0)No Changes (159)
Regression (0)Improvement (1)
No Changes (158)
|
I saw similar Linux link errors indicating failure to use atomic instructions in #5282. I didn't investigate and instead added |
@gparker42 There should be references to the memory from each slab in the cache data structures; that should be sufficient to silence tools, assuming they're silenced today when those data structures are referring to malloc'ed memory. I mean, I guess it's possible that an entire slab could be leaked from high-level contention, but I wouldn't stay up nights worrying about it. I don't have concrete measurements about pool usage. Those would be good to get. Yeah, I think adding -latomic is probably the easiest solution. It looks like the implementation is unnecessarily conditionalized (are there any x86_64 processors that don't support that instruction?) but still does use the instruction. If people care about supporting this maximally efficiently there, they can add a target-specific customization, or better yet teach GCC/libstdc++ to just emit the optimal code pattern. |
Okay, the cache references to the slabs ought to be good enough. It would be easy to tweak in response to feedback. |
e8f0ced
to
b80aa14
Compare
@swift-ci Please smoke test. |
b80aa14
to
e488413
Compare
@swift-ci Please smoke test Linux. |
e488413
to
4003b5c
Compare
@swift-ci Please smoke test Linux. |
4003b5c
to
254db8a
Compare
@swift-ci Please smoke test Linux. |
Hmm. That seems to be a real, semantic failure. Okay, seriously, whose bright idea was it to give LLVM allocators these two Deallocate overloads?
This is just beyond stupid. |
This seems to more than fix a performance regression that we detected on a metadata-allocation microbenchmark. A few months ago, I improved the metadata cache representation and changed the metadata allocation scheme to primarily use malloc. Previously, we'd been using malloc in the concurrent tree data structure but a per-cache slab allocator for the metadata itself. At the time, I was concerned about the overhead of per-cache allocators, since many metadata patterns see only a small number of instantiations. That's still an important factor, so in the new scheme we're using a global allocator; but instead of using malloc for individual allocations, we're using a slab allocator, which should have better peak, single-thread performance, at the cost of not easily supporting deallocation. Deallocation is only used for metadata when there's contention on the cache, and specifically only when there's contention for the same key, so leaking a little isn't the worst thing in the world. The initial slab is a 64K globally-allocated buffer. Successive slabs are 16K and allocated with malloc. rdar://28189496
254db8a
to
ccbe5fc
Compare
From the commit log:
Whose bright idea was it to allow this in a language? |
Heh. Alright, let's try this again. @swift-ci Please smoke test Linux. |
@swift-ci Please smoke test OS X and merge. |
@swift-ci Please smoke test OS X. |
Please? |
This seems to more than fix a performance regression that we
detected on a metadata-allocation microbenchmark.
A few months ago, I improved the metadata cache representation
and changed the metadata allocation scheme to primarily use malloc.
Previously, we'd been using malloc in the concurrent tree data
structure but a per-cache slab allocator for the metadata itself.
At the time, I was concerned about the overhead of per-cache
allocators, since many metadata patterns see only a small number
of instantiations. That's still an important factor, so in the
new scheme we're using a global allocator; but instead of using
malloc for individual allocations, we're using a slab allocator,
which should have better peak, single-thread performance, at the
cost of not easily supporting deallocation. Deallocation is
only used for metadata when there's contention on the cache, and
specifically only when there's contention for the same key, so
leaking a little isn't the worst thing in the world.
The initial slab is a 64K globally-allocated buffer.
Successive slabs are 16K and allocated with malloc.
Resolves rdar://28189496.