Skip to content

[libSyntax] Don't cache token nodes #36352

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 9, 2021

Conversation

ahoppen
Copy link
Member

@ahoppen ahoppen commented Mar 8, 2021

It turns out that caching is actually more expensive than just creating new nodes. I assume this is because memory allocation has become fast through the bump allocator and the cache lookup requires computing a hash value which is rather expensive.

In terms of memory, I measured the memory used by the SyntaxArena when parsing Alamofire, which increases from 1.07MB to 1.24MB (15% increase), which should be negligible compared to the >100MB overall memory footprint of swift-frontend. I haven’t measured again, but the performance increase when creating a libSyntax tree was significant (something like 15-40%, I can measure again if you’d like more accurate numbers).

Also, out of curiosity, I measured how many cache hits we were actually seeing and 75% of all tokens could be served from the cache. This tells me that the majority of the SyntaxArena's memory is used by layout nodes.

@ahoppen ahoppen requested review from akyrtzi and rintaro March 8, 2021 15:38
@ahoppen
Copy link
Member Author

ahoppen commented Mar 8, 2021

@swift-ci Please smoke test

@akyrtzi
Copy link
Contributor

akyrtzi commented Mar 8, 2021

IIRC I came to same conclusion on the SwiftSyntax side and removed the caching there as well.

I was thinking at the time we could reconsider caching as bulk creating plain token nodes in advance and re-using them process-wide, but perhaps it would complicate memory management a bit so not sure how much it is worth.

@ahoppen
Copy link
Member Author

ahoppen commented Mar 8, 2021

I was thinking at the time we could reconsider caching as bulk creating plain token nodes in advance and re-using them process-wide, but perhaps it would complicate memory management a bit so not sure how much it is worth.

As in, create tokens for the common keywords with common trivia (e.g. one trailing space) and have an efficient look-up structure for them?

@akyrtzi
Copy link
Contributor

akyrtzi commented Mar 8, 2021

I was thinking at the time we could reconsider caching as bulk creating plain token nodes in advance and re-using them process-wide, but perhaps it would complicate memory management a bit so not sure how much it is worth.

As in, create tokens for the common keywords with common trivia (e.g. one trailing space) and have an efficient look-up structure for them?

Yes, exactly.

It turns out that caching is actually more expensive than
just creating new nodes.
@ahoppen ahoppen force-pushed the pr/dont-cache-tokens branch from c0b2ab5 to 2e5c869 Compare March 9, 2021 08:55
@ahoppen
Copy link
Member Author

ahoppen commented Mar 9, 2021

@swift-ci Please smoke test

@ahoppen ahoppen merged commit bdacb68 into swiftlang:main Mar 9, 2021
@ahoppen ahoppen deleted the pr/dont-cache-tokens branch March 9, 2021 12:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants