Skip to content

Make the demangler in the runtime use stack allocated memory. #22655

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Feb 18, 2019

Conversation

eeckstein
Copy link
Contributor

A big part of the change is to reduce the size of demangler Node: This is done by disallowing nodes with children to also have index or text payloads.
In some cases those payloads were not needed anyway, because the information can be derived later.
In other cases the fix was to insert an additional child node with the index/text payload.

The demangler now supports stack allocated memory for its allocator.
The demangler can be initialized with a preallocated memory on the stack. Only in case of an overflow, the bump pointer allocator mallocs new memory.
Also, support that a new instance of a demangler can "borrow" the free memory from an existing demangler. This is useful because in the runtime the demangler is invoked recursively. With this feature, all the nested demanglers can share a single stack allocated space.

rdar://problem/47357709

This is done by disallowing nodes with children to also have index or text payloads.
In some cases those payloads were not needed anyway, because the information can be derived later.
In other cases the fix was to insert an additional child node with the index/text payload.

Also, implement single or double children as "inline" children, which avoids needing a separate node vector for children.

All this reduces the needed size for node trees by over 2x.
Log allocated memory and indent according to the nesting level
The demangler can be initialized with a preallocated memory on the stack. Only in case of an overflow, the bump pointer allocator mallocs new memory.
Also, support that a new instance of a demangler can "borrow" the free memory from an existing demangler. This is useful because in the runtime the demangler is invoked recursively. With this feature, all the nested demanglers can share a single stack allocated space.
This reduces the amount of mallocs significantly.
@eeckstein eeckstein requested a review from DougGregor February 15, 2019 21:41
@eeckstein
Copy link
Contributor Author

@swift-ci test

@eeckstein
Copy link
Contributor Author

@swift-ci benchmark

@eeckstein eeckstein requested a review from jckarter February 15, 2019 21:42
@swift-ci
Copy link
Contributor

Performance: -O

TEST OLD NEW DELTA RATIO
Regression
StringAdder 427 470 +10.1% 0.91x (?)
StringBuilderSmallReservingCapacity 350 381 +8.9% 0.92x (?)
Improvement
SortStringsUnicode 3565 3315 -7.0% 1.08x (?)

Performance: -Osize

TEST OLD NEW DELTA RATIO
Regression
StringBuilder 327 370 +13.1% 0.88x
StringBuilderSmallReservingCapacity 341 382 +12.0% 0.89x (?)
Improvement
SortStringsUnicode 3560 3310 -7.0% 1.08x (?)

Performance: -Onone

TEST OLD NEW DELTA RATIO
Regression
StrComplexWalk 6680 7330 +9.7% 0.91x (?)
Improvement
ArrayOfGenericPOD2 1179 1065 -9.7% 1.11x (?)
ArrayOfPOD 855 775 -9.4% 1.10x (?)
Dictionary3 759 704 -7.2% 1.08x (?)
SortStringsUnicode 5205 4840 -7.0% 1.08x (?)

Code size: -swiftlibs

TEST OLD NEW DELTA RATIO
Regression
libswiftRemoteMirror.dylib 364544 368640 +1.1% 0.99x
How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview
  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB

if (CurrentSlab) {
#ifdef NODE_FACTORY_DEBUGGING
std::cerr << indent() << "## clear: allocated memory = " << allocatedMemory << "\n";
#endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would LLVM's DEBUG(...) macros work from the runtime?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't try, but we are not linking llvm to the runtime

Copy link
Contributor

@jckarter jckarter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks for doing this Erik!

@eeckstein eeckstein merged commit bf909ca into swiftlang:master Feb 18, 2019
@eeckstein eeckstein deleted the stack-allocated-demangler branch February 18, 2019 16:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants