Skip to content

Reduce the Stack Size of ConstraintSystem #24659

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 13, 2019

Conversation

gmittert
Copy link
Contributor

@gmittert gmittert commented May 9, 2019

I've been looking at the sourcekit tests on Windows and have been running into stack overflows. The issue is not too much recursion, but rather one of huge stack frames in normal operation. For example, the ConstraintSystem class is on the order of 1000s of bytes in size on the stack. This becomes a problem with sourcekit which runs dispatch threads that have a hard coded 64k stack limit.

I don't see a single spot that's causing all the problems, I've seen it stack overflow so far with two entirely different stack traces. (Though one of them was probably caused because swift::Lexer::kindOfIdentifier uses 9992 bytes of stack on Windows and 12736 on Darwin.) I think most of this will be finding large classes and functions and either making them smaller, or instantiating them on the heap instead of the stack.

This commit changes ConstraintSystem from using Small data types which store data on the stack to non small heap based data types.

As numbers go, this reduces the stack frame size of ResolvedMemberResult swift::resolveValueMember(DeclContext &DC, Type BaseTy, DeclName Name) from 4824 bytes to 2584 bytes (measured on a Windows Release build). The numbers are taken from looking at the sub rsp instruction in the function's dissassembly. e.g:

    sourcekitdInProc!swift::resolveValueMember:
00007ff8`b66cc160 4157               push    r15
00007ff8`b66cc162 4156               push    r14
00007ff8`b66cc164 4155               push    r13
00007ff8`b66cc166 4154               push    r12
00007ff8`b66cc168 56                 push    rsi
00007ff8`b66cc169 57                 push    rdi
00007ff8`b66cc16a 55                 push    rbp
00007ff8`b66cc16b 53                 push    rbx
00007ff8`b66cc16c 4881ec180a0000     sub     rsp, 0A18h

I've only noticed the stack overflows on Windows so far. Windows in general does seem to allocate slightly more stack space than Darwin and I'm not sure why that is yet. For example, for one stack overflow I traced through, I counted up 48440 of used stack space on Windows and 41288 on Darwin at the same (non crashed) point.

@compnerd
Copy link
Member

compnerd commented May 9, 2019

CC: @DougGregor @jrose-apple

@jrose-apple jrose-apple requested review from xedin and DougGregor May 9, 2019 23:27
@jrose-apple
Copy link
Contributor

This seems unfortunate to me since a lot of these things probably are small. (Also, a lot of them would probably benefit from using copy-on-write data structures…) But @xedin can speak to this much more than I.

@swift-ci Please smoke test compiler performance

@xedin
Copy link
Contributor

xedin commented May 9, 2019

I'd be really interested to know what expressions did you catch this on? It has to be big literal arrays or something similar...

I think this is more of a workaround for actual problem which is related to the way we handle independent sub-components in the solver at the moment.

We shouldn't need to move data in/out to/from original TypeVariables to be able to solve sub-components, it should be possible to take a slice and use ArrayRef and ignore the out-of-scope type variables, this would reduce the stack space used considerably without needing to allocate data on heap. Would you be interested in tackling that, @gmittert?

@gmittert
Copy link
Contributor Author

gmittert commented May 10, 2019

@xedin This one was from cursor_info_container.swift, running S:\b\swift> &"S:\b\swift\bin\sourcekitd-test.EXE" "-req=cursor" "-pos=37:22" "S:\swift\test\SourceKit\CursorInfo\cursor_info_container.swift" "--" "S:\swift\test\SourceKit\CursorInfo\cursor_info_container.swift".

However, while it might be possible that the solver is allocating too much at run time, and I'd be happy (if a bit slow) to look at having the solver use ArrayRefs, I think a lot of the issue is just that a there are a lot of locals statically.

For example, here's the stack trace of the first overflow I ran into (not with the solver). It lists the size of just the locals used (the amount subtracted from rsp) in each frame. It uses about 2/3 of its stack space with just locals, not even counting any runtime stack allocations/pushing. Note that swift::Lexer::kindOfIdentifier is huge because it's a macro to create a comparison against each identifier which creates a ton of locals.

@slavapestov
Copy link
Contributor

One future improvement would be to investigate the places where we spin up new ConstraintSystems and see if they can be rewritten in a simpler way. I've seen various bits of code use a ConstraintSystem to check subtype and conformance relationships; this is overkill and should be avoided.

@jrose-apple
Copy link
Contributor

It is overkill, but it's probably not causing these stack overflows, since they're always simple systems.

@swift-ci
Copy link
Contributor

Summary for master smoketest

Unexpected test results, excluded stats for ReactiveCocoa

No regressions above thresholds

Debug

debug brief

Regressed (0)
name old new delta delta_pct
Improved (0)
name old new delta delta_pct
Unchanged (delta < 1.0% or delta < 100.0ms) (3)
name old new delta delta_pct
Frontend.NumInstructionsExecuted 100,861,048,556 100,741,313,865 -119,734,691 -0.12%
LLVM.NumLLVMBytesOutput 6,147,700 6,147,700 0 0.0%
time.swift-driver.wall 13.2s 13.1s -102.1ms -0.77%

debug detailed

Regressed (0)
name old new delta delta_pct
Improved (0)
name old new delta delta_pct
Unchanged (delta < 1.0% or delta < 100.0ms) (22)
name old new delta delta_pct
AST.NumImportedExternalDefinitions 1,028 1,028 0 0.0%
AST.NumLoadedModules 828 828 0 0.0%
AST.NumTotalClangImportedEntities 4,282 4,282 0 0.0%
IRModule.NumIRBasicBlocks 18,032 18,032 0 0.0%
IRModule.NumIRFunctions 10,542 10,542 0 0.0%
IRModule.NumIRGlobals 8,019 8,019 0 0.0%
IRModule.NumIRInsts 312,849 312,849 0 0.0%
IRModule.NumIRValueSymbols 17,622 17,622 0 0.0%
LLVM.NumLLVMBytesOutput 6,147,700 6,147,700 0 0.0%
SILModule.NumSILGenFunctions 5,370 5,370 0 0.0%
SILModule.NumSILOptFunctions 7,016 7,016 0 0.0%
Sema.NumConformancesDeserialized 15,769 15,769 0 0.0%
Sema.NumConstraintScopes 39,756 39,756 0 0.0%
Sema.NumDeclsDeserialized 136,226 136,226 0 0.0%
Sema.NumDeclsValidated 11,208 11,208 0 0.0%
Sema.NumFunctionsTypechecked 2,516 2,516 0 0.0%
Sema.NumGenericSignatureBuilders 4,792 4,792 0 0.0%
Sema.NumLazyGenericEnvironments 30,541 30,541 0 0.0%
Sema.NumLazyGenericEnvironmentsLoaded 2,012 2,012 0 0.0%
Sema.NumLazyIterableDeclContexts 24,366 24,366 0 0.0%
Sema.NumTypesDeserialized 61,133 61,133 0 0.0%
Sema.NumTypesValidated 11,858 11,858 0 0.0%

Release

release brief

Regressed (0)
name old new delta delta_pct
Improved (0)
name old new delta delta_pct
Unchanged (delta < 1.0% or delta < 100.0ms) (3)
name old new delta delta_pct
Frontend.NumInstructionsExecuted 142,947,437,559 142,977,267,810 29,830,251 0.02%
LLVM.NumLLVMBytesOutput 7,156,192 7,156,180 -12 -0.0%
time.swift-driver.wall 23.7s 23.7s -8.2ms -0.03%

release detailed

Regressed (0)
name old new delta delta_pct
Improved (0)
name old new delta delta_pct
Unchanged (delta < 1.0% or delta < 100.0ms) (22)
name old new delta delta_pct
AST.NumImportedExternalDefinitions 398 398 0 0.0%
AST.NumLoadedModules 76 76 0 0.0%
AST.NumTotalClangImportedEntities 2,196 2,196 0 0.0%
IRModule.NumIRBasicBlocks 21,084 21,084 0 0.0%
IRModule.NumIRFunctions 10,011 10,011 0 0.0%
IRModule.NumIRGlobals 8,031 8,031 0 0.0%
IRModule.NumIRInsts 223,919 223,919 0 0.0%
IRModule.NumIRValueSymbols 17,214 17,214 0 0.0%
LLVM.NumLLVMBytesOutput 7,156,192 7,156,180 -12 -0.0%
SILModule.NumSILGenFunctions 4,176 4,176 0 0.0%
SILModule.NumSILOptFunctions 5,844 5,844 0 0.0%
Sema.NumConformancesDeserialized 12,469 12,469 0 0.0%
Sema.NumConstraintScopes 38,826 38,826 0 0.0%
Sema.NumDeclsDeserialized 32,733 32,733 0 0.0%
Sema.NumDeclsValidated 7,874 7,874 0 0.0%
Sema.NumFunctionsTypechecked 2,204 2,204 0 0.0%
Sema.NumGenericSignatureBuilders 1,726 1,726 0 0.0%
Sema.NumLazyGenericEnvironments 6,891 6,891 0 0.0%
Sema.NumLazyGenericEnvironmentsLoaded 130 130 0 0.0%
Sema.NumLazyIterableDeclContexts 4,375 4,375 0 0.0%
Sema.NumTypesDeserialized 18,197 18,197 0 0.0%
Sema.NumTypesValidated 6,942 6,942 0 0.0%

@slavapestov
Copy link
Contributor

@jrose-apple I suspect the stack overflow is not from the size of the constraints themselves but the large number of vectors and DenseMaps that are directly stored inside the instance.

@jrose-apple
Copy link
Contributor

I…guess we stopped trying to measure memory usage in the compiler perf tests. (It was never great anyway.)

Copy link
Member

@DougGregor DougGregor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM. I wouldn't expect performance to suffer here, because there's a lot of allocations already going on in the solver, and these particular bits of state are likely to either empty to blow through the small buffers anyway.

@@ -603,7 +603,7 @@ class TypeChecker final : public LazyResolver {
/// }
/// }

llvm::SmallDenseMap<AnyFunctionRef, SmallVector<AnyFunctionRef, 4>, 4>
llvm::DenseMap<AnyFunctionRef, SmallVector<AnyFunctionRef, 4>>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This particular change probably doesn't do much, because there's only really one TypeChecker instance that lives forever.

@@ -549,7 +549,7 @@ class TypeChecker final : public LazyResolver {
std::vector<AbstractFunctionDecl *> definedFunctions;

/// Declarations that need their conformances checked.
llvm::SmallVector<Decl *, 8> ConformanceContexts;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This particular change probably doesn't do much, because there's only really one TypeChecker instance that lives forever.

The ConstraintSystem class is on the order of 1000s of bytes in size on
the stacka nd is causing issues with dispatch's 64k stack limit.

This changes most Small data types which store data on the stack to non
small heap based data types.
@gmittert gmittert force-pushed the StackConstraints branch from ac19610 to e51b72b Compare May 13, 2019 20:10
@gmittert
Copy link
Contributor Author

Alright, removed the TypeChecker changes.

@swift-ci please test and merge

@gmittert gmittert merged commit a6d5830 into swiftlang:master May 13, 2019
@gmittert gmittert deleted the StackConstraints branch May 13, 2019 23:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants