-
Notifications
You must be signed in to change notification settings - Fork 314
Restart sourcekitd and clangd after they have crashed #219
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
swiftlang/swift#29558 |
c8d0977
to
3cbd9ff
Compare
swiftlang/swift#29558 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might also make sense to swap FileBuildSettings
to be a class since it's commonly shared/cached and immutable, not sure.
3cbd9ff
to
eac3dc2
Compare
@swift-ci Please test |
I don’t think so, we explicitly want the value semantics. Plus it’s not immutable, both |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sourcekitd part is looking almost there. Mentioned a couple of issues. I still need to look at the clangd side.
@@ -131,6 +148,27 @@ extension SwiftLanguageServer { | |||
guard let self = self else { return } | |||
let notification = SKResponse(notification, sourcekitd: self.sourcekitd) | |||
|
|||
if notification.value?[self.keys.notification] == self.values.notification_sema_enabled { | |||
self.state = .connected |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The notification handler queue is not related to queue
for this object, so to access state
we need to synchronize. Similarly, addStateChangeHandler
needs to synchronize the same way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I understood it, queue
is used to synchronise the requests and responses from sourcekitd
. Setting the state does not fall into this category, so I did not synchronise it on queue
.
Do you want to extend the definition of queue
? I don’t think that state
really needs to be synchronised, but it wouldn’t hurt either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment on queue
is misleading and should be updated. We need to protect shared access to mutable state - e.g. we're doing this for currentDiagnostics
already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, in that case, it makes total sense to guard the access to state by queue
.
|
||
// We don't have any open documents anymore after sourcekitd crashed. | ||
// Reset the document manager to reflect that. | ||
self.documentManager = DocumentManager() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This mutation also needs synchronization. It's kind of suspicious that you're doing this after you call reopenDocuments - couldn't that repopulate it at the wrong time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This mutation also needs synchronization.
Good point. I wrapped it in queue.async
.
It's kind of suspicious that you're doing this after you call reopenDocuments - couldn't that repopulate it at the wrong time?
No, that ordering is correct. When sourcekitd
crashes, state
is .connected
. Hence the second if self.state == .connectionInterrupted
is skipped and the third if let error = notification.error, error.code == .connectionInterrupted
sets state = .connectionInterrupted
, because we receive a notification with error.code == .connectionInterrupted
.
When the next notification comes in, the first if self.state == .connectionInterrupted
is taken, which then sets state = .semanticFunctionalityDisabled
.
Finally, when semantic functionality is restored, we get a notification that satisfies the first if notification.value?[self.keys.notification] == self.values.notification_sema_enabled
, setting state
back to .connected
.
connection.close() | ||
if process.terminationStatus != 0 { | ||
if let self = self { | ||
self.state = .connectionInterrupted |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar synchronization issues to the Swift side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as on the Swift side:
As I understood it, queue is used to synchronise the requests and responses from clangd
. Setting the state does not fall into this category, so I did not synchronise it on queue.
Do you want to extend the definition of queue? I don’t think that state really needs to be synchronised, but it wouldn’t hurt either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don’t think that state really needs to be synchronised, but it wouldn’t hurt either.
Why not? It's being mutated and read concurrently.
if process.terminationStatus != 0 { | ||
if let self = self { | ||
self.state = .connectionInterrupted | ||
self.restartClangd() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm concerned that this could crash loop. For sourcekitd, XPC prevents us from re-spawning too fast in a loop. We should implement something here to delay the relaunch if we crash loop (e.g. after the second crash in a short time start delaying a few seconds before trying again).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW VSCode will restart an LSP for up to 5 times within a 3 minute interval. Really depends upon the type of crash - most of which I've seen are persistent or related to a single document.
Might make sense to do something similar here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I have added some super simple crash-loop prevention code: If clangd
has crashed in the last 30 seconds, delay any further restart by 10 seconds. I think this should be sufficient to prevent us from going wild im crash loop scenarios. I also added a test case that checks that clangd
isn’t restarted twice in a row within 5 seconds.
eac3dc2
to
0776790
Compare
@swift-ci Please test |
0776790
to
2b0cbfd
Compare
@benlangmuir I have aded 2b0cbfd which guards access to @swift-ci Please test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried running these test under tsan and I'm seeing races like the following
ThreadSanitizer: data race ClangLanguageServer.swift:234
WARNING: ThreadSanitizer: data race (pid=9454)
Read of size 8 at 0x7b5800030018 by thread T13:
#0 ClangLanguageServerShim.changeDocument(_:) ClangLanguageServer.swift:234 (SourceKitLSPPackageTests:x86_64+0xc76794)
#1 protocol witness for ToolchainLanguageServer.changeDocument(_:) in conformance ClangLanguageServerShim <compiler-generated> (SourceKitLSPPackageTests:x86_64+0xc766ca)
#2 SourceKitServer.changeDocument(_:workspace:) SourceKitServer.swift:490 (SourceKitLSPPackageTests:x86_64+0xc455d9)
#3 partial apply for SourceKitServer.changeDocument(_:workspace:) <compiler-generated> (SourceKitLSPPackageTests:x86_64+0xc63f92)
#4 closure #1 in SourceKitServer.registerWorkspaceNotfication<A>(_:) SourceKitServer.swift:130 (SourceKitLSPPackageTests:x86_64+0xc4c661)
#5 partial apply for closure #1 in SourceKitServer.registerWorkspaceNotfication<A>(_:) <compiler-generated> (SourceKitLSPPackageTests:x86_64+0xc59414)
#6 thunk for @escaping @callee_guaranteed (@guaranteed Notification<A>) -> () <compiler-generated> (SourceKitLSPPackageTests:x86_64+0xac9e8b)
#7 partial apply for thunk for @escaping @callee_guaranteed (@guaranteed Notification<A>) -> () <compiler-generated> (SourceKitLSPPackageTests:x86_64+0xacdd04)
#8 thunk for @escaping @callee_guaranteed (@in_guaranteed Notification<A>) -> (@out ()) <compiler-generated> (SourceKitLSPPackageTests:x86_64+0xacb1e0)
#9 partial apply for thunk for @escaping @callee_guaranteed (@in_guaranteed Notification<A>) -> (@out ()) <compiler-generated> (SourceKitLSPPackageTests:x86_64+0xacee40)
#10 closure #1 in LanguageServerEndpoint.handle<A>(_:from:) LanguageServer.swift:168 (SourceKitLSPPackageTests:x86_64+0xacb11a)
#11 partial apply for closure #1 in LanguageServerEndpoint.handle<A>(_:from:) <compiler-generated> (SourceKitLSPPackageTests:x86_64+0xace185)
#12 thunk for @escaping @callee_guaranteed () -> () <compiler-generated> (SourceKitLSPPackageTests:x86_64+0x17a43)
#13 __tsan::invoke_and_release_block(void*) <null>:5352656 (libclang_rt.tsan_osx_dynamic.dylib:x86_64h+0x71e5b)
#14 _dispatch_client_callout object.m:495 (libdispatch.dylib:x86_64+0x350d)
Previous write of size 8 at 0x7b5800030018 by thread T8:
#0 ClangLanguageServerShim.initialize() ClangLanguageServer.swift:91 (SourceKitLSPPackageTests:x86_64+0xc6cd87)
#1 closure #1 in ClangLanguageServerShim.restartClangd() ClangLanguageServer.swift:155 (SourceKitLSPPackageTests:x86_64+0xc6fce2)
#2 partial apply for closure #1 in ClangLanguageServerShim.restartClangd() <compiler-generated> (SourceKitLSPPackageTests:x86_64+0xc7fd65)
#3 thunk for @escaping @callee_guaranteed () -> () <compiler-generated> (SourceKitLSPPackageTests:x86_64+0x17a43)
#4 __tsan::invoke_and_release_block(void*) <null>:5352656 (libclang_rt.tsan_osx_dynamic.dylib:x86_64h+0x71e5b)
#5 _dispatch_client_callout object.m:495 (libdispatch.dylib:x86_64+0x350d)
Location is heap block of size 736 at 0x7b5800030000 allocated by thread T7:
#0 malloc <null>:5352688 (libclang_rt.tsan_osx_dynamic.dylib:x86_64h+0x4f88a)
#1 swift_slowAlloc <null>:5352688 (libswiftCore.dylib:x86_64+0x2cb6d8)
#2 makeJSONRPCClangServer(client:toolchain:buildSettings:clangdOptions:reopenDocuments:) ClangLanguageServer.swift:339 (SourceKitLSPPackageTests:x86_64+0xc7d644)
#3 languageService(for:_:options:client:reopenDocuments:) SourceKitServer.swift:817 (SourceKitLSPPackageTests:x86_64+0xc52bf3)
#4 closure #1 in SourceKitServer.languageService(for:_:) SourceKitServer.swift:222 (SourceKitLSPPackageTests:x86_64+0xc5028c)
#5 partial apply for closure #1 in SourceKitServer.languageService(for:_:) <compiler-generated> (SourceKitLSPPackageTests:x86_64+0xc602a2)
#6 thunk for @callee_guaranteed () -> (@owned ToolchainLanguageServer?, @error @owned Error) <compiler-generated> (SourceKitLSPPackageTests:x86_64+0xc54276)
#7 partial apply for thunk for @callee_guaranteed () -> (@owned ToolchainLanguageServer?, @error @owned Error) <compiler-generated> (SourceKitLSPPackageTests:x86_64+0xc60327)
#8 orLog<A>(_:level:logger:_:) Logging.swift:58 (SourceKitLSPPackageTests:x86_64+0x590e27)
#9 SourceKitServer.languageService(for:_:) SourceKitServer.swift:221 (SourceKitLSPPackageTests:x86_64+0xc4fde4)
#10 SourceKitServer.languageService(for:_:in:) SourceKitServer.swift:256 (SourceKitLSPPackageTests:x86_64+0xc54775)
#11 SourceKitServer.openDocument(_:workspace:registerForBuildSystemNotifications:) SourceKitServer.swift:471 (SourceKitLSPPackageTests:x86_64+0xc4f679)
#12 SourceKitServer.openDocument(_:workspace:) SourceKitServer.swift:458 (SourceKitLSPPackageTests:x86_64+0xc44522)
#13 partial apply for SourceKitServer.openDocument(_:workspace:) <compiler-generated> (SourceKitLSPPackageTests:x86_64+0xc64052)
#14 closure #1 in SourceKitServer.registerWorkspaceNotfication<A>(_:) SourceKitServer.swift:130 (SourceKitLSPPackageTests:x86_64+0xc4c661)
#15 partial apply for closure #1 in SourceKitServer.registerWorkspaceNotfication<A>(_:) <compiler-generated> (SourceKitLSPPackageTests:x86_64+0xc59414)
#16 thunk for @escaping @callee_guaranteed (@guaranteed Notification<A>) -> () <compiler-generated> (SourceKitLSPPackageTests:x86_64+0xac9e8b)
#17 partial apply for thunk for @escaping @callee_guaranteed (@guaranteed Notification<A>) -> () <compiler-generated> (SourceKitLSPPackageTests:x86_64+0xacdd04)
#18 thunk for @escaping @callee_guaranteed (@in_guaranteed Notification<A>) -> (@out ()) <compiler-generated> (SourceKitLSPPackageTests:x86_64+0xacb1e0)
#19 partial apply for thunk for @escaping @callee_guaranteed (@in_guaranteed Notification<A>) -> (@out ()) <compiler-generated> (SourceKitLSPPackageTests:x86_64+0xacee40)
#20 closure #1 in LanguageServerEndpoint.handle<A>(_:from:) LanguageServer.swift:168 (SourceKitLSPPackageTests:x86_64+0xacb11a)
#21 partial apply for closure #1 in LanguageServerEndpoint.handle<A>(_:from:) <compiler-generated> (SourceKitLSPPackageTests:x86_64+0xace185)
#22 thunk for @escaping @callee_guaranteed () -> () <compiler-generated> (SourceKitLSPPackageTests:x86_64+0x17a43)
#23 __tsan::invoke_and_release_block(void*) <null>:5352688 (libclang_rt.tsan_osx_dynamic.dylib:x86_64h+0x71e5b)
#24 _dispatch_client_callout object.m:495 (libdispatch.dylib:x86_64+0x350d)
Thread T13 (tid=4418908, running) is a GCD worker thread
Thread T8 (tid=4418902, running) is a GCD worker thread
Thread T7 (tid=4418847, running) is a GCD worker thread
SUMMARY: ThreadSanitizer: data race ClangLanguageServer.swift:234 in ClangLanguageServerShim.changeDocument(_:)
// Do a sanity check and verify that we get the expected result from a hover response before crashing clangd. | ||
|
||
let expectedHoverContent = """ | ||
Declared in global namespace |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This text seems to vary between recent clangd versions - I tried this test locally and it failed because "loc" wasn't included at the bottom. We generally try to keep our tests passing with both the latest toolchain and the most recent released toolchain. Let's change this only compare the range, or if you really want to check the markup content, just check a prefix like "Declared in".
Hi @benlangmuir, thanks for running the thread sanitizer. Looks like I missed a couple data races, but things should be fixed now. I also changed crash recovery tests to only verify that the range is correct. I think that should suffice for the sanity check that we have correctly re-opened the document. I have added then new changes as two new commits for an easier review. I’d like to squash them all together before merging the PR though. @swift-ci Please test |
Please regenerate linuxmain and squash as appropriate; otherwise LGTM. Thanks! |
Squashed and regenerated @swift-ci Please test |
Test failure seems to have been unrelated @swift-ci Please test |
|
… get better CI failure messages
@swift-ci Please test |
The Swift project moved the default branch to More detail about the branch update - https://forums.swift.org/t/updating-branch-names/40412 |
Recover from crashes in
sourcekitd
andclangd
by transparently restarting them from SourceKit-LSP.