Skip to content

Fixing race condition in server and partial stream handling in frontend. #2391

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Aug 4, 2023
Merged

Conversation

snichols
Copy link
Contributor

This PR fixes two problems:

  1. A race condition in server.cpp.
  2. completion.js didn't handle partial stream results.

The race condition was caused by unique_lock scope loss when processing streaming completion. My fix was to handle unlocking of the mutex manually in that case. This bug caused segfaults when handling multiple streaming requests on the completion endpoint.

Partial stream results happen regularly on slower connections, or when the server is generating messages very quickly. The fix was to handle leftover data in the llama generator function. This bug caused completion messages to be garbled in several cases.

@Azeirah
Copy link
Contributor

Azeirah commented Jul 31, 2023

Ah great! I had many segfaults, I will use this branch tomorrow to check if it's fixed.

@slaren
Copy link
Member

slaren commented Aug 4, 2023

From what I understand, the .hpp files will need to be regenerated to see the changes in completion.js in the binary. However, I think that fixing this is important enough to merge this now. Hopefully the .hpp files can be regenerated in one of the other server PRs.

@slaren slaren merged commit 5f631c2 into ggml-org:master Aug 4, 2023
crasm pushed a commit to crasm/llama.cpp that referenced this pull request Aug 9, 2023
…nd. (ggml-org#2391)

* Fixing race condition in server.cpp and partial stream handling in completion.js

* Reverting assert edits.

* Adding newline to eof
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[User] Unreliable response from server using Wireguard when tokens are generated too fast
3 participants