Skip to content

Limit SSL_ERROR_WANT_READ retries to 1 sec #957

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 14, 2021
Merged

Conversation

qt1
Copy link
Contributor

@qt1 qt1 commented Jun 7, 2021

There is an unlimited loop in case open_ssl returns SSL_ERROR_WANT_READ or SSL_ERROR_SYSCALL

Currently the loop is unlimited and can potencially cause the main program to stuck.
Also, the retry loop can hog unlimited CPU.

retry with 1ms delays to prevent CPU hoggin
@yhirose
Copy link
Owner

yhirose commented Jun 7, 2021

@qt1, thanks for the pull request. Can I ask two questions before reviewing?

  • Did you actually hit the problem? If so, in what situation?
  • Where does 1000 come from? Is it just your heuristic, or does OpenSSL suggests?

Thanks!

@qt1
Copy link
Contributor Author

qt1 commented Jun 7, 2021

Hi,

I did see the SSL_ERROR_SYSCALL and also added the retry independently.

The time limit and count limit are good practice to reduce risk. I am writing a small extension to an existing application and an application stuck situation - even if rare - would be unacceptable.

I can run some more tests, maybe monitor the actual number of retries, and also look at the server side, and report here in a couple of days.

Regards

@yhirose
Copy link
Owner

yhirose commented Jun 7, 2021

@qt1, thanks for the reply.

I can run some more tests, maybe monitor the actual number of retries, and also look at the server side, and report here in a couple of days.

Yes, that would be very helpful. Thanks a lot!

@qt1
Copy link
Contributor Author

qt1 commented Jun 9, 2021

Looking at the server logs, it dose not look very good..

The retry (even with 1ms delay) seems to do cause multiple connection, sometimes with error indication and sometimes just double entry.
Note the log entries with the same time size etc.

The data is actually uploaded as needed but the communication seems wrong.
My guess is that there is something wrong with the activation of open-ssl

x.x.x.165 - - [09/Jun/2021:18:11:21 +0000] "POST /endpoint/ HTTP/1.1" 201 2519 "-" "cpp-httplib/0.7" "-"
x.x.x.165 - - [09/Jun/2021:18:11:21 +0000] "POST /endpoint/ HTTP/1.1" 201 2519 "-" "cpp-httplib/0.7" "-"
2021/06/09 18:11:21 [info] 39#39: *6799 recv() failed (104: Connection reset by peer) while sending to client, client: x.x.x.165, server: myserver.westeurope.cloudapp.azure.com, request: "POST /endpoint/ HTTP/1.1", upstream: "http://10.0.3.2:8000/endpoint/", host: "myserver.westeurope.cloudapp.azure.com"
x.x.x.234 - - [09/Jun/2021:18:21:59 +0000] "POST /endpoint/ HTTP/1.1" 201 2450 "-" "cpp-httplib/0.7" "-"
x.x.x.234 - - [09/Jun/2021:18:21:59 +0000] "POST /endpoint/ HTTP/1.1" 201 2450 "-" "cpp-httplib/0.7" "-"
2021/06/09 18:21:59 [info] 39#39: *6801 recv() failed (104: Connection reset by peer) while sending to client, client: x.x.x.234, server: myserver.westeurope.cloudapp.azure.com, request: "POST /endpoint/ HTTP/1.1", upstream: "http://10.0.3.2:8000/endpoint/", host: "myserver.westeurope.cloudapp.azure.com"
2021/06/09 18:23:07 [info] 39#39: *6804 client closed connection while waiting for request, client: x.x.x.46, server: 0.0.0.0:443
x.x.x.46 - - [09/Jun/2021:18:23:08 +0000] "POST /endpoint/ HTTP/1.1" 201 23680 "-" "cpp-httplib/0.7" "-"
x.x.x.46 - - [09/Jun/2021:18:23:08 +0000] "POST /endpoint/ HTTP/1.1" 201 23680 "-" "cpp-httplib/0.7" "-"
x.x.x.46 - - [09/Jun/2021:18:25:42 +0000] "POST /endpoint/ HTTP/1.1" 201 23680 "-" "cpp-httplib/0.7" "-"
x.x.x.46 - - [09/Jun/2021:18:25:42 +0000] "POST /endpoint/ HTTP/1.1" 201 23680 "-" "cpp-httplib/0.7" "-"
x.x.x.117 - - [09/Jun/2021:18:30:07 +0000] "POST /endpoint/ HTTP/1.1" 201 996 "-" "cpp-httplib/0.7" "-"
x.x.x.117 - - [09/Jun/2021:18:30:07 +0000] "POST /endpoint/ HTTP/1.1" 201 996 "-" "cpp-httplib/0.7" "-"

Regards

@qt1
Copy link
Contributor Author

qt1 commented Jun 14, 2021

Well, after running the app and the server for a few more days, it seems to be working.

There are occasional "client closed connection while waiting for request" but otherwise it works.

I think that for the moment this is as good as it gets. Not perfect - but works. I feel that the protection against getting stuck is important (application stuck means a user loosing work - not acceptable).

So my recommendation is to merge the PR for now.
Also, in future, dig deeper to understand how to prevent SSL_ERROR_SYSCALL .

Regards
Baruch

@yhirose yhirose merged commit b8dec12 into yhirose:master Jun 14, 2021
@yhirose
Copy link
Owner

yhirose commented Jun 14, 2021

@qt1, thanks for the excellent research. I just merged it. Thanks for the fine contribution!

ExclusiveOrange pushed a commit to ExclusiveOrange/cpp-httplib-exor that referenced this pull request May 2, 2023
retry with 1ms delays to prevent CPU hoggin
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants