-
Notifications
You must be signed in to change notification settings - Fork 3
fix: stream connections longer than 5 minutes are dropped #244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This pull request has been linked to Shortcut Story #216849: Fix stream timing out after 5 minutes. |
: level_(Defaults<AnySDK>::LogLevel()), tag_(Defaults<AnySDK>::LogTag()) {} | ||
: level_(GetLogLevelEnum(std::getenv("LD_LOG_LEVEL"), | ||
Defaults<AnySDK>::LogLevel())), | ||
tag_(Defaults<AnySDK>::LogTag()) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally, we can just run LD_LOG_LEVEL=debug ./binary
but the Console Logger constructed in the client passes in a default value, so the environment variable isn't inspected.
Now it works like you'd want.
* The timestamp of the last read is initialized after receiving successful | ||
* headers, but before initiating the first body read. Future reads then | ||
* update the timestamp. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added this helper so we can more easily tell how fast / when data is being received. For example, this clearly shows heartbeats arriving every 180 seconds.
@@ -79,21 +79,49 @@ class FoxyClient : public Client, | |||
read_timeout_(read_timeout), | |||
write_timeout_(write_timeout), | |||
req_(std::move(req)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-arranging to match declaration order.
@@ -263,8 +293,22 @@ class FoxyClient : public Client, | |||
void on_read_body(boost::system::error_code ec, std::size_t amount) { | |||
boost::ignore_unused(amount); | |||
if (ec == boost::asio::error::operation_aborted) { | |||
logger_("read HTTP response body aborted"); | |||
return; | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's the key: if the parser indicates that the chunked encoding is finished, initiate backoff instead of attempting to keep reading the body.
@@ -276,6 +320,7 @@ class FoxyClient : public Client, | |||
} | |||
|
|||
void do_shutdown(std::function<void()> completion) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was causing unnecessary hanging when the test service tells the client to close. By cancelling the backoff timer, that operation can complete immediately.
937b304
to
ed5ff7f
Compare
@@ -290,13 +299,13 @@ class FoxyClient : public Client, | |||
|
|||
void on_read_body(boost::system::error_code ec, std::size_t amount) { | |||
boost::ignore_unused(amount); | |||
if (ec == boost::asio::error::operation_aborted) { | |||
logger_("read HTTP response body aborted"); | |||
if (ec == boost::asio::error::operation_aborted && shutting_down_) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If on_read_body()
can ever be executed from a context that is different from where do_shutdown()
is executed then we may have a race between modifying & accessing shutting_down_
.
Edit: Never mind, I see this was removed in the last commit :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good callout, especially since I had to add the bool back.
Although the public async_shutdown
method will post
's to the session_
's executor, the run
routine did not - I've fixed that.
Since those are the only two public entrypoints, we should be safe.
01bdc8b
to
5e28860
Compare
6d9ca86
to
24fcdfc
Compare
@@ -35,6 +35,11 @@ std::optional<std::string> EntityManager::create(ConfigParams const& params) { | |||
std::chrono::milliseconds(*params.readTimeoutMs)); | |||
} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This builder option was missing, but it's already implemented. Easy win to speed up tests.
23262e3
to
d8e7b1d
Compare
@@ -72,11 +72,11 @@ class EventOutbox : public std::enable_shared_from_this<EventOutbox> { | |||
private: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lints.
@@ -53,7 +59,7 @@ std::optional<std::string> EntityManager::create(ConfigParams const& params) { | |||
return std::nullopt; | |||
} | |||
|
|||
client->run(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed run
to async_connect
to make it clear this is kicking off an asynchronous operation.
@@ -14,8 +14,6 @@ auto const kOutboxCapacity = 1023; | |||
EventOutbox::EventOutbox(net::any_io_executor executor, | |||
std::string callback_url) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lints.
RequestType req; | ||
|
||
req.set(http::field::host, callback_host_); | ||
req.method(http::verb::get); | ||
req.method(http::verb::post); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get
was just plain wrong, but accepted by the contract tests.
@@ -38,6 +38,7 @@ int main(int argc, char* argv[]) { | |||
srv.add_capability("report"); | |||
srv.add_capability("post"); | |||
srv.add_capability("reconnection"); | |||
srv.add_capability("read-timeout"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We were already passing the parameter along, just hadn't advertised the capability.
errors_(std::move(errors)) { | ||
backoff_timer_(std::move(executor)), | ||
last_read_(std::nullopt), | ||
shutting_down_(false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Session construction now takes place in create_session()
, so the logic can be reused for each new connection.
} | ||
|
||
void run() override { | ||
session_.async_connect( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After renaming run
, I wrapped it in a post
so we can ensure there is only one thread accessing session
. This is the same as async_shutdown
.
🤖 I have created a release *beep* *boop* --- <details><summary>launchdarkly-cpp-client: 3.0.8</summary> ## [3.0.8](launchdarkly-cpp-client-v3.0.7...launchdarkly-cpp-client-v3.0.8) (2023-09-13) ### Bug Fixes * stream connections longer than 5 minutes are dropped ([#244](#244)) ([e12664f](e12664f)) ### Dependencies * The following workspace dependencies were updated * dependencies * launchdarkly-cpp-internal bumped from 0.1.8 to 0.1.9 * launchdarkly-cpp-common bumped from 0.3.5 to 0.3.6 * launchdarkly-cpp-sse-client bumped from 0.1.2 to 0.1.3 </details> <details><summary>launchdarkly-cpp-common: 0.3.6</summary> ## [0.3.6](launchdarkly-cpp-common-v0.3.5...launchdarkly-cpp-common-v0.3.6) (2023-09-13) ### Bug Fixes * stream connections longer than 5 minutes are dropped ([#244](#244)) ([e12664f](e12664f)) </details> <details><summary>launchdarkly-cpp-internal: 0.1.9</summary> ### Dependencies * The following workspace dependencies were updated * dependencies * launchdarkly-cpp-common bumped from 0.3.5 to 0.3.6 </details> <details><summary>launchdarkly-cpp-sse-client: 0.1.3</summary> ## [0.1.3](launchdarkly-cpp-sse-client-v0.1.2...launchdarkly-cpp-sse-client-v0.1.3) (2023-09-13) ### Bug Fixes * stream connections longer than 5 minutes are dropped ([#244](#244)) ([e12664f](e12664f)) </details> --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
This fixes streaming connection dropping after 5 minutes.
The problem was applying a 5 minute timeout to an
async_read
operation, when the actual intent was to apply it to the act of receiving any data.The timeout exists to prevent the client from hanging infinitely if the server stops responding. LaunchDarkly sends heartbeats to clear the timer.
With
async_read
, the operation won't complete until the response body is finished. Since the body will never finish until the client shuts down (or server interrupts it), it's not possible to institute a timeout.The solution is to use
async_read_some
, which completes whenever data arrives.This way, we can assign the 5 minute timeout as an upper bound on receiving any data, as was originally intended.
A side effect of this change was breaking the
reconnection
SSE contract tests, which proved to be somewhat of a rabbithole.The upshot is that we weren't handling chunked encoding properly when chunked encoding ends. The original code assumes we either terminate an async read with an error, or it goes on forever.
In fact it should perform the backoff algorithm if we detect the end of chunked encoding.
I've also included more comments, more debug logging, and renamed some symbols for clarity. This should aid in debugging issues like this more efficiently.