Skip to content

fix infinite loop with kafka admin #20

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jan 19, 2021

Conversation

hackaugusto
Copy link

Port of dpkp#2194

An infinite loop may happen with the following pattern:

self._send_request_to_node(self._client.least_loaded_node(), request)

The problem happens when self._client's cluster metadata is out-of-date, and the
result of least_loaded_node() is a node that has been removed from the cluster but
the client is unware of it. When this happens _send_request_to_node will enter an
infinite loop waiting for the chosen node to become available, which won't happen,
resulting in an infinite loop.

This commit introduces a new method named _send_request_to_least_loaded_node which
handles the case above. This is done by regularly checking if the target node is
available in the cluster metadata, and if not, a new node is chosen.

Notes:

This does not yet cover every call site to _send_request_to_node, there are some
other places were similar race conditions may happen.
The code above does not guarantee that the request itself will be sucessful, since
it is still possible for the target node to exit, however, it does remove the
infinite loop which can render client code unusable.

An infinite loop may happen with the following pattern:

    self._send_request_to_node(self._client.least_loaded_node(), request)

The problem happens when `self._client`'s cluster metadata is out-of-date, and the
result of `least_loaded_node()` is a node that has been removed from the cluster but
the client is unware of it. When this happens `_send_request_to_node` will enter an
infinite loop waiting for the chosen node to become available, which won't happen,
resulting in an infinite loop.

This commit introduces a new method named `_send_request_to_least_loaded_node` which
handles the case above. This is done by regularly checking if the target node is
available in the cluster metadata, and if not, a new node is chosen.

Notes:

- This does not yet cover every call site to `_send_request_to_node`, there are some
  other places were similar race conditions may happen.
- The code above does not guarantee that the request itself will be sucessful, since
  it is still possible for the target node to exit, however, it does remove the
  infinite loop which can render client code unusable.
If the value `_controller_id` is out-of-date and the node was removed
from the cluster, `_send_request_to_node` would enter an infinite loop.
Copy link
Member

@ivanyu ivanyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM to the best of my knowledge of the library.

@ivanyu ivanyu merged commit b9f2f78 into master Jan 19, 2021
@ivanyu ivanyu deleted the hacka/fix-infinite-loop-with-kafka-admin branch January 19, 2021 08:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants