Redirect partition queues after reassignment. #107

shlevy · 2019-09-04T05:49:50Z

This matches the example given in https://gitter.im/edenhill/librdkafka/archives/2017/12/12?at=5a2fad9dcc1d527f6b20114d, and fixes duplicate message issues we've seen.

AlexeyRaga · 2019-09-04T08:02:29Z

I think I tried it this way long time ago and it was problematic because librdkafka was able to to pull some messages from Kafka and put them into a default queue after the assignment happened but before redirection happened. Those messages were lost since we only expect messages from the specific queue.

Do you have any evidence that it won't happen anymore?

AlexeyRaga · 2019-09-04T08:17:54Z

Ping Magnus @edenhill, can you advice if redirecting partitions queue should happen before or after acknowledging assignment? Can doing it after assignment cause missing/losing messages?

edenhill · 2019-09-04T08:26:37Z

If you redirect after assign() it means some messages may be forwarded to the single consumer queue, so either do it before assign() or do: assign(); pause(); redirect; resume()

AlexeyRaga · 2019-09-04T18:54:50Z

@edenhill thanks!

@shlevy I think we can't just do redirect after assign. We can try looking at assign, pause, redirect, resume, but since our consumer is asynchronous w.r.t message poll, we need to be sure that nothing "bad" can happen in between assign and pause.

But just accepting this PR as is would introduce change loosing messages, which I don't think we want.

shlevy · 2019-09-16T11:48:35Z

Hi @AlexeyRaga! Unfortunately, we've had an extremely hard time reproducing our issue in a satisfactory way. We have both a haskell reproduction which shows very high rates of commit errors (largely but not exclusively "no offset to commit" right after a fresh poll!) without this change but a significantly reduced (but not zero!) error rate with either this or #108 applied, and a C reproduction that I believe matches the haskell logic exactly (except for pthreads vs GHC's green threads) and yet does not exhibit the issue even once on any configuration tried.

The most successful configuration for us has been bumping to librdkafka 1.1.0 and using #108. Any ideas for how we can help make sure we're actually doing the right thing?

AlexeyRaga · 2019-10-21T03:15:31Z

Parking it for now

Redirect partition queues *after* reassignment.

f0657b1

This matches the example given in https://gitter.im/edenhill/librdkafka/archives/2017/12/12?at=5a2fad9dcc1d527f6b20114d, and fixes duplicate message issues we've seen.

shlevy mentioned this pull request Sep 5, 2019

Add a consumer option to rely on users for polling. #108

Merged

AlexeyRaga mentioned this pull request Oct 7, 2019

Use assign/pause/redirect/unpause pattern #111

Merged

AlexeyRaga closed this Oct 21, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Redirect partition queues after reassignment. #107

Redirect partition queues after reassignment. #107

Uh oh!

shlevy commented Sep 4, 2019

Uh oh!

AlexeyRaga commented Sep 4, 2019

Uh oh!

AlexeyRaga commented Sep 4, 2019

Uh oh!

edenhill commented Sep 4, 2019

Uh oh!

AlexeyRaga commented Sep 4, 2019

Uh oh!

shlevy commented Sep 16, 2019

Uh oh!

AlexeyRaga commented Oct 21, 2019

Uh oh!

Uh oh!

Redirect partition queues *after* reassignment. #107

Redirect partition queues *after* reassignment. #107

Uh oh!

Conversation

shlevy commented Sep 4, 2019

Uh oh!

AlexeyRaga commented Sep 4, 2019

Uh oh!

AlexeyRaga commented Sep 4, 2019

Uh oh!

edenhill commented Sep 4, 2019

Uh oh!

AlexeyRaga commented Sep 4, 2019

Uh oh!

shlevy commented Sep 16, 2019

Uh oh!

AlexeyRaga commented Oct 21, 2019

Uh oh!

Uh oh!

Redirect partition queues after reassignment. #107

Redirect partition queues after reassignment. #107