Handle cases where there are no partitions to fetch from #439

dasch · 2017-10-20T11:54:07Z

Fixes #416.

Rather than going into a busy-loop when e.g. all partitions are paused, sleep for `max_wait_time` seconds before retrying.

mensfeld · 2017-10-20T11:58:37Z

lib/kafka/consumer.rb

-      sleep max_wait_time
+      backoff = max_wait_time > 0 ? max_wait_time : 1
+
+      @logger.warn "There are no partitions to fetch from, sleeping for #{backoff}s"


wouldn't it be better if this was info? I mean having redundant processes isn't something bad and it can be by design, not by accident

Hmm. I guess it sort of depends on your use case – I imagine most people will use more than 1 partition.

@dasch I know that, you know that but you would be surprised :D the 1 to 1 is pretty common especially at the beginning

Sure, I can make it info...

Thank you ❤️ :)

This could be expected behavior in cases where there are more consumers than partitons on purpose.

victorphamdeveloper · 2017-10-27T06:19:27Z

Hi @dasch, I'm using karafka to run a consumer and has this error:

[2017-10-27T05:32:33.193691 #1] ERROR -- : Kafka::NoPartitionsAssignedError (Kafka::NoPartitionsAssignedError)
/app/bundle/ruby/2.3.0/gems/ruby-kafka-0.4.2/lib/kafka/consumer.rb:380:in `fetch_batches'
/app/bundle/ruby/2.3.0/gems/ruby-kafka-0.4.2/lib/kafka/consumer.rb:247:in `block in each_batch'
/app/bundle/ruby/2.3.0/gems/ruby-kafka-0.4.2/lib/kafka/consumer.rb:319:in `consumer_loop'
/app/bundle/ruby/2.3.0/gems/ruby-kafka-0.4.2/lib/kafka/consumer.rb:246:in `each_batch'
/app/bundle/ruby/2.3.0/gems/karafka-1.0.0/lib/karafka/connection/messages_consumer.rb:61:in `consume_each_batch'
/app/bundle/ruby/2.3.0/gems/karafka-1.0.0/lib/karafka/connection/messages_consumer.rb:21:in `fetch_loop'
/app/bundle/ruby/2.3.0/gems/karafka-1.0.0/lib/karafka/connection/listener.rb:34:in `fetch_loop'
/app/bundle/ruby/2.3.0/gems/celluloid-0.17.3/lib/celluloid/calls.rb:28:in `public_send'
/app/bundle/ruby/2.3.0/gems/celluloid-0.17.3/lib/celluloid/calls.rb:28:in `public_send'
/app/bundle/ruby/2.3.0/gems/celluloid-0.17.3/lib/celluloid/calls.rb:28:in `dispatch'
/app/bundle/ruby/2.3.0/gems/celluloid-0.17.3/lib/celluloid/call/sync.rb:16:in `dispatch'
/app/bundle/ruby/2.3.0/gems/celluloid-0.17.3/lib/celluloid/cell.rb:50:in `block in dispatch'
/app/bundle/ruby/2.3.0/gems/celluloid-0.17.3/lib/celluloid/cell.rb:76:in `block in task'
/app/bundle/ruby/2.3.0/gems/celluloid-0.17.3/lib/celluloid/actor.rb:339:in `block in task'
/app/bundle/ruby/2.3.0/gems/celluloid-0.17.3/lib/celluloid/task.rb:44:in `block in initialize'
/app/bundle/ruby/2.3.0/gems/celluloid-0.17.3/lib/celluloid/task/fibered.rb:14:in `block in create'

Will this PR fix this issue ?

mensfeld · 2017-10-27T06:40:48Z

@victorphamdeveloper if you have more processes than partitions, you will see that until the release of both, new ruby-kafka and new karafka.

victorphamdeveloper · 2017-10-27T06:58:44Z

@mensfeld My topic has 10 partitions and I don't think we have more than 10 consumer processes :(

mensfeld · 2017-10-27T07:00:01Z

@victorphamdeveloper please move your issue to Karafka repository: https://github.com/karafka/karafka as it may not be ruby-kafka related. We will get back to it after the release of ruby-kafka 0.5 and Karafka 1.1

victorphamdeveloper · 2017-10-27T07:02:29Z

Sure, will do.

Soleone · 2017-12-21T02:47:32Z

This change is listed as new in 0.5.0, but seems to already be included in 0.4.4.

We had to revert to 0.4.3 for now because we experienced performance problems and currently traced these back to this upgrade from an old 0.3.x version of kafka-shopify to 0.4.4.

It might be the way we're using the kafka consumer in a background job though. In our case we have a fetch loop in the background job that manually gets exited after a certain amount of time has passed, and it seems like the change in this PR caused our loop to get stuck. We haven't fully diagnosed this yet, but I thought it's worth mentioning. Cheers!

mensfeld · 2017-12-21T07:46:11Z

@Soleone could you please provide a test example? Indeed there's a sleep but apart from blocking it should not do much. @dasch maybe we should sleep(0) (I mean disabling if needed) for some cases? I'm ok submiting a PR if that's the issue (but still dont see how it could relate).

dasch · 2017-12-21T09:29:07Z

I'd like to understand the actual problem first...

@Soleone I assume you're stopping the consumer using #stop?

dasch · 2017-12-21T09:30:42Z

@mensfeld it could be that we need to do sleep backoff if @running.

Soleone · 2018-01-02T19:00:35Z

Sorry for the late reply!

Yes, we're using #stop on the consumer.

At this time I don't have any good example code I can provide. But I'm investigating this further soon and am enabling some additional logging to find out more.

dasch · 2018-01-03T09:57:53Z

@Soleone I think #516 should fix it.

dasch added 3 commits October 20, 2017 13:52

Sleep a little if there are no partitions to fetch from

3aac073

Rather than going into a busy-loop when e.g. all partitions are paused, sleep for `max_wait_time` seconds before retrying.

Don't crash if no partitions have been assigned to a consumer

0d7d1bf

Guard against max_wait_time being zero

307f47e

mensfeld reviewed Oct 20, 2017

View reviewed changes

Log with info rather than warn

387c20f

This could be expected behavior in cases where there are more consumers than partitons on purpose.

dasch merged commit 9fb2ec2 into master Oct 23, 2017

dasch deleted the dasch/handle-no-partitions-to-fetch-from branch October 23, 2017 07:13

dasch mentioned this pull request Oct 23, 2017

Allow consumers to run with no assigned partitions #338

Closed

Handle cases where there are no partitions to fetch from #439

Handle cases where there are no partitions to fetch from #439

Uh oh!

Conversation

dasch commented Oct 20, 2017

Uh oh!

mensfeld Oct 20, 2017

Choose a reason for hiding this comment

Uh oh!

dasch Oct 20, 2017

Choose a reason for hiding this comment

Uh oh!

mensfeld Oct 20, 2017

Choose a reason for hiding this comment

Uh oh!

dasch Oct 20, 2017

Choose a reason for hiding this comment

Uh oh!

mensfeld Oct 20, 2017

Choose a reason for hiding this comment

Uh oh!

victorphamdeveloper commented Oct 27, 2017 • edited by dasch Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mensfeld commented Oct 27, 2017

Uh oh!

victorphamdeveloper commented Oct 27, 2017

Uh oh!

mensfeld commented Oct 27, 2017

Uh oh!

victorphamdeveloper commented Oct 27, 2017

Uh oh!

Soleone commented Dec 21, 2017

Uh oh!

mensfeld commented Dec 21, 2017

Uh oh!

dasch commented Dec 21, 2017

Uh oh!

dasch commented Dec 21, 2017

Uh oh!

Soleone commented Jan 2, 2018

Uh oh!

dasch commented Jan 3, 2018

Uh oh!

Uh oh!

victorphamdeveloper commented Oct 27, 2017 •

edited by dasch

Loading