Skip to content

Handle cases where there are no partitions to fetch from #439

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Oct 23, 2017

Conversation

dasch
Copy link
Contributor

@dasch dasch commented Oct 20, 2017

Fixes #416.

dasch added 3 commits October 20, 2017 13:52
Rather than going into a busy-loop when e.g. all partitions are paused,
sleep for `max_wait_time` seconds before retrying.
sleep max_wait_time
backoff = max_wait_time > 0 ? max_wait_time : 1

@logger.warn "There are no partitions to fetch from, sleeping for #{backoff}s"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldn't it be better if this was info? I mean having redundant processes isn't something bad and it can be by design, not by accident

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. I guess it sort of depends on your use case – I imagine most people will use more than 1 partition.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dasch I know that, you know that but you would be surprised :D the 1 to 1 is pretty common especially at the beginning

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I can make it info...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you ❤️ :)

This could be expected behavior in cases where there are more consumers
than partitons on purpose.
@dasch dasch merged commit 9fb2ec2 into master Oct 23, 2017
@dasch dasch deleted the dasch/handle-no-partitions-to-fetch-from branch October 23, 2017 07:13
@victorphamdeveloper
Copy link

victorphamdeveloper commented Oct 27, 2017

Hi @dasch, I'm using karafka to run a consumer and has this error:

[2017-10-27T05:32:33.193691 #1] ERROR -- : Kafka::NoPartitionsAssignedError (Kafka::NoPartitionsAssignedError)
/app/bundle/ruby/2.3.0/gems/ruby-kafka-0.4.2/lib/kafka/consumer.rb:380:in `fetch_batches'
/app/bundle/ruby/2.3.0/gems/ruby-kafka-0.4.2/lib/kafka/consumer.rb:247:in `block in each_batch'
/app/bundle/ruby/2.3.0/gems/ruby-kafka-0.4.2/lib/kafka/consumer.rb:319:in `consumer_loop'
/app/bundle/ruby/2.3.0/gems/ruby-kafka-0.4.2/lib/kafka/consumer.rb:246:in `each_batch'
/app/bundle/ruby/2.3.0/gems/karafka-1.0.0/lib/karafka/connection/messages_consumer.rb:61:in `consume_each_batch'
/app/bundle/ruby/2.3.0/gems/karafka-1.0.0/lib/karafka/connection/messages_consumer.rb:21:in `fetch_loop'
/app/bundle/ruby/2.3.0/gems/karafka-1.0.0/lib/karafka/connection/listener.rb:34:in `fetch_loop'
/app/bundle/ruby/2.3.0/gems/celluloid-0.17.3/lib/celluloid/calls.rb:28:in `public_send'
/app/bundle/ruby/2.3.0/gems/celluloid-0.17.3/lib/celluloid/calls.rb:28:in `public_send'
/app/bundle/ruby/2.3.0/gems/celluloid-0.17.3/lib/celluloid/calls.rb:28:in `dispatch'
/app/bundle/ruby/2.3.0/gems/celluloid-0.17.3/lib/celluloid/call/sync.rb:16:in `dispatch'
/app/bundle/ruby/2.3.0/gems/celluloid-0.17.3/lib/celluloid/cell.rb:50:in `block in dispatch'
/app/bundle/ruby/2.3.0/gems/celluloid-0.17.3/lib/celluloid/cell.rb:76:in `block in task'
/app/bundle/ruby/2.3.0/gems/celluloid-0.17.3/lib/celluloid/actor.rb:339:in `block in task'
/app/bundle/ruby/2.3.0/gems/celluloid-0.17.3/lib/celluloid/task.rb:44:in `block in initialize'
/app/bundle/ruby/2.3.0/gems/celluloid-0.17.3/lib/celluloid/task/fibered.rb:14:in `block in create'

Will this PR fix this issue ?

@mensfeld
Copy link
Contributor

@victorphamdeveloper if you have more processes than partitions, you will see that until the release of both, new ruby-kafka and new karafka.

@victorphamdeveloper
Copy link

@mensfeld My topic has 10 partitions and I don't think we have more than 10 consumer processes :(

@mensfeld
Copy link
Contributor

@victorphamdeveloper please move your issue to Karafka repository: https://github.com/karafka/karafka as it may not be ruby-kafka related. We will get back to it after the release of ruby-kafka 0.5 and Karafka 1.1

@victorphamdeveloper
Copy link

Sure, will do.

@Soleone
Copy link

Soleone commented Dec 21, 2017

This change is listed as new in 0.5.0, but seems to already be included in 0.4.4.

We had to revert to 0.4.3 for now because we experienced performance problems and currently traced these back to this upgrade from an old 0.3.x version of kafka-shopify to 0.4.4.

It might be the way we're using the kafka consumer in a background job though. In our case we have a fetch loop in the background job that manually gets exited after a certain amount of time has passed, and it seems like the change in this PR caused our loop to get stuck. We haven't fully diagnosed this yet, but I thought it's worth mentioning. Cheers!

@mensfeld
Copy link
Contributor

@Soleone could you please provide a test example? Indeed there's a sleep but apart from blocking it should not do much. @dasch maybe we should sleep(0) (I mean disabling if needed) for some cases? I'm ok submiting a PR if that's the issue (but still dont see how it could relate).

@dasch
Copy link
Contributor Author

dasch commented Dec 21, 2017

I'd like to understand the actual problem first...

@Soleone I assume you're stopping the consumer using #stop?

@dasch
Copy link
Contributor Author

dasch commented Dec 21, 2017

@mensfeld it could be that we need to do sleep backoff if @running.

@Soleone
Copy link

Soleone commented Jan 2, 2018

Sorry for the late reply!

Yes, we're using #stop on the consumer.

At this time I don't have any good example code I can provide. But I'm investigating this further soon and am enabling some additional logging to find out more.

@dasch
Copy link
Contributor Author

dasch commented Jan 3, 2018

@Soleone I think #516 should fix it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants