-
Notifications
You must be signed in to change notification settings - Fork 27
Wrap rd_kafka_consumer_poll into iterator (use librdkafka embedded backpressure) #158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrap rd_kafka_consumer_poll into iterator (use librdkafka embedded backpressure) #158
Conversation
@felixschlegel, @FranzBusch could you give us an idea if these changes suit your envision on swift-kafka-client future development, please? |
@FranzBusch, @felixschlegel could you advise if you have any thoughts on this PR, please? |
Sources/Kafka/KafkaConsumer.swift
Outdated
// FIXME: there are two possibilities: | ||
// 1. Create gcd queue and wait blocking call client.consumerPoll() -> faster reaction on new messages | ||
// 2. Sleep in case there are no messages -> easier to implement + no problems with gcd Sendability |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think both are not great but the good thing is I think we have a solution for this in the future with https://github.com/apple/swift-evolution/blob/main/proposals/0417-task-executor-preference.md we are able to create our own custom executor for a KafkaConsumer
this can be backed by a p_thread in the end (we can use NIO here) and then just do withTaskExecutorPreference
in the next()
call before calling the underlying rd Kafka API. This makes sure we have one thread that we can freely block and we get unblocked as soon as we have a message. It just comes with the overhead of a thread hop. Theoretically we could do this conditionally by trying to poll -> no messages -> executor preference -> blocking poll.
The problem is that this feature is only available on the latest nightly Swift versions. So what I would propose for now is going forward with the sleep
based implementation and creating an issue for adopting task executor preference here in the future.
Finally had some time to go through my changes and add couple of minor changes:
|
/// A back pressure strategy based on high and low watermarks. | ||
/// | ||
/// The consumer maintains a buffer size between a low watermark and a high watermark | ||
/// to control the flow of incoming messages. | ||
/// | ||
/// - Parameter low: The lower threshold for the buffer size (low watermark). | ||
/// - Parameter high: The upper threshold for the buffer size (high watermark). | ||
public static func watermark(low: Int, high: Int) -> BackPressureStrategy { | ||
return .init(backPressureStrategy: .watermark(low: low, high: high)) | ||
@available(*, deprecated, message: "Use MessageOptions to control backpressure") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to depreciate stuff here. We aren't 1.0.0 yet so let's just remove it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed deprecated code.
// Currently use Task.sleep() if no new messages, should use task executor preference when implemented: | ||
// https://github.com/apple/swift-evolution/blob/main/proposals/0417-task-executor-preference.md |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we create an issue for this to track it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now we have one #165 !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM just two nits
* Feature: expose librdkafka statistics as swift metrics (swift-server#92) * introduce statistics for producer * add statistics to new consumer with events * fix some artefacts * adjust to KeyRefreshAttempts * draft: statistics with metrics * make structures internal * Update Sources/Kafka/Configuration/KafkaConfiguration+Metrics.swift Co-authored-by: Felix Schlegel <[email protected]> * Update Sources/Kafka/Configuration/KafkaConsumerConfiguration.swift Co-authored-by: Felix Schlegel <[email protected]> * Update Sources/Kafka/Configuration/KafkaConfiguration+Metrics.swift Co-authored-by: Felix Schlegel <[email protected]> * Update Sources/Kafka/Configuration/KafkaConfiguration+Metrics.swift Co-authored-by: Felix Schlegel <[email protected]> * address review comments * formatting * map gauges in one place * move json mode as rd kafka statistics, misc renaming + docc * address review comments * remove import Metrics * divide producer/consumer configuration * apply swiftformat * fix code after conflicts * fix formatting --------- Co-authored-by: Felix Schlegel <[email protected]> * Add benchmark infratructure without actual tests (swift-server#146) * add benchmark infratructure without actual test * apply swiftformat * fix header in sh file * use new async seq methods * Update to latest librdkafka & add a define for RAND_priv_bytes (swift-server#148) Co-authored-by: Franz Busch <[email protected]> * exit from consumer batch loop when no more messages left (swift-server#153) * Lower requirements for consumer state machine (swift-server#154) * lower requirements for kafka consumer * add twin test for kafka producer * defer source.finish (swift-server#157) * Add two consumer benchmark (swift-server#149) * benchmark for consumer * attempty to speedup benchmarks * check CI works for one test * enable one more test * try to lower poll interval * adjust max duration of test * remain only manual commit test * check if commit is the reason for test delays * try all with schedule commit * revert max test time to 5 seconds * dockerfiles * test set threasholds * create dummy thresholds from ci results * disable benchmark in CI * add header * add stable metrics * update thresholds to stable metrics only * try use '1' instead of 'true' * adjust thresholds to CI results (as temporary measure) * set 20% threshold.. * move arc to unstable metrics * try use 'true' in quotes for CI * try reduce number of messages for more reliable results * try upgrade bench * disable benchmark in CI * Update librdkafka for BoringSSL (swift-server#162) * chore(patch): [sc-8379] use returned error (swift-server#163) * [producer message] Allow optional key for initializer (swift-server#164) Co-authored-by: Harish Yerra <[email protected]> * Allow groupID to be specified when assigning partition (swift-server#161) * Allow groupID to be specified when assigning partition Motivation: A Consumer Group can provide a lot of benefits even if the dynamic loadbalancing features are not used. Modifications: Allow for an optional GroupID when creating a partition consumer. Result: Consumer Groups can now be used when manual assignment is used. * fix format --------- Co-authored-by: Ómar Kjartan Yasin <[email protected]> Co-authored-by: blindspotbounty <[email protected]> Co-authored-by: Franz Busch <[email protected]> * Wrap rd_kafka_consumer_poll into iterator (use librdkafka embedded backpressure) (swift-server#158) * remove message sequence * test consumer with implicit rebalance * misc + format * remove artefact * don't check a lot of messages * fix typo * slow down first consumer to lower message to fit CI timeout * remove helpers * use exact benchmark version to avoid missing thresholds error (as no thresholds so far) * add deprecated marks for backpressure, change comment for future dev * address comments --------- Co-authored-by: Felix Schlegel <[email protected]> Co-authored-by: Axel Andersson <[email protected]> Co-authored-by: Franz Busch <[email protected]> Co-authored-by: Samuel M <[email protected]> Co-authored-by: Harish Yerra <[email protected]> Co-authored-by: Harish Yerra <[email protected]> Co-authored-by: Omar Yasin <[email protected]> Co-authored-by: Ómar Kjartan Yasin <[email protected]>
This PR primarily addresses #136
Since changes in PR #139 there is no more need in intermediate AsyncStream for messages.
Instead
rd_kafka_consumer_poll
(client.consumerPoll()
) can be used directly from iterator.That primarily should solve the problem with duplicating messages by leaving its handling to librdkafka. However, it has some other benefits.
For example:
I've used still pending PR (#149) to test the difference at my machine with the following results: