Memory leak on a Kafka Observation due to the metric "spring.kafka.listener.active" #3690

vivi2701 · 2024-12-20T19:26:46Z

In what version(s) of Spring for Apache Kafka are you seeing this issue?
3.3.0 and 3.3.1

Describe the bug

We have an application using Spring Boot v3.3.6 and Spring Kafka v3.3.0. And we have seen that the metric called spring.kafka.listener.active leads to an increasing number of activeTask instances of type DefaultLongTaskTimer from Micrometer, which are never garbage collected.
We can see in the screenshot the memory and CPU usage of the process, which shows a classic memory leak trend :

The issue doesn’t appear on another application we have that uses spring Kafka v3.2.2

The symptoms are similar to the one in this issue: spring-projects/spring-security#14030, where the stop method is not call on the observation.
We can see by debugging the code in method doInvokeRecordListener in the class KafkaMessageListenerContainer, that the finally bloc containing the observation.stop is not called because the listener is an instance of RecordMessaginMessageListenerAdapter.

When we disable this property, system resource consumption seems to return to normal : spring.cloud.stream.kafka.binder.enableObservation

Further compounding the issue, prometheus scrapes regularly this metric, which uses even more CPU and leads to timeouts or broken pipes on the scraping endpoint (/actuator/Prometheus)

See the stack trace associated to the scrapping workload : threaddump-1734718231951.zip

To Reproduce

We have been able to create a minimal sample project to reproduce the issue.
This is a simple Kafka producer / consumer using the latest Spring Boot v3.4.1 and Spring Kafka v3.3.1 versions.
We changed the rate of the producer to 1 millisecond (so around 1000 messages per second) to speed up the phenomenon.

We see millions of instance counts (and growing) for the active task after about 1h of running the test :

Sample

sample

The text was updated successfully, but these errors were encountered:

artembilan · 2024-12-20T20:01:10Z

I see where is the problem.
The KafkaMessageListenerContainer has the logic like:

				if (!(this.listener instanceof RecordMessagingMessageListenerAdapter<K, V>)) {
					observation.stop();
				}

where we assume that invoke() of the super class of that RecordMessagingMessageListenerAdapter is invoked.
And that one has a logic like:

Observation currentObservation = getCurrentObservation();
...
		finally {
			if (listenerError != null || result == null) {
				currentObservation.stop();
			}
		}

However, turns out, Spring Cloud Stream Kafka Binder uses KafkaMessageDrivenChannelAdapter from Spring Integration and that one has its own private class IntegrationRecordMessageListener extends RecordMessagingMessageListenerAdapter<K, V> {, which does not call the mentioned invoke().

So, yeah, confirmed as a bug.
Not sure yet how to fix.

Thank you for so simple reproducible sample!

…enerContainer` Fixes: spring-projects#3690 When `this.listener` is an instance of `RecordMessagingMessageListenerAdapter`, we rely on its logic to call `invoke()` from super class to handle observation lifecycle this or other way. However, Spring Integration's `KafkaMessageDrivenChannelAdapter` use its own `IntegrationRecordMessageListener` extension of the `RecordMessagingMessageListenerAdapter` without calling super `invoke()`. The problem apparent from Spring Cloud Stream Kafka Binder, where an observation is enabled. * Fix `KafkaMessageListenerContainer` to check for exact type of `this.listener` before making decision to close an observation here, or propagate it down to the `RecordMessagingMessageListenerAdapter`

vivi2701 added status: waiting-for-triage type: bug labels Dec 20, 2024

artembilan added this to the 3.3.2 milestone Dec 20, 2024

artembilan removed the status: waiting-for-triage label Dec 20, 2024

artembilan mentioned this issue Dec 23, 2024

GH-3690: Fix observation leak in the KafkaMessageListenerContainer #3694

Merged

sobychacko closed this as completed in #3694 Dec 23, 2024

sobychacko closed this as completed in c9e7edc Dec 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Memory leak on a Kafka Observation due to the metric "spring.kafka.listener.active" #3690

Memory leak on a Kafka Observation due to the metric "spring.kafka.listener.active" #3690

vivi2701 commented Dec 20, 2024

artembilan commented Dec 20, 2024

Uh oh!

Memory leak on a Kafka Observation due to the metric "spring.kafka.listener.active" #3690

Memory leak on a Kafka Observation due to the metric "spring.kafka.listener.active" #3690

Comments

vivi2701 commented Dec 20, 2024

artembilan commented Dec 20, 2024

Uh oh!