-
Notifications
You must be signed in to change notification settings - Fork 1.6k
GH-2212: Log listener exception in retry topic flow #2213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
How the
|
Resolves spring-projects#2212 As in Retry Topic flow recovery is successful, the exception thrown by the listener was never being logged. Now it logs at DEBUG level for intermediate retries and ERROR level when retries are exhausted. Also some polishing in TimestampedException as it was cluttering the stacktrace with duplicated information from cause.getMessage(). Now it logs the time when the exception occurred.
public TimestampedException(Exception ex, Clock clock) { | ||
super(ex.getMessage(), ex); | ||
this.timestamp = Instant.now(clock).toEpochMilli(); | ||
} | ||
|
||
@Deprecated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And what is wrong with the convenient ctor taking the now()
for its timestamp?
Why do we need to deprecate this one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honest, I went back and forth with this a couple of times. In the end, the problem is that I can't use the exact same timestamp to build the message to be passed to the super
constructor and also to the timestamp
field.
I had another solution where instead of passing the error message to super I overrode the getMessage()
method using the time in the timestamp
field.
I went with what I thought would be a simpler solution - leaving the responsibility of providing the time to the caller and leaving the exception simpler.
But if the getMessage()
solution looks good enough, or if the message being out of sync with the field seems like not a problem, I can remove the deprecation and make the necessary adjustments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about this:
public TimestampedException(Exception ex) {
this(ex, Instant.now().toEpochMilli());
}
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So simple, yet so distant when I was looking into this 😄
Sure, I'll add that, thanks.
spring-kafka/src/main/java/org/springframework/kafka/listener/TimestampedException.java
Show resolved
Hide resolved
spring-kafka/src/main/java/org/springframework/kafka/listener/TimestampedException.java
Show resolved
Hide resolved
/** | ||
* Creates an instance with the timestamp of when it was thrown and its cause. | ||
* @param ex the exception cause. | ||
* @param timestamp the timestamp of when the exception was thrown. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@since
?
LOGGER.error(e, () -> getErrorMessage(cr) | ||
+ "Sending to DLT with name " + nextDestination.getDestinationName() + "."); | ||
} | ||
else if (nextDestination.isNoOpsTopic()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"no ops" ? In the public API?
Doesn't sound neutral and technical.
How about NONE
instead?
Not related, though, but I don't see any JavaDocs on those public APIs of the DestinationTopic
.
At least that Type
enums should have some explanations for their reason.
Maybe then I wouldn't complain about "no ops" 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I remember I had to figure out a way to mark end of line
, and at the time that seemed the best approach. I don't really remember which feature needed this instead of for example returning null
as the next destination for an input topic - I can probably figure it out by reviewing the code.
Also, TBH I never thought of this as public API
back then, though now of course I realize it is.
So I probably should write the javadocs to these methods and enums, and then we can evaluate if we need to change anything. But probably not as part of this PR, right?
...src/main/java/org/springframework/kafka/retrytopic/DeadLetterPublishingRecovererFactory.java
Outdated
Show resolved
Hide resolved
public TimestampedException(Exception ex, Clock clock) { | ||
super(ex.getMessage(), ex); | ||
this.timestamp = Instant.now(clock).toEpochMilli(); | ||
} | ||
|
||
@Deprecated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about this:
public TimestampedException(Exception ex) {
this(ex, Instant.now().toEpochMilli());
}
?
spring-kafka/src/main/java/org/springframework/kafka/listener/TimestampedException.java
Show resolved
Hide resolved
Add ListenerExceptionLoggingStrategy Add unit tests
@artembilan, some of our topics regarding the Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to add some docs on the matter?
* Creates an instance with the timestamp of when it was thrown and its cause. | ||
* @param ex the exception cause. | ||
* @param timestamp the millis from epoch of when the exception was thrown. | ||
* @since 2.8.5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, back-port to 2.7.x
label on the issue is wrong. Isn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, not sure really, I hadn't noticed that. If it's meant to be back-ported, what should I change this to? Maybe 2.7.x would be confusing since 2.8.0 - 2.8.4 doesn't have that? Not that proficient with some of these criteria yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this one @since 2.7.13
as the current point release for that branch.
There is no reason to mention since what 2.8.x
or 2.9.x
just because you won't see this API in the 2.8.0
anyway.
Therefore if you see it like @since 2.7.13
in your code base that means it is included into the release of 2.8.x
or 2.9.x
.
It might even not give any gain for end-users, unless there is a bug and it will be much easier to report the bug for the API which is marked with respective @since
.
We then on our side will figure out what branches the fix belongs too.
@@ -62,6 +73,10 @@ | |||
@SuppressWarnings({"unchecked", "rawtypes"}) | |||
class DeadLetterPublishingRecovererFactoryTests { | |||
|
|||
private static final boolean isDebugEnabled = | |||
((LogAccessor) ReflectionTestUtils.getField(DeadLetterPublishingRecovererFactory.class, "LOGGER")) | |||
.isDebugEnabled(); // NOSONAR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is something wrong to rely on the external static value which we don't control from this unit test.
How about to use a @LogLevels
on this class to be sure that our logging configuration is explicit for the test logic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to add some docs on the matter?
About that, I'm not sure we should add this to the docs until we fix the configuration system for RT's components - might create more confusion than the problem it solves. For most users the default setting might be good enough, and if we solve the configuration for 2.8.5
we can add that later.
But, of course, I can add it now too. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is something wrong to rely on the external static value which we don't control from this unit test.
How about to use a@LogLevels
on this class to be sure that our logging configuration is explicit for the test logic?
Oh, I didn't know about that annotation, cool! I'll use that then, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Just fix the test and @since
and I'll merge.
As I said on other your PR, the RT configuration refactoring is too drastic to be back-ported even to 2.8.x
.
It works some way over there already, so that's enough.
The new vision or hard refactoring is better to do in new version (I guess 2.9.0
is good enough). So, it is easy for you to implement the feature without a breaking changes fear and it is more comfortable for end-users to refactor their projects to new API from our new version in their own new versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, about the configuration, I thought we came to the conclusion that if we can come up with something that is an optional add-on that that users can just not use if there's a problem or they simply don't want or need to, we might merge it to 2.8.5. But I think we'll have a clearer picture when I have something more concrete we can look into.
TBH, I think the current configuration system is too bad, specially when we consider that the next version in sync with Boot might be 3.0.0 at end of year. So if we decide not to back port this new approach, I'll still try to come up with something better for 2.8.x, even if it's only for the main features - of course, if that's not a problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I seem to be much more worried about this than you 😄 Maybe I'm overreacting. Probably @garyrussell can help settling this since he's been involved with this feature from the beginning. And we sure can take a better look after I work on the Support
-based configuration.
But to provide some more context, we didn't use to have many features that relied on this kind of configuration - most features would use the @RetryableTopic
annotation and RetryTopicConfigurationBuilder
, which are APIs thought from the start to be used by end-users. That's probably why we didn't have so much bad feedback on this.
In the past couple versions, we've introduced a few features - e.g. the ability to set framework level FATAL
exceptions, and the ability to combine blocking and non-blocking retries, that can only be used through this kind of configuration.
Those seem to be gathering user attention - I think it took a couple of days from Boot launching with this last feature and the issue popping on SO.
So, as you've probably already have seen, to configure that feature, currently users might have to do:
@Configuration
public static class SO71705876Configuration {
@Bean(name = RetryTopicInternalBeanNames.LISTENER_CONTAINER_FACTORY_CONFIGURER_NAME)
public ListenerContainerFactoryConfigurer lcfc(KafkaConsumerBackoffManager kafkaConsumerBackoffManager,
DeadLetterPublishingRecovererFactory deadLetterPublishingRecovererFactory) {
ListenerContainerFactoryConfigurer lcfc = new ListenerContainerFactoryConfigurer(kafkaConsumerBackoffManager, deadLetterPublishingRecovererFactory, Clock.systemUTC());
lcfc.setBlockingRetryableExceptions(IllegalArgumentException.class, IllegalStateException.class);
lcfc.setBlockingRetriesBackOff(new FixedBackOff(500, 5)); // Optional
return lcfc;
}
@Bean(name = RetryTopicInternalBeanNames.KAFKA_CONSUMER_BACKOFF_MANAGER)
public KafkaConsumerBackoffManager backOffManager(ApplicationContext context) {
PartitionPausingBackOffManagerFactory managerFactory =
new PartitionPausingBackOffManagerFactory();
managerFactory.setApplicationContext(context);
return managerFactory.create();
}
@Bean(name = RetryTopicInternalBeanNames.DEAD_LETTER_PUBLISHING_RECOVERER_FACTORY_BEAN_NAME)
public DeadLetterPublishingRecovererFactory dlprFactory(DestinationTopicResolver resolver) {
var DLPRF = new DeadLetterPublishingRecovererFactory(resolver);
dlprf.neverLogListenerException();
return new DeadLetterPublishingRecovererFactory(resolver);
}
@Bean(name = RetryTopicInternalBeanNames.DESTINATION_TOPIC_CONTAINER_NAME)
public DestinationTopicResolver destinationTopicResolver(ApplicationContext context) {
return new DefaultDestinationTopicResolver(Clock.systemUTC(), context);
}
I think it was bad enough when we thought users had do declare only one bean in such way, but I think having to do all this for any feature is really not acceptable in my book. And it tends to get worse as we release new features - albeit we could defer some new things to 2.9.0
. But the logging configuration from the other PR is an example of this.
Let alone users having to look into SO
for this workaround after being frustrated by errors - and I'd feel very bad if we actually documented this ugliness as an "official" workaround 😄
So that's to provide some more context. Again, maybe I'm overreacting. I should have something on the Support
configuration by Monday so we can take a better look.
I really appreciate you looking into all this. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you are hijacking this PR for other matter 😄
Perhaps would be great if you are linking to this comment from the proper place.
Thanks
|
||
/** | ||
* Logs the listener exception at each attempt. | ||
* The default is logging only after retries are exhausted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@since
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you have anything else to do for this PR?
Do I need to pay more attention to something?
Or may be you need @garyrussell 's opinion as well?
Thanks
I don't see anything else to be done - it seems to be on par with what was discussed here: #2211 If you don't see any problems code-wise, I think we can merge this, and if @garyrussell has something else to add we can always patch it early next week. That being said, I'm also ok with waiting for @garyrussell's input if you prefer. |
* GH-2212: Log listener ex. in retry topic flow Resolves #2212 As in Retry Topic flow recovery is successful, the exception thrown by the listener was never being logged. Now it logs at DEBUG level for intermediate retries and ERROR level when retries are exhausted. Also some polishing in TimestampedException as it was cluttering the stacktrace with duplicated information from cause.getMessage(). Now it logs the time when the exception occurred. * Address TimestampedException review suggestions Add ListenerExceptionLoggingStrategy Add unit tests * Address review comments
* GH-2212: Log listener ex. in retry topic flow Resolves #2212 As in Retry Topic flow recovery is successful, the exception thrown by the listener was never being logged. Now it logs at DEBUG level for intermediate retries and ERROR level when retries are exhausted. Also some polishing in TimestampedException as it was cluttering the stacktrace with duplicated information from cause.getMessage(). Now it logs the time when the exception occurred. * Address TimestampedException review suggestions Add ListenerExceptionLoggingStrategy Add unit tests * Address review comments
* GH-2212: Log listener ex. in retry topic flow Resolves #2212 As in Retry Topic flow recovery is successful, the exception thrown by the listener was never being logged. Now it logs at DEBUG level for intermediate retries and ERROR level when retries are exhausted. Also some polishing in TimestampedException as it was cluttering the stacktrace with duplicated information from cause.getMessage(). Now it logs the time when the exception occurred. * Address TimestampedException review suggestions Add ListenerExceptionLoggingStrategy Add unit tests * Address review comments
...and cherry-picked to 2.9.x, 2.8.x, and to 2.7.x after resolving conflicts. |
Resolves #2212
As in normal
Retry Topic
flow recovery is always successful, the exception thrown by the listener is never logged.Now such exception is logged at
DEBUG
level for intermediate retries andERROR
level when retries are exhausted.Produces messages such as:
For intermediate retries:
For messages heading to a DLT:
For messages that won't be forwarded, including those thrown in a DLT handler:
Also did some polishing in
TimestampedException
as it was cluttering the stacktrace with duplicated information from cause.getMessage(). Now it logs the time when the exception occurred.