-
Notifications
You must be signed in to change notification settings - Fork 41.2k
Improve CassandraHealthIndicator with more robust mechanism #23041
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
To expand on what I said in #22901 : the changes proposed here are isofunctional. Users shouldn't see the slightest change in the way health reports appear in monitoring systems. However we do expect the checks to be performed almost instantaneously: these should never again risk a timeout. |
// fill details with version of the first node (if the version is not null) | ||
nodes.stream().map(Node::getCassandraVersion).filter(Objects::nonNull).findFirst() | ||
.ifPresent((version) -> builder.withDetail("version", version)); | ||
return builder.build(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We now have this logic repeated 4 times. Maybe it's time to extract this to a helper class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion, I was thinking about the same. I didn't extract it yet because it was duplicated for the current health check logic as well. So if the spring-boot team is ok with extracting this to a separate component, I am +1 on this.
When should I expect a review for this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR. I've made some suggestions. I'd also like we keep the Spring Data equivalent untouched as they are deprecated anyway.
|
||
// fill details with version of the first node (if the version is not null) | ||
nodes.stream().map(Node::getCassandraVersion).filter(Objects::nonNull).findFirst() | ||
.ifPresent((version) -> builder.withDetail("version", version)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the reasons for which a node wouldn't report a version? I think the first map operation should rather return the first node that's up and we should then use the information provided by that node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the reasons for which a node wouldn't report a version?
A healthy Cassandra installation would always report its version. But we've seen some edge cases in the past: firstly, the system.peers
table is subject to corruption (intentional or accidental); secondly, this may happen when a node is being upgraded. This is the reason why com.datastax.oss.driver.api.core.metadata.Node#getCassandraVersion
is annotated with @Nullable
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback @adutra. In the case we keep the node that states UP, wouldn't that be correct to assume its version is set? Can you have a node reporting UP and not providing a version at the same time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you have a node reporting UP and not providing a version at the same time?
It is technically possible, although very unlikely.
builder.withDetail("version", row.getString(0)); | ||
Collection<Node> nodes = this.session.getMetadata().getNodes().values(); | ||
boolean atLeastOneUp = nodes.stream().map(Node::getState).anyMatch((state) -> state == NodeState.UP); | ||
if (atLeastOneUp) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This if/else branch should be simplified with a ternary operator using builder.status
.
@@ -31,13 +36,11 @@ | |||
* | |||
* @author Julien Dubois | |||
* @author Alexandre Dutra | |||
* @author Tomasz Lelek |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've decided to deprecate this class. Please revert those changes.
import org.springframework.util.Assert; | ||
|
||
/** | ||
* A {@link ReactiveHealthIndicator} for Cassandra. | ||
* | ||
* @author Artsiom Yudovin | ||
* @author Tomasz Lelek |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've decided to deprecate this class. Please revert those changes.
@@ -45,30 +55,86 @@ void createWhenCqlSessionIsNullShouldThrowException() { | |||
assertThatIllegalArgumentException().isThrownBy(() -> new CassandraDriverHealthIndicator(null)); | |||
} | |||
|
|||
@ParameterizedTest | |||
@MethodSource | |||
void reportCassandraHealthCheck(Map<UUID, Node> nodes, Status expectedStatus) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not a big fan of @ParameterizedTest
for this and this looks inconsistent with all the other test. We have rather dedicated test where the expectations are described, potentially sharing a private method that setup the infrastructure.
Can you please give that a try? If you need inspiration, the other health indicator tests should help.
@@ -50,29 +56,79 @@ void createWhenCqlSessionIsNullShouldThrowException() { | |||
assertThatIllegalArgumentException().isThrownBy(() -> new CassandraDriverReactiveHealthIndicator(null)); | |||
} | |||
|
|||
@ParameterizedTest | |||
@MethodSource | |||
void reportCassandraHealthCheck(Map<UUID, Node> nodes, Status expectedStatus) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto
|
||
/** | ||
* Tests for {@link CassandraHealthIndicator}. | ||
* | ||
* @author Oleksii Bondar | ||
* @author Stephane Nicoll | ||
* @author Tomasz Lelek |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be reverted
|
||
/** | ||
* Tests for {@link CassandraReactiveHealthIndicator}. | ||
* | ||
* @author Artsiom Yudovin | ||
* @author Tomasz Lelek |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be reverted.
@tomekl007 do you have time to process feedback on this PR? |
Hello, unfortunately currently I don't have time to adapt it. Someone from the team should be able to take a look at it in the next week (that starts 28.09) |
@snicoll sorry for the delay, our team had to shift to a few urgent projects as Cassandra 4.0 GA approaches. I addressed all your comments and requests, let me know what you think. |
if (row != null && !row.isNull(0)) { | ||
builder.withDetail("version", row.getString(0)); | ||
} | ||
Collection<Node> nodes = this.session.getMetadata().getNodes().values(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic is still duplicated twice. Do you think it's worth extracting it to a separate, shared component?
No worries, we still have time. Don't worry about reverting code, I can easily do that as part of merging this issue. |
@tomekl007 thank you for making your first contribution to Spring Boot. @adutra thank you for following up! |
No description provided.