-
Notifications
You must be signed in to change notification settings - Fork 356
Migration of Spectator to Micrometer #486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
case CANCEL: | ||
this.cancel.increment(); | ||
break; | ||
case PAYLOAD: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like a mistake to not track payload frames.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The oversight here was actually including the enum
and doing nothing with it (ah, IDE's conveniently adding everything). Experimentally I noticed that PAYLOAD
frames are never transferred, instead being immediately translated into COMPLETE
, NEXT
and NEXT_COMPLETE
frames. I'll add in a counter for PAYLOAD
anyway (since it can't hurt), but I suspect it'll always be 0.
FrameType frameType = frame.getType(); | ||
|
||
switch (frameType) { | ||
case RESERVED: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we let this fall to the default case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, same oversight as PAYLOAD
, letting the IDE get the better of me.
this.extension.increment(); | ||
break; | ||
default: | ||
this.logger.warn("Skipping count of unknown frame type: {}", frameType); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest this be at debug level and still cause a counter to be incremented. It has always bit me in server development to decide to be verbose about any error that a misconfigured or newer client can externally trigger. Suddenly server performance is effected by excessive logging, while instead tracking a counter and then temporarily increasing logging in response to an alert is much safer IMHO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reasonable enough. Going with an UNKNOWN
frame type.
MeterRegistry meterRegistry, String interactionModel, SignalType signalType, Tag... tags) { | ||
|
||
return meterRegistry.timer( | ||
String.format("rsocket.%s", interactionModel), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: String.format here seem like overkill compare to "rsocket." + interactionModel, where interactionModel is just a String.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stylistically, I prefer to use String.format()
consistently for readability, but happy to conform to project conventions.
Looks great to me generally, few nits. Would you normally split out different clients (either host:port target, or different backend services) by the tags passed to the constructor of MicrometerDuplexConnectionInterceptor? What is the rollup pattern with Micrometer (e.g. 1 user service backed by 5 different load balanced host:ports)? by ignoring the tag when querying the results from a shared registry? |
} | ||
|
||
private static final class InteractionTimers implements BiConsumer<Sample, SignalType> { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing I was thinking about adding in something else I was working on was the idea of capturing the time between onNexts. Right now it's really hard to tell how quickly, or slowly a stream is emitting data. Having the time between when the stream starts and cancel/error/complete doesn't really tell that. Throwing this out there as a suggestion as you're refactoring the metrics...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice - is that a https://micrometer.io/docs/concepts#_distribution_summaries
Can we iterate on improvements like this? I'd love to get this in sooner and start seeing what results look like from the CLI etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep - something like that.
Can we iterate on improvements like this? I'd love to get this in sooner and start seeing what results look like from the CLI etc.
Sure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed that's a really useful metric, but also that we should kick the can on it. At some level we're getting rates at the DuplexConnection
, but I'll confer with @jkschneider and work on #487 once I have a plan.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@robertroeser I think this onNext
timer would be potentially an interesting application of the IntervalTimer
we plan to introduce with Micrometer 1.1. It's purpose is to count/time the space between things. Conventionally I've seen it used as something to alert me when say a periodic batch job doesn't kick off in an expected amount of time, but the concept interestingly applies equally well here.
public Mono<Void> onClose() { | ||
return delegate.onClose(); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are people's thoughts on counting the disposes and onCloses? Might help with debugging or alerts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really have enough historical context to know what problem that's trying to solve. I'll defer to the rest of you for a definitive call on collecting them. Just let me know and I'll add it if desired.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added onClose print statements to debug issues and I've had similar data about http before so I would like to capture it if no one has an objection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do @robertroeser. My suspicion is that this will only be 0/1. Is that correct?
Correct. We also see things like what stack (container image, VM id, etc.), region, etc. put in there. I'm toying around with a strategy so that a
Exactly. A dimensionally rolled up metric that might filter on some, but specifically not that one host-identifying tag. |
Just to be clear, this kind of behavior is a common feature of dimensional monitoring systems, not Micrometer itself. The primary purpose of Micrometer is to get your metrics to those kind of systems to enable this kind of rollup. |
@robertroeser Added metrics for |
This change migrates the metrics-gathering project from using Spectator to using Micrometer (https://micrometer.io). This change fully moves both the DuplexConnectionInterceptor and RSocketInterceptor to a Micrometer-based implementation, changing some of the metric names. The metric name changes were made at the advice of the project lead of Micrometer in order to get the most useful (and compatible) metrics and dimensions for monitoring an RSocket- based system. As part of writing the tests, some necessary changes were detected in the framing abstraction. These changes are included in this. [resolves #434]
This change migrates the metrics-gathering project from using Spectator to using Micrometer. This change fully moves both the
DuplexConnectionInterceptor
andRSocketInterceptor
to a Micrometer-based implementation, changing some of the metric names. The metric name changes were made at the advice of the project lead of Micrometer in order to get the most useful (and compatible) metrics and dimensions for monitoring an RSocket-based system.As part of writing the tests, some necessary changes were detected in the framing abstraction. These changes are included in this.
[resolves #434]