Skip to content

Migration of Spectator to Micrometer #486

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 9, 2018
Merged

Migration of Spectator to Micrometer #486

merged 1 commit into from
Apr 9, 2018

Conversation

nebhale
Copy link
Member

@nebhale nebhale commented Apr 6, 2018

This change migrates the metrics-gathering project from using Spectator to using Micrometer. This change fully moves both the DuplexConnectionInterceptor and RSocketInterceptor to a Micrometer-based implementation, changing some of the metric names. The metric name changes were made at the advice of the project lead of Micrometer in order to get the most useful (and compatible) metrics and dimensions for monitoring an RSocket-based system.

As part of writing the tests, some necessary changes were detected in the framing abstraction. These changes are included in this.

[resolves #434]

case CANCEL:
this.cancel.increment();
break;
case PAYLOAD:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like a mistake to not track payload frames.

Copy link
Member Author

@nebhale nebhale Apr 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The oversight here was actually including the enum and doing nothing with it (ah, IDE's conveniently adding everything). Experimentally I noticed that PAYLOAD frames are never transferred, instead being immediately translated into COMPLETE, NEXT and NEXT_COMPLETE frames. I'll add in a counter for PAYLOAD anyway (since it can't hurt), but I suspect it'll always be 0.

FrameType frameType = frame.getType();

switch (frameType) {
case RESERVED:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we let this fall to the default case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, same oversight as PAYLOAD, letting the IDE get the better of me.

this.extension.increment();
break;
default:
this.logger.warn("Skipping count of unknown frame type: {}", frameType);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest this be at debug level and still cause a counter to be incremented. It has always bit me in server development to decide to be verbose about any error that a misconfigured or newer client can externally trigger. Suddenly server performance is effected by excessive logging, while instead tracking a counter and then temporarily increasing logging in response to an alert is much safer IMHO.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reasonable enough. Going with an UNKNOWN frame type.

MeterRegistry meterRegistry, String interactionModel, SignalType signalType, Tag... tags) {

return meterRegistry.timer(
String.format("rsocket.%s", interactionModel),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: String.format here seem like overkill compare to "rsocket." + interactionModel, where interactionModel is just a String.

Copy link
Member Author

@nebhale nebhale Apr 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stylistically, I prefer to use String.format() consistently for readability, but happy to conform to project conventions.

@yschimke
Copy link
Member

yschimke commented Apr 7, 2018

Looks great to me generally, few nits.

Would you normally split out different clients (either host:port target, or different backend services) by the tags passed to the constructor of MicrometerDuplexConnectionInterceptor?

What is the rollup pattern with Micrometer (e.g. 1 user service backed by 5 different load balanced host:ports)? by ignoring the tag when querying the results from a shared registry?

}

private static final class InteractionTimers implements BiConsumer<Sample, SignalType> {

Copy link
Member

@robertroeser robertroeser Apr 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I was thinking about adding in something else I was working on was the idea of capturing the time between onNexts. Right now it's really hard to tell how quickly, or slowly a stream is emitting data. Having the time between when the stream starts and cancel/error/complete doesn't really tell that. Throwing this out there as a suggestion as you're refactoring the metrics...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice - is that a https://micrometer.io/docs/concepts#_distribution_summaries

Can we iterate on improvements like this? I'd love to get this in sooner and start seeing what results look like from the CLI etc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep - something like that.

Can we iterate on improvements like this? I'd love to get this in sooner and start seeing what results look like from the CLI etc.
Sure

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed that's a really useful metric, but also that we should kick the can on it. At some level we're getting rates at the DuplexConnection, but I'll confer with @jkschneider and work on #487 once I have a plan.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@robertroeser I think this onNext timer would be potentially an interesting application of the IntervalTimer we plan to introduce with Micrometer 1.1. It's purpose is to count/time the space between things. Conventionally I've seen it used as something to alert me when say a periodic batch job doesn't kick off in an expected amount of time, but the concept interestingly applies equally well here.

public Mono<Void> onClose() {
return delegate.onClose();
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are people's thoughts on counting the disposes and onCloses? Might help with debugging or alerts

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really have enough historical context to know what problem that's trying to solve. I'll defer to the rest of you for a definitive call on collecting them. Just let me know and I'll add it if desired.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added onClose print statements to debug issues and I've had similar data about http before so I would like to capture it if no one has an objection.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do @robertroeser. My suspicion is that this will only be 0/1. Is that correct?

@nebhale
Copy link
Member Author

nebhale commented Apr 7, 2018

Would you normally split out different clients (either host:port target, or different backend services) by the tags passed to the constructor of MicrometerDuplexConnectionInterceptor?

Correct. We also see things like what stack (container image, VM id, etc.), region, etc. put in there. I'm toying around with a strategy so that a MicrometerDuplexConnection and MicrometerRSocket knows what it is connected to (other end of the socket) so there's enough tag information to draw the pretty network graphs without requiring another component to collect that information. Not sure it's possible, but we're very keen on getting that kind of graphing up and running for demos.

What is the rollup pattern with Micrometer (e.g. 1 user service backed by 5 different load balanced host:ports)? by ignoring the tag when querying the results from a shared registry?

Exactly. A dimensionally rolled up metric that might filter on some, but specifically not that one host-identifying tag.

@jkschneider
Copy link

What is the rollup pattern with Micrometer
A dimensionally rolled up metric that might filter on some, but specifically not that one host-identifying tag.

Just to be clear, this kind of behavior is a common feature of dimensional monitoring systems, not Micrometer itself. The primary purpose of Micrometer is to get your metrics to those kind of systems to enable this kind of rollup.

@nebhale
Copy link
Member Author

nebhale commented Apr 8, 2018

@robertroeser Added metrics for onClose() and dispose() and I believe all other comments have now been addressed.

This change migrates the metrics-gathering project from using Spectator to
using Micrometer (https://micrometer.io).  This change fully moves both the
DuplexConnectionInterceptor and RSocketInterceptor to a Micrometer-based
implementation, changing some of the metric names.  The metric name changes
were made at the advice of the project lead of Micrometer in order to get the
most useful (and compatible) metrics and dimensions for monitoring an RSocket-
based system.

As part of writing the tests, some necessary changes were detected in the
framing abstraction.  These changes are included in this.

[resolves #434]
@robertroeser robertroeser merged commit ff4647b into rsocket:1.0.x Apr 9, 2018
@nebhale nebhale deleted the micrometer branch April 11, 2018 21:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants