Skip to content

Add client span to SignalR .NET client #57101

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Aug 7, 2024
Merged

Conversation

JamesNK
Copy link
Member

@JamesNK JamesNK commented Jul 31, 2024

Addresses #51557. Specifically, the client activity source item.

Follow up to #55439 and builds on top of #57049.

Changes:

  • Invocations by the SignalR client now start a local activity.
  • Activities are started for regular invocations and streaming methods.
  • Activity starts with the invocation and ends when the client receives a completion message.

Not planned in this PR:

  • Tracing in other clients.

Nothing is blocking additional work in the future in the future.

@JamesNK JamesNK added the area-signalr Includes: SignalR clients and servers label Jul 31, 2024
@JamesNK
Copy link
Member Author

JamesNK commented Jul 31, 2024

@BrennanConroy A problem I found when implementing this is the SignalR client doesn't know the name of the hub on the server. The client just specifies a URL and then sends messages to a method at that URL.

Because the hub name isn't available, as a substitute I replaced it with the URL path, e.g. builder.WithUrl("https://localhost:5001/chat") will result in the URL path chat being recorded as the service name.

Is there a better option? If not, should the server record the URL path as the OTEL service name instead of the .NET type name? That will keep consistency between what the client and server record.

@JamesNK
Copy link
Member Author

JamesNK commented Jul 31, 2024

@tarekgh @noahfalk Do you know how the Activity name interacts with OTEL? Activity has OperationName and DisplayName. Right now, Activity.OperationName is set to the required OTEL span format. Is this the right thing to do?

It seems like opentelemetry-dotnet uses Activity.DisplayName (which defaults to OperationName if not set). And the docs for Activity.OperationName say its value should be a constant.

Would it be more correct to do this?

var activity = activitySource.CreateActivity("Microsoft.AspNetCore.SignalR.Client.InvocationOut", ...);
activity.DisplayName = $"{serviceName}/{serviceMethod}";

@tarekgh
Copy link
Member

tarekgh commented Jul 31, 2024

CC @cijothomas @CodeBlanch @reyang to answer the question in #57101 (comment). What @JamesNK propose is matching the runtime guidelines as OperationName used for grouping and filtering while DisplayName used for the UI.

Copy link
Member

@BrennanConroy BrennanConroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we see a screenshot of these changes in a UI like the Aspire Dashboard?

SendCoreAsync doesn't have an activity being created for it. It's for "fire and forget" calls. Do we just create an activity, send the message, and stop the activity immediately?

There should be a few more tests:
Multiple different invokes to make sure there is a different activity every time.
Creating an activity in "user code" and seeing that it's the parent of the activity created in an invoke.
Making sure the activity stopped.

@cijothomas
Copy link

CC @cijothomas @CodeBlanch @reyang to answer the question in #57101 (comment). What @JamesNK propose is matching the runtime guidelines as OperationName used for grouping and filtering while DisplayName used for the UI.

Do you know how the Activity name interacts with OTEL? Activity has OperationName and DisplayName.

OTel has a single concept of Span Name, which is expected to be the low cardinality Spanname, also used for UI. The name can be changed after span creation. Activity.DisplayName is what maps to this. (Operationname exists as legacy on Activity, not used by OTel).

@JamesNK
Copy link
Member Author

JamesNK commented Jul 31, 2024

@cijothomas Thanks!

@JamesNK
Copy link
Member Author

JamesNK commented Jul 31, 2024

SendCoreAsync doesn't have an activity being created for it. It's for "fire and forget" calls. Do we just create an activity, send the message, and stop the activity immediately?

Yes.

Multiple different invokes to make sure there is a different activity every time.

This is done in HubConnectionTests.InvokeAsync_SendTraceHeader.

Creating an activity in "user code" and seeing that it's the parent of the activity created in an invoke.

Done in HubConnectionTests.InvokeAsync_SendTraceHeader and StreamAsyncCore_SendTraceHeader

Making sure the activity stopped.

I'll add some asserts for Activity.IsStopped.

@BrennanConroy
Copy link
Member

Multiple different invokes to make sure there is a different activity every time.

This is done in HubConnectionTests.InvokeAsync_SendTraceHeader.

Creating an activity in "user code" and seeing that it's the parent of the activity created in an invoke.

Done in HubConnectionTests.InvokeAsync_SendTraceHeader and StreamAsyncCore_SendTraceHeader

Huh, maybe there is something weird happening here since it's merging into another PR, but those tests look like they're being deleted, which would be odd since they seem specific to this change.

@JamesNK
Copy link
Member Author

JamesNK commented Jul 31, 2024

They're deleted from their original location and moved to HubConnectionTests.Tracing.cs.

@BrennanConroy
Copy link
Member

🙈 The file was collapsed and when I looked for HubConnectionTests.Tracing.cs I only noticed one of them but there are two files with that name.

@JamesNK
Copy link
Member Author

JamesNK commented Aug 1, 2024

Updates:

  • Client activity for SendAsync added.
  • Changed OperationName to a constant. Will fix server activity in another PR.
  • Changed client activity to include acquiring an active connection.

I don't think there are any outstanding changes. There is still this open question: #57101 (comment). I think the server should use the path. ChatHub will almost always have a URL like /chat so it will be understandable. And it will be consistent with the client @BrennanConroy

@JamesNK
Copy link
Member Author

JamesNK commented Aug 1, 2024

Can we see a screenshot of these changes in a UI like the Aspire Dashboard?

image

😎 😎 😎

Copy link
Member

@BrennanConroy BrennanConroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is good to go after my last questions/comments.

@JamesNK
Copy link
Member Author

JamesNK commented Aug 1, 2024

I think this is good to go after my last questions/comments.

This PR depends on the propagation PR being merged first

#57049

@JamesNK JamesNK force-pushed the jamesnk/signalr-distributed-tracing branch from eff6c7e to 2f4adcc Compare August 2, 2024 01:22
Base automatically changed from jamesnk/signalr-distributed-tracing to main August 2, 2024 07:14
@JamesNK JamesNK force-pushed the jamesnk/signalr-client-span branch from ab2ff3f to 691bf0f Compare August 2, 2024 08:16
@JamesNK JamesNK force-pushed the jamesnk/signalr-client-span branch from 691bf0f to a698667 Compare August 6, 2024 04:02
{
ConnectionState connectionState;
var connectionStateTask = _state.WaitForActiveConnectionAsync(sendingMethodName, token);
if (connectionStateTask.Status == TaskStatus.RanToCompletion)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to set any server tags if connectionStateTask throws. It throws when you're either not connected or the user canceled the passed in CTS. In either case it's not actually doing anything except creating and stopping an activity.

Plus, it'd clean up the code a bit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is a problem starting the connection then it would be useful to know the server address being called

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the connection failing to start, or closing before the send calls, or not even calling start in the first place, is not reflected in SendAsync or InvokeAsync calls. All it knows is that there is no connection.

Copy link
Member Author

@JamesNK JamesNK Aug 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With telemetry it's useful to view it from the perspective of the user. The mechanics of what happens internally that could cause the call to fail are invisible to them. They're focused on the end result: the call they want to make failed and there was an error. Having information about what they were trying to do - make a call to a hub with the specified rpc.service, rpc.method, server.address, server.port - is what they care about. Then the error reason would be in error.type.

For an example of prior art, HttpClient always reports server.address and server.port when making a HTTP request. It does this even if the server connection couldn't be established, or the request was canceled before sending any data.

If you want to make it clearer that the call failed because of the connection not being available, there is room to do that by providing good values to error.type. By default, it is the exception type name, but it can be customized in known scenarios. For example, if there is a problem with the connection, error.type could be something like: negotiate-failed, hub-connection-not-started, etc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, server.address and server.port are considered required attributes according to the spec: https://github.com/open-telemetry/semantic-conventions/blob/1e34b57b9f73b08b109cdc0e8841e857e5f5c205/docs/rpc/rpc-spans.md#client-attributes

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok thanks, I wanted to know some concrete reason we were including it.

@JamesNK
Copy link
Member Author

JamesNK commented Aug 7, 2024

/azp run

@JamesNK JamesNK enabled auto-merge (squash) August 7, 2024 05:18
Copy link

Azure Pipelines successfully started running 3 pipeline(s).

@JamesNK
Copy link
Member Author

JamesNK commented Aug 7, 2024

/azp run

Copy link

Azure Pipelines successfully started running 3 pipeline(s).

@JamesNK JamesNK merged commit d96d272 into main Aug 7, 2024
26 checks passed
@JamesNK JamesNK deleted the jamesnk/signalr-client-span branch August 7, 2024 14:25
@dotnet-policy-service dotnet-policy-service bot added this to the 9.0-rc1 milestone Aug 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-signalr Includes: SignalR clients and servers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants