Skip to content

Websocket handshake perf #12386

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Nov 1, 2019
Merged

Websocket handshake perf #12386

merged 10 commits into from
Nov 1, 2019

Conversation

BrennanConroy
Copy link
Member

@BrennanConroy BrennanConroy commented Jul 20, 2019

Found a couple spots we could avoid allocating in the websocket handshake.

Would like @blowdart and @Tratcher to take a look to make sure I'm not doing something completely wrong here.

Before:

Method Mean Error StdDev Op/s Gen 0 Allocated
CreateResponseKey 1,118.31 ns 20.799 ns 19.455 ns 894,207.5 0.1373 5360 B
IsRequestKeyValid 92.56 ns 1.761 ns 2.163 ns 10,803,358.7 0.0095 400 B

After:

Method Mean Error StdDev Op/s Gen 0 Allocated
CreateResponseKey 966.29 ns 18.022 ns 15.976 ns 1,034,883.1 0.0763 3520 B
IsRequestKeyValid 54.61 ns 1.089 ns 1.878 ns 18,311,386.7 - 0 B

With caching SHA1:

Method Mean Error StdDev Op/s Gen 0 Allocated
CreateResponseKey 497.52 ns 9.885 ns 10.576 ns 2,009,951.4 0.0153 800 B

@@ -146,7 +145,7 @@ public async Task<WebSocket> AcceptAsync(WebSocketAcceptContext acceptContext)
}
}

string key = string.Join(", ", _context.Request.Headers[HeaderNames.SecWebSocketKey]);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was odd, nothing in the RFC says this is valid.
Should we check that there is only one value?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're now using the implicit string converter that does the same thing here.

A multi-value header will fail the normal parse if there are multiple values anyways, don't worry about it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it allocating right now even for a single value?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it allocating right now even for a single value?

No.

byte[] data = Convert.FromBase64String(value);
return data.Length == 16;
Span<byte> temp = stackalloc byte[16];
var success = Convert.TryFromBase64String(value, temp, out var written);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the return value if temp is too small?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

false

@@ -146,7 +145,7 @@ public async Task<WebSocket> AcceptAsync(WebSocketAcceptContext acceptContext)
}
}

string key = string.Join(", ", _context.Request.Headers[HeaderNames.SecWebSocketKey]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're now using the implicit string converter that does the same thing here.

A multi-value header will fail the normal parse if there are multiple values anyways, don't worry about it.

Debug.Assert(written == 20);
if (!success || written != 20)
{
throw new InvalidOperationException("Could not compute the hash for the 'Sec-WebSocket-Accept' header.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the right behavior? a 500?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably. We don't know why it would fail and would have to debug it when it did.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a 400 but we can wait until it happens 😄

};

[ThreadStatic]
private static SHA1 _algorithm;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@blowdart Is caching SHA1 ok? I know it's reusable, I'm just wondering if the OS resource usage is ok to keep around.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this ends up allocating one for every threadpool thread and they live for the lifetime of the app? That's not great.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bartonjs for his thoughts

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SHA-1 state is 96 bytes. Even if you assume a couple of layers of indirection and metadata it shouldn't be all that large.

The other main alternative would be a static ConcurrentBag for pooling (or going back to a new one per request).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already pool to ton of stuff like this and the savings seem significant.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pooling this gave a 2x gain, so the layers of indirection seem pretty heavy.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do we want to do here? Go back to the slow path and create a new SHA1 every request, keep the ThreadStatic, or maintain a small pool using something like ConcurrentBag and creating a temporary new one if the pool is empty?

@BrennanConroy
Copy link
Member Author

New perf numbers in original comment :D

// so this can be hardcoded to 60 bytes for the requestKey + static websocket string
Span<byte> mergedBytes = stackalloc byte[60];
Encoding.UTF8.GetBytes(requestKey, mergedBytes);
_encodedWebSocketKey.CopyTo(mergedBytes.Slice(24));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At 24 + 36 bytes I don't know if it would be faster to use IncrementalHash with two calls to AppendData or to do the copy (probably doing this copy).

You might see perf improvement from using IncrementalHash anyways, if you happen to be at a level where you can notice a 3us difference.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll check this out

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried it out and it looked slower

@analogrelay analogrelay added the tell-mode Indicates a PR which is being merged during tell-mode label Jul 23, 2019
@analogrelay analogrelay added this to the 3.0.0-preview8 milestone Jul 23, 2019
return data.Length == 16;
Span<byte> temp = stackalloc byte[16];
var success = Convert.TryFromBase64String(value, temp, out var written);
return written == 16 && success;
}
catch (Exception)
Copy link
Member

@gfoidl gfoidl Jul 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any chance for an exception left in this code-path?

Convert.TryFromBase64String throws if value-argument is null, but this case is already eliminated in L98 here.
So you could remove the try-catch, resulting in smaller code.

Actually I would write this method as

public static bool IsRequestKeyValid(string value)
{
    ReadOnlySpan<char> chars = value.AsSpan();

    if (chars.IsEmpty)
    {
        return false;
    }

    Span<byte> temp = stackalloc byte[16];
    var success = Convert.TryFromBase64Chars(chars, temp, out int written);

    // RyuJIT produces better code than the other way round: success && written == 16
    return written == 16 && success;
}

Maybe: to get even more perf, one could skip the base64-decoding step, and just check if the chars.Length == 16 and if the chars are in the base64-alphabet, so no need to perform the actual decoding. This step could also be vectorized. But maybe that's a bit of over-engineering.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If written == 16, so for base64 the input-length must be 24. Hence one could write

ReadOnlySpan<char> chars = value.AsSpan();

if (chars.Length != 24)
{
    return false;
}

So the entire decoding-step is omitted if the result (written) can't be 16.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe: to get even more perf, one could skip the base64-decoding step, and just check if the chars.Length == 16 and if the chars are in the base64-alphabet, so no need to perform the actual decoding.

I'm assuming you meant chars.Length == 24. Can't do this because a 17 and 18 byte string will also be 24 bytes when encoded. Unless you also want to check for the trailing "==", but I think at this point we're trying too hard.

I'm going to stick with Convert.TryFromBase64Chars for now, but I'll look at the other feedback :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming you meant chars.Length == 24

Yep, typed wrong in the first comment.

Can't do this because a 17 and 18 byte string will also be 24 bytes when encoded.

Maybe I missunderstand something now, but the string is encoded and we try to decode it, then check if the decoded data (byte) is of size 16.
So the other way round this means, 16 data bytes -> 24 base64 encoded utf8-bytes.
And for base64 the encoded length is always a multiple of 4.

Thus if chars.Length != 24 we know that the decoded data won't be 16 bytes, so we can short-circuit the method and return false. No need to try to decode the input.

If chars.Length == 24 we can try to decode, then check if written == 16.

Ahhhhh, now I see that in my previous comment it may look like the entire method. It's just a fragment. Sorry.

So I mean:

public static bool IsRequestKeyValid(string value)
{
    ReadOnlySpan<char> chars = value.AsSpan();

    if (chars.Length != 24)
    {
        return false;
    }

    Span<byte> temp = stackalloc byte[16];
    var success = Convert.TryFromBase64Chars(chars, temp, out int written);

    return written == 16 && success;
}

If you want I could submit a PR for the vectorized version (next week).

@@ -110,13 +123,25 @@ public static string CreateResponseKey(string requestKey)
throw new ArgumentNullException(nameof(requestKey));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move these exceptions out of the hot-path into ThrowHelper-methods?

@BrennanConroy BrennanConroy removed this from the 3.0.0-preview8 milestone Aug 13, 2019
@BrennanConroy BrennanConroy removed the tell-mode Indicates a PR which is being merged during tell-mode label Aug 13, 2019
@BrennanConroy
Copy link
Member Author

🆙 📅
Removed SHA1 caching to make the PR more enticing :D

// so this can be hardcoded to 60 bytes for the requestKey + static websocket string
Span<byte> mergedBytes = stackalloc byte[60];
Encoding.UTF8.GetBytes(requestKey, mergedBytes);
_encodedWebSocketKey.CopyTo(mergedBytes.Slice(24));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if this is called with a requestKey that isn't 24 bytes? Should we check the return value of GetBytes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"requestKey is already verified to be small (24 bytes) by 'IsRequestKeyValid()' and everything is 1:1 mapping to UTF8 bytes"

@BrennanConroy
Copy link
Member Author

@aspnet-hello
Copy link

This comment was made automatically. If there is a problem contact [email protected].

I've triaged the above build. I've created/commented on the following issue(s)
https://github.com/aspnet/AspNetCore-Internal/issues/3128
https://github.com/aspnet/AspNetCore-Internal/issues/3168
https://github.com/aspnet/AspNetCore-Internal/issues/2483

@analogrelay analogrelay added this to the 5.0.0-preview1 milestone Oct 17, 2019
@jkotalik
Copy link
Contributor

@BrennanConroy ping

@BrennanConroy BrennanConroy merged commit 0a61879 into master Nov 1, 2019
@BrennanConroy BrennanConroy deleted the brecon/handshake branch November 1, 2019 00:43
@amcasey amcasey added area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions area-middleware Includes: URL rewrite, redirect, response cache/compression, session, and other general middlewares labels Jun 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-middleware Includes: URL rewrite, redirect, response cache/compression, session, and other general middlewares area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions feature-websockets Includes: WebSockets Perf
Projects
None yet
Development

Successfully merging this pull request may close these issues.