-
Notifications
You must be signed in to change notification settings - Fork 10.4k
Websocket handshake perf #12386
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Websocket handshake perf #12386
Conversation
@@ -146,7 +145,7 @@ public async Task<WebSocket> AcceptAsync(WebSocketAcceptContext acceptContext) | |||
} | |||
} | |||
|
|||
string key = string.Join(", ", _context.Request.Headers[HeaderNames.SecWebSocketKey]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was odd, nothing in the RFC says this is valid.
Should we check that there is only one value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're now using the implicit string converter that does the same thing here.
A multi-value header will fail the normal parse if there are multiple values anyways, don't worry about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it allocating right now even for a single value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it allocating right now even for a single value?
byte[] data = Convert.FromBase64String(value); | ||
return data.Length == 16; | ||
Span<byte> temp = stackalloc byte[16]; | ||
var success = Convert.TryFromBase64String(value, temp, out var written); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the return value if temp is too small?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
false
@@ -146,7 +145,7 @@ public async Task<WebSocket> AcceptAsync(WebSocketAcceptContext acceptContext) | |||
} | |||
} | |||
|
|||
string key = string.Join(", ", _context.Request.Headers[HeaderNames.SecWebSocketKey]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're now using the implicit string converter that does the same thing here.
A multi-value header will fail the normal parse if there are multiple values anyways, don't worry about it.
Debug.Assert(written == 20); | ||
if (!success || written != 20) | ||
{ | ||
throw new InvalidOperationException("Could not compute the hash for the 'Sec-WebSocket-Accept' header."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the right behavior? a 500?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably. We don't know why it would fail and would have to debug it when it did.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be a 400 but we can wait until it happens 😄
}; | ||
|
||
[ThreadStatic] | ||
private static SHA1 _algorithm; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@blowdart Is caching SHA1 ok? I know it's reusable, I'm just wondering if the OS resource usage is ok to keep around.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this ends up allocating one for every threadpool thread and they live for the lifetime of the app? That's not great.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bartonjs for his thoughts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SHA-1 state is 96 bytes. Even if you assume a couple of layers of indirection and metadata it shouldn't be all that large.
The other main alternative would be a static ConcurrentBag for pooling (or going back to a new one per request).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already pool to ton of stuff like this and the savings seem significant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pooling this gave a 2x gain, so the layers of indirection seem pretty heavy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do we want to do here? Go back to the slow path and create a new SHA1
every request, keep the ThreadStatic
, or maintain a small pool using something like ConcurrentBag
and creating a temporary new one if the pool is empty?
New perf numbers in original comment :D |
// so this can be hardcoded to 60 bytes for the requestKey + static websocket string | ||
Span<byte> mergedBytes = stackalloc byte[60]; | ||
Encoding.UTF8.GetBytes(requestKey, mergedBytes); | ||
_encodedWebSocketKey.CopyTo(mergedBytes.Slice(24)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At 24 + 36 bytes I don't know if it would be faster to use IncrementalHash with two calls to AppendData or to do the copy (probably doing this copy).
You might see perf improvement from using IncrementalHash anyways, if you happen to be at a level where you can notice a 3us difference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll check this out
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried it out and it looked slower
return data.Length == 16; | ||
Span<byte> temp = stackalloc byte[16]; | ||
var success = Convert.TryFromBase64String(value, temp, out var written); | ||
return written == 16 && success; | ||
} | ||
catch (Exception) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any chance for an exception left in this code-path?
Convert.TryFromBase64String
throws if value
-argument is null
, but this case is already eliminated in L98 here.
So you could remove the try-catch
, resulting in smaller code.
Actually I would write this method as
public static bool IsRequestKeyValid(string value)
{
ReadOnlySpan<char> chars = value.AsSpan();
if (chars.IsEmpty)
{
return false;
}
Span<byte> temp = stackalloc byte[16];
var success = Convert.TryFromBase64Chars(chars, temp, out int written);
// RyuJIT produces better code than the other way round: success && written == 16
return written == 16 && success;
}
Maybe: to get even more perf, one could skip the base64-decoding step, and just check if the chars.Length == 16
and if the chars
are in the base64-alphabet, so no need to perform the actual decoding. This step could also be vectorized. But maybe that's a bit of over-engineering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If written == 16
, so for base64 the input-length must be 24
. Hence one could write
ReadOnlySpan<char> chars = value.AsSpan();
if (chars.Length != 24)
{
return false;
}
So the entire decoding-step is omitted if the result (written) can't be 16.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe: to get even more perf, one could skip the base64-decoding step, and just check if the
chars.Length == 16
and if thechars
are in the base64-alphabet, so no need to perform the actual decoding.
I'm assuming you meant chars.Length == 24
. Can't do this because a 17 and 18 byte string will also be 24 bytes when encoded. Unless you also want to check for the trailing "==", but I think at this point we're trying too hard.
I'm going to stick with Convert.TryFromBase64Chars
for now, but I'll look at the other feedback :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm assuming you meant
chars.Length == 24
Yep, typed wrong in the first comment.
Can't do this because a 17 and 18 byte string will also be 24 bytes when encoded.
Maybe I missunderstand something now, but the string is encoded and we try to decode it, then check if the decoded data (byte) is of size 16.
So the other way round this means, 16 data bytes -> 24 base64 encoded utf8-bytes.
And for base64 the encoded length is always a multiple of 4.
Thus if chars.Length != 24
we know that the decoded data won't be 16 bytes, so we can short-circuit the method and return false
. No need to try to decode the input.
If chars.Length == 24
we can try to decode, then check if written == 16
.
Ahhhhh, now I see that in my previous comment it may look like the entire method. It's just a fragment. Sorry.
So I mean:
public static bool IsRequestKeyValid(string value)
{
ReadOnlySpan<char> chars = value.AsSpan();
if (chars.Length != 24)
{
return false;
}
Span<byte> temp = stackalloc byte[16];
var success = Convert.TryFromBase64Chars(chars, temp, out int written);
return written == 16 && success;
}
If you want I could submit a PR for the vectorized version (next week).
@@ -110,13 +123,25 @@ public static string CreateResponseKey(string requestKey) | |||
throw new ArgumentNullException(nameof(requestKey)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move these exceptions out of the hot-path into ThrowHelper-methods?
dc65f17
to
9610426
Compare
🆙 📅 |
9610426
to
9ec7ce5
Compare
// so this can be hardcoded to 60 bytes for the requestKey + static websocket string | ||
Span<byte> mergedBytes = stackalloc byte[60]; | ||
Encoding.UTF8.GetBytes(requestKey, mergedBytes); | ||
_encodedWebSocketKey.CopyTo(mergedBytes.Slice(24)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if this is called with a requestKey that isn't 24 bytes? Should we check the return value of GetBytes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"requestKey is already verified to be small (24 bytes) by 'IsRequestKeyValid()' and everything is 1:1 mapping to UTF8 bytes"
This comment was made automatically. If there is a problem contact [email protected]. I've triaged the above build. I've created/commented on the following issue(s) |
@BrennanConroy ping |
Found a couple spots we could avoid allocating in the websocket handshake.
Would like @blowdart and @Tratcher to take a look to make sure I'm not doing something completely wrong here.
Before:
After:
With caching SHA1: