-
Notifications
You must be signed in to change notification settings - Fork 10.5k
Dynamic type lookup string compare optimization #42390
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
7d0915e
to
e24b433
Compare
@swift-ci please benchmark |
Seems to be within the noise, if I'm understanding the report. Does this need a smoke test as well? (I don't have commit access) |
Yeah, looks like no measurable difference (no surprise there) but still seems worth doing. I'll go ahead and run a full test. |
@swift-ci please test |
Thanks for running the tests. Anything else to do here? |
If this is a real win, it might be nice to upstream it to LLVM as an additional overload for |
size_t length = s1.size(); | ||
// It may be possible for s1 to contain embedded NULL characters | ||
// so additionally validate that the lengths match | ||
return strncmp(s1.data(), s2, length) == 0 && strlen(s2) == length; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit unfortunate to have to iterate through s2 twice here. What does performance look like if we use an inline loop here instead of strncmp
to do the entire comparison in one pass? It'd be interesting to see if strncmp and/or strlen are optimized enough to offset the double iteration through s2
compared to a loop that precisely accounted for s1's fixed length and s2's null termination in one pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A naive inline loop seems to be ≈3x slower than platform strncmp in my tests with a 65 byte long sample string. I suspect this is due to the optimized ARM assembly implementations in llvm-project/libc/AOR_v20.02/string/aarch64
+ the compiler's optimization of built-ins. That said, my assembly skills are a bit rusty.
I tried something like this with -O3
:
inline static bool testcmp(const char *left, const char *right, size_t n) {
for (int i=0; i < n; i++) {
char lc = left[i];
char rc = right[i];
if (lc != rc || rc == '\0') {
return false;
}
}
return true;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, it looks good as is. Thanks for trying it out!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries -- at this level of the code, I think you could only do better if you already knew the length of s2
, or knew that s1
contained no embedded NUL
characters.
@jckarter Are you good with merging now? Getting it into LLVM sounds nice but as a separate thing. |
@mikeash Yeah, I'm fine merging this now. |
This is a small optimization for a specific path of dynamic type lookup, as profiled below in our iOS application. In a small synthetic benchmark, using
strncmp
is roughly twice as fast as usingstrlen
+memcmp
viaStringRef
in the case where the two mangled type strings match.(missing frames due to inlining?)
Other ideas for optimizing this path welcome!