-
Notifications
You must be signed in to change notification settings - Fork 14.4k
[libc++] Optimizing is_permutation #129565
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be notified. If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers. If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
@llvm/pr-subscribers-libcxx Author: Imad Aldij (imdj) ChangesOptimize is_permutation by using Solve: Full diff: https://github.com/llvm/llvm-project/pull/129565.diff 1 Files Affected:
diff --git a/libcxx/include/__algorithm/is_permutation.h b/libcxx/include/__algorithm/is_permutation.h
index 1afb11596bc6b..c6cc947c75714 100644
--- a/libcxx/include/__algorithm/is_permutation.h
+++ b/libcxx/include/__algorithm/is_permutation.h
@@ -11,7 +11,10 @@
#define _LIBCPP___ALGORITHM_IS_PERMUTATION_H
#include <__algorithm/comp.h>
+#include <__algorithm/count_if.h>
+#include <__algorithm/find_if.h>
#include <__algorithm/iterator_operations.h>
+#include <__algorithm/mismatch.h>
#include <__config>
#include <__functional/identity.h>
#include <__iterator/concepts.h>
@@ -82,28 +85,29 @@ _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 bool __is_permutation_impl(
for (auto __i = __first1; __i != __last1; ++__i) {
// Have we already counted the number of *__i in [f1, l1)?
- auto __match = __first1;
- for (; __match != __i; ++__match) {
- if (std::__invoke(__pred, std::__invoke(__proj1, *__match), std::__invoke(__proj1, *__i)))
- break;
- }
+ auto __match = std::find_if(__first1, __i, [&](const auto& __x) {
+ return bool(std::__invoke(__pred, std::__invoke(__proj1, __x), std::__invoke(__proj1, *__i)));
+ });
if (__match == __i) {
+ auto __proj = std::__identity();
+
// Count number of *__i in [f2, l2)
- _D1 __c2 = 0;
- for (auto __j = __first2; __j != __last2; ++__j) {
- if (std::__invoke(__pred, std::__invoke(__proj1, *__i), std::__invoke(__proj2, *__j)))
- ++__c2;
- }
+ auto __predicate2 = [&](const auto& __x) {
+ return bool(std::__invoke(__pred, std::__invoke(__proj1, *__i), std::__invoke(__proj2, __x)));
+ };
+ _D1 __c2 = std::__count_if<_AlgPolicy>(__first2, __last2, __predicate2, __proj);
+
if (__c2 == 0)
return false;
- // Count number of *__i in [__i, l1) (we can start with 1)
- _D1 __c1 = 1;
- for (auto __j = _IterOps<_AlgPolicy>::next(__i); __j != __last1; ++__j) {
- if (std::__invoke(__pred, std::__invoke(__proj1, *__i), std::__invoke(__proj1, *__j)))
- ++__c1;
- }
+ // Count number of *__i in [__i, l1)
+ auto __predicate1 = [&](const auto& __x) {
+ return bool(std::__invoke(__pred, std::__invoke(__proj1, *__i), std::__invoke(__proj1, __x)));
+ };
+ _D1 __c1 = std::__count_if<_AlgPolicy>(_IterOps<_AlgPolicy>::next(__i), __last1, __predicate1, __proj);
+ ++__c1; // Add 1 for *__i itself
+
if (__c1 != __c2)
return false;
}
@@ -117,10 +121,9 @@ template <class _AlgPolicy, class _ForwardIterator1, class _Sentinel1, class _Fo
[[__nodiscard__]] _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 bool __is_permutation(
_ForwardIterator1 __first1, _Sentinel1 __last1, _ForwardIterator2 __first2, _BinaryPredicate&& __pred) {
// Shorten sequences as much as possible by lopping of any equal prefix.
- for (; __first1 != __last1; ++__first1, (void)++__first2) {
- if (!__pred(*__first1, *__first2))
- break;
- }
+ auto __result = std::mismatch(__first1, __last1, __first2, __pred);
+ __first1 = __result.first;
+ __first2 = __result.second;
if (__first1 == __last1)
return true;
|
ba40d2e
to
471c691
Compare
d062eba
to
cc9c0b2
Compare
Based on initial results from benchmarks this implementation doesn't (yet) offer better performance. I will try to profile it and tweak few things. |
Any feedback, suggestions, ideas to further boost the performance and improve the PR are welcome. |
Can you post the benchmarks for before and after your change? |
I noticed the major difference is during Result comparison:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I run the new benchmarks locally and compare before/after your patch, this is what I get:
Results
Comparing build/default/libcxx/test/benchmarks/algorithms/nonmodifying/Output/is_permutation.bench.cpp.dir/benchmark-result.json to build/candidate/libcxx/test/benchmarks/algorithms/nonmodifying/Output/is_permutation.bench.cpp.dir/benchmark-result.json
Benchmark Time CPU Time Old Time New CPU Old CPU New
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
std::is_permutation(vector<int>) (3leg) (common prefix)/8 +0.0328 +0.0303 4 4 4 4
std::is_permutation(vector<int>) (3leg) (common prefix)/1024 -0.1053 -0.1055 548 490 547 490
std::is_permutation(vector<int>) (3leg) (common prefix)/8192 +0.0017 +0.0005 3925 3931 3918 3919
std::is_permutation(deque<int>) (3leg) (common prefix)/8 +0.0026 +0.0026 7 7 7 7
std::is_permutation(deque<int>) (3leg) (common prefix)/1024 +0.0149 +0.0105 657 667 657 664
std::is_permutation(deque<int>) (3leg) (common prefix)/8192 +0.0104 +0.0070 5223 5277 5219 5256
std::is_permutation(list<int>) (3leg) (common prefix)/8 +0.0475 +0.0485 5 5 5 5
std::is_permutation(list<int>) (3leg) (common prefix)/1024 +0.4774 +0.4798 1095 1618 1092 1615
std::is_permutation(list<int>) (3leg) (common prefix)/8192 +0.0071 +0.0071 13884 13983 13860 13959
std::is_permutation(vector<int>) (3leg, pred) (common prefix)/8 +0.0399 +0.0389 4 4 4 4
std::is_permutation(vector<int>) (3leg, pred) (common prefix)/1024 -0.0139 -0.0111 461 454 459 454
std::is_permutation(vector<int>) (3leg, pred) (common prefix)/8192 +0.0212 +0.0181 3528 3603 3527 3591
std::is_permutation(deque<int>) (3leg, pred) (common prefix)/8 +0.0034 +0.0032 8 8 8 8
std::is_permutation(deque<int>) (3leg, pred) (common prefix)/1024 +0.0219 +0.0197 769 786 769 784
std::is_permutation(deque<int>) (3leg, pred) (common prefix)/8192 +0.0147 +0.0138 6121 6211 6121 6205
std::is_permutation(list<int>) (3leg, pred) (common prefix)/8 +0.0153 +0.0153 6 6 6 6
std::is_permutation(list<int>) (3leg, pred) (common prefix)/1024 +0.0654 +0.0628 1111 1184 1111 1181
std::is_permutation(list<int>) (3leg, pred) (common prefix)/8192 +0.3435 +0.3437 11485 15431 11483 15429
std::is_permutation(vector<int>) (4leg) (common prefix)/8 +0.0604 +0.0598 6 7 6 7
std::is_permutation(vector<int>) (4leg) (common prefix)/1024 +0.0248 +0.0243 572 586 572 586
std::is_permutation(vector<int>) (4leg) (common prefix)/8192 +0.0091 +0.0091 4520 4561 4520 4561
std::is_permutation(deque<int>) (4leg) (common prefix)/8 +0.0566 +0.0564 12 12 12 12
std::is_permutation(deque<int>) (4leg) (common prefix)/1024 +0.0007 +0.0004 811 811 811 811
std::is_permutation(deque<int>) (4leg) (common prefix)/8192 +0.0008 +0.0007 6440 6445 6439 6444
std::is_permutation(list<int>) (4leg) (common prefix)/8 -0.0102 -0.0102 7 7 7 7
std::is_permutation(list<int>) (4leg) (common prefix)/1024 +0.0079 +0.0079 1070 1079 1070 1078
std::is_permutation(list<int>) (4leg) (common prefix)/8192 +0.0485 +0.0481 12583 13194 12583 13189
rng::is_permutation(vector<int>) (4leg) (common prefix)/8 +0.1462 +0.1173 7 7 7 7
rng::is_permutation(vector<int>) (4leg) (common prefix)/1024 +0.0298 +0.0246 583 600 583 597
rng::is_permutation(vector<int>) (4leg) (common prefix)/8192 -0.0308 -0.0206 4738 4592 4684 4588
rng::is_permutation(deque<int>) (4leg) (common prefix)/8 +0.2257 +0.2283 10 13 10 13
rng::is_permutation(deque<int>) (4leg) (common prefix)/1024 -0.0090 -0.0062 825 817 821 816
rng::is_permutation(deque<int>) (4leg) (common prefix)/8192 +0.0046 +0.0050 6465 6495 6454 6486
rng::is_permutation(list<int>) (4leg) (common prefix)/8 +0.0054 +0.0048 7 7 7 7
rng::is_permutation(list<int>) (4leg) (common prefix)/1024 +0.0921 +0.0527 1074 1173 1074 1130
rng::is_permutation(list<int>) (4leg) (common prefix)/8192 -0.0457 -0.0489 13452 12837 13452 12795
std::is_permutation(vector<int>) (4leg, pred) (common prefix)/8 +0.1036 +0.1025 6 7 6 7
std::is_permutation(vector<int>) (4leg, pred) (common prefix)/1024 -0.0533 -0.0534 607 574 607 574
std::is_permutation(vector<int>) (4leg, pred) (common prefix)/8192 -0.0726 -0.0705 4820 4470 4808 4469
std::is_permutation(deque<int>) (4leg, pred) (common prefix)/8 +0.0179 +0.0174 11 12 11 12
std::is_permutation(deque<int>) (4leg, pred) (common prefix)/1024 -0.0014 +0.0016 854 853 851 852
std::is_permutation(deque<int>) (4leg, pred) (common prefix)/8192 +0.0032 +0.0044 6766 6788 6755 6785
std::is_permutation(list<int>) (4leg, pred) (common prefix)/8 -0.0140 -0.0148 8 8 8 8
std::is_permutation(list<int>) (4leg, pred) (common prefix)/1024 +0.0163 +0.0147 1179 1199 1179 1197
std::is_permutation(list<int>) (4leg, pred) (common prefix)/8192 -0.0326 -0.0329 14402 13932 14400 13926
rng::is_permutation(vector<int>) (4leg, pred) (common prefix)/8 +0.1177 +0.1137 6 7 6 7
rng::is_permutation(vector<int>) (4leg, pred) (common prefix)/1024 -0.0608 -0.0626 609 572 609 570
rng::is_permutation(vector<int>) (4leg, pred) (common prefix)/8192 -0.0641 -0.0643 4811 4503 4808 4499
rng::is_permutation(deque<int>) (4leg, pred) (common prefix)/8 +0.0350 +0.0347 11 12 11 12
rng::is_permutation(deque<int>) (4leg, pred) (common prefix)/1024 +0.0221 +0.0180 849 868 849 864
rng::is_permutation(deque<int>) (4leg, pred) (common prefix)/8192 +0.0028 +0.0022 6775 6794 6775 6790
rng::is_permutation(list<int>) (4leg, pred) (common prefix)/8 -0.0094 -0.0094 8 8 8 8
rng::is_permutation(list<int>) (4leg, pred) (common prefix)/1024 -0.0004 -0.0003 1182 1182 1182 1182
rng::is_permutation(list<int>) (4leg, pred) (common prefix)/8192 -0.1109 -0.1109 12824 11402 12823 11401
std::is_permutation(vector<int>) (3leg) (shuffled)/8 +0.1544 +0.1540 49 56 49 56
std::is_permutation(vector<int>) (3leg) (shuffled)/1024 -0.0041 -0.0042 417690 415998 417613 415878
std::is_permutation(deque<int>) (3leg) (shuffled)/8 +0.0312 +0.0311 72 74 72 74
std::is_permutation(deque<int>) (3leg) (shuffled)/1024 -0.0039 -0.0038 976937 973173 976905 973173
std::is_permutation(list<int>) (3leg) (shuffled)/8 +0.0698 +0.0696 60 64 60 64
std::is_permutation(list<int>) (3leg) (shuffled)/1024 +0.0014 +0.0008 2016154 2019032 2016155 2017847
std::is_permutation(vector<int>) (3leg, pred) (shuffled)/8 +0.0055 +0.0066 62 62 61 62
std::is_permutation(vector<int>) (3leg, pred) (shuffled)/1024 -0.0142 -0.0122 1070694 1055456 1068171 1055100
std::is_permutation(deque<int>) (3leg, pred) (shuffled)/8 -0.0045 -0.0024 81 81 81 81
std::is_permutation(deque<int>) (3leg, pred) (shuffled)/1024 -0.0031 -0.0026 1167616 1164036 1166812 1163794
std::is_permutation(list<int>) (3leg, pred) (shuffled)/8 -0.1736 -0.1734 91 75 91 75
std::is_permutation(list<int>) (3leg, pred) (shuffled)/1024 +0.0245 +0.0251 2229788 2284486 2228285 2284297
std::is_permutation(vector<int>) (4leg) (shuffled)/8 +0.1556 +0.1562 49 57 49 57
std::is_permutation(vector<int>) (4leg) (shuffled)/1024 -0.0037 -0.0024 413315 411780 412744 411745
std::is_permutation(deque<int>) (4leg) (shuffled)/8 +0.0179 +0.0185 78 80 78 80
std::is_permutation(deque<int>) (4leg) (shuffled)/1024 +0.0060 +0.0060 975699 981570 975697 981572
std::is_permutation(list<int>) (4leg) (shuffled)/8 +0.0390 +0.0391 60 63 60 63
std::is_permutation(list<int>) (4leg) (shuffled)/1024 -0.0042 -0.0034 2018535 2009991 2016729 2009779
rng::is_permutation(vector<int>) (4leg) (shuffled)/8 +0.1454 +0.1470 49 56 49 56
rng::is_permutation(vector<int>) (4leg) (shuffled)/1024 -0.0226 -0.0223 421068 411541 420894 411504
rng::is_permutation(deque<int>) (4leg) (shuffled)/8 +0.0739 +0.0734 76 81 76 81
rng::is_permutation(deque<int>) (4leg) (shuffled)/1024 -0.0013 -0.0017 991798 990556 990445 988751
rng::is_permutation(list<int>) (4leg) (shuffled)/8 +0.0279 +0.0286 61 63 61 63
rng::is_permutation(list<int>) (4leg) (shuffled)/1024 +0.0008 +0.0011 2013143 2014835 2011920 2014227
std::is_permutation(vector<int>) (4leg, pred) (shuffled)/8 +0.0090 +0.0091 61 61 61 61
std::is_permutation(vector<int>) (4leg, pred) (shuffled)/1024 -0.2018 -0.2012 1058702 845087 1057847 845031
std::is_permutation(deque<int>) (4leg, pred) (shuffled)/8 +0.1386 +0.1383 81 92 81 92
std::is_permutation(deque<int>) (4leg, pred) (shuffled)/1024 +0.0404 +0.0405 1164023 1211074 1163886 1211007
std::is_permutation(list<int>) (4leg, pred) (shuffled)/8 -0.0203 -0.0199 77 75 77 75
std::is_permutation(list<int>) (4leg, pred) (shuffled)/1024 -0.0073 -0.0065 2302394 2285626 2300674 2285627
rng::is_permutation(vector<int>) (4leg, pred) (shuffled)/8 -0.0040 -0.0024 61 61 61 61
rng::is_permutation(vector<int>) (4leg, pred) (shuffled)/1024 -0.1933 -0.1933 1047648 845138 1047652 845093
rng::is_permutation(deque<int>) (4leg, pred) (shuffled)/8 +0.1698 +0.1701 79 92 79 92
rng::is_permutation(deque<int>) (4leg, pred) (shuffled)/1024 +0.0406 +0.0407 1163094 1210313 1162968 1210314
rng::is_permutation(list<int>) (4leg, pred) (shuffled)/8 -0.0023 -0.0031 76 76 76 76
rng::is_permutation(list<int>) (4leg, pred) (shuffled)/1024 +0.0127 +0.0113 2294352 2323538 2292282 2318209
OVERALL_GEOMEAN +0.0188 +0.0180 0 0 0 0
I think that's really interesting. Observations:
- We're doing much worse with the new implementation on
std::list
andstd::deque
. I don't understand that, that needs investigation. - We're not doing better on
std::vector
like we would assume sincestd::mismatch
is vectorized. The root cause here seems to be that we don't properly forward the knowledge that the predicate isstd::equal_to
to the call tostd::mismatch
. I think that might be due to the use ofreference_wrapper
, which might inhibit this check. If that's the case, we could avoid usingstd::ref
when we callmismatch
, but we should probably fix the underlying issue by making sure that__desugars_to<__equal_tag, ...>
understands when it gets passed areference_wrapper
. That seems like a general thing we should fix if it's broken, and that's actually the target of #129312.
I think those are two good directions for investigating, please let me know if you have questions!
What platform are you running your benchmarks on? I also just discovered that we were not enabling vectorization in
Edit: The AppleClang issue should be solved by #132090. |
Thank you for the feedback. I'll update the repo, incorporate those changes, and try to investigate further accordingly.
I'm using Linux (openSUSE). |
Replace hand-written loops with vectorized std::mismatch, std::find_if, and std::count_if
7bfebb7
to
659e8a5
Compare
This are the benchmark results I'm getting now after adding the second mismatch and incorporating #132090 and #132090 locally. Results:
|
I'm missing some piece of the puzzle. I couldn't reach those numbers. Like mentioned above in the benchmarks , I'm currently getting:
Just dropping
I also tried using __identity __ident;
auto __result = std::__mismatch(__first1, __last1, __first2, __pred, __ident, __ident); but that also ended up with worse results
What am I doing wrong :( |
@imdj What is your Per https://libcxx.llvm.org/TestingLibcxx.html#benchmarks:
|
Yep, the exact workflow. Then I compare my PR build against a fairly up-to-date llvm main repo. using I'm noticing a lot of fluctuation though between runs. Something is probably off in my setup. I'll double check with the tips at: https://llvm.org/docs/Benchmarking.html. |
I also notice a bit of fluctuation between runs for this algorithm, especially for |
So, I compared few of the benchmarks results in pairs and ranked the top 10 with largest relative changes. The results overall match your findings @ldionne :
For reference, here's my build config: cmake -G Ninja -S runtimes -B build -DCMAKE_BUILD_TYPE=Release -DLLVM_USE_LINKER=lld -DLLVM_BUILD_STATIC=ON -DLLVM_ENABLE_RUNTIMES="libcxx;libcxxabi;libunwind" |
✅ With the latest revision this PR passed the C/C++ code formatter. |
I tried to do some more experiments, this time also comparing Let me know what you think and if there are any further suggestions. Here are the results after the change: GCC v14 results (Mean Time: -0.1017)
Clang v19 results (Mean Time: -0.0432)
|
Could the diff in results be originating from using a different commit as base for old benchmarks? I'm using 9b1f905 as base |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First, make sure to benchmark on the latest main
since I recently fixed two issues where we wouldn't vectorize properly inside mismatch
. I pulled your branch and rebased it onto main
just now, and the algorithms I get that do worse are the following (I dropped all the lines where your patch was an improvement):
Comparing build/default/libcxx/test/benchmarks/algorithms/nonmodifying/Output/is_permutation.bench.cpp.dir/benchmark-result.json to build/candidate/libcxx/test/benchmarks/algorithms/nonmodifying/Output/is_permutation.bench.cpp.dir/benchmark-result.json
Benchmark Time CPU Time Old Time New CPU Old CPU New
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
std::is_permutation(list<int>) (3leg) (common prefix)/8 +0.0801 +0.0824 5 5 5 5
std::is_permutation(list<int>) (3leg) (common prefix)/1024 +0.4706 +0.4731 1088 1599 1086 1599
std::is_permutation(list<int>) (3leg) (common prefix)/8192 +0.1887 +0.1899 11456 13618 11445 13618
std::is_permutation(list<int>) (3leg, pred) (common prefix)/1024 +0.0336 +0.0338 1137 1176 1137 1175
rng::is_permutation(list<int>) (4leg, pred) (common prefix)/8192 +0.1133 +0.1139 12519 13938 12512 13937
std::is_permutation(list<int>) (3leg) (shuffled)/8 +0.0699 +0.0701 61 65 61 65
std::is_permutation(list<int>) (3leg, pred) (shuffled)/1024 +0.0281 +0.0289 2234288 2297102 2232489 2296954
std::is_permutation(list<int>) (4leg) (shuffled)/8 +0.0703 +0.0723 61 65 61 65
rng::is_permutation(list<int>) (4leg) (shuffled)/8 +0.0819 +0.0818 60 65 60 65
std::is_permutation(list<int>) (4leg, pred) (shuffled)/8 +0.2659 +0.2659 76 96 76 96
rng::is_permutation(list<int>) (4leg, pred) (shuffled)/8 +0.2426 +0.2472 77 96 77 96
std::is_permutation(deque<int>) (4leg) (common prefix)/8 +0.2592 +0.2600 12 15 12 15
std::is_permutation(deque<int>) (4leg) (common prefix)/1024 +0.5906 +0.5919 818 1301 817 1301
std::is_permutation(deque<int>) (4leg) (common prefix)/8192 +0.5919 +0.5925 6456 10277 6453 10276
rng::is_permutation(deque<int>) (4leg) (common prefix)/8 +0.4652 +0.4659 10 15 10 15
rng::is_permutation(deque<int>) (4leg) (common prefix)/1024 +0.5856 +0.5861 812 1288 812 1288
rng::is_permutation(deque<int>) (4leg) (common prefix)/8192 +0.5919 +0.5921 6443 10256 6441 10255
std::is_permutation(deque<int>) (4leg, pred) (common prefix)/8 +0.3581 +0.3591 11 16 11 16
std::is_permutation(deque<int>) (4leg, pred) (common prefix)/1024 +0.4977 +0.4986 862 1291 861 1291
std::is_permutation(deque<int>) (4leg, pred) (common prefix)/8192 +0.5084 +0.5102 6826 10296 6817 10295
rng::is_permutation(deque<int>) (4leg, pred) (common prefix)/8 +0.3754 +0.3763 11 16 11 16
rng::is_permutation(deque<int>) (4leg, pred) (common prefix)/1024 +0.5137 +0.5142 852 1290 852 1290
rng::is_permutation(deque<int>) (4leg, pred) (common prefix)/8192 +0.5136 +0.5144 6771 10248 6767 10248
std::is_permutation(deque<int>) (3leg) (shuffled)/8 +0.0634 +0.0639 73 78 73 78
std::is_permutation(deque<int>) (3leg, pred) (shuffled)/8 +0.0260 +0.0275 81 83 81 83
rng::is_permutation(deque<int>) (4leg, pred) (shuffled)/8 +0.3511 +0.3530 81 109 81 109
std::is_permutation(vector<int>) (3leg, pred) (common prefix)/8 +0.0183 +0.0172 4 4 4 4
std::is_permutation(vector<int>) (3leg) (shuffled)/8 +0.1508 +0.1512 49 56 49 56
std::is_permutation(vector<int>) (3leg, pred) (shuffled)/8 +0.0585 +0.0604 62 65 62 65
std::is_permutation(vector<int>) (4leg) (shuffled)/8 +0.0977 +0.0989 49 54 49 54
rng::is_permutation(vector<int>) (4leg) (shuffled)/8 +0.1512 +0.1525 49 56 49 56
std::is_permutation(vector<int>) (4leg, pred) (shuffled)/8 +0.0705 +0.0721 61 66 61 66
- First, we can observe that
vector<int>
is only doing worse on very small sequences. That's actually a particularity of this benchmark, it operates on pretty small sequences sinceis_permutation
is so expensive. I think we can mostly disregard the slowdown forvector<int>
since it only affects 8 element sequences. I suspect that making our vectorizedmismatch
faster on small sequences would solve the problem here. - Second, we can see that we're doing worse on several benchmarks that check the
common prefix
pattern. But with that data pattern, the algorithm should be dominated bymismatch
. So I think we need to understand why our currentstd::mismatch
behaves worse onstd::deque
than the hand-written loop that existed instd::is_permutation
before your patch. I think you could also validate that switching from the hand-written loop tostd::mismatch
is the cause of the slowdown by locally reverting just that part of the change and seeing if the before/after benchmarks are better forstd::deque
oncommon prefix
. BTW you can locally edit the benchmark to only run a subset of all the combinations in order to iterate more quickly. - Last, we are also doing worse on
list
with the common prefix pattern, I suspect we might be hitting the same issue asdeque
.
So TLDR, I'd focus on confirming that std::mismatch
is slower on deque
and list
than a naive hand-written loop, and go from there.
if (std::__invoke(__pred, std::__invoke(__proj1, *__match), std::__invoke(__proj1, *__i))) | ||
break; | ||
} | ||
auto __match = std::find_if(__first1, __i, [&](_Ref1 __x) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably hold on to the result of *__i
. Something like _Ref1 __va = *__i
. This simplifies the code a bit and might be a bit faster depending on the kind of iterator.
Optimize
is_permutation
by usingstd::find_if
,std::count_if
, andstd::mismatch
to replace the hand-written loops.Solve:
std::is_permutation
#129324