Skip to content

perf(io): excise last use of std::regex #515

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 17, 2023
Merged

Conversation

JohelEGP
Copy link
Contributor

@JohelEGP JohelEGP commented Jun 17, 2023

On my system, with a GCC 13 Debug build of Cppfront,
lowering passthrough-tests/clang-12-libstdc++-e.cpp2
is reduced from 24 s to 0.4 s.
Lowering the other passthrough-tests also takes 0.4 s each.

The passthrough-tests pass, as do the regression-tests:

100% tests passed, 0 tests failed out of 660

Total Test time (real) =  85.42 sec

@hsutter
Copy link
Owner

hsutter commented Jun 17, 2023

Holy carruthers, Batman, that's impressive. Between this and the other regex excision you did yesterday, that's a 95+% speedup in cppfront all-up... simply by not using <regex>.

I am quickly gaining sympathy with the people who've been complaining about regex perf for many years now and recommend against using it. I've never felt the pain because I haven't been a big regex user, and was I trying regex here in cppfront because I felt I should. Now I won't. Thanks.

One small fix though: You need to change

*line.end()

to

line.data()+line.length()

in two places, because the first is UB (and the Microsoft STL detects it and fires an assertion on it).

@hsutter
Copy link
Owner

hsutter commented Jun 17, 2023

Ah, looks like I can just apply the fix right here in the GitHub UI... done.

Thanks, Johel! You get the prize for easily the most dramatic performance improvement in cppfront.

@hsutter hsutter merged commit ba1cab6 into hsutter:main Jun 17, 2023
@hsutter
Copy link
Owner

hsutter commented Jun 17, 2023

For perspective, I knew compiling the /passthrough-tests was kind of slow, because they're huge files that contain entire preprocessed commercial STL implementations. I hadn't bothered to try to optimize for large files until seeing whether the project gets to a point where there’s actual demand to process huge files. And even then, I suspected the culprit was probably iostreams. Nope, iostreams perf is just fine. It was regex all along.

Similarly, I always guessed that regex perf wouldn't be a major factor because cppfront is dominated (I thought) with lots of little heap allocations and also some cache-unfriendly pointer-chasing where I don’t use vectors/deques. Nope, the pointer-chasing is just fine (at least so far). It was regex that totally dominated everything.

Which just goes to show again to not trust intuition. As always: Never guess, measure.

Incidentally, and pretty remarkably, compiling a file with cppfront is now similar to the speed of doing a file system copy of the file. For example, the second part of my regression-tests/test-results/run-tests script that compiles all the test .cpp2 files is visually nearly as fast as the first part that just copies the files to the test directory.

Thanks again.

@JohelEGP JohelEGP deleted the perf_io_regex branch June 17, 2023 18:44
@JohelEGP
Copy link
Contributor Author

JohelEGP commented Jun 17, 2023

because the first is UB

That makes sense: https://compiler-explorer.com/z/ffGxxPYTx.
data() + size() is guaranteed to be char{},
so I thought it'd be OK to dereference the past-the-end iterator.
Thank you.

@JohelEGP
Copy link
Contributor Author

It occurred to me that this could be refactored to behave like do_is_keyword.
To do a single lazy pass and then check that a space follows the import keyword.
Unfortunately, I was occupied and you were faster to merge.

@hsutter
Copy link
Owner

hsutter commented Jun 18, 2023

No worries, feel free to do a new PR to do it that way instead.

zaucy pushed a commit to zaucy/cppfront that referenced this pull request Dec 5, 2023
* perf(io): excise last use of `std::regex`

* Change `&*x.end()` to `x.data()+x.length()` to avoid UB

Signed-off-by: Herb Sutter <[email protected]>

---------

Signed-off-by: Herb Sutter <[email protected]>
Co-authored-by: Herb Sutter <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants