Skip to content
This repository was archived by the owner on May 28, 2025. It is now read-only.

Commit c0639b8

Browse files
committed
Rewrite UTF-8 validation in shift-based DFA
This gives plenty of performance increase on validating strings with many non-ASCII codepoints, which is the normal case for almost every non-English content. Shift-based DFA algorithm does not use SIMD instructions and does not rely on the branch predictor to get a good performance, thus is good as a general, default, architecture-agnostic implementation. There is still a bypass for ASCII-only strings to benefits from auto-vectorization, if the target supports.
1 parent 01e4f19 commit c0639b8

File tree

1 file changed

+274
-145
lines changed

1 file changed

+274
-145
lines changed

0 commit comments

Comments
 (0)