-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Speed up JSON Parsing #181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This reverts commit 9b69452.
This reverts commit 000cd70.
JavaScript numbers are double so we should use that as a basis for validation.
Only supports UTF-8. This is an experiment to see if it is the right solution
Other encodings still have undefined behaviour and error locations are broken for input containing any multi-byte characters.
A character with non-zero higher-order bytes would previously be interpreted as ASCII and decoding would break.
One niggling use remains in `takeString`
Needs to reference code unit locations instead of specific character locations
from a cursory glance this looks like a reasonable change; builds pass on ubuntu. Do you happen to have the performance benchmarks handy? that might be something interesting to check in so that we could potentially build up a performance test suite. Additionally what other hotspots are there in Foundation that you found by this? |
The performance "benchmark" wasn't in any way automated unfortunately. I simply created a The main hotspots in this code now relate to
In general String parsing for the sf-city-lots-json bears a huge time cost ~(5000ms vs 250ms) when compared with Darwin Foundation. |
Improve BuildServerBuildSystemTests error handling
[pull] swiftwasm from main
I ran 2 benchmarks against the current JSON parser implementation on
master
, comparing it to Darwin Foundation for large inputs and found it was 5-10x slower. (72s vs 8s for a single decode). This new code reduces this to a factor for 2-4x (17s)While JSON is a text-based serialization, all control characters are ASCII characters. The data can be most efficiently parsed as a stream of bytes instead of the initial overhead of converting the bytes to a String.
The byte -> string conversion in the current code only accounts for 3s of time but parsing numbers is incredibly expensive (
Double.init?(_ text: String)
converts the string back to bytes to usestrtod()
). This technique usesstrtod
directly on the byte array. There is a penalty for non-UTF-8 encoded data which is inline with the Darwin implementationAnother modification has been made to remove the intermediate
parser
values as the heap allocations for each intermediate parser was adding significant overhead.This also includes a commit to implement
.AllowFragments
.