Speed up JSON Parsing #181

argon · 2015-12-29T00:07:57Z

I ran 2 benchmarks against the current JSON parser implementation on master, comparing it to Darwin Foundation for large inputs and found it was 5-10x slower. (72s vs 8s for a single decode). This new code reduces this to a factor for 2-4x (17s)

While JSON is a text-based serialization, all control characters are ASCII characters. The data can be most efficiently parsed as a stream of bytes instead of the initial overhead of converting the bytes to a String.

The byte -> string conversion in the current code only accounts for 3s of time but parsing numbers is incredibly expensive (Double.init?(_ text: String) converts the string back to bytes to use strtod()). This technique uses strtod directly on the byte array. There is a penalty for non-UTF-8 encoded data which is inline with the Darwin implementation

The most efficient encoding to use for parsing is UTF-8, so if you have a choice in encoding the data passed to this method, use UTF-8.

Another modification has been made to remove the intermediate parser values as the heap allocations for each intermediate parser was adding significant overhead.

This also includes a commit to implement .AllowFragments.

This reverts commit 9b69452.

This reverts commit 000cd70.

JavaScript numbers are double so we should use that as a basis for validation.

Only supports UTF-8. This is an experiment to see if it is the right solution

Other encodings still have undefined behaviour and error locations are broken for input containing any multi-byte characters.

A character with non-zero higher-order bytes would previously be interpreted as ASCII and decoding would break.

One niggling use remains in `takeString`

Needs to reference code unit locations instead of specific character locations

phausler · 2015-12-29T03:21:07Z

from a cursory glance this looks like a reasonable change; builds pass on ubuntu.

Do you happen to have the performance benchmarks handy? that might be something interesting to check in so that we could potentially build up a performance test suite.

Additionally what other hotspots are there in Foundation that you found by this?

Speed up JSON Parsing

argon · 2015-12-29T13:01:50Z

The performance "benchmark" wasn't in any way automated unfortunately. I simply created a main.swift to read the JSON file from disk then decode it, then I profiled it using instruments.

The main hotspots in this code now relate to

Array and Dictionary size increases, leading to a large number of copies
Bridging from NSString -> String in parseString

In general String parsing for the sf-city-lots-json bears a huge time cost ~(5000ms vs 250ms) when compared with Darwin Foundation.

Improve BuildServerBuildSystemTests error handling

[pull] swiftwasm from main

argon and others added 22 commits December 22, 2015 13:09

Remove unnecessary intermediate variable

5545532

Revert "Flatten whitespaceScalars"

49804a7

This reverts commit 9b69452.

Revert "Flatten numberScalars"

4fd6ce7

This reverts commit 000cd70.

Sort the scalar definitions by value

9e0053f

Test doubleValue only for isValidJSON

3d21d8e

JavaScript numbers are double so we should use that as a basis for validation.

Update the Status doc for deserialization

80c36b5

Simplify the decoding of surrogate pairs

8a745dd

Tidy up scalar consumption

0e2a090

Refactor takeInClass to be monadic

4da0fef

Replace hex initialization with string literals

64467b9

Parse byte buffer instead of converting to string

67a1799

Only supports UTF-8. This is an experiment to see if it is the right solution

Reinstate escape sequence parsing

b87b52f

Make the parser fully functional for UTF8

6c7c022

Other encodings still have undefined behaviour and error locations are broken for input containing any multi-byte characters.

Fix non-UTF-8 decoding for non-numerics

0954aca

Also fix decoding for non ASCII characters

901694a

A character with non-zero higher-order bytes would previously be interpreted as ASCII and decoding would break.

Keep the need for step better contained

eee65c3

One niggling use remains in `takeString`

Make the switch structure look nice

4f13a4c

Optimise number parsing for UTF-8

fd327e5

Consolidate the structure character definitions

d1a673b

Fix distanceFromStart calculations

bc0448d

Needs to reference code unit locations instead of specific character locations

Improve the error messages

f082553

AllowFragments to be decoded

1b1458f

phausler added a commit that referenced this pull request Dec 29, 2015

Merge pull request #181 from argon/experiment/JSONParseBuffer

db4b395

Speed up JSON Parsing

phausler merged commit db4b395 into swiftlang:master Dec 29, 2015

argon deleted the experiment/JSONParseBuffer branch December 29, 2015 13:01

atrick pushed a commit to atrick/swift-corelibs-foundation that referenced this pull request Jan 12, 2021

Merge pull request swiftlang#181 from rmaz/cleanershutdown

c051601

Improve BuildServerBuildSystemTests error handling

kateinoigakukun pushed a commit to kateinoigakukun/swift-corelibs-foundation that referenced this pull request Oct 11, 2023

Merge pull request swiftlang#181 from swiftwasm/main

54eba31

[pull] swiftwasm from main

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Speed up JSON Parsing #181

Speed up JSON Parsing #181

Uh oh!

argon commented Dec 29, 2015

Uh oh!

phausler commented Dec 29, 2015

Uh oh!

argon commented Dec 29, 2015

Uh oh!

Uh oh!

Speed up JSON Parsing #181

Speed up JSON Parsing #181

Uh oh!

Conversation

argon commented Dec 29, 2015

Uh oh!

phausler commented Dec 29, 2015

Uh oh!

argon commented Dec 29, 2015

Uh oh!

Uh oh!