You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Parallel mode was crashing with bad memory access due to data races
accessing `Frontend.configurationLoader`, more specifically its `cache`.
After serializing access (initially with an `NSLock`), I observed that
despite not crashing anymore, the performance was surprisingly *worst*
than in single threaded mode.
That led me down a road of investigating why this was happening, and
after some profiling I discovered that `Rule`'s `nameCache` was the
main source of contention - causing around 30% of total spent time
waiting for locks (`ulock_wait`) on my tests.
After replacing the synchronization mechanism on `nameCache` with a
more efficient `os_unfair_lock_t` (`pthread_mutex_t` in Linux), the
contention dropped significantly, and parallel mode now outperformed
single threaded mode as expected. As a bonus, these changes also
improved single threaded mode performance as well, due to the reduced
overhead of using a lock vs a queue.
I then used the same `Lock` approach to serialize access to
`Frontend.configurationLoader` which increased the performance gap
even further.
After these improvements, I was able to obtain quite significant
performance gains using `Lock`:
- serial (non optimized) vs parallel (optimized): ~5.4x (13.5s vs 74s)
- serial (optimized) vs serial (non optimized): ~1.6x (44s vs 74s)
- serial (optimized) vs parallel (optimized): ~3.2x (13.5s vs 44s)
Sadly, a custom `Lock` implementation is not ideal for `swift-format`
to maintain and Windows support was not provided. As such, `NSLock` was
used instead which is a part of `Foundation` and supported on all major
platforms.
Using `NSLock` the improvements were not so good, unfortunately:
- serial (non optimized) vs parallel (NSLock): ~1,9x (38s vs 74s)
- serial (NSLock) vs serial (non optimized): ~1,4x (52s vs 74s)
- serial (NSLock) vs parallel (NSLock): ~1,3x (38s vs 52s)
Tests were made on my `MacBookPro16,1` (8-core [email protected]), on a project
with 2135 Swift files, compiling `swift-format` in Release mode.
## Changes
- Use an `NSLock` to serialize access to `Frontend.configurationLoader`.
- Use an `NSLock` to serialize access to `Rule`'s `nameCache`
(replacing `nameCacheQueue`)
0 commit comments