-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Package collections: improve search performance #3108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -85,6 +97,69 @@ final class SQLitePackageCollectionsStorage: PackageCollectionsStorage, Closable | |||
try statement.bind(bindings) | |||
try statement.step() | |||
} | |||
|
|||
// Update search indices | |||
try self.ftsLock.withLock { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how is this lock used / what is it protecting from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I explain it in the code comment below but this prevents concurrency errors with BEGIN TRANSACTION
since there is only one db connection and each connection can have just one transaction. e.g., I ran into this problem in the test for refreshCollections
.
|
||
try package.versions.forEach { version in | ||
let packagesBindings: [SQLite.SQLiteValue] = [ | ||
.string(try self.encoder.encode(collection.identifier).base64EncodedString()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool
Sources/PackageCollections/Storage/SQLitePackageCollectionsStorage.swift
Show resolved
Hide resolved
Sources/PackageCollections/Storage/SQLitePackageCollectionsStorage.swift
Show resolved
Hide resolved
@@ -48,6 +58,8 @@ final class SQLitePackageCollectionsStorage: PackageCollectionsStorage, Closable | |||
self.diagnosticsEngine = diagnosticsEngine | |||
self.encoder = JSONEncoder.makeWithDefaults() | |||
self.decoder = JSONDecoder.makeWithDefaults() | |||
|
|||
self.populateTargetTrie() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be moved to the search function, such that if the trie is not loaded load it and then return results
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we move this to search, we might be inserting the same data to the trie more than once. Imagine between init
and searchTargets
collections are added/refreshed, which means FTS and trie are updated also. When searchTargets
is called, we would attempt to populate the trie using data from FTS, but the trie already has some if not all of the data in the FTS, so we'd be wasting time and resources to insert the same data again.
And ideally, we'd call populate only once. I think moving this to searchTargets
would require more work to restrict that.
Sources/PackageCollections/Storage/SQLitePackageCollectionsStorage.swift
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is fantastic! some small suggestions
@swift-ci please smoke test |
@yim-lee this looks great. we are going into a merge freeze for non-critical bug fixes until after the 5.4 branch is created on 1/7 (see https://forums.swift.org/t/swift-5-4-release-process/), so will merge it then |
@swift-ci please smoke test |
@swift-ci please smoke test |
1 similar comment
@swift-ci please smoke test |
@swift-ci please smoke test |
1 similar comment
@swift-ci please smoke test |
@swift-ci please smoke test |
@swift-ci please smoke test |
This is an alternate to swiftlang#3090 but is a complete solution. Motivation: Currently to support search for package collections API we read and deserialize collection blobs from SQLite then perform string matchings on individual properties in memory (e.g., `package.summary.contains("foobar")`). This can be optimized. Modifications: Use SQLite FTS--define FTS virtual tables for packages and targets, and update implementation for `findPackage`, `searchPackages`, and `searchTargets` methods of `SQLitePackageCollectionsStorage`. Without optimization, `PackageCollectionsTests.testPackageSearchPerformance` and `testTargetsSearchPerformance` take about ~400ms to run on my local machine. With FTS, `testPackageSearchPerformance` takes ~40ms and `testTargetsSearchPerformance` ~50ms. The `testSearchTargetsPerformance` in `InMemoryPackageCollectionsSearchTests` (swiftlang#3090) yields result in ~10ms, though it queries the trie directly without going through the PackageCollections API layer. Since target search is either exact or prefix match and doesn't tokenize the query, and given the good results I saw in `InMemoryPackageCollectionsSearchTests`, this implementation includes a trie on top of SQLite FTS for target search. The trie is in-memory and loads from SQLite FTS during initialization. `put`/`remove` updates both the SQLite FTS and trie. The improvement with using trie varies--`testTargetsSearchPerformance` takes between ~15-50ms to complete. `put` now takes longer to complete because of the search index updates. Result: Better search performance.
@swift-ci please smoke test |
This is an alternative to #3090 but is a complete solution.
Motivation:
Currently to support search for package collections API we read and deserialize collection blobs from SQLite then perform string matchings on individual properties in memory (e.g.,
package.summary.contains("foobar")
). This can be optimized.Modifications:
Use SQLite FTS--define FTS virtual tables for packages and targets, and update implementation for
findPackage
,searchPackages
, andsearchTargets
methods ofSQLitePackageCollectionsStorage
.Without optimization,
PackageCollectionsTests.testPackageSearchPerformance
andtestTargetsSearchPerformance
take about ~400ms to run on my local machine.With FTS,
testPackageSearchPerformance
takes ~40ms andtestTargetsSearchPerformance
~50ms.The
testSearchTargetsPerformance
inInMemoryPackageCollectionsSearchTests
(#3090) yields result in ~10ms, though it queries the trie directly without going through the PackageCollections API layer.Since target search is either exact or prefix match and doesn't tokenize the query, and given the good results I saw in
InMemoryPackageCollectionsSearchTests
, this implementation includes a trie on top of SQLite FTS for target search. The trie is in-memory and loads from SQLite FTS during initialization.put
/remove
updates both the SQLite FTS and trie. The improvement with using trie varies--testTargetsSearchPerformance
takes between ~15-50ms to complete.put
now takes longer to complete because of the search index updates.Result:
Better search performance.