Skip to content

Package collections: improve search performance #3108

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jan 20, 2021

Conversation

yim-lee
Copy link
Contributor

@yim-lee yim-lee commented Dec 11, 2020

This is an alternative to #3090 but is a complete solution.

Motivation:
Currently to support search for package collections API we read and deserialize collection blobs from SQLite then perform string matchings on individual properties in memory (e.g., package.summary.contains("foobar")). This can be optimized.

Modifications:
Use SQLite FTS--define FTS virtual tables for packages and targets, and update implementation for findPackage, searchPackages, and searchTargets methods of SQLitePackageCollectionsStorage.

Without optimization, PackageCollectionsTests.testPackageSearchPerformance and testTargetsSearchPerformance take about ~400ms to run on my local machine.

With FTS, testPackageSearchPerformance takes ~40ms and testTargetsSearchPerformance ~50ms.

The testSearchTargetsPerformance in InMemoryPackageCollectionsSearchTests (#3090) yields result in ~10ms, though it queries the trie directly without going through the PackageCollections API layer.

Since target search is either exact or prefix match and doesn't tokenize the query, and given the good results I saw in InMemoryPackageCollectionsSearchTests, this implementation includes a trie on top of SQLite FTS for target search. The trie is in-memory and loads from SQLite FTS during initialization. put/remove updates both the SQLite FTS and trie. The improvement with using trie varies--testTargetsSearchPerformance takes between ~15-50ms to complete.

put now takes longer to complete because of the search index updates.

Result:
Better search performance.

@@ -85,6 +97,69 @@ final class SQLitePackageCollectionsStorage: PackageCollectionsStorage, Closable
try statement.bind(bindings)
try statement.step()
}

// Update search indices
try self.ftsLock.withLock {
Copy link
Contributor

@tomerd tomerd Dec 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how is this lock used / what is it protecting from?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I explain it in the code comment below but this prevents concurrency errors with BEGIN TRANSACTION since there is only one db connection and each connection can have just one transaction. e.g., I ran into this problem in the test for refreshCollections.


try package.versions.forEach { version in
let packagesBindings: [SQLite.SQLiteValue] = [
.string(try self.encoder.encode(collection.identifier).base64EncodedString()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool

@@ -48,6 +58,8 @@ final class SQLitePackageCollectionsStorage: PackageCollectionsStorage, Closable
self.diagnosticsEngine = diagnosticsEngine
self.encoder = JSONEncoder.makeWithDefaults()
self.decoder = JSONDecoder.makeWithDefaults()

self.populateTargetTrie()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be moved to the search function, such that if the trie is not loaded load it and then return results

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we move this to search, we might be inserting the same data to the trie more than once. Imagine between init and searchTargets collections are added/refreshed, which means FTS and trie are updated also. When searchTargets is called, we would attempt to populate the trie using data from FTS, but the trie already has some if not all of the data in the FTS, so we'd be wasting time and resources to insert the same data again.

And ideally, we'd call populate only once. I think moving this to searchTargets would require more work to restrict that.

Copy link
Contributor

@tomerd tomerd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is fantastic! some small suggestions

@yim-lee
Copy link
Contributor Author

yim-lee commented Dec 15, 2020

@swift-ci please smoke test

@tomerd
Copy link
Contributor

tomerd commented Dec 15, 2020

@yim-lee this looks great. we are going into a merge freeze for non-critical bug fixes until after the 5.4 branch is created on 1/7 (see https://forums.swift.org/t/swift-5-4-release-process/), so will merge it then

@tomerd tomerd added the next waiting for next merge window label Jan 5, 2021
@tomerd
Copy link
Contributor

tomerd commented Jan 5, 2021

@swift-ci please smoke test

@yim-lee
Copy link
Contributor Author

yim-lee commented Jan 6, 2021

@swift-ci please smoke test

1 similar comment
@yim-lee
Copy link
Contributor Author

yim-lee commented Jan 7, 2021

@swift-ci please smoke test

@yim-lee
Copy link
Contributor Author

yim-lee commented Jan 8, 2021

@swift-ci please smoke test

1 similar comment
@yim-lee
Copy link
Contributor Author

yim-lee commented Jan 9, 2021

@swift-ci please smoke test

@yim-lee
Copy link
Contributor Author

yim-lee commented Jan 11, 2021

@swift-ci please smoke test

@yim-lee
Copy link
Contributor Author

yim-lee commented Jan 19, 2021

@swift-ci please smoke test

This is an alternate to swiftlang#3090 but is a complete solution.

Motivation:
Currently to support search for package collections API we read and deserialize collection blobs from SQLite then perform string matchings on individual properties in memory (e.g., `package.summary.contains("foobar")`). This can be optimized.

Modifications:
Use SQLite FTS--define FTS virtual tables for packages and targets, and update implementation for `findPackage`, `searchPackages`, and `searchTargets` methods of `SQLitePackageCollectionsStorage`.

Without optimization, `PackageCollectionsTests.testPackageSearchPerformance` and `testTargetsSearchPerformance` take about ~400ms to run on my local machine.

With FTS, `testPackageSearchPerformance` takes ~40ms and `testTargetsSearchPerformance` ~50ms.

The `testSearchTargetsPerformance` in `InMemoryPackageCollectionsSearchTests` (swiftlang#3090) yields result in ~10ms, though it queries the trie directly without going through the PackageCollections API layer.

Since target search is either exact or prefix match and doesn't tokenize the query, and given the good results I saw in `InMemoryPackageCollectionsSearchTests`, this implementation includes a trie on top of SQLite FTS for target search. The trie is in-memory and loads from SQLite FTS during initialization. `put`/`remove` updates both the SQLite FTS and trie. The improvement with using trie varies--`testTargetsSearchPerformance` takes between ~15-50ms to complete.

`put` now takes longer to complete because of the search index updates.

Result:
Better search performance.
@yim-lee
Copy link
Contributor Author

yim-lee commented Jan 20, 2021

@swift-ci please smoke test

@yim-lee yim-lee merged commit 1eb1342 into swiftlang:main Jan 20, 2021
@yim-lee yim-lee deleted the search-fts branch January 20, 2021 04:57
@yim-lee yim-lee restored the search-fts branch January 20, 2021 23:35
yim-lee added a commit that referenced this pull request Jan 20, 2021
yim-lee added a commit that referenced this pull request Jan 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
next waiting for next merge window
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants