Skip to content

Significantly improve prefix trie performance and memory usage #109

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 23, 2020

Conversation

harlanhaskins
Copy link
Contributor

The previous prefix trie made a new Node for each character along the
input string. In addition to being wasteful, this would introduce many
more hops to match a given string.

Instead, store common Substrings in the Nodes, and use a sorted
array/binary search through children to avoid extra time spend hashing
and storing in hashed collections.

Doing this reduces the number of nodes in the trie for the driver's
options by an order of magnitude -- from 6601 nodes to 660.

(This is actually a port of LLVM's Trie, but cleaned up and made more
idiomatic in Swift)

The previous prefix trie made a new `Node` for each character along the
input string. In addition to being wasteful, this would introduce many
more hops to match a given string.

Instead, store common `Substring`s in the `Node`s, and use a sorted
array/binary search through children to avoid extra time spend hashing
and storing in hashed collections.

Doing this reduces the number of nodes in the trie for the driver's
options by an order of magnitude -- from 6601 nodes to 660.

(This is actually a port of LLVM's Trie, but cleaned up and made more
idiomatic in Swift)
@harlanhaskins
Copy link
Contributor Author

@swift-ci please test

@harlanhaskins
Copy link
Contributor Author

@swift-ci please test

@harlanhaskins
Copy link
Contributor Author

@harlanhaskins
Copy link
Contributor Author

@swift-ci please test

@harlanhaskins
Copy link
Contributor Author

It shaves off 3MB of memory, too. But that's not really that much.

Copy link
Member

@DougGregor DougGregor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, nice optimization!

@DougGregor
Copy link
Member

Woo this go into Swift-argument-parser too?

@harlanhaskins
Copy link
Contributor Author

harlanhaskins commented May 23, 2020

I tried to work it in, but the swift-argument-parser doesn’t do the kind of ‘best prefix’ matching we need to do in the driver. I was planning to follow up with @natecook1000 about this specifically!

@harlanhaskins harlanhaskins merged commit 72baf02 into master May 23, 2020
Copy link
Contributor

@cltnschlosser cltnschlosser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks cool :) Seems I took too long to review, haha

@@ -53,34 +53,23 @@ final class PrefixTrieTests: XCTestCase {

trie["garbage"] = nil
XCTAssertNil(trie["garbage"])
XCTAssertEqual(trie.nodeCount, 0)
XCTAssertEqual(trie.nodeCount, 2)
// Removing a node leaves the entry in the trie

trie["12345"] = 12345 // 5 nodes
trie["12367"] = 12367 // 3 common nodes, 2 new nodes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably update these comments.


/// There's a common part of the string, but neither the label nor the
/// query string were matched entirely.
case haveCommonPart(Int)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: label the parameter length or document what the int means.

/// match made.
func query<S: StringProtocol>(_ s: S) -> QueryResult {
let stringCount = s.count
let labelCount = label.count
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think it would be easier to follow if you kept consistent naming.
query, queryCount, queryIndex
label, labelCount, labelIndex

}
case .dontMatch:
fatalError("Impossible because we found a child with id \(id)")
default:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explicitly declare the last 2 cases, so the compiler catches if more are added.

/// Each node is identified in the child list by the first character in its
/// label.
var id: Character {
label.first!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally have always liked label[0] to avoid the implicitly unwrapped optional, but do you know if there are any performance differences between the 2? Maybe the optimize the same, I'll try it out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, the label is a Substring which doesn’t have an int subscript.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I guess you could use https://developer.apple.com/documentation/swift/substring/2894052-subscript

But label[label.startIndex] looks worse to me than label.first!.

Also, I tried tested it out and at least for an [Int] they are the same after optimizations.

@harlanhaskins
Copy link
Contributor Author

I’ll get these in a follow-up, thanks @cltnschlosser!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants