-
Notifications
You must be signed in to change notification settings - Fork 207
Significantly improve prefix trie performance and memory usage #109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The previous prefix trie made a new `Node` for each character along the input string. In addition to being wasteful, this would introduce many more hops to match a given string. Instead, store common `Substring`s in the `Node`s, and use a sorted array/binary search through children to avoid extra time spend hashing and storing in hashed collections. Doing this reduces the number of nodes in the trie for the driver's options by an order of magnitude -- from 6601 nodes to 660. (This is actually a port of LLVM's Trie, but cleaned up and made more idiomatic in Swift)
bb0a222
to
a184a67
Compare
@swift-ci please test |
@swift-ci please test |
@swift-ci please test |
It shaves off 3MB of memory, too. But that's not really that much. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, nice optimization!
Woo this go into Swift-argument-parser too? |
I tried to work it in, but the swift-argument-parser doesn’t do the kind of ‘best prefix’ matching we need to do in the driver. I was planning to follow up with @natecook1000 about this specifically! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks cool :) Seems I took too long to review, haha
@@ -53,34 +53,23 @@ final class PrefixTrieTests: XCTestCase { | |||
|
|||
trie["garbage"] = nil | |||
XCTAssertNil(trie["garbage"]) | |||
XCTAssertEqual(trie.nodeCount, 0) | |||
XCTAssertEqual(trie.nodeCount, 2) | |||
// Removing a node leaves the entry in the trie | |||
|
|||
trie["12345"] = 12345 // 5 nodes | |||
trie["12367"] = 12367 // 3 common nodes, 2 new nodes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should probably update these comments.
|
||
/// There's a common part of the string, but neither the label nor the | ||
/// query string were matched entirely. | ||
case haveCommonPart(Int) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: label the parameter length
or document what the int means.
/// match made. | ||
func query<S: StringProtocol>(_ s: S) -> QueryResult { | ||
let stringCount = s.count | ||
let labelCount = label.count |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think it would be easier to follow if you kept consistent naming.
query, queryCount, queryIndex
label, labelCount, labelIndex
} | ||
case .dontMatch: | ||
fatalError("Impossible because we found a child with id \(id)") | ||
default: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Explicitly declare the last 2 cases, so the compiler catches if more are added.
/// Each node is identified in the child list by the first character in its | ||
/// label. | ||
var id: Character { | ||
label.first! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally have always liked label[0]
to avoid the implicitly unwrapped optional, but do you know if there are any performance differences between the 2? Maybe the optimize the same, I'll try it out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, the label is a Substring
which doesn’t have an int subscript.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I guess you could use https://developer.apple.com/documentation/swift/substring/2894052-subscript
But label[label.startIndex]
looks worse to me than label.first!
.
Also, I tried tested it out and at least for an [Int]
they are the same after optimizations.
I’ll get these in a follow-up, thanks @cltnschlosser! |
The previous prefix trie made a new
Node
for each character along theinput string. In addition to being wasteful, this would introduce many
more hops to match a given string.
Instead, store common
Substring
s in theNode
s, and use a sortedarray/binary search through children to avoid extra time spend hashing
and storing in hashed collections.
Doing this reduces the number of nodes in the trie for the driver's
options by an order of magnitude -- from 6601 nodes to 660.
(This is actually a port of LLVM's Trie, but cleaned up and made more
idiomatic in Swift)