Skip to content

[WIP][stdlib] Dictionary: Add unsafe bulk-loading initializer #21208

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 6 commits into from

Conversation

lorentey
Copy link
Member

@lorentey lorentey commented Dec 11, 2018

This in an experimental Dictionary initializer that is designed to work better with -[NSDictionary getObjects:andKeys:count:]. This is primarily useful during bridging.

(Interface inspired by @natecook1000's Array initializer proposal.)

@lorentey
Copy link
Member Author

cc @Catfish-Man

@lorentey
Copy link
Member Author

@swift-ci please test

} else {
// Swap into place, then try again with the swapped-in value.
native.swapEntry(target, with: bucket)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a cool algorithm, @lorentey 👏

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to work it out myself because I couldn't picture how to hash-in-place like this, but I think I got it. Nice work. (Did you adapt it from somewhere or come up with it yourself?)

Does this obsolete DictionaryBuilder?

For my own understanding:

  • The storage is separated into three regions: "before bucket", "bucket up to count", and "count and after".
  • Everything in "before bucket" and "count and after" is either uninitialized or in use. Everything in "bucket up to count" is either unprocessed or in use.
  • "in use" is tracked by the bitmap in native.hashTable, the same way it would be for a working Dictionary. Whenever we see such a bucket in the middle section, we skip over it, and native.hashTable.insertNew will never return an in-use bucket. (This is the key thing I was missing when trying to picture the algorithm on my own.)
  • When processing an item, we're basically deciding how to move it from the middle section. Either we're moving it elsewhere in the middle section, in which case we do a swap so that we don't lose the unprocessed item that was there, or we're moving it outside the middle section, in which case it's okay to leave behind an uninitialized hole (because we're about to advance bucket).

Copy link
Member Author

@lorentey lorentey Dec 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a nice analysis & it's exactly right -- in fact, I'll steal it to document the invariants!

This is unlikely to be a new algorithm -- I can swear I saw something like this done before, although maybe not in these exact circumstances. (Knuth, maybe? This feels like the sort of thing he may include as an exercise.)

We're experimenting with switching some forms of bridging from _DictionaryBuilder to this, but it's only an experiment at this point. This is designed specifically to work with the narrow use case of -[NSDictionary objects:keys:count:].

It's quite more fiddly than _DictionaryBuilder, so if the input comes from a sequence of unique key/value pairs, then I expect the builder will likely still be faster. (Although benchmarks may surprise us!)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if there's a tiny simplification in going backwards instead of forwards—indeed, @Catfish-Man described it to me that way at first. That way you only have two regions to think about (and for a concrete benefit, only have to make one comparison).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea; applied!

@lorentey
Copy link
Member Author

@swift-ci test

@swift-ci
Copy link
Contributor

Build failed
Swift Test OS X Platform
Git Sha - 4e61394

@swift-ci
Copy link
Contributor

Build failed
Swift Test Linux Platform
Git Sha - 4e61394

// Discard existing entry, then move the current entry in place of it.
uncheckedDestroy(at: b)
_storage._count -= 1
moveEntry(from: bucket, to: b)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be cheaper to leave the existing entry where it is? Why does the first one win?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is how the existing bridging code works for String keys. Arguably the behavior is observable -- although you have to really work for it, so it probably isn't worth worrying about it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, this is already breaking first/last ordering anyway because of all the swapping it does. I'll change it to discard the current entry.

@lorentey
Copy link
Member Author

@swift-ci smoke test

@lorentey
Copy link
Member Author

Closing; this has landed in #21235.

@lorentey lorentey closed this Jul 29, 2019
@lorentey lorentey deleted the dictionary-bulk-loading branch July 29, 2019 19:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants