Skip to content

Use a better hash seed when doing string hashing in the compiler #26685

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Aug 29, 2019

Conversation

jrose-apple
Copy link
Contributor

  • Remove explicit seeds from string hashes that are compiler-version-dependent already.

  • Comment that we can't change the seed for swiftdoc files (@nkcsgexi).

  • Switch to a "better" explicit seed for serialization lookup tables. This last one changed the order of SIL entity deserialization (as @adrian-prantl noticed in the original Force the DJB hash seed to 0 for compatibility with HashString. #14890), but we shouldn't have been relying on that anyway (@gottesmm). It's actually rather scary that that was matching source order at the time.

@jrose-apple
Copy link
Contributor Author

@swift-ci Please test

///
/// This is the same as the default used by llvm::djbHash, just provided
/// explicitly here to note that it's part of the format.
const uint32_t SWIFTMODULE_HASH_SEED = 5381;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity: Is the fact that the format depends on the hash seed a bug?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's an OnDiskHashTable, so the hash used is inherently part of the format.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense!

@@ -52,7 +52,13 @@ const uint16_t SWIFTMODULE_VERSION_MAJOR = 0;
/// describe what change you made. The content of this comment isn't important;
/// it just ensures a conflict if two people change the module format.
/// Don't worry about adhering to the 80-column limit for this line.
const uint16_t SWIFTMODULE_VERSION_MINOR = 509; // [dynamic_lifetime] sil flag
const uint16_t SWIFTMODULE_VERSION_MINOR = 510; // better string hash seed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's up with the end-of-line comment both on the LHS and the RHS of the patch? Seems like neither one belongs on this line?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah.. it's the last change that causes us to bump the version?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, see the comment above.

@@ -388,8 +388,7 @@ class ModuleFile::DeclTableInfo {

hash_value_type ComputeHash(internal_key_type key) {
if (key.first == DeclBaseName::Kind::Normal) {
// FIXME: DJB seed=0, audit whether the default seed could be used.
return llvm::djbHash(key.second, 0);
return llvm::djbHash(key.second, SWIFTMODULE_HASH_SEED);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps factor this out into a ModuleFile::hashFunction(). I'm not sure whether this is much better, I'll leave it up to you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I considered that, but figured this was slightly more likely to keep people from doing the wrong thing in the future. I'm not 100% sure of that though.

@swift-ci
Copy link
Contributor

Build failed
Swift Test OS X Platform
Git Sha - 3142ce8d00ed39f4153da260f05fe131f907aceb

@jrose-apple
Copy link
Contributor Author

@swift-ci Please test

@swift-ci
Copy link
Contributor

Build failed
Swift Test OS X Platform
Git Sha - 3142ce8d00ed39f4153da260f05fe131f907aceb

@swift-ci
Copy link
Contributor

Build failed
Swift Test Linux Platform
Git Sha - 3142ce8d00ed39f4153da260f05fe131f907aceb

@jrose-apple
Copy link
Contributor Author

Updated to make the swiftdoc hash seed more explicit.

@swift-ci Please test

@swift-ci
Copy link
Contributor

Build failed
Swift Test OS X Platform
Git Sha - f51b775a58d7ca783fd07d4fe0833b9e8f9df6a6

@swift-ci
Copy link
Contributor

Build failed
Swift Test Linux Platform
Git Sha - 2273fdc6caada4d5d0e8d5aecd08da172a7dba97

@jrose-apple
Copy link
Contributor Author

Hmm.

@swift-ci Please clean test

@swift-ci
Copy link
Contributor

Build failed
Swift Test Linux Platform
Git Sha - 2273fdc6caada4d5d0e8d5aecd08da172a7dba97

@swift-ci
Copy link
Contributor

Build failed
Swift Test OS X Platform
Git Sha - 2273fdc6caada4d5d0e8d5aecd08da172a7dba97

@jrose-apple
Copy link
Contributor Author

Definitely concerning. What changed between now and last time? Maybe someone added another hash?

@jrose-apple
Copy link
Contributor Author

jrose-apple commented Aug 20, 2019

Oops, yes, they did. Or I resolved a conflict incorrectly.

@jrose-apple
Copy link
Contributor Author

@swift-ci Please test

@swift-ci
Copy link
Contributor

Build failed
Swift Test Linux Platform
Git Sha - 2273fdc6caada4d5d0e8d5aecd08da172a7dba97

@swift-ci
Copy link
Contributor

Build failed
Swift Test OS X Platform
Git Sha - 2273fdc6caada4d5d0e8d5aecd08da172a7dba97

@jrose-apple
Copy link
Contributor Author

@swift-ci Please test

@swift-ci
Copy link
Contributor

Build failed
Swift Test Linux Platform
Git Sha - d2826e5e5c132cf30df556bafdf98cefe5743e51

@swift-ci
Copy link
Contributor

Build failed
Swift Test OS X Platform
Git Sha - d2826e5e5c132cf30df556bafdf98cefe5743e51

@jrose-apple
Copy link
Contributor Author

@gottesmm @nkcsgexi Ping. Nothing major here.

Copy link
Contributor

@nkcsgexi nkcsgexi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keeping the hash seed for swiftdoc files makes sense to me. Thank you for the heads-up!

@jrose-apple
Copy link
Contributor Author

@swift-ci Please smoke test

The one for enum raw value checking isn't serialized anywhere, so we
can use normal in-process hashing. The ones in the Clang importer's
lookup tables are serialized, but the lookup table is already only
valid for a particular compiler build anyway (it goes into a Clang
PCM, which has the full compiler version in its cache key).
Unlike compiled modules, swiftdoc files are considered a stable
format, so we can't change how information is stored in them. If we
add any more string-hashed tables to swiftdoc files, we should
consider using a new hash seed for those.
...fulfilling the promised audit from 0747d9a. No intended
functionality change /other/ than the order of already-unsorted lists.
This affected a number of SIL tests that relied on deserialization
order matching the original source order; I have no idea why the old
hash logic would make that the case. If we think that's a valuable
property, we should serialize a list of functions in addition to the
iterable table. (Maybe just in SIB mode?)
@jrose-apple jrose-apple merged commit 9bc5a55 into swiftlang:master Aug 29, 2019
@jrose-apple jrose-apple deleted the po-tay-toes branch August 29, 2019 21:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants