-
Notifications
You must be signed in to change notification settings - Fork 10.5k
[Incremental] Catch conversion failure in Fingerprint #35405
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should just convert the tests - it's not hard to teach Slava's tool to do this.
Fingerprints created by the frontend will otherwise round-trip losslessly, so you are correct that only the tests are wonky here.
I see two issues: one meriting a fix and the other, an understandable oversight meriting discussion: What merits discussion is that Fingerprint chooses limit the quantity of information to 128 bits, instead of an unlimited string. I understand how the present state of the compiler would inspire that limitation. But that limitation is inconsistent with the present state of the tests. I expect that a fixed, hard limit on the size of a fingerprint is good for compiler performance, but might there be a way to get the best of both worlds? |
Increasing the number of bits in a hash does not correlate at all with an increase in desirable properties. For example, SHA-1 is a 256-bit hash, but it is deeply insecure and weaker than (the ironically named for this discussion) SHA-256, and less collision resistant than SipHash. |
While there exist larger fingerprints with worse performance--a fair point!--that does not rule out the potential need for larger fingerprints in order to get better performance. From an information theoretic standpoint, a fingerprint with more bits can potentially fulfill its purpose better. By placing a fixed, hard limit on the size of a fingerprint, the existing code will make it harder to expand the fingerprint in the future, should we deem it necessary to do so. That necessity cannot be ruled out because we don't have much data for our code-base regarding the efficacy of the fingerprints, that is, how often a fingerprint fails to catch a mutation in the code. I have heard that other projects have some experience here, but have not investigated. Thus, it seems conceivable that we might enlarge fingerprints someday. The I am OK with leaving length fixed for now; however, we'll have to fix the tests in a manner which will make them slightly harder to debug in the future. |
I don't understand this objection. We have the ability to change the version and format of swiftdeps at any time. If e.g. a larger bitwidth is needed, we have the headroom to make such a change. |
Yes we do, although it would be harder than when the fingerprint was a string.
The fact that a poorer yet larger hash exists struck me as irrelevant to the merits of modifying the code to accommodate varying-length fingerprints today. I also failed to understand the "does not correlate at all" statement, given my understanding of Shannon information theory. Anyway, I'll look into fixing up the tests and adding those fixes to this PR. |
Information theory is simply one aspect. You're considering the data format of the hash, instead of the hash algorithm itself and its result. If you have such a variable-length algorithm in mind, I invite you to propose it and implement it and we'll give it a shootout with the compiler performance suite. Note that a variable-length format is not, in itself, a good thing. One could imagine a terrible hash function on variable-length outputs where the combiner simply appends a 0 or 1 to a bitstring for each quantum of data. You'd collide quite frequently, but you'd have all the entropy you could cram into the thing at your disposal. |
Absolutely right! But doesn't information theory imply that although no fingerprint will be perfect, a longer fingerprint has the potential to be better than a shorter one, as long as the fingerprint is still shorter than the code its hashing. Given that you've chosen an excellent hash, if someday it seems to be not quite good enough, might it not be the case that we need to enlarge it? |
@swift-ci please smoke test |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neat.
@swift-ci test |
Thank you, @CodaFi |
Build failed |
Build failed |
@swift-ci please smoke test |
ea3dbf9
to
d913dfe
Compare
@swift-ci please smoke test |
@swift-ci please smoke test |
@swift-ci please smoke test |
When creating a
Fingerprint
from a string, if the string is not valid hex, information is lost, creating the potential for miscompilation. However, I believe this situation only happens in some tests, for example,Driver/Dependencies/mutual-interface-hash-fine.swift
. This change checks for that failure, but it cannot be merged as-is because some tests will (rightly) fail. Before the advent ofFingerprint
theswift-dependency-tool
would write out the whole string, and thus the tests worked correctly.To see the failure, try converting the
.swift
files from before this PR inDriver/Dependencies/Inputs/mutual-interface-hash-fine
from yaml to binary and back.