[Incremental] Catch conversion failure in Fingerprint #35405

davidungar · 2021-01-13T05:21:34Z

When creating a Fingerprint from a string, if the string is not valid hex, information is lost, creating the potential for miscompilation. However, I believe this situation only happens in some tests, for example, Driver/Dependencies/mutual-interface-hash-fine.swift. This change checks for that failure, but it cannot be merged as-is because some tests will (rightly) fail. Before the advent of Fingerprint the swift-dependency-tool would write out the whole string, and thus the tests worked correctly.

To see the failure, try converting the .swift files from before this PR in Driver/Dependencies/Inputs/mutual-interface-hash-fine from yaml to binary and back.

CodaFi

We should just convert the tests - it's not hard to teach Slava's tool to do this.

Fingerprints created by the frontend will otherwise round-trip losslessly, so you are correct that only the tests are wonky here.

davidungar · 2021-01-13T19:05:44Z

We should just convert the tests - it's not hard to teach Slava's tool to do this.

Fingerprints created by the frontend will otherwise round-trip losslessly, so you are correct that only the tests are wonky here.

I see two issues: one meriting a fix and the other, an understandable oversight meriting discussion:
The definite problem is that there was no check in the fromString: input. That code could not deal with a non-hex number, but it did not check. You did put in an assertion which is great, but it turned out that that assertion was in the wrong place to check the input. I'll be refining the PR to fix that.

What merits discussion is that Fingerprint chooses limit the quantity of information to 128 bits, instead of an unlimited string. I understand how the present state of the compiler would inspire that limitation. But that limitation is inconsistent with the present state of the tests. I expect that a fixed, hard limit on the size of a fingerprint is good for compiler performance, but might there be a way to get the best of both worlds?

CodaFi · 2021-01-13T19:21:11Z

What merits discussion is that Fingerprint chooses limit the quantity of information to 128 bits

Increasing the number of bits in a hash does not correlate at all with an increase in desirable properties. For example, SHA-1 is a 256-bit hash, but it is deeply insecure and weaker than (the ironically named for this discussion) SHA-256, and less collision resistant than SipHash.

davidungar · 2021-01-13T21:03:09Z

What merits discussion is that Fingerprint chooses limit the quantity of information to 128 bits

Increasing the number of bits in a hash does not correlate at all with an increase in desirable properties. For example, SHA-1 is a 256-bit hash, but it is deeply insecure and weaker than (the ironically named for this discussion) SHA-256, and less collision resistant than SipHash.

While there exist larger fingerprints with worse performance--a fair point!--that does not rule out the potential need for larger fingerprints in order to get better performance. From an information theoretic standpoint, a fingerprint with more bits can potentially fulfill its purpose better. By placing a fixed, hard limit on the size of a fingerprint, the existing code will make it harder to expand the fingerprint in the future, should we deem it necessary to do so.

That necessity cannot be ruled out because we don't have much data for our code-base regarding the efficacy of the fingerprints, that is, how often a fingerprint fails to catch a mutation in the code. I have heard that other projects have some experience here, but have not investigated. Thus, it seems conceivable that we might enlarge fingerprints someday. The Fingerprint abstraction will help do that if/when we do.

I am OK with leaving length fixed for now; however, we'll have to fix the tests in a manner which will make them slightly harder to debug in the future.

CodaFi · 2021-01-13T21:29:05Z

By placing a fixed, hard limit on the size of a fingerprint, the existing code will make it harder to expand the fingerprint in the future, should we deem it necessary to do so.

I don't understand this objection. We have the ability to change the version and format of swiftdeps at any time. If e.g. a larger bitwidth is needed, we have the headroom to make such a change.

davidungar · 2021-01-13T22:39:38Z

By placing a fixed, hard limit on the size of a fingerprint, the existing code will make it harder to expand the fingerprint in the future, should we deem it necessary to do so.

I don't understand this objection. We have the ability to change the version and format of swiftdeps at any time. If e.g. a larger bitwidth is needed, we have the headroom to make such a change.

Yes we do, although it would be harder than when the fingerprint was a string.
I objected because I didn't understand the relevance of:

Increasing the number of bits in a hash does not correlate at all with an increase in desirable properties. For example, SHA-1 is a 256-bit hash, but it is deeply insecure and weaker than (the ironically named for this discussion) SHA-256, and less collision resistant than SipHash.

The fact that a poorer yet larger hash exists struck me as irrelevant to the merits of modifying the code to accommodate varying-length fingerprints today.

I also failed to understand the "does not correlate at all" statement, given my understanding of Shannon information theory.

Anyway, I'll look into fixing up the tests and adding those fixes to this PR.

CodaFi · 2021-01-13T22:59:50Z

Information theory is simply one aspect. You're considering the data format of the hash, instead of the hash algorithm itself and its result. If you have such a variable-length algorithm in mind, I invite you to propose it and implement it and we'll give it a shootout with the compiler performance suite.

Note that a variable-length format is not, in itself, a good thing. One could imagine a terrible hash function on variable-length outputs where the combiner simply appends a 0 or 1 to a bitstring for each quantum of data. You'd collide quite frequently, but you'd have all the entropy you could cram into the thing at your disposal.

davidungar · 2021-01-14T00:10:58Z

Information theory is simply one aspect. You're considering the data format of the hash, instead of the hash algorithm itself and its result. If you have such a variable-length algorithm in mind, I invite you to propose it and implement it and we'll give it a shootout with the compiler performance suite.

Note that a variable-length format is not, in itself, a good thing. One could imagine a terrible hash function on variable-length outputs where the combiner simply appends a 0 or 1 to a bitstring for each quantum of data. You'd collide quite frequently, but you'd have all the entropy you could cram into the thing at your disposal.

Absolutely right! But doesn't information theory imply that although no fingerprint will be perfect, a longer fingerprint has the potential to be better than a shorter one, as long as the fingerprint is still shorter than the code its hashing.

Given that you've chosen an excellent hash, if someday it seems to be not quite good enough, might it not be the case that we need to enlarge it?

davidungar · 2021-01-14T22:50:42Z

@swift-ci please smoke test

davidungar · 2021-01-14T23:02:09Z

Fingerprint::fromString now always checks for information loss, returning an optional. I believe this check is worthwhile because:

The data being checked are read in from a file, either in the driver or when testing. It's good to check for file corruption, and
Given all the other work in reading the file, I don't think we'll see a noticeable performance regression in the driver.

CodaFi

Neat.

CodaFi · 2021-01-15T05:02:13Z

@swift-ci test

davidungar · 2021-01-15T05:49:58Z

Thank you, @CodaFi

swift-ci · 2021-01-15T06:15:22Z

Build failed
Swift Test Linux Platform
Git Sha - 67028b9f4f76e85e4a29a039398b52746c1c9cda

swift-ci · 2021-01-15T07:02:25Z

Build failed
Swift Test OS X Platform
Git Sha - 67028b9f4f76e85e4a29a039398b52746c1c9cda

davidungar · 2021-01-15T21:25:54Z

@swift-ci please smoke test

davidungar · 2021-01-15T21:59:45Z

@swift-ci please smoke test

davidungar · 2021-01-15T22:55:41Z

@swift-ci please smoke test

davidungar · 2021-01-16T19:48:31Z

@swift-ci please smoke test

davidungar requested a review from CodaFi January 13, 2021 05:23

CodaFi requested changes Jan 13, 2021

View reviewed changes

davidungar changed the title ~~[DNM; Incremental] Catch conversion failure in Fingerprint~~ [Incremental] Catch conversion failure in Fingerprint Jan 14, 2021

CodaFi approved these changes Jan 15, 2021

View reviewed changes

David Ungar added 12 commits January 15, 2021 13:46

Catch conversion failure in Fingerprint

2ffa369

Allow Fingerprint::fromString to fail, returning None on bad input.

7f36ab1

fix inputs

4c03e67

handle conversion failure differently

a094afe

Show bad fingerprint.

6049788

mutual runs

2f7fc65

fail-with-bad-deps-fine

37e663a

chained-private-after-fine

762e0ea

fail-on-interface-hash-file

3547b38

mutual fine

760d43b

Adapt only-skip-once.swift to work with Swift Driver, too.

d1e6f88

Fix the driver unittests

a78cfad

Check for fromString failure in ModuleFile.cpp

d913dfe

davidungar force-pushed the fingerprint-assert branch from ea3dbf9 to d913dfe Compare January 15, 2021 21:57

Fix Mocking code.

a64f3f2

David Ungar added 3 commits January 15, 2021 16:10

XFAIL only-skip-once on Windows

c9e12d3

Fix fingerprint formation for unit driver tests

f17ad10

Fix fingerprints in unit tests of driver.

dbe4b6d

davidungar merged commit 0c99379 into swiftlang:main Jan 16, 2021

[Incremental] Catch conversion failure in Fingerprint #35405

[Incremental] Catch conversion failure in Fingerprint #35405

Uh oh!

Conversation

davidungar commented Jan 13, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CodaFi left a comment

Choose a reason for hiding this comment

Uh oh!

davidungar commented Jan 13, 2021

Uh oh!

CodaFi commented Jan 13, 2021

Uh oh!

davidungar commented Jan 13, 2021

Uh oh!

CodaFi commented Jan 13, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

davidungar commented Jan 13, 2021

Uh oh!

CodaFi commented Jan 13, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

davidungar commented Jan 14, 2021

Uh oh!

davidungar commented Jan 14, 2021

Uh oh!

davidungar commented Jan 14, 2021

Uh oh!

CodaFi left a comment

Choose a reason for hiding this comment

Uh oh!

CodaFi commented Jan 15, 2021

Uh oh!

davidungar commented Jan 15, 2021

Uh oh!

swift-ci commented Jan 15, 2021

Uh oh!

swift-ci commented Jan 15, 2021

Uh oh!

davidungar commented Jan 15, 2021

Uh oh!

davidungar commented Jan 15, 2021

Uh oh!

davidungar commented Jan 15, 2021

Uh oh!

davidungar commented Jan 16, 2021

Uh oh!

Uh oh!

davidungar commented Jan 13, 2021 •

edited

Loading

CodaFi commented Jan 13, 2021 •

edited

Loading

CodaFi commented Jan 13, 2021 •

edited

Loading