Skip to content

[IRGen] Mix -profile-generate into module hash #26774

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Aug 24, 2019

Conversation

beccadax
Copy link
Contributor

When rebuilding a .o file, we hash the IR generated by IRGen and compare it to a hash embedded in the .o file. If they match, we simply skip asking LLVM to optimize and generate code. Certain flags that control our LLVM configuration are mixed into the hash, but the -profile-generate flag was not one of them. This usually wouldn’t matter because profiling would insert additional code into the IR, but in edge cases like empty files or files containing only protocol declarations, it could cause linker errors when profiling was turned off.

This change adds the GenerateProfile flag into the IR hash, ensuring that we always recompile, even if the IR was identical.

Fixes rdar://problem/54126622.

@beccadax beccadax requested a review from gottesmm August 22, 2019 01:11
@beccadax
Copy link
Contributor Author

@swift-ci please smoke test

@beccadax beccadax requested a review from vedantk August 22, 2019 01:26
@gottesmm
Copy link
Contributor

This is for @eeckstein

@vedantk
Copy link
Contributor

vedantk commented Aug 22, 2019

Thanks for working on this. To be on the safe side, should other types of instrumentation be included here as well (say, {a,t}san)?

Copy link
Contributor

@eeckstein eeckstein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@gottesmm
Copy link
Contributor

Question. Would it be possible to change this into some sort of exhaustive switch for Options? That way if someone adds a new option like this, they get a warning telling them that they may need to update this code.

@gottesmm
Copy link
Contributor

In other words, I am worried that the code as written will cause this same bug to occur again. If we had some sort of switch associated with it, at least we could get a compile time warning in such a case. This will cause the writer to at least think about... should this be used in the IR hash?

@jrose-apple
Copy link
Contributor

It's only required for options that don't affect the IR as generated, but arguably being forced to say that explicitly would be nice. You could do this with a .def file but that might feel heavyweight.

@gottesmm
Copy link
Contributor

@jrose-apple exactly. My fear is that someone will just add the option and not think about it. But if one at least has to explicitly do something, that can not happen.

@beccadax
Copy link
Contributor Author

@vedantk Good idea to check. -sanitize=address and -sanitize=thread emit different IR for an empty source file. -sanitize=undefined emits the same IR, but also the same assembly on x86_64 at least. -sanitize=fuzzer emits the same IR but different assembly, so that one should be mixed in too.

@beccadax
Copy link
Contributor Author

beccadax commented Aug 22, 2019

@gottesmm @jrose-apple I don't think the problem here is that someone forgot to update the function when they added an option; I think it's that someone didn't think about this edge case of an empty file, or didn't know about LLVM's behavior in this situation. -profile-generate doesn't usually need to be mixed into the flags because it will normally cause IR changes; only in files with no executable code does it behave incorrectly. If there had been a mechanism to make the programmer think about whether this flag needed to be mixed into the IR hash, they would have thought about it and chosen "No".

The way to prevent similar bugs would be to add a compiler mode which computes the module hash, then if it would normally skip compiling the IR, proceeds anyway and fails if the resulting assembly or .o file are different from the cached version. Then we can encourage folks who add a flag to add tests running the compiler in that mode against some battery of input files.

@jrose-apple
Copy link
Contributor

We could do that in +Asserts builds. We're only losing incrementality by doing so.

When rebuilding a .o file, we hash the IR generated by IRGen and compare it to a hash embedded in the .o file. If they match, we simply skip asking LLVM to optimize and generate code. Certain flags that control our LLVM configuration are mixed into the hash, but the -profile-generate flag was not one of them. This usually wouldn’t matter because profiling would insert additional code into the IR, but in edge cases like empty files or files containing only protocol declarations, it could cause linker errors when profiling was turned off.

This change adds the `GenerateProfile` flag into the IR hash, ensuring that we always recompile, even if the IR was identical.

Fixes rdar://problem/54126622.
We do a particular dance to diagnose errors twice, and we’re about to do it three times.
@beccadax
Copy link
Contributor Author

beccadax commented Aug 22, 2019

This is probably the most complicated and expensive "assertion" I've ever written, but it's in and it works. I've also started to mix -sanitize=fuzzer into the hash.

@beccadax beccadax requested a review from jrose-apple August 22, 2019 23:08
@beccadax
Copy link
Contributor Author

@swift-ci please test

@swift-ci
Copy link
Contributor

Build failed
Swift Test OS X Platform
Git Sha - ef3868fba3b7c8bf548ea413a3e33cc5e297f9fb

@swift-ci
Copy link
Contributor

Build failed
Swift Test Linux Platform
Git Sha - ef3868fba3b7c8bf548ea413a3e33cc5e297f9fb

Copy link
Contributor

@jrose-apple jrose-apple left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work. cc @eeckstein

std::string message;
if (0 != llvm::DiffFilesWithTolerance(OutputFilename,
OriginalOutputFilename,
/*tolerances=*/0, 0, &message)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like overkill. Check out swift::moveFileIfDifferent.

This separates the “do these two files have the same contents?” logic from the “move or delete” logic in `moveFileIfDifferent()`, creating a useful helper function. It also ties the special-case behavior for the `destination` parameter to a flag, since we have a use where we won’t want that.
In assert builds, when performLLVM() would normally skip invoking LLVM because it believes it would generate the same code, it now generates code into a temporary file and compares it to the output. This should catch mistakes where Swift configures LLVM in ways which affect the output, but which the incremental logic doesn’t account for.
@@ -259,6 +259,8 @@ class IRGenOptions {
unsigned Hash = (unsigned)OptMode;
Hash = (Hash << 1) | DisableLLVMOptzns;
Hash = (Hash << 1) | DisableSwiftSpecificLLVMOptzns;
Hash = (Hash << 1) | GenerateProfile;
Hash = (Hash << 1) | Sanitizers.contains(SanitizerKind::Fuzzer);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly we might as well mix in the whole thing. It's not just about empty files; it's also about *non-*empty files that only change due to LLVM passes, and I'm not sure I trust that all the other sanitizers don't need this.

@@ -0,0 +1 @@
// This space intentionally left blank.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: There's already a test/Inputs/empty.swift you can use.

@jrose-apple
Copy link
Contributor

Thanks for the refactor!

@beccadax
Copy link
Contributor Author

@swift-ci please smoke test

Replaces getLLVMCodeGenOptionsHash(), which combined a bunch of individual bits into a string, with writeLLVMCodeGenOptionsTo(), which writes each one separately into a raw_ostream.

This is intended to be make the hashing more future-proof. The current design needs to know exactly how many bits each of the values needs; if any of the values grew and you forgot to update this function, its bits would interfere with those of an earlier value in the hash. This new design is expected to be slightly slower, but more robust to future change in the compiler.
Mixes the state of all sanitizers, not just the fuzzer, into the module hash used to decide whether to skip LLVM codegen. I don’t actually know of a case where one of the other sanitizers will generate identical IR and different machine code, but being defensive costs us very little.
@beccadax
Copy link
Contributor Author

@jrose-apple I mixed in all of the sanitizers, and also redesigned the hash slightly to avoid any issues when we add more sanitizers/optimization modes/values to these fields in general.

@beccadax
Copy link
Contributor Author

@swift-ci please smoke test

@beccadax beccadax merged commit 3fe9333 into swiftlang:master Aug 24, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants