-
Notifications
You must be signed in to change notification settings - Fork 10.5k
[Batch mode]: Move SupplementaryOutputs into each InputFile and use the proper supplementary output. (7) #14702
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Batch mode]: Move SupplementaryOutputs into each InputFile and use the proper supplementary output. (7) #14702
Conversation
@swift-ci please smoke test os x platform |
2820faf
to
7e8f681
Compare
@swift-ci please smoke test os x platform |
7e8f681
to
5ccda20
Compare
@swift-ci please smoke test os x platform |
5ccda20
to
66f8f4d
Compare
@swift-ci please smoke test os x platform |
66f8f4d
to
85ba7d0
Compare
@swift-ci please smoke test os x platform |
85ba7d0
to
1439261
Compare
@swift-ci please smoke test os x platform |
1439261
to
d0e8c37
Compare
@swift-ci please smoke test os x platform |
d0e8c37
to
a2267cb
Compare
@swift-ci please smoke test os x platform |
a2267cb
to
a4e92f1
Compare
@swift-ci please smoke test os x platform |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I skimmed this just to see if I had any major concerns, as you mentioned, and I don't! This looks mostly like what I pictured from our discussions.
I will say that there are some outputs that only make sense for a non-primary-based build today (merge-modules or WMO), and it's weird to pass an explicit ""
to talk about those. Maybe those few outputs should get convenience entry points? Or even make those the only entry points, so that people don't accidentally think they're supported in primary-file or batch modes.
Great, tx! Yes, that’s the sort of cleanup I’ve been thinking about. Tx |
@jrose-apple, when you review this, would you like to look at the TBD file? Even though writeTBDIfNeeded does the right thing with output paths, I'm not sure that what is actually written depends on the primary being served, so what's there might not make sense. Tx. |
Yeah, let's see:
Maybe the answer is just to move ObjCHeaderOutputPath back to FrontendOptions, if it's only that one. (I feel like we did this exercise before, but I don't remember what happened.) |
It looks like FrontendOptions also has a few other output paths: FixitsOutputPath [sic], DumpAPIPath, and GroupInfoPath. We might need to ask Argyrios or Xi how these are supposed to be used. |
@swift-ci please smoke test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, still have more. (And I'm waiting for this one too!)
lib/FrontendTool/FrontendTool.cpp
Outdated
Invocation.getFrontendOptions() | ||
.InputsAndOutputs.supplementaryOutputs() | ||
.SerializedDiagnosticsPath; | ||
const std::string SerializedDiagnosticsPath = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why make this a std::string
at all? (here and elsewhere). You know the Invocation is alive in the current stack frame, so everything it owns is alive too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And if it's because we need the path to be a C string, that's probably a reason to just change all the path-returning methods to use const std::string &
instead of StringRef
. That's an exception I forgot to mention that Graydon already ran into on the Driver side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed it to a StringRef. (But in general StringRefs give me a queasy feeling because one has to know a lot to guarantee safety.)
As far as the C string issue, which is not here, but may be elsewhere, are you saying that all the path-returning methods should be changed to const std::string &
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const std::string &
provides a guarantee that StringRef doesn't, which is null-termination, at the cost of not being able to use other string types (like llvm::SmallString). If that guarantee would save on copies elsewhere, it might be worth it.
It's true that StringRef isn't safe-by-design and std::string is, but copies add up. I view this as the same sort of eating-your-vegetables performance as bitfield packing: it's not that any individual copy will make things worse, but that just means that if you don't avoid copies by default things will be slower and the profile won't show any hotspots.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think of two issues here:
-
StringRef
vsconst std::string &
: It seems to me that since we are storing strings with null termination, I might as well have the routines in question returnconst std::string &
as you suggest. I'll make the changes. -
StringRef
(orstd::string const &
) vs copying astd::string
. There are two sub-issues here for me: the cost and the benefits:
2a: The cost of copying: (I'm going to be very very approximate here.) Suppose the average path string is 100 characters, and that it costs, say 1000 clocks to copy it. Then on a 3GHz machine, that's 1 ns per copy, if I've the math right. Suppose we're talking about 100 of these strings, that's 100 ns to copy the whole set. Now this path copying isn't (as far as I know) done over-and-over, so I'll assume the whole set is copied 10 times. (Am I wrong here?) (These aren't identifier names that might be looked up every time they are used in a program, but just file names which are mainly used to open output files.) So the whole cost of copying these names is 1 µs. If I have this right, we're talking 1 µs to copy these path names instead of passing references. That's the cost.
2b. What's the benefit? (Or, conversely, what's the cost of passing refs?) I don't think the analogy to bitfield packing holds because I can isolate bitfield packing to a single, simple class and thus reason fairly locally about the correctness of the optimization. OTOH, in order to be sure ref-passing code is correct, I have to first follow a call-chain down to a leaf function so I can see where the actual string is allocated, then I have to follow the structure chain up to find the outermost enclosing structure, then I have to figure out where that structure might be freed. The reasoning required to understand ref-passing code seems to be much much much more harder, time-consuming, and error-prone than the reasoning required to verify bit-field packing.
So, my conclusion is that the tradeoff in this case leans quite heavily towards copying strings and away from passing references. I fear that, even if I get all the tests to pass for a ref-passing version, I will have creating lurking bugs. Or, someday, someone else may evolve the code in a way as to create lurking bugs. If one such is created and is reproducible, than a sanitizer can help find and fix them, but that seems like a high price to pay.
Please let me know what you think. I may well be missing some important fact about the compiler, or C++, or something else. If so, it would help me do better work for the team to correct my misunderstanding.
For now, I will switch over to the string refs.
Thanks!
lib/FrontendTool/FrontendTool.cpp
Outdated
const bool hadEmitIndexDataError = emitIndexDataIfNeeded( | ||
Instance.getPrimarySourceFile(), Invocation, Instance); | ||
const bool hadEmitIndexDataError = | ||
emitIndexDataForAllPrimariesOrModuleIfNeeded(Invocation, Instance); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: does this really need a separate name? emitIndexDataIfNeeded
that doesn't take a particular file seems pretty clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many of these functions only deal with a single primary. Why not use a longer name for the sake of clarity? When I made these changes, I had to inspect functions like emitIndexData
and follow the call chains down to see what they did. A doxygen comment could give the info, but that takes an additional gesture to pull it up. Or, one can gather that a specific primary isn't passed as an argument, if one understands that Instance is primary-invarient (which a noob might not),
but one cannot gather that non-primaries do not have index data emitted. But I'll change it back if you like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was definitely basing this on the arguments. emitFoo(SF, Instance)
is almost certainly specific to SF
, and therefore emitFoo(Instance)
would not be in a world where batch mode is commonplace. The exact meaning of "not just one file" is ambiguous, I suppose—is it all primaries or all inputs?—but in practice we don't do anything for non-primary files.
For me, I'd balance this by putting the explanation in a comment, but not putting it in the function name. It's adding more cognitive burden than it's reducing. But you can keep pushing back on that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed it to emitIndexData
with a Doxygen comment.
lib/FrontendTool/FrontendTool.cpp
Outdated
@@ -121,7 +121,7 @@ static bool emitMakeDependencies(DiagnosticEngine &diags, | |||
llvm::SmallString<256> pathBuf; | |||
auto escape = [&](StringRef raw) -> StringRef { | |||
pathBuf.clear(); | |||
|
|||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: stray indentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, but git-clang-format does this, sigh.
lib/FrontendTool/FrontendTool.cpp
Outdated
@@ -148,23 +148,21 @@ static bool emitMakeDependencies(DiagnosticEngine &diags, | |||
out << ' ' << escape(path); | |||
// Then print dependencies we've picked up during compilation. | |||
for (auto const &path : | |||
reversePathSortedFilenames(depTracker.getDependencies())) | |||
reversePathSortedFilenames(depTracker.getDependencies())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: lost indentation. (Yes, Xcode is not helpful here.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. But git-clang-format indents it the other way.
AllInputs.push_back(input); | ||
InputsByName.insert(std::make_pair(AllInputs.back().file(), index)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you find the case where this is necessary?
include/swift/Frontend/InputFile.h
Outdated
@@ -63,15 +68,33 @@ class InputFile { | |||
|
|||
/// Return Swift-standard file name from a buffer name set by | |||
/// llvm::MemoryBuffer::getFileOrSTDIN, which uses "<stdin>" instead of "-". | |||
static StringRef convertBufferNameFromLLVM_getFileOrSTDIN_toSwiftConventions( | |||
static std::string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's okay for this one to stay StringRef to avoid copies. Yes, it means that the input can't be an expiring std::string, but that's pretty unlikely in practice, and ASan would catch it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, tx for catching this. Fixed.
@@ -199,7 +199,8 @@ class FrontendOptions { | |||
bool ImportUnderlyingModule = false; | |||
|
|||
/// If set, the header provided in ImplicitObjCHeaderPath will be rewritten | |||
/// by the Clang importer as part of semantic analysis. | |||
/// by the Clang importer as part of semantic analysis if the ModuleOutputPath | |||
/// for the primary input being compiled is not empty. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not quite right; in practice this only happens when there aren't primaries. (That's an approximation of "are we emitting a swiftmodule for the entire module, rather than just one file?".) I would just leave the old comment, even though you're right that it doesn't happen if we don't actually output a swiftmodule.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed what had been added to the comment.
void forEachInputProducingAMainOutputFile( | ||
llvm::function_ref<void(const InputFile &)> fn) const; | ||
/// | ||
/// If \pfn returns true, return early and return true. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: missing space after \p
again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Fixed.
/// Should combine all primaries for stats reporter | ||
/// Today, just answers "batch" if > 1, so it could | ||
/// return a StringRef, but someday will have to fabricate | ||
/// a temporary string. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't currently return a StringRef, so the comment doesn't make sense. (It is still something we have to fix, but picking the first filename is probably good enough.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There exists an interpretation of the words that is what I meant to say, but I shortened the comment and removed the whole issue of return types. I added "FIXME:".
@@ -103,15 +91,17 @@ class FrontendInputsAndOutputs { | |||
|
|||
bool isReadingFromStdin() const; | |||
|
|||
void forEachInput(llvm::function_ref<void(const InputFile &)> fn) const; | |||
/// If \p fn returns true, exits early and returns true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: semicolon instead of period.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Fixed.
Thanks for these comments. Have started on them, but not pushed the fixes yet. Am also waiting for the result of the mapping-non-primaries experiment. Have to quit working for now. |
You are correct, so now the code only maps primary input names to inputs. |
@swift-ci please smoke test |
1 similar comment
@swift-ci please smoke test |
include/swift/Frontend/InputFile.h
Outdated
/// input. If there is no such file, contains an empty string. If the output | ||
/// is to be written to stdout, contains "-". | ||
std::string OutputFilename; | ||
/// Contains the OutputFilename, which is the name of the main output file, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: This comment no longer sounds correct for what PSPs
is; it's mixing a description of the type with the description of the instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great catch! I'm taking a crack at better comments both here and at the declaration of PrimarySpecificPaths
, but I could use your help for them. (Haven't pushed these yet.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have now pushed.
const StringRef firstInput = inputsAndOutputs.hasSingleInput() | ||
? inputsAndOutputs.getFilenameOfFirstInput() | ||
: StringRef(); | ||
const std::string firstInput = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the copy here necessary? (Yes, I know it's just one string, but now that we've been talking about copies I figured I'd be more of a stickler.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion! I like it because firstInput
gets passed into a constructor for OutputFilesComputer
, and that class must take ownership of the information anyway.
So I think there's no concern about leaky invariants here. Pure win!
I'll change it to:
const StringRef firstInput = inputsAndOutputs.hasSingleInput() ? StringRef(inputsAndOutputs.getFilenameOfFirstInput()) : StringRef();
.
return hasInputs() ? StringRef(lastInputProducingOutput().outputFilename()) | ||
: StringRef(); | ||
return hasInputs() ? lastInputProducingOutput().outputFilename() | ||
: std::string(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This ends up returning a reference to a temporary. I guess it doesn't always work to return const std::string &
.
Hint: there's a "please ASan test" for swift-ci if you don't want to do it locally, although it takes hours.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the hint! I didn't see it documented anywhere, and yes, I really could have used it!
I have changed the code in question.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docs/ContinuousIntegration.md is the place to look :-) but only if you know there's something to look for!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the pointer!
lib/FrontendTool/FrontendTool.cpp
Outdated
? "lib" + Invocation.getModuleName().str() + ".dylib" | ||
: Invocation.getFrontendOptions().TBDInstallName; | ||
|
||
return writeTBD(Instance.getMainModule(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't really makes sense. I know I said TBD files could be generalized to individual files, but they haven't been, as evidenced by this call here that doesn't specify a primary. Even if we move TBDPath
into the per-file outputs, there should still be no loop in this function until it does something meaningful.
(Sorry I didn't see this one sooner. Note that the same ought to apply to Make-style dependencies, but doesn't because we pretend that each file could have different dependencies.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to confirm, are you pointing out that the TBDPath
does not belong in PrimarySpecificPaths
? If that's the case (and I suspect it may be), I'll move it out. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be one reasonable resolution here. Another would be to remove the loop and have a getTBDPathForWholeModule accessor like the AtMostOnePrimary ones you already have, which just looks at the first input file all the time. I don't think I have a preference between those two.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've now pushed a redone writeTBDIfNeeded
. If will fail an assertion if a TBDPath is specified when not in WMO mode. Is that the right behavior?
StringRef outputPath); | ||
bool emitReferenceDependencies(DiagnosticEngine &diags, SourceFile *SF, | ||
DependencyTracker &depTracker, | ||
const std::string &outputPath); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The counterpoint to "things should return const std::string &
if that's a useful guarantee" is "things should accept StringRef if they don't need a null-terminated string". Usual precondition/postcondition improvements, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup. What if the string will be used as a null-terminated string? (I assume that's the case here, since it will be passed to a system call eventually.) If the parameter is a std::string (ref), then a caller with a StringRef would have to create a (temporary) std::string to pass in and would pay the execution cost (negligible here) to do the conversion. But a caller with a std::string would just pass it in for free.
OTOH, if this function takes a StringRef, it always has to eventually do the conversion, since the path is passed down to Unix.
But there's likely a piece of the puzzle I'm missing. I'm assuming that a StringRef is not automatically converted to a std::string, so that the execution cost is explicit. I'm also assuming that a StringRef does not remember whether it was created from a null-terminated string or not.
Help me understand what I'm missing. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's that last bit: a StringRef does not know whether it came from a null-terminated string. (It would be totally reasonable for it to do so, simply by stealing a bit out of the "size" field, but it doesn't.) That means that if a string ever passes through a StringRef layer in the API, you've lost that information.
In this case, you're right that the string will end up null-terminated because it ends up at a system call. But the API we use to perform that system call is already an LLVM wrapper that takes an arbitrary StringRef, so there's already no way to take advantage of the information. Therefore it's not worth limiting our API.
(This is all pretty subtle, so if you want a simpler rule, it's "don't bother optimizing for null-terminated strings at the expense of losing StringRef", because while that is a copy and I've been saying we try to avoid copies, it's a copy that can often be done on the stack or on a bump-pointer allocator at the point where it's needed, rather than needing a heap allocation.)
@swift-ci please ASan test |
@swift-ci Please ASAN test |
@swift-ci please ASan test |
@swift-ci please smoke test |
16d718a
to
d463498
Compare
@swift-ci please smoke test |
@jrose-apple I've made the string type changes. I did not move TBDPath because it is computed the same way as the other supplementaries, and so it's simplest to keep it where it is. However, with a little finagling, I could hack up the code to do it. It would be a bit ugly, though. Anyway, the code is ready for another round of review. I'll be testing it in the meantime. Thanks! |
@swift-ci please smoke test |
@swift-ci Please ASAN test |
1 similar comment
@swift-ci Please ASAN test |
@swift-ci please test |
Build failed |
Build failed |
@davidungar note those 2 build failures above are artifacts of CI noticing you force-pushed and auto-restarting the build; they're not "real" failures. |
@graydon Ah, so that's what those are. Thanks. I thought it was git telling me to shut up:) ("Sha" means shut-up in a culture I'm familiar with.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's take it!
@@ -88,6 +73,9 @@ class FrontendInputsAndOutputs { | |||
|
|||
std::vector<std::string> getInputFilenames() const; | |||
|
|||
/// \return nullptr if not a primry input file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: "primry"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. Tx!
} | ||
|
||
// Input queries | ||
|
||
bool FrontendInputsAndOutputs::shouldTreatAsLLVM() const { | ||
if (hasSingleInput()) { | ||
StringRef Input(getFilenameOfFirstInput()); | ||
const std::string &Input(getFilenameOfFirstInput()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a small thing, but this one, for example, is probably still better as StringRef. Being a const std::string &
makes me wonder what's special about Input
that you couldn't use StringRef, and then it's only used for APIs that take StringRef anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tx, fixed.
2729c30
to
4866df6
Compare
@swift-ci please smoke test |
Great! I've made those last two small fixes, collapsed the commits, and will merge as soon as the smoke tests clear. Yippee! Thanks for reviewing it so quickly! |
For batch mode, we need one set of supplementary outputs for each primary input. Move the supplementary outputs to the InputFiles.