Skip to content

Wrap Command Line Arguments in a Response File if System Limits are Exceeded #16362

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Jun 21, 2018

Conversation

dabelknap
Copy link
Contributor

This is a follow-up to PR 15853, which allows the Swift compiler to accept response files. When the driver attempts to submit frontend jobs, the jobs will fail if the command line argument length exceeds the system limit. This PR automatically wraps the command line arguments in a temporary response file in the event the system limits are exceeded.

We use an implementation pattern similar to filelists. The temporary response file is reserved when the Job object is created, but only when the argument vector exceeds the system limits. The arguments are not written to the temporary response file until the Job is added to the TaskQueue.

  • It may be the case that with these changes -filelist and -temporary-filelist are no longer needed. They can be removed in this PR, or in a separate one if desired.
  • The -save-temps flag does preserve the temporary response files, if any are created. Otherwise, they are removed when the compiler cleans up temporary files.
  • In the event of a crash, the stack trace does show the temporary response files in the argument list, if any are created.

The related Swift Forum discussion is here.

Resolves SR-4517.

@harlanhaskins
Copy link
Contributor

@swift-ci please smoke test

@harlanhaskins harlanhaskins requested a review from jrose-apple May 9, 2018 00:28
@jrose-apple jrose-apple requested a review from davidungar May 9, 2018 16:35
Copy link
Contributor

@jrose-apple jrose-apple left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for continuing to push this forward, Austin. Some comments on what's there but I like this better than the original version.

@@ -321,6 +325,8 @@ class LinkJobAction : public JobAction {
static bool classof(const Action *A) {
return A->getKind() == Action::Kind::LinkJob;
}

virtual bool allowsResponseFiles() const { return true; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure this is a toolchain-dependent decision, and even without that it doesn't really make sense at the Action level. Action is supposed to be independent of the executable you use to accomplish that action.

/// Argument vector containing a single argument pointing to the response file
/// path with the '@' prefix.
/// The argument string must be kept alive as long as the Job is alive.
llvm::opt::ArgStringList ResponseFileArg;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like overkill. Can't we just keep a std::string around?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was created to address the following issues:

  1. To use the response file, we need to pass a different argument list to TQ->addTask that lasts as long as the Job object.
  2. The Job object is const shortly after its creation, so we can't modify its existing argument list when it is added to the Task Queue.

I would have preferred to create the response file ArgStringList directly in addPendingJobToTaskQueue and then populate it with the name of the response file as a std::string, retrieved from the Job. However, I could not find a way to make sure the ArgStringList would persist as long as necessary for the Job to be executed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't respond to this at the time, so clearly I understood it then, but I still don't see why that wouldn't be a std::string.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TaskQueue::addTask takes ArrayRef<const char *> Args as its second argument and not a std::string.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A std::string carries a const char * inside it at a constant address (if not modified), which is all you need to build an ArrayRef.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I use a std::string, its memory gets cleaned up when the Job goes out of scope, and this is before the task actually gets executed. The compiler then tries to execute something like this:

swift-UH��H��0H�}�H�u�H�U�H�}���� 

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, the string needs to live on the llvm::opt::ArgList that ends up being owned by the Compilation. Okay, but a const char * would work then, too, right? No need for a whole vector.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We ended up having to modify Job::getResponseFileArg() to return a const char * const&, which takes advantage of the one-element constructor for ArrayRef when calling addTask. The const & is necessary to ensure that it persists as long as the Job object. Without it, we read junk memory.

@@ -352,7 +352,11 @@ void Job::printCommandLineAndEnvironment(raw_ostream &Stream,
void Job::printCommandLine(raw_ostream &os, StringRef Terminator) const {
escapeAndPrintString(os, Executable);
os << ' ';
printArguments(os, Arguments);
if (hasResponseFile()) {
printArguments(os, ResponseFileArg);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems wrong; we'd want to dump the regular command line, but note that the job will be using a response file (similar to how we do environment variables).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not exactly sure how the output should be formatted (I don't see environment variables printed when I use -v or -debug-crash-immediately). As written, it would look similar to:

/usr/bin/swift @/tmp/arguments-70c499.resp

A possible alternative would be:

[Arguments: @/tmp/arguments-70c499.resp]
/usr/bin/swift -frontend -c -primary-file main.swift test.swift -target x86_64-unknown-linux-gnu -disable-objc-interop -color-diagnostics -module-name test -o /tmp/main-6d68e0.o

Also, wouldn't you run into the same issue when filelists are used?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true, but filelists hold much less information. (Though it's still important enough that the frontend jumps through hoops to show them in the PrettyStackTrace.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, is my suggested alternative sufficient?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, sorry. The idea is, but we still want one command per line. I could see either

/usr/bin/swift @/tmp/arguments-70c499.resp # -frontend -c -primary-file main.swift …

or

/usr/bin/swift -frontend -c -primary-file main.swift … # [Arguments: @/tmp/arguments-70c499.resp]

What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer the first of your two options since it is literally what is being executed. Plus, a user could actually execute the first one, whereas the second would likely fail due to too many arguments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am looking at how to implement this for the stack trace during a crash, and ran into some issues. Printing out the program and its arguments during a crash is handled by llvm::PrettyStackTrace, and it just loops over the argv. The problem is that the argv is initialized in the first line of main in tools/driver/driver.cpp. At this point, the executable has no knowledge of the command line arguments if a response file is being used. I would either 1) move PROGRAM_START after the response files are expanded, or 2) modify the internals of PrettyStackTrace to expand any response files. It looks like clang initializes the stack trace before expanding response files: https://github.com/apple/swift-clang/blob/stable/tools/driver/driver.cpp#L326

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd prefer (1). AFAIK Clang never uses response files to communicate between the driver and frontend, and those are the most important crash logs. @davidungar?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should point out that if I move PROGRAM_START, the response file name and path will not be printed, since it is consumed when expanded.


// RUN: python -c 'for i in range(500001): print "-DTEST7_" + str(i)' > %t.9.resp
// RUN: not %target-swiftc_driver %s @%t.9.resp -Xfrontend -debug-crash-immediately 2>&1 | %FileCheck %s -check-prefix=CRASH
// CRASH: @{{[^ ]*}}arguments-{{[0-9a-zA-Z]+}}.resp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We definitely don't want this behavior. The crash log should have the expanded form of the response file, not the before-expansion form. Otherwise we have no chance of reproducing the crash.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should still be possible to reproduce the crash in this situation. Temporary files are automatically saved during a crash, correct? The intermediate response files are registered as ordinary temporary files.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true locally, but not very useful when a user sends us a crash log and we want to see what they were doing.

@@ -403,6 +407,30 @@ void Job::printSummary(raw_ostream &os) const {
os << "}";
}

static void
writeResponseFile(llvm::raw_ostream &OS, const llvm::opt::ArgStringList &Arguments) {
// wrap arguments in quotes to ensure compatibility with Unix and Windows
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: Please use complete sentences in comments, including capitalization and a period at the end.

OS << '"';
for (; *arg != '\0'; arg++) {
if (*arg == '\"' || *arg == '\\') {
OS << '\\'; // escape quote and backslash characters
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I keep looking around for an existing function that does this. The closest is escapeAndPrintString; any chance we can combine these?


void Job::writeArgsToResponseFile() const {
std::string responseContents;
llvm::raw_string_ostream SS(responseContents);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use raw_fd_ostream here and just write directly to the file. No need to write to an in-memory buffer and then export it.

Also, you should probably have a way of handling failure here. (What if the temporary directory isn't writable?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure of the right way to handle a potential error since I don't appear to have access to the Diagnostic Engine within Job. There is an example in Compilation.cpp where raw_fd_ostream is used, but there is a FIXME for the error handling.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could just returning a bool or an llvm::Error from the function for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Note that writeFileWithEncoding can fail just as much as raw_fd_ostream can.)

llvm::opt::ArgStringList responseFileArg = {};
if (!llvm::sys::commandLineFitsWithinSystemLimits(executablePath,
invocationInfo.Arguments)
&& JA.allowsResponseFiles()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: You should probably test these two in the opposite order, so we don't waste time measuring arguments if we can't use a response file anyway.


// RUN: python -c 'for i in range(500001): print "-DTEST6_" + str(i)' > %t.8.resp
// RUN: %empty-directory(%t/tmp)
// RUN: env TMPDIR=%t/tmp/ %target-swiftc_driver %s @%t.8.resp -save-temps
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please include something for TEST6 to test, so that we know the arguments are actually getting read and there isn't just some kind of silent success.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TEST6 is specifically testing that the temporary response files are being saved with -save-temps. TEST5 uses an input identical to that of TEST6, and it does test that the flags are being read. That being said, does TEST6 need to test that the flags are being read?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, that wasn't obvious to me. What if you grouped this RUN line with the previous ones and reused TEST5's response file?

(Also noticing that the response files and the compilation conditions should be using the same numbering to avoid confusion.)

Copy link
Contributor

@davidungar davidungar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty reasonable, but I'd like to see some changes to help the next person better understand what's going on.

/// Argument vector containing a single argument pointing to the response file
/// path with the '@' prefix.
/// The argument string must be kept alive as long as the Job is alive.
llvm::opt::ArgStringList ResponseFileArg;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about grouping ResponseFilePath and ResponseFileArg into a substructure? I think the extra structure makes the Job class more readable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new struct or class could have a nice comment describing it's purpose and use.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense for the instance variable in Job holding the new structure to be optional?

@@ -121,6 +121,7 @@ class ToolChain {
llvm::opt::ArgStringList Arguments;
std::vector<std::pair<const char *, const char *>> ExtraEnvironment;
std::vector<FilelistInfo> FilelistInfos;
bool allowsResponseFiles = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this variable needs a comment explaining why it would ever be true or false.
Something like: "Some platforms limit argument length and those use response files..."

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not just a per-platform decision, so it's not that, but I agree that a comment would be nice.

@@ -291,8 +291,14 @@ namespace driver {
"not implemented for compilations with multiple jobs");
if (Comp.ShowJobLifecycle)
llvm::outs() << "Added to TaskQueue: " << LogJob(Cmd) << "\n";
TQ->addTask(Cmd->getExecutable(), Cmd->getArguments(), llvm::None,
(void *)Cmd);
if (Cmd->hasResponseFile()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer a style that puts the shorter arm of the if-then-else first, even though it is the do-less arm. I feel that such a choice reduces the max depth of the reader's mental stack. But this is just a suggestion.

} else {
TQ->addTask(Cmd->getExecutable(), Cmd->getArguments(), llvm::None,
(void *)Cmd);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, as I read further down, I noticed I had to stare at the two addTask call lines to see the difference.
Therefore I propose factoring into a small, static function that gets the arguments, or else pushing this logic into the Cmd->getArguments function. There should be only one line that calls TQ->addTask.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cmd->getArguments may be used elsewhere with the expectation that it returns the vector of arguments executed by this job, regardless of whether or not a response file is used. I felt it would be safer to have a separate method specifically for returning the response file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point, but I still want the factoring. How about defining a new function, called getArgumentsOrResponseFileName or getArgumentsAccountingForResponseFile or getArgumentsForTask? Or call the new one getArguments and rename the old one to getArgumentsIgnoringResponseFile. What IMO is going in is that the very notion of response file muddies the notion of arguments, as you appositely point out. So the best solution would be two, more verbose names for the existing and new getArguments that make the distinction clear. For instance getDirectArguments and getArgumentsWithResponseFIleIndirection (or maybe getPossiblyPackagedArguments). (I bet you can think of two better names!)

printArguments(os, ResponseFileArg);
} else {
printArguments(os, Arguments);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the above is right, then I think it should be:
printArguments(os, hasResponseFile() ? ResponseFileArg : Arguments); rather than the if.
Why? Because the code is only doing one printArguments operation, and it is exactly the same, except for that second argument.

@@ -247,6 +247,7 @@ ToolChain::constructInvocation(const CompileJobAction &job,
const JobContext &context) const {
InvocationInfo II{SWIFT_EXECUTABLE_NAME};
ArgStringList &Arguments = II.Arguments;
II.allowsResponseFiles = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it true in this case? Comment?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remain puzzled by that line. Please add a comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never mind, see below.

@@ -325,7 +325,10 @@ toolchains::GenericUnix::constructInvocation(const LinkJobAction &job,
Arguments.push_back(
context.Args.MakeArgString(context.Output.getPrimaryOutputFilename()));

return {Clang, Arguments};
InvocationInfo II{Clang, Arguments};
II.allowsResponseFiles = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why true? Comment?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are a bunch of similar lines, could you please factor them into an intention-revealing method? Or add a comment to each?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never mind, see below.

@@ -168,5 +168,8 @@ toolchains::Windows::constructInvocation(const LinkJobAction &job,
Arguments.push_back(
context.Args.MakeArgString(context.Output.getPrimaryOutputFilename()));

return {Clang, Arguments};
InvocationInfo II{Clang, Arguments};
II.allowsResponseFiles = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, this is the third time I'm seeing the setting to true. Is there something in common that the code should be reifying?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope. It's all based on what other tool is getting invoked, which is not necessarily the same across each toolchain and certainly not the same across each job.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is, this is the customization point, so that the rest of the code can treat everything the same.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, but how about comments explaining why allowsResponseFiles is true in each case? Why might it be false vs true?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment would read: // clang supports response files. Given the way in which the code is written, such a comment would be redundant.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still with Austin on this. If the executable specified in the InvocationInfo constructor supports response files, you can put true here. I guess the doc comment for allowsResponseFiles could say that a little more clearly, but it doesn't belong at the use sites.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the comment in include/swift/Driver/ToolChain.h.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for updating the comment. In thinking about this item, the comment you added is good and it is sufficient. I'm not asking for the following, but if I were to try to make it even clearer, I would use a one-line intention-revealing function. But, never mind.

@@ -134,6 +132,10 @@ int main(int argc_, const char **argv_) {
llvm::cl::TokenizeGNUCommandLine,
argv);

// Initialize the stack trace using the parsed argument vector with expanded
// response files.
PROGRAM_START(argv.size(), argv.data());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this can be affected by subsequent operations on argv. Possible options are to push PROGRAM_START even further down in main, or to make a copy of argv at this location for the sole purpose of initializing the stack trace.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. Maybe we can do it the other way, and make sure that no one else changes argv after this point? The case where that does happen, shouldRunAsSubcommand, can be responsible for making the copy. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make sure I understand your suggestion, the signature for shouldRunAsSubcommand would look like:

static bool shouldRunAsSubcommand(
  StringRef ExecName,
  SmallString<256> &SubcommandName,
  const SmallVectorImpl<const char *> &InputArgs,
  SmallVectorImpl<const char *> &OutputArgs) { ... }

InputArgs comes from what is currently argv. OutputArgs would initially point to InputArgs, but would be replaced with a brand new vector in the event of a modification. OutputArgs is what would be used in the rest of main. This wouldn't necessarily prevent someone from modifying args/InputArgs, however.

I'm not sure if there is a stronger way to ensure that the argument vector doesn't get modified, aside from:

const SmallVector<const char *, 256> trace_args(argv);
PROGRAM_START(trace_args.size(), trace_args.data());
// continue as normal with 'argv'

Unfortunately, I don't think we can just make argv a const since it must be operated on twice before calling PROGRAM_START.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I didn't realize that shouldRunAsSubcommand was already modifying the input arguments. I feel like I'd change that to just taking the input arguments as an immutable ArrayRef, and then making the copy inside the body of the if:

if (shouldRunAsSubcommand(ExecName, SubcommandName, argv)) {
  SmallVector<const char *, 256> subCommandArgs(argv.begin(), argv.end());
  subCommandArgs.erase(1);
  // …

What do you think for that?

(Sorry for the delayed responses. The run-up to WWDC has been slamming a bunch of us!)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jrose-apple About WWDC: no worries!

I attempted to implement your suggestion, and created a commit for it. It was necessary to factor out the lower half of main in order to properly handle the repl case. If the sub-command is repl, shouldRunAsSubcommand modifies argv and it returns false, so a modified version of argv is used outside of the if block.

@davidungar
Copy link
Contributor

@dabelknap I'm checking all the PRs I'm involved in. Do you need anything more from me at this time?

@dabelknap
Copy link
Contributor Author

@jrose-apple I just need input on the handling of the arguments for PROGRAM_START. My preference would be to create a const copy of the vector just for the stack trace; I think this would be much simpler and safer. However, I can still attempt to create the copy only as needed in shouldRunAsSubcommand if that is what you prefer.

Copy link
Contributor

@jrose-apple jrose-apple left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, this may be the least-bad version. @davidungar, what do you think about the driver.cpp code now?

}

static int run_driver(StringRef ExecName,
const SmallVectorImpl<const char *> &argv) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: ArrayRef is a handy wrapper type for any array-like structure; please use it here and in shouldRunAsSubcommand.


// If the subcommand is the "built-in" 'repl', then use the
// normal driver.
if (Subcommand == "repl")
if (Subcommand == "repl") {
isRepl=true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: spacing.

@dabelknap
Copy link
Contributor Author

@jrose-apple Just checking to see if anything else needs to be done for this PR.

@jrose-apple
Copy link
Contributor

I was still waiting for @davidungar to sign off as well, since I know he wants to make sure the additional complexity is as small as possible.

@@ -71,7 +71,8 @@ extern int swift_format_main(ArrayRef<const char *> Args, const char *Argv0,
/// \returns True if running as a subcommand.
static bool shouldRunAsSubcommand(StringRef ExecName,
SmallString<256> &SubcommandName,
SmallVectorImpl<const char *> &Args) {
const ArrayRef<const char *> &Args,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick (here and below): ArrayRef is intended to be passed around by value; it's just a pointer/length pair.

@dabelknap
Copy link
Contributor Author

@jrose-apple Also, for this to be fully implemented, swift-llvm needs to be updated to include this patch: llvm-mirror/llvm@a3869c0. The reason for this is that commandLineFitsWithinSystemLimits was underestimating the actual system limits. If swift or clang was given a sufficiently large response file, it might not re-wrap those arguments in a response file, and the frontend jobs will fail. Of course, if you increase the size of the response file further, commandLineFitsWithinSystemLimits will eventually return false, and the argument vector will be wrapped appropriately. I was able to reproduce the problem using clang++ on both Linux and macOS.

@jrose-apple
Copy link
Contributor

Yikes, thanks. I'll look into cherry-picking that into the Swift 4.2 LLVM branch.

@jrose-apple
Copy link
Contributor

apple/swift-llvm#101

Copy link
Contributor

@davidungar davidungar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started looking over the changes and my comments again, and AFAICT, @dabelknap just hasn't had a chance to address them yet. Let me know when you want me to look again, thanks!

} else {
TQ->addTask(Cmd->getExecutable(), Cmd->getArguments(), llvm::None,
(void *)Cmd);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point, but I still want the factoring. How about defining a new function, called getArgumentsOrResponseFileName or getArgumentsAccountingForResponseFile or getArgumentsForTask? Or call the new one getArguments and rename the old one to getArgumentsIgnoringResponseFile. What IMO is going in is that the very notion of response file muddies the notion of arguments, as you appositely point out. So the best solution would be two, more verbose names for the existing and new getArguments that make the distinction clear. For instance getDirectArguments and getArgumentsWithResponseFIleIndirection (or maybe getPossiblyPackagedArguments). (I bet you can think of two better names!)

@@ -168,5 +168,8 @@ toolchains::Windows::constructInvocation(const LinkJobAction &job,
Arguments.push_back(
context.Args.MakeArgString(context.Output.getPrimaryOutputFilename()));

return {Clang, Arguments};
InvocationInfo II{Clang, Arguments};
II.allowsResponseFiles = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, but how about comments explaining why allowsResponseFiles is true in each case? Why might it be false vs true?

/// Response file path
const char *ResponseFilePath;

/// Argument vector containing a single argument pointing to the response file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point, I think the comment is inaccurate. It's not an argument vector.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that property is no longer a vector. I'll fix the comment.

@@ -271,6 +271,14 @@ class Job {
/// Whether the job wants a list of input or output files created.
std::vector<FilelistInfo> FilelistFileInfos;

/// Response file path
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove this comment. It is completely redundant.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the point of this comment? @jrose-apple Is there some coding standard I don't know about? Why not remove it entirely?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This purpose of the comment is to remain consistent with the rest of the file. A few lines above reads:

/// The output of this command.
std::unique_ptr<CommandOutput> Output;

/// The executable to run.
const char *Executable;

for example.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand. Since I believe that this sort of comment degrades maintainability (yes, it's a tiny cut but there are virtually thousands of them), and since I try to enhance the maintainabilty of the code with each review, I asked for its deletion. However, a consistent-but-redundant comment is only a mere bagatelle.

@@ -271,6 +271,14 @@ class Job {
/// Whether the job wants a list of input or output files created.
std::vector<FilelistInfo> FilelistFileInfos;

/// Response file path
const char *ResponseFilePath;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit of a nitpick, but I would prefer it with another "const" after the star (const char * const) in order to signal that this variable is never assigned to (assuming that is the case). I'm assuming it is initialized in the constructor and never changed thereafter. (Also applies in other cases, such as ResponseFileArg.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still like the const after the star here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const char * is what is conventionally used in the rest of the class.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I count about 100 lines of code in methods of that class. If I had to maintain this code and needed to know if that instance variable ever changed, I would have to read them. Adding the "cost" would save me that work. Therefore, I would not approve this PR without this "const". (Of course you don't need my approval to land it & that's OK. I would just feel derelict in my duty to favor consistency over maintainability in this case.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meta: At this point you've expressed a strong enough opinion that Austin does need your approval to land the PR. If you want to rescind that, you'd have to signify that, perhaps by "dismissing" your X review.

Personally, I'm with Austin on this one. Being inconsistent can imply that the inconsistency is deliberate, and therefore it is better to be consistent with existing code in a PR that's adding functionality. (That's especially true for someone who's a relative newcomer.)

Copy link
Contributor

@davidungar davidungar Jun 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meta: @jrose-apple Please help me out here. Since the merge box is green, I figured he did not need my approval to land the PR. (I seem to recall reading verbiage to that effect on github, too.) Is there some policy (or implicit understanding) that I've missed? Where are you thinking that I express such a strong enough opinion? Was the need expressed, or implied? I don't understand.

You have a fair point about inconsistency, but I weigh that point against the benefits of this particular proposed inconsistency. In this case I would argue (on your side) that if the const is added to one variable, a future reader might infer that the other variables are not const. However, the counter argument would be that it would be clear for anyone with a passing familiarity with the code base that const is often not used when it could be. Thus, it seems to me that the consistency benefit is outweighed by the not-having-to-read-100-lines-of-code benefit of the const.

My experience as a novitiate to this code base has been that such consts would save me significant time. Ironically, it was my experience with adopting Swift for my code that convinced me of the significance of the benefits of write-once variables.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the merge box is green, I figured he did not need my approval to land the PR. (I seem to recall reading verbiage to that effect on github, too.) Is there some policy (or implicit understanding) that I've missed? Where are you thinking that I express such a strong enough opinion? Was the need expressed, or implied?

We aren't using GitHub's built-in gating for reviewers, true. But only a little above the green merge button is the part where "@davidungar requested changes", with a red X. Even without that, though, I think we want to take expressed concerns seriously unless the reviewer explicitly says something like "this is fine as is (but I think it could be better in these ways)".

@@ -296,6 +308,7 @@ class Job {

const char *getExecutable() const { return Executable; }
const llvm::opt::ArgStringList &getArguments() const { return Arguments; }
const char * const &getResponseFileArg() const { return ResponseFileArg; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is a reference involved here? Why not just const char * getResponseFileArg?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Job class needs to own the reference so that it remains in scope sufficiently long to execute. Otherwise the task queue will read junk memory and try to execute a nonsense command. This change was necessary to avoid keeping around a vector for the purpose of storing a single argument.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning a pointer wouldn't actually break any of that, but I guess using a const reference for a value that won't change signifies that the lifetime is tied to something else a little more strongly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We did try just returning a const char *, but doing so would cause it to read junk memory. Myself, @allevato, and @harlanhaskins looked at this issue, and this is what we came up with that worked.

Copy link
Contributor

@harlanhaskins harlanhaskins Jun 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TaskQueue::addTask takes an ArrayRef<char *>. If we return the value of getResponseFileArg() as a pointer, then the returned pointer will get promoted to an ArrayRef through ArrayRef::ArrayRef(T &OneElt), and the reference passed in is a temporary stack slot storing the returned address, not the actual address inside the Job struct.

Returning a const reference here ensures that the ArrayRef<char *> that is passed to the TaskQueue is actually tied to the Job itself.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's horrible; nice work. Can you add that as a comment so that a later refactorer like me doesn't try to change it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternately I suppose you could return an ArrayRef here even though the backing store is the one const char *.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the ArrayRef does appear to work actually; I went ahead and used that. Do you think it still needs a comment? The code is more straightforward now, I think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I think this version is okay. Though I personally would have kept using the accessor methods in the other function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went ahead and replaced them.

@dabelknap
Copy link
Contributor Author

@CodaFi can we start another round of smoke tests?

@allevato
Copy link
Member

@jrose-apple Of course—my intention wasn't to imply that we should just dismiss feedback from anyone, so my apologies if that's the impression that I gave. I merely want to make sure that the review for a bug fix that has risen greatly in severity for us lately is making good forward progress and to understand what we should prioritize (e.g., consistency with prevailing style or reviewer feedback, particularly when they conflict) for this and future reviews.

Copy link
Contributor

@davidungar davidungar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, we're almost there & thank you for your patience.
I know I originally said it was a bit of a nitpick, but I find myself feeling strongly about that const after the * for the ResponseFilePath. I'm ready to approve as soon as that's added.

@davidungar
Copy link
Contributor

@allevato Add that one const and I'll approve it. Thanks.

Copy link
Contributor

@davidungar davidungar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After an offline conversation with @jrose-apple, I still would really like that "const" in there, but I realize that my "Request Changes" checkbox may actually keep you from merging. Since I intend to express a strong opinion but not to prevent you from disregarding my opinion, I am changing the box to "Comment".

@davidungar davidungar dismissed their stale review June 21, 2018 18:15

After an offline conversation with @jrose-apple, I still would really like that "const" in there, but I realize that my "Request Changes" checkbox may actually keep you from merging. Since I intend to express a strong opinion but not to prevent you from disregarding my opinion, I am changing the box to "Comment".

@dabelknap
Copy link
Contributor Author

@davidungar Thank you for your time and effort in reviewing this PR. I do agree that adding the extra const would be more concrete. However, I feel that making style changes like this would leave the file in an inconsistent (and potentially confusing) state. Mangling style conventions would make a refactorer's job more difficult, which is why I always defer to maintainers for these sorts of things. This would be a useful task for a future PR specifically scoped for such style changes.

@jrose-apple, is there anything else that needs to be addressed before merging?

@jrose-apple
Copy link
Contributor

I think we can take it! Thanks for working with us throughout the review process. I'll take care of cherry-picking it to the 4.2 branch, too.

@jrose-apple jrose-apple merged commit 7d8e40b into swiftlang:master Jun 21, 2018
jrose-apple added a commit to jrose-apple/swift that referenced this pull request Jun 21, 2018
Wrap Command Line Arguments in a Response File if System Limits are Exceeded

https://bugs.swift.org/browse/SR-4517
(cherry picked from commit 7d8e40b)
jrose-apple added a commit that referenced this pull request Jun 22, 2018
Wrap Command Line Arguments in a Response File if System Limits are Exceeded

https://bugs.swift.org/browse/SR-4517
(cherry picked from commit 7d8e40b)
bob-wilson added a commit that referenced this pull request Jun 26, 2018
On the master-next branch, the InitLLVM class used by the PROGRAM_START
macro can modify the argc/argv values (currently only on Windows).
Expanding response files before initializing the stack trace also modifies
the arguments so use a separate SmallVector for each copy of the argument
vector.
bob-wilson added a commit that referenced this pull request Jun 26, 2018
[master-next] Fix up the driver's use of PROGRAM_START after PR #16362
@dabelknap dabelknap deleted the frontend_responsefile branch October 9, 2018 18:40
@ravitripathi
Copy link

Hello. I wanted to know is this now enabled by default on XCode 10? We have hit this argument limit wall, and builds are failing. So is there a straightforward way to enable this?

@jrose-apple
Copy link
Contributor

This is for communication between the different commands that the Swift compiler runs internally. We know there's still an issue with long command lines for Xcode running the top-level Swift command. That'll require an Xcode-side change that didn't make it into Xcode 10 (or 10.1).

@Lascorbe
Copy link

Lascorbe commented Feb 1, 2019

Hi there,
First, thank you all for your work on this issue, it's really problematic when working on big codebases.
Second, I'd like to know if this made it to Xcode 10.2, I don't see it on the release notes but it worth asking.
Thank you.

@jrose-apple
Copy link
Contributor

The same thing I said above still applies, although the Xcode-side change is not in 10.2 either.

@samskiter
Copy link

Can I ask if this made XCode 10.2.1?

@jrose-apple
Copy link
Contributor

"The same thing I said above still applies, although the Xcode-side change is not in 10.2.1 either." Please stop asking about Xcode changes on the Swift repository.

@samskiter
Copy link

Thanks so much for the reply. Can you recommend where we should ask?

@jrose-apple
Copy link
Contributor

I…don't think Apple has a place to ask about future plans. But this seems like the sort of thing that should end up in the Xcode release notes when it changes.

@Lascorbe
Copy link

Sorry to add more comments about Xcode here, but just to bring a bit of clarification, an engineer at the WWDC labs checked it and told us it may be fixed by Xcode 11 beta 3, so hoping for that 🤞

Otherwise, we may need to keep duplicating rdar://50886131 on the new Feedback Assistant.

@swiftlang swiftlang locked as off-topic and limited conversation to collaborators Jun 12, 2019
@jrose-apple
Copy link
Contributor

(just to be clear, not locking in response to @Lascorbe's comment, but rather to keep this from happening again in the future)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants