Skip to content

Wrap Command Line Arguments in a Response File if System Limits are Exceeded #16362

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Jun 21, 2018
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 19 additions & 2 deletions include/swift/Driver/Job.h
Original file line number Diff line number Diff line change
Expand Up @@ -271,6 +271,14 @@ class Job {
/// Whether the job wants a list of input or output files created.
std::vector<FilelistInfo> FilelistFileInfos;

/// Response file path
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove this comment. It is completely redundant.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the point of this comment? @jrose-apple Is there some coding standard I don't know about? Why not remove it entirely?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This purpose of the comment is to remain consistent with the rest of the file. A few lines above reads:

/// The output of this command.
std::unique_ptr<CommandOutput> Output;

/// The executable to run.
const char *Executable;

for example.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand. Since I believe that this sort of comment degrades maintainability (yes, it's a tiny cut but there are virtually thousands of them), and since I try to enhance the maintainabilty of the code with each review, I asked for its deletion. However, a consistent-but-redundant comment is only a mere bagatelle.

const char *ResponseFilePath;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit of a nitpick, but I would prefer it with another "const" after the star (const char * const) in order to signal that this variable is never assigned to (assuming that is the case). I'm assuming it is initialized in the constructor and never changed thereafter. (Also applies in other cases, such as ResponseFileArg.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still like the const after the star here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const char * is what is conventionally used in the rest of the class.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I count about 100 lines of code in methods of that class. If I had to maintain this code and needed to know if that instance variable ever changed, I would have to read them. Adding the "cost" would save me that work. Therefore, I would not approve this PR without this "const". (Of course you don't need my approval to land it & that's OK. I would just feel derelict in my duty to favor consistency over maintainability in this case.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meta: At this point you've expressed a strong enough opinion that Austin does need your approval to land the PR. If you want to rescind that, you'd have to signify that, perhaps by "dismissing" your X review.

Personally, I'm with Austin on this one. Being inconsistent can imply that the inconsistency is deliberate, and therefore it is better to be consistent with existing code in a PR that's adding functionality. (That's especially true for someone who's a relative newcomer.)

Copy link
Contributor

@davidungar davidungar Jun 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meta: @jrose-apple Please help me out here. Since the merge box is green, I figured he did not need my approval to land the PR. (I seem to recall reading verbiage to that effect on github, too.) Is there some policy (or implicit understanding) that I've missed? Where are you thinking that I express such a strong enough opinion? Was the need expressed, or implied? I don't understand.

You have a fair point about inconsistency, but I weigh that point against the benefits of this particular proposed inconsistency. In this case I would argue (on your side) that if the const is added to one variable, a future reader might infer that the other variables are not const. However, the counter argument would be that it would be clear for anyone with a passing familiarity with the code base that const is often not used when it could be. Thus, it seems to me that the consistency benefit is outweighed by the not-having-to-read-100-lines-of-code benefit of the const.

My experience as a novitiate to this code base has been that such consts would save me significant time. Ironically, it was my experience with adopting Swift for my code that convinced me of the significance of the benefits of write-once variables.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the merge box is green, I figured he did not need my approval to land the PR. (I seem to recall reading verbiage to that effect on github, too.) Is there some policy (or implicit understanding) that I've missed? Where are you thinking that I express such a strong enough opinion? Was the need expressed, or implied?

We aren't using GitHub's built-in gating for reviewers, true. But only a little above the green merge button is the part where "@davidungar requested changes", with a red X. Even without that, though, I think we want to take expressed concerns seriously unless the reviewer explicitly says something like "this is fine as is (but I think it could be better in these ways)".


/// Argument vector containing a single argument pointing to the response file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point, I think the comment is inaccurate. It's not an argument vector.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that property is no longer a vector. I'll fix the comment.

/// path with the '@' prefix.
/// The argument string must be kept alive as long as the Job is alive.
const char *ResponseFileArg;

/// The modification time of the main input file, if any.
llvm::sys::TimePoint<> InputModTime = llvm::sys::TimePoint<>::max();

Expand All @@ -281,12 +289,16 @@ class Job {
const char *Executable,
llvm::opt::ArgStringList Arguments,
EnvironmentVector ExtraEnvironment = {},
std::vector<FilelistInfo> Infos = {})
std::vector<FilelistInfo> Infos = {},
const char *ResponseFilePath = nullptr,
const char *ResponseFileArg = nullptr)
: SourceAndCondition(&Source, Condition::Always),
Inputs(std::move(Inputs)), Output(std::move(Output)),
Executable(Executable), Arguments(std::move(Arguments)),
ExtraEnvironment(std::move(ExtraEnvironment)),
FilelistFileInfos(std::move(Infos)) {}
FilelistFileInfos(std::move(Infos)),
ResponseFilePath(ResponseFilePath),
ResponseFileArg(ResponseFileArg) {}

virtual ~Job();

Expand All @@ -296,6 +308,7 @@ class Job {

const char *getExecutable() const { return Executable; }
const llvm::opt::ArgStringList &getArguments() const { return Arguments; }
const char * const &getResponseFileArg() const { return ResponseFileArg; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is a reference involved here? Why not just const char * getResponseFileArg?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Job class needs to own the reference so that it remains in scope sufficiently long to execute. Otherwise the task queue will read junk memory and try to execute a nonsense command. This change was necessary to avoid keeping around a vector for the purpose of storing a single argument.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning a pointer wouldn't actually break any of that, but I guess using a const reference for a value that won't change signifies that the lifetime is tied to something else a little more strongly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We did try just returning a const char *, but doing so would cause it to read junk memory. Myself, @allevato, and @harlanhaskins looked at this issue, and this is what we came up with that worked.

Copy link
Contributor

@harlanhaskins harlanhaskins Jun 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TaskQueue::addTask takes an ArrayRef<char *>. If we return the value of getResponseFileArg() as a pointer, then the returned pointer will get promoted to an ArrayRef through ArrayRef::ArrayRef(T &OneElt), and the reference passed in is a temporary stack slot storing the returned address, not the actual address inside the Job struct.

Returning a const reference here ensures that the ArrayRef<char *> that is passed to the TaskQueue is actually tied to the Job itself.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's horrible; nice work. Can you add that as a comment so that a later refactorer like me doesn't try to change it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternately I suppose you could return an ArrayRef here even though the backing store is the one const char *.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the ArrayRef does appear to work actually; I went ahead and used that. Do you think it still needs a comment? The code is more straightforward now, I think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I think this version is okay. Though I personally would have kept using the accessor methods in the other function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went ahead and replaced them.

ArrayRef<FilelistInfo> getFilelistInfos() const { return FilelistFileInfos; }

ArrayRef<const Job *> getInputs() const { return Inputs; }
Expand Down Expand Up @@ -347,6 +360,10 @@ class Job {

static void printArguments(raw_ostream &Stream,
const llvm::opt::ArgStringList &Args);

bool hasResponseFile() const { return ResponseFilePath != nullptr; }

bool writeArgsToResponseFile() const;
};

/// A BatchJob comprises a _set_ of jobs, each of which is sufficiently similar
Expand Down
5 changes: 5 additions & 0 deletions include/swift/Driver/ToolChain.h
Original file line number Diff line number Diff line change
Expand Up @@ -122,6 +122,11 @@ class ToolChain {
std::vector<std::pair<const char *, const char *>> ExtraEnvironment;
std::vector<FilelistInfo> FilelistInfos;

// Not all platforms and jobs support the use of response files, so assume
// "false" by default. This can be overridden as needed on a per
// platform/job basis.
bool allowsResponseFiles = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this variable needs a comment explaining why it would ever be true or false.
Something like: "Some platforms limit argument length and those use response files..."

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not just a per-platform decision, so it's not that, but I agree that a comment would be nice.


InvocationInfo(const char *name, llvm::opt::ArgStringList args = {},
decltype(ExtraEnvironment) extraEnv = {})
: ExecutableName(name), Arguments(std::move(args)),
Expand Down
10 changes: 8 additions & 2 deletions lib/Driver/Compilation.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -291,8 +291,14 @@ namespace driver {
"not implemented for compilations with multiple jobs");
if (Comp.ShowJobLifecycle)
llvm::outs() << "Added to TaskQueue: " << LogJob(Cmd) << "\n";
TQ->addTask(Cmd->getExecutable(), Cmd->getArguments(), llvm::None,
(void *)Cmd);
if (Cmd->hasResponseFile()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer a style that puts the shorter arm of the if-then-else first, even though it is the do-less arm. I feel that such a choice reduces the max depth of the reader's mental stack. But this is just a suggestion.

Cmd->writeArgsToResponseFile();
TQ->addTask(Cmd->getExecutable(), Cmd->getResponseFileArg(),
llvm::None, (void *)Cmd);
} else {
TQ->addTask(Cmd->getExecutable(), Cmd->getArguments(), llvm::None,
(void *)Cmd);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, as I read further down, I noticed I had to stare at the two addTask call lines to see the difference.
Therefore I propose factoring into a small, static function that gets the arguments, or else pushing this logic into the Cmd->getArguments function. There should be only one line that calls TQ->addTask.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cmd->getArguments may be used elsewhere with the expectation that it returns the vector of arguments executed by this job, regardless of whether or not a response file is used. I felt it would be safer to have a separate method specifically for returning the response file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point, but I still want the factoring. How about defining a new function, called getArgumentsOrResponseFileName or getArgumentsAccountingForResponseFile or getArgumentsForTask? Or call the new one getArguments and rename the old one to getArgumentsIgnoringResponseFile. What IMO is going in is that the very notion of response file muddies the notion of arguments, as you appositely point out. So the best solution would be two, more verbose names for the existing and new getArguments that make the distinction clear. For instance getDirectArguments and getArgumentsWithResponseFIleIndirection (or maybe getPossiblyPackagedArguments). (I bet you can think of two better names!)

}

/// When a task finishes, check other Jobs that may be blocked.
Expand Down
21 changes: 21 additions & 0 deletions lib/Driver/Job.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
#include "llvm/ADT/STLExtras.h"
#include "llvm/Option/Arg.h"
#include "llvm/Support/Compiler.h"
#include "llvm/Support/FileSystem.h"
#include "llvm/Support/Path.h"
#include "llvm/Support/Program.h"
#include "llvm/Support/raw_ostream.h"
Expand Down Expand Up @@ -352,7 +353,12 @@ void Job::printCommandLineAndEnvironment(raw_ostream &Stream,
void Job::printCommandLine(raw_ostream &os, StringRef Terminator) const {
escapeAndPrintString(os, Executable);
os << ' ';
if (hasResponseFile()) {
printArguments(os, {ResponseFileArg});
os << " # ";
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the above is right, then I think it should be:
printArguments(os, hasResponseFile() ? ResponseFileArg : Arguments); rather than the if.
Why? Because the code is only doing one printArguments operation, and it is exactly the same, except for that second argument.

printArguments(os, Arguments);

os << Terminator;
}

Expand Down Expand Up @@ -403,6 +409,21 @@ void Job::printSummary(raw_ostream &os) const {
os << "}";
}

bool Job::writeArgsToResponseFile() const {
std::error_code EC;
llvm::raw_fd_ostream OS(ResponseFilePath, EC, llvm::sys::fs::F_None);
if (EC) {
return true;
}
for (const char *arg : Arguments) {
OS << "\"";
escapeAndPrintString(OS, arg);
OS << "\" ";
}
OS.flush();
return false;
}

BatchJob::BatchJob(const JobAction &Source,
SmallVectorImpl<const Job *> &&Inputs,
std::unique_ptr<CommandOutput> Output,
Expand Down
13 changes: 12 additions & 1 deletion lib/Driver/ToolChain.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -114,11 +114,22 @@ std::unique_ptr<Job> ToolChain::constructJob(
}
}

const char *responseFilePath = nullptr;
const char *responseFileArg = nullptr;
if (invocationInfo.allowsResponseFiles &&
!llvm::sys::commandLineFitsWithinSystemLimits(
executablePath, invocationInfo.Arguments)) {
responseFilePath = context.getTemporaryFilePath("arguments", "resp");
responseFileArg = C.getArgs().MakeArgString(Twine("@") + responseFilePath);
}

return llvm::make_unique<Job>(JA, std::move(inputs), std::move(output),
executablePath,
std::move(invocationInfo.Arguments),
std::move(invocationInfo.ExtraEnvironment),
std::move(invocationInfo.FilelistInfos));
std::move(invocationInfo.FilelistInfos),
responseFilePath,
responseFileArg);
}

std::string
Expand Down
1 change: 1 addition & 0 deletions lib/Driver/ToolChains.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -247,6 +247,7 @@ ToolChain::constructInvocation(const CompileJobAction &job,
const JobContext &context) const {
InvocationInfo II{SWIFT_EXECUTABLE_NAME};
ArgStringList &Arguments = II.Arguments;
II.allowsResponseFiles = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it true in this case? Comment?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remain puzzled by that line. Please add a comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never mind, see below.


Arguments.push_back("-frontend");

Expand Down
5 changes: 4 additions & 1 deletion lib/Driver/UnixToolChains.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -325,7 +325,10 @@ toolchains::GenericUnix::constructInvocation(const LinkJobAction &job,
Arguments.push_back(
context.Args.MakeArgString(context.Output.getPrimaryOutputFilename()));

return {Clang, Arguments};
InvocationInfo II{Clang, Arguments};
II.allowsResponseFiles = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why true? Comment?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are a bunch of similar lines, could you please factor them into an intention-revealing method? Or add a comment to each?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never mind, see below.


return II;
}

std::string toolchains::Android::getTargetForLinker() const {
Expand Down
5 changes: 4 additions & 1 deletion lib/Driver/WindowsToolChains.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -168,5 +168,8 @@ toolchains::Windows::constructInvocation(const LinkJobAction &job,
Arguments.push_back(
context.Args.MakeArgString(context.Output.getPrimaryOutputFilename()));

return {Clang, Arguments};
InvocationInfo II{Clang, Arguments};
II.allowsResponseFiles = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, this is the third time I'm seeing the setting to true. Is there something in common that the code should be reifying?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope. It's all based on what other tool is getting invoked, which is not necessarily the same across each toolchain and certainly not the same across each job.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is, this is the customization point, so that the rest of the code can treat everything the same.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, but how about comments explaining why allowsResponseFiles is true in each case? Why might it be false vs true?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment would read: // clang supports response files. Given the way in which the code is written, such a comment would be redundant.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still with Austin on this. If the executable specified in the InvocationInfo constructor supports response files, you can put true here. I guess the doc comment for allowsResponseFiles could say that a little more clearly, but it doesn't belong at the use sites.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the comment in include/swift/Driver/ToolChain.h.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for updating the comment. In thinking about this item, the comment you added is good and it is sufficient. I'm not asking for the following, but if I were to try to make it even clearer, I would use a one-line intention-revealing function. But, never mind.


return II;
}
39 changes: 27 additions & 12 deletions test/Driver/response-file.swift
Original file line number Diff line number Diff line change
Expand Up @@ -6,24 +6,35 @@
// RUN: %target-build-swift -typecheck @%t.1.resp %s 2>&1 | %FileCheck %s -check-prefix=MULTIPLE
// MULTIPLE: warning: result of call to 'abs' is unused

// RUN: echo "-DTEST2A -DTEST2B" > %t.2.resp
// RUN: echo "-DTEST2C -DTEST2D" > %t.3.resp
// RUN: %target-build-swift -typecheck @%t.2.resp %s @%t.3.resp 2>&1 | %FileCheck %s -check-prefix=MIXED
// RUN: echo "-DTEST2A -DTEST2B" > %t.2A.resp
// RUN: echo "-DTEST2C -DTEST2D" > %t.2B.resp
// RUN: %target-build-swift -typecheck @%t.2A.resp %s @%t.2B.resp 2>&1 | %FileCheck %s -check-prefix=MIXED
// MIXED: warning: result of call to 'abs' is unused

// RUN: echo "-DTEST3A" > %t.4.resp
// RUN: echo "%s" >> %t.4.resp
// RUN: echo "-DTEST3B" >> %t.4.resp
// RUN: %target-build-swift -typecheck @%t.4.resp 2>&1 | %FileCheck %s -check-prefix=RESPONLY
// RUN: echo "-DTEST3A" > %t.3.resp
// RUN: echo "%s" >> %t.3.resp
// RUN: echo "-DTEST3B" >> %t.3.resp
// RUN: %target-build-swift -typecheck @%t.3.resp 2>&1 | %FileCheck %s -check-prefix=RESPONLY
// RESPONLY: warning: result of call to 'abs' is unused

// RUN: echo "-DTEST4A" > %t.5.resp
// RUN: echo "%s" >> %t.5.resp
// RUN: echo "@%t.5.resp" > %t.6.resp
// RUN: echo "-DTEST4B" >> %t.6.resp
// RUN: %target-build-swift -typecheck @%t.6.resp 2>&1 | %FileCheck %s -check-prefix=RECURSIVE
// RUN: echo "-DTEST4A" > %t.4A.resp
// RUN: echo "%s" >> %t.4A.resp
// RUN: echo "@%t.4A.resp" > %t.4B.resp
// RUN: echo "-DTEST4B" >> %t.4B.resp
// RUN: %target-build-swift -typecheck @%t.4B.resp 2>&1 | %FileCheck %s -check-prefix=RECURSIVE
// RECURSIVE: warning: result of call to 'abs' is unused

// RUN: python -c 'for i in range(500001): print "-DTEST5_" + str(i)' > %t.5.resp
// RUN: %target-build-swift -typecheck @%t.5.resp %s 2>&1 | %FileCheck %s -check-prefix=LONG
// LONG: warning: result of call to 'abs' is unused
// RUN: %empty-directory(%t/tmp)
// RUN: env TMPDIR=%t/tmp/ %target-swiftc_driver %s @%t.5.resp -save-temps 2>&1
// RUN: ls %t/tmp/arguments-*.resp
// RUN: %target-build-swift -v @%t.5.resp %s 2>&1 | %FileCheck %s -check-prefix=VERBOSE
// VERBOSE: @{{[^ ]*}}arguments-{{[0-9a-zA-Z]+}}.resp # -frontend -c -primary-file
// RUN: not %target-swiftc_driver %s @%t.5.resp -Xfrontend -debug-crash-immediately 2>&1 | %FileCheck %s -check-prefix=CRASH
// CRASH: Program arguments: {{[^ ]*}}swift -frontend -c -primary-file

#if TEST0
abs(-5)
#endif
Expand All @@ -43,3 +54,7 @@ abs(-5)
#if TEST4A && TEST4B
abs(-5)
#endif

#if TEST5_0 && TEST5_10 && TEST5_100 && TEST5_1000 && TEST5_10000 && TEST5_100000 && TEST5_500000
abs(-5)
#endif
6 changes: 4 additions & 2 deletions tools/driver/driver.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -111,8 +111,6 @@ static bool shouldRunAsSubcommand(StringRef ExecName,
extern int apinotes_main(ArrayRef<const char *> Args);

int main(int argc_, const char **argv_) {
PROGRAM_START(argc_, argv_);

SmallVector<const char *, 256> argv;
llvm::SpecificBumpPtrAllocator<char> ArgAllocator;
std::error_code EC = llvm::sys::Process::GetArgumentVector(
Expand All @@ -134,6 +132,10 @@ int main(int argc_, const char **argv_) {
llvm::cl::TokenizeGNUCommandLine,
argv);

// Initialize the stack trace using the parsed argument vector with expanded
// response files.
PROGRAM_START(argv.size(), argv.data());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this can be affected by subsequent operations on argv. Possible options are to push PROGRAM_START even further down in main, or to make a copy of argv at this location for the sole purpose of initializing the stack trace.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. Maybe we can do it the other way, and make sure that no one else changes argv after this point? The case where that does happen, shouldRunAsSubcommand, can be responsible for making the copy. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make sure I understand your suggestion, the signature for shouldRunAsSubcommand would look like:

static bool shouldRunAsSubcommand(
  StringRef ExecName,
  SmallString<256> &SubcommandName,
  const SmallVectorImpl<const char *> &InputArgs,
  SmallVectorImpl<const char *> &OutputArgs) { ... }

InputArgs comes from what is currently argv. OutputArgs would initially point to InputArgs, but would be replaced with a brand new vector in the event of a modification. OutputArgs is what would be used in the rest of main. This wouldn't necessarily prevent someone from modifying args/InputArgs, however.

I'm not sure if there is a stronger way to ensure that the argument vector doesn't get modified, aside from:

const SmallVector<const char *, 256> trace_args(argv);
PROGRAM_START(trace_args.size(), trace_args.data());
// continue as normal with 'argv'

Unfortunately, I don't think we can just make argv a const since it must be operated on twice before calling PROGRAM_START.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I didn't realize that shouldRunAsSubcommand was already modifying the input arguments. I feel like I'd change that to just taking the input arguments as an immutable ArrayRef, and then making the copy inside the body of the if:

if (shouldRunAsSubcommand(ExecName, SubcommandName, argv)) {
  SmallVector<const char *, 256> subCommandArgs(argv.begin(), argv.end());
  subCommandArgs.erase(1);
  // …

What do you think for that?

(Sorry for the delayed responses. The run-up to WWDC has been slamming a bunch of us!)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jrose-apple About WWDC: no worries!

I attempted to implement your suggestion, and created a commit for it. It was necessary to factor out the lower half of main in order to properly handle the repl case. If the sub-command is repl, shouldRunAsSubcommand modifies argv and it returns false, so a modified version of argv is used outside of the if block.


// Check if this invocation should execute a subcommand.
StringRef ExecName = llvm::sys::path::stem(argv[0]);
SmallString<256> SubcommandName;
Expand Down