Skip to content

Document exceptions to the rules for parsing C arguments #2585

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Nov 10, 2020

Conversation

vtjnash
Copy link
Contributor

@vtjnash vtjnash commented Nov 9, 2020

This description was empirically wrong, which made it rather confusing to read and rely upon. This is my best guess at the actual behavior, based upon some experiments, though I'm hoping the actual behavior will be documented based upon reading of the relevant codes themselves.

Note that this incomplete description was copied to other places, including:

The CommandLineToArgvW function documentation also contains the misleading claim that "The GetCommandLineW function can be used to get a command line string that is suitable for use as the lpCmdLine parameter". This is not quite true, as the quoting/escape rules applied to argv[0] are usually different than those for the remaining arguments, so the user may want to first scan forward to the first space character not enclosed in double quotes.

Amusingly, dotnet also contains some contradictory advice to this existing documentation, but helpfully also gives a sample program that can be used to directly test in what ways it is non-conforming: https://docs.microsoft.com/en-us/dotnet/api/system.diagnostics.processstartinfo.arguments, and this seems to have lead to this other very long conversation about the many ways that this can and has caused problems: PowerShell/PowerShell#1995

This description was empirically wrong, which made it rather confusing to read and rely upon.
@PRMerger14
Copy link
Contributor

@vtjnash : Thanks for your contribution! The author(s) have been notified to review your proposed change.

@vtjnash
Copy link
Contributor Author

vtjnash commented Nov 9, 2020

Note that others have observed that the behavior of this undocumented rule changed in VS 2008: https://daviddeley.com/autohotkey/parameters/parameters.htm#WINCRULES

@ktoliver ktoliver added the aq-pr-triaged Tracking label for the PR review team label Nov 9, 2020
@ghost
Copy link

ghost commented Nov 10, 2020

CLA assistant check
All CLA requirements met.

Copy link
Contributor

@colin-home colin-home left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @vtjnash. Per your request, the content has been clarified, and your example included.

@colin-home
Copy link
Contributor

colin-home commented Nov 10, 2020

@vtjnash
Thanks for taking the time to contribute. Your analysis and example aren't precisely right, but maybe some clarification is in order here.

There are several command-line parsing routines in use in Windows, and different apps use different ones. The Win32 API provides CommandLineToArgvW that behaves mostly like the C runtime startup, but may be different in some details. The C# startup uses yet another one. Documentation for one of these parsers doesn't necessarily apply to any of the others, so we won't discuss the differences.

The command line that a C or C++ program sees in its argv parameter may have been through two layers of parsing.

First comes processing by the command processor itself, which spawns the program process. It passes an environment, which contains a copy of the interpreted command line. It may have stripped out escapes from a batch file, or coalesced multiple lines, or streamed the command line in via a pipe. You have to read the docs on your command line processor to know what it might do before it hands off a command line to a spawned process. Different command processors may do different things. And a command processor isn't the only way a process can be spawned. Your app may have been spawned by a system call from another program, and the command line you get is up to that program.

Next comes processing by the runtime startup, which preps a bunch of things, including the argv parameter. The startup then invokes the entry point (main, winMain, etc.). The startup behavior code paths in the C runtime are slightly different for Windows apps, UWP apps, .NET C++ apps, and command-line apps, but for the kinds of apps we're describing in this article, the argv parsing behavior is the same. It's possible to replace the parser used by the runtime startup code with your own, if you like; there's even an object file you can link in to bypass argv setup completely. We'll ignore that for now.

The common command-line parsing routine used by the runtime startup code for console and Windows apps is available in source form in the Windows 10 SDK, under Source\<version>\ucrt\startup\argv_parsing.cpp. The main part of the parsing takes place in the parse_command_line function. A look at this code shows that it handles argv[0] separately with a simpler approach, based on the fact that it's constrained to be a valid file path. The later arguments get more complex handling.

Your example command line, a"b"" c d actually produces only one argument in argv[1], with content ab" c d. The reason is, the parser encounters the first double quote and enters in_quotes mode. When it finds the following "", it interprets this as an escaped double quote, which it outputs. It continues parsing in in_quotes mode, expecting a closing double quote, but it runs out of argument before it finds one, so it outputs that argument with the one double-quote character in it, then outputs the final nullptr to mark the end of argv.

I've updated the PR to reflect the behavior. Unfortunately, I touched enough that now your agreement to the license terms is required before I can merge it. If you would be so kind, I'll take care of the rest. Thanks!

[edit because I pressed Comment before finishing.]

@PRMerger7 PRMerger7 requested a review from colin-home November 10, 2020 03:05
@PRMerger7
Copy link
Contributor

@corob-msft : Thanks for your contribution! The author(s) have been notified to review your proposed change.

@vtjnash
Copy link
Contributor Author

vtjnash commented Nov 10, 2020

actually produces only one argument

I think it's worth noting that this is only true for VS2008 and later. For msvcrt-based applications (such as anything compiled by mingw-w64 or mingw32 or older applications such as python before 3.3 according to https://wiki.python.org/moin/WindowsCompilers#Which_Microsoft_Visual_C.2B-.2B-_compiler_to_use_with_a_specific_Python_version_.3F)

@colin-home colin-home merged commit 44be0d2 into MicrosoftDocs:master Nov 10, 2020
@colin-home
Copy link
Contributor

These documents are written to apply to VS 2015 and later, For VS 2008 and other out-of-support products, we provide access to the original docs under previous versions. Sure, it may be annoying if you're hoping for a one-stop shop for your Python 2.7 on mingw32 command-line argument parsing needs. But since 99% of readers are here for the most recent couple of versions of MSVC, we try not to slow them down with material that doesn't apply to them.

@vtjnash vtjnash deleted the patch-1 branch November 17, 2020 19:45
@vtjnash
Copy link
Contributor Author

vtjnash commented Nov 17, 2020

I just fear that it's still potentially misleading, since users may refer to this expecting to learn how to correctly launch an external program, and not realize that programs compiled by an older version of MSVC (such as python) may not follow these rules.

@vtjnash
Copy link
Contributor Author

vtjnash commented Nov 19, 2020

Just dropping back by again to record another another place in this repo I stumbled across, which had also copied the incorrect documentation from here: https://docs.microsoft.com/en-us/cpp/cpp/main-function-command-line-args?view=msvc-160

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants