-
-
Notifications
You must be signed in to change notification settings - Fork 32.3k
bpo-32409 Ensures activate.bat can handle Unicode contents #10295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hello, and thanks for your contribution! I'm a bot set up to make sure that the project can legally accept your contribution by verifying you have signed the PSF contributor agreement (CLA). Unfortunately we couldn't find an account corresponding to your GitHub username on bugs.python.org (b.p.o) to verify you have signed the CLA (this might be simply due to a missing "GitHub Name" entry in your b.p.o account settings). This is necessary for legal reasons before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue. You can check yourself to see if the CLA has been received. Thanks again for your contribution, we look forward to reviewing it! |
Lib/venv/scripts/nt/activate.bat
Outdated
@@ -2,7 +2,7 @@ | |||
|
|||
rem This file is UTF-8 encoded, so we need to update the current code page while executing it | |||
for /f "tokens=2 delims=:" %%a in ('"%SystemRoot%\System32\chcp.com"') do ( | |||
set "_OLD_CODEPAGE=%%a" | |||
for %%b in ("%%a") do set "_OLD_CODEPAGE=%%~nxb" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also modify delims
in the first loop to include ".", e.g. "tokens=2 delims=.:"
. It's still the 2nd token because the period, if present, is at the end of the string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also modify
delims
Yes, that certainly seems easier to understand than the somewhat obscure ~nx
which is really intended for file extensions and the usage here non-obvious.
Both solutions - delims
and ~nx
- may not work with some other locale if that happens to use another character than .
in the message - we're at the whims of Windows developers here, in some sense. Is there any way we can make the solution more general to cope with that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the very fast reaction. I verified: for /F "tokens=2 delims=:." %a in ('"C:\WINDOWS\System32\chcp.com"') also works on a Swiss German Windows 10 and solves the problem as well.
The alternative I considered was using the C-API:
for /f %%a in ('%~dp0python.exe -c "from ctypes import windll; print(windll.kernel32.GetConsoleOutputCP());"') do (set "_OLD_CODEPAGE=%%a")
This is likely more portable among Windows versions. python.exe should be part of the virtualenv scripts folder. The %~dp0 returns the folder of the script file even if called from another batch file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Call GetConsoleCP
to get the current input codepage, as chcp.com does. CMD actually uses the output codepage to decode the script, which we overwrite and lose when chcp.com sets the codepage. We could remove the reliance on chcp.com entirely by calling GetConsoleOutputCP
and SetConsoleOutputCP
via ctypes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay. I have done that and also saved and set the Console input.
Lib/venv/scripts/nt/activate.bat
Outdated
set "_OLD_CODEPAGE=%%a" | ||
) | ||
rem This file is UTF-8 encoded, so we need to update the current code page while executing it. | ||
for /f %%a in ('__VENV_PYTHON__ -s -S -E -c "from ctypes import windll; print(windll.kernel32.GetConsoleOutputCP());"') do (set "_OLD_CODEPAGE=%%a") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't use __VENV_PYTHON__
since it's UTF-8. Go back to using %dp0python.exe
.
The command is shorter without the from
import, e.g. import ctypes; print(ctypes.windll.kernel32.GetConsoleOutputCP())
.
Lib/venv/scripts/nt/activate.bat
Outdated
) | ||
rem This file is UTF-8 encoded, so we need to update the current code page while executing it. | ||
for /f %%a in ('__VENV_PYTHON__ -s -S -E -c "from ctypes import windll; print(windll.kernel32.GetConsoleOutputCP());"') do (set "_OLD_CODEPAGE=%%a") | ||
for /f %%a in ('__VENV_PYTHON__ -s -S -E -c "from ctypes import windll; print(windll.kernel32.GetConsoleCP());"') do (set "_OLD_INPUT_CODEPAGE=%%a") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to save, set, and restore the console input codepage. CMD decodes the script using the console output codepage, so that's the only one that needs to temporarily change to UTF-8.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have tried to be closer to the original behaviour of the scripts as chcp will still return the original codepage when just setting the ConsoleOutputCP:
python.exe -s -S -E -c "import ctypes; ctypes.windll.kernel32.SetConsoleOutputCP(65001);"
chcp
Aktive Codepage: 850.
But when:
>python.exe -s -S -E -c "import ctypes; ctypes.windll.kernel32.SetConsoleCP(65001); ctypes.windll.kernel32.SetConsoleOutputCP(65001);"
chcp
Aktive Codepage: 65001.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I explained previously that chcp.com returns the input codepage but sets both the input and output codepages. This means the original script potentially loses the initial output codepage. It's only an issue if they're different, which is a rare configuration. The point of using ctypes instead of chcp.com, other than working around the localization problem, is to focus on saving, setting, and restoring just the output codepage, which is all that's needed to make CMD correctly decode this UTF-8 batch script.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the extended explanation. I have removed the unneeded code in the new commit. I would like to squash all the commits together and add the useful explanation in the commit message. What do you think about the parameters I use to call python: -s -S -E?
-E Ignore all PYTHON* environment variables, e.g. PYTHONPATH and PYTHONHOME, that might be set.
-s Don’t add the user site-packages directory to sys.path.
See also PEP 370 – Per user site-packages directory
-S Disable the import of the module site and the site-dependent manipulations of sys.path that it entails. Also disable these manipulations if site is explicitly imported later (call site.main() if you want them to be triggered.)
Lib/venv/scripts/nt/activate.bat
Outdated
if defined _OLD_CODEPAGE ( | ||
"%SystemRoot%\System32\chcp.com" 65001 > nul | ||
%~dp0python.exe -s -S -E -c "import ctypes; ctypes.windll.kernel32.SetConsoleOutputCP(65001);" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a nit - I don't think the trailing semicolon is required - it's a separator and not a terminator. Same applies below, of course.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
haha. ctypes pulled me to the dark side ;-). Your comment made my day "just a nit" is a lovely expression. Thanks.
@eryksun - any comments? If not, I can merge this. |
Lib/venv/scripts/nt/activate.bat
Outdated
set "_OLD_CODEPAGE=%%a" | ||
) | ||
rem This file is UTF-8 encoded, so we need to update the current code page while executing it. | ||
for /f %%a in ('%~dp0python.exe -s -S -E -c "import ctypes; print(ctypes.windll.kernel32.GetConsoleOutputCP())"') do (set "_OLD_CODEPAGE=%%a") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The -I option (3.4+) implies -Es
and also removes the user site packages and script directory from sys.path
. The script directory should be the working directory for a -c
command. I don't think we need to skip importing the site module via -S
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. I have changed to -Ic and squashed the commits.
…thonGH-10295) Handle Unicode contents on localized Windows systems when activating a virtualenv. activate.bat currently breaks on German Windows systems as chcp does not return a plain number as on English systems, but appends a dot at the end (for example "Aktive Codepage: 850." instead of "Active Codepage: 850). The dependency to chcp.com is removed and ctypes is used to get, set and restore the console ouput codepage. The codepage for console input is not changed. We can't use __VENV_PYTHON__ to find python.exe since it's UTF-8. CMD decodes the script using the console output codepage.
Thanks @samstagern for the PR, and @vsajip for merging it 🌮🎉.. I'm working now to backport this PR to: 3.7. |
GH-10377 is a backport of this pull request to the 3.7 branch. |
…ythonGH-10295) Handle Unicode contents on localised Windows systems when activating a venv. activate.bat currently breaks on German Windows systems, as chcp.com does not return a plain number as on English systems, but (arbitrarily) appends a dot at the end (for example "Aktive Codepage: 850." instead of "Active Codepage: 850"). The dependency to chcp.com is removed and ctypes is used to get, set and restore the console output code page. The code page for console input is not changed. We can't use __VENV_PYTHON__ to find python.exe, since it's UTF-8. cmd.exe decodes the script using the console output code page. (cherry picked from commit c64583b) Co-authored-by: samstagern <[email protected]>
|
|
|
|
|
…H-10295) (GH-10377) Handle Unicode contents on localised Windows systems when activating a venv. activate.bat currently breaks on German Windows systems, as chcp.com does not return a plain number as on English systems, but (arbitrarily) appends a dot at the end (for example "Aktive Codepage: 850." instead of "Active Codepage: 850"). The dependency to chcp.com is removed and ctypes is used to get, set and restore the console output code page. The code page for console input is not changed. We can't use __VENV_PYTHON__ to find python.exe, since it's UTF-8. cmd.exe decodes the script using the console output code page. (cherry picked from commit c64583b) Co-authored-by: samstagern <[email protected]>
…ndows (pythonGH-10295)" This reverts commit c64583b.
…ndows (pythonGH-10295)" (pythonGH-10403) This reverts commit c64583b due to multiple buildbot failures when building it. (cherry picked from commit 6843ffe) Co-authored-by: Pablo Galindo <[email protected]>
…ndows (GH-10295)" (GH-10403) This reverts commit c64583b due to multiple buildbot failures when building it. (cherry picked from commit 6843ffe) Co-authored-by: Pablo Galindo <[email protected]>
The buildbot failure might be due to missing "python.exe" when the environment is created by a debug-only build with "python_d.exe". In this case at the beginning of the script we can test whether "%~dp0python.exe" or "%~dp0python_d.exe" exists, setting a |
I reverted this PR in #10403. Please, add me as a reviewer in the next PR of this issue so I can test on the buildbots before merging. |
@eryksun I like the solution of injecting the name by a constant as that can be properly controlled. An alternative is to always provide (a link to) the executable with a fixed name "python.exe" (the unix version of venv will always be called "bin\python"). |
OTOH doing it all in the |
I like the Name too.
Am Do., 8. Nov. 2018 um 09:29 Uhr schrieb Vinay Sajip <
[email protected]>:
… I like the solution of injecting the name by a constant
OTOH doing it all in the activate.bat script keeps everything localised
to where the problem is, so that seems preferable to me.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#10295 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AnGIi7xQXDXL11rqFBC-p0r6JnnYL1bCks5us-tZgaJpZM4YLbhY>
.
|
For my usecase that would be enough to simply check for either python.exe or python_d.exe. In that case it should not set and reset the codepage in case it cannot find those executables. The alternative there would be to keep the improved chcp call or a combination of ctypes with a fallback to chcp. I think the fallback is not needed. |
I'm worried that someone might need venv with an embedded Python that has neither "python.exe" nor "python_d.exe". Is that possible? Supporting a template parameter for the executable name is a simple change. Probably only activate.bat would ever use it, but I don't see a problem with that. It's a reasonable parameter to support. |
Also, please take into account that ctypes may not be available because someone can choose to compile python without libffi and the executable won't have the extension module. |
if defined _OLD_CODEPAGE ( | ||
"%SystemRoot%\System32\chcp.com" 65001 > nul | ||
%~dp0python.exe -Ic "import ctypes; ctypes.windll.kernel32.SetConsoleOutputCP(65001)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the tracker @zooba pointed that you're missing the quoting around %~dp0python.exe
. I don't think the issue with the buildbot failures is related to spaces in the path, but remember to fix this problem as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. I will do that. I will wait a few days until the design is cleared. My assumptions now are:
- we need a new PR starting of the up-to-date master
- we need to address the case that ctypes is not available with chcp.com
- we need quoting to support spaces in folder names
- we need to inject the python_exe name by a as a template variable such as VENV_PYTHON_EXE_NAME, assuming that this name is always ASCII
- we need a fallback in case VENV_PYTHON_EXE_NAME is not available
Please correct me where you differ. I will make the PR as soon as you agree on the solution. Sorry to have broken the build.
|
Not especially, since there are potential drawbacks with that too, as @pablogsal pointed out. I guess the problem might resurface if there's a code page for which |
activate.bat breaks on German Windows systems as chcp does not return a plain number as on English systems, but appends a dot at the end (for example "Aktive Codepage: 850." instead of "Active Codepage: 850). We convert the string to a number deleting the trailing dot.
Note: this must be backported to Python 3.6 and Python 3.7
The solution of msg308934 on https://bugs.python.org/issue32409 does not take internationalization of Windows in account. I could create a new bugticket.
https://bugs.python.org/issue32409