-
-
Notifications
You must be signed in to change notification settings - Fork 32.2k
bpo-16285: Update urllib quoting to RFC 3986 #173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hello, and thanks for your contribution! I'm a bot set up to make sure that the project can legally accept your contribution by verifying you have signed the PSF contributor agreement (CLA). Unfortunately we couldn't find an account corresponding to your GitHub username on bugs.python.org (b.p.o) to verify you have signed the CLA. This is necessary for legal reasons before we can look at your contribution. Please follow these steps to help rectify the issue:
Thanks again to your contribution and we look forward to looking at it! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change looks basically OK, but will need test updates, and probably docs updates as well.
9e93107
to
1294937
Compare
Doc/library/urllib.parse.rst
Outdated
function is intended for quoting the path section of URL. The optional *safe* | ||
parameter specifies additional ASCII characters that should not be quoted | ||
--- its default value is ``'/'``. | ||
|
||
*string* may be either a :class:`str` or a :class:`bytes`. | ||
|
||
Python 3.7 updates from using RFC 2396 to RFC 3986 to quote URL strings. | ||
Now, "~" is included in the set of reserved characters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should use a .. versionchanged: 3.7
note (there are several examples for that earlier in the file).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed!
@@ -761,7 +761,7 @@ def test_never_quote(self): | |||
do_not_quote = '' .join(["ABCDEFGHIJKLMNOPQRSTUVWXYZ", | |||
"abcdefghijklmnopqrstuvwxyz", | |||
"0123456789", | |||
"_.-"]) | |||
"_.-~"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And this is the test case update I missed in my earlier review :)
@ncoghlan Is there anything else that needs to be updated in this PR? |
Doc/library/urllib.parse.rst
Outdated
function is intended for quoting the path section of URL. The optional *safe* | ||
parameter specifies additional ASCII characters that should not be quoted | ||
--- its default value is ``'/'``. | ||
|
||
*string* may be either a :class:`str` or a :class:`bytes`. | ||
|
||
.. versionchanged:: 3.7 | ||
Moving from RFC 2396 to RFC 3986 for quoting URL strings. Now, "~" is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We typically write these notes in past tense, so a slight tweak here:
Moved from RFC ...
And in the 2nd sentence:
"~" is now included...
|
||
Each of these characters is reserved in some component of a URL, | ||
but not necessarily in all of them. | ||
|
||
Python 3.7 updates from using RFC 2396 to RFC 3986 to quote URL strings. | ||
Now, "~" is included in the set of reserved characters. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I missed this one earlier: there's no need to have the version change info in the docstring, so the change in the RFC reference and the addition of ~
to the set of reserved characters is sufficient here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's also wrong as stated above.
@rtnpro Almost! Just a couple of small comments on the docs and docstring updates. |
Assigning to myself to do the final ACKS/NEWS/What's New updates before merging. |
Initial work done by ctheune at http://bugs.python.org/file34950/0be3805cade1.diff.
Squashed commits! |
Added the sprint label, as this PR was submitted at the PyCon Pune 2017 core development sprint. |
I think the document and the comments are wrong here. What's changed is the inclusion of '~' to unreserved characters. So the comments here are wrong. Lines 742 to 743 in 41439a0
Lines 748 to 750 in 41439a0
41439a0#diff-01126e4d609357e876a7fd7baa62ce18 And the document here is wrong. cpython/Doc/library/urllib.parse.rst Lines 461 to 464 in 41439a0
|
Being pinged about this in #2568, i can only repeat: Python never actually followed 2396 to Nevertheless, stating that quote was updated to RFC 3986 is still somewhat misleading. |
the following reserved characters. | ||
|
||
reserved = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" | | ||
"$" | "," | ||
"$" | "," | "~" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is wrong: "~" is in the set of _UN_reserved chars in RFC 3986, please see https://bugs.python.org/issue12910 and its PR #2568
|
||
Each of these characters is reserved in some component of a URL, | ||
but not necessarily in all of them. | ||
|
||
Python 3.7 updates from using RFC 2396 to RFC 3986 to quote URL strings. | ||
Now, "~" is included in the set of reserved characters. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's also wrong as stated above.
The unused full clobber list must contain "st" too.
This pull request enhances urllib quoting from RFC 2396 to RFC 3986.
This is an effort to merge initial efforts at http://bugs.python.org/file34950/0be3805cade1.diff in http://bugs.python.org/issue16285.