Skip to content

bpo-16285: Update urllib quoting to RFC 3986 #173

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Feb 25, 2017

Conversation

rtnpro
Copy link
Contributor

@rtnpro rtnpro commented Feb 19, 2017

This pull request enhances urllib quoting from RFC 2396 to RFC 3986.

This is an effort to merge initial efforts at http://bugs.python.org/file34950/0be3805cade1.diff in http://bugs.python.org/issue16285.

@the-knights-who-say-ni
Copy link

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept your contribution by verifying you have signed the PSF contributor agreement (CLA).

Unfortunately we couldn't find an account corresponding to your GitHub username on bugs.python.org (b.p.o) to verify you have signed the CLA. This is necessary for legal reasons before we can look at your contribution. Please follow these steps to help rectify the issue:

  1. If you don't have an account on b.p.o, please create one
  2. Make sure your GitHub username is listed in "Your Details" at b.p.o
  3. If you have not already done so, please sign the PSF contributor agreement
  4. If you just signed the CLA, please wait at least one US business day and then check "Your Details" on bugs.python.org to see if your account has been marked as having signed the CLA (the delay is due to a person having to manually check your signed CLA)
  5. Reply here saying you have completed the above steps

Thanks again to your contribution and we look forward to looking at it!

Copy link
Contributor

@ncoghlan ncoghlan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change looks basically OK, but will need test updates, and probably docs updates as well.

@rtnpro rtnpro force-pushed the fix-16285 branch 2 times, most recently from 9e93107 to 1294937 Compare February 19, 2017 11:25
function is intended for quoting the path section of URL. The optional *safe*
parameter specifies additional ASCII characters that should not be quoted
--- its default value is ``'/'``.

*string* may be either a :class:`str` or a :class:`bytes`.

Python 3.7 updates from using RFC 2396 to RFC 3986 to quote URL strings.
Now, "~" is included in the set of reserved characters.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should use a .. versionchanged: 3.7 note (there are several examples for that earlier in the file).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed!

@@ -761,7 +761,7 @@ def test_never_quote(self):
do_not_quote = '' .join(["ABCDEFGHIJKLMNOPQRSTUVWXYZ",
"abcdefghijklmnopqrstuvwxyz",
"0123456789",
"_.-"])
"_.-~"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this is the test case update I missed in my earlier review :)

@rtnpro
Copy link
Contributor Author

rtnpro commented Feb 22, 2017

@ncoghlan Is there anything else that needs to be updated in this PR?

function is intended for quoting the path section of URL. The optional *safe*
parameter specifies additional ASCII characters that should not be quoted
--- its default value is ``'/'``.

*string* may be either a :class:`str` or a :class:`bytes`.

.. versionchanged:: 3.7
Moving from RFC 2396 to RFC 3986 for quoting URL strings. Now, "~" is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We typically write these notes in past tense, so a slight tweak here:

Moved from RFC ...

And in the 2nd sentence:

"~" is now included...


Each of these characters is reserved in some component of a URL,
but not necessarily in all of them.

Python 3.7 updates from using RFC 2396 to RFC 3986 to quote URL strings.
Now, "~" is included in the set of reserved characters.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I missed this one earlier: there's no need to have the version change info in the docstring, so the change in the RFC reference and the addition of ~ to the set of reserved characters is sufficient here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's also wrong as stated above.

@ncoghlan
Copy link
Contributor

@rtnpro Almost! Just a couple of small comments on the docs and docstring updates.

@rtnpro
Copy link
Contributor Author

rtnpro commented Feb 24, 2017

@ncoghlan versionchanged documentation fixed by 6ce8001

@ncoghlan ncoghlan self-assigned this Feb 24, 2017
@ncoghlan
Copy link
Contributor

Assigning to myself to do the final ACKS/NEWS/What's New updates before merging.

@rtnpro
Copy link
Contributor Author

rtnpro commented Feb 24, 2017

Squashed commits!

@ncoghlan ncoghlan merged commit 21024f0 into python:master Feb 25, 2017
@ncoghlan
Copy link
Contributor

Added the sprint label, as this PR was submitted at the PyCon Pune 2017 core development sprint.

@openandclose
Copy link

I think the document and the comments are wrong here.

What's changed is the inclusion of '~' to unreserved characters.
https://tools.ietf.org/html/rfc3986#section-2.3

So the comments here are wrong.

cpython/Lib/urllib/parse.py

Lines 742 to 743 in 41439a0

reserved = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" |
"$" | "," | "~"

cpython/Lib/urllib/parse.py

Lines 748 to 750 in 41439a0

Python 3.7 updates from using RFC 2396 to RFC 3986 to quote URL strings.
Now, "~" is included in the set of reserved characters.

41439a0#diff-01126e4d609357e876a7fd7baa62ce18

And the document here is wrong.

.. versionchanged:: 3.7
Moved from RFC 2396 to RFC 3986 for quoting URL strings. "~" is now
included in the set of reserved characters.

41439a0#diff-67a4980d053b561c26794c63eb5ac1de

@joernhees
Copy link
Contributor

Being pinged about this in #2568, i can only repeat: Python never actually followed 2396 to quote, but did always quote additional characters such as () (see the old https://bugs.python.org/issue12910#msg297675). The problem is that this has been in place for so long that people will definitely rely on it.

Nevertheless, stating that quote was updated to RFC 3986 is still somewhat misleading.

the following reserved characters.

reserved = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" |
"$" | ","
"$" | "," | "~"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is wrong: "~" is in the set of _UN_reserved chars in RFC 3986, please see https://bugs.python.org/issue12910 and its PR #2568


Each of these characters is reserved in some component of a URL,
but not necessarily in all of them.

Python 3.7 updates from using RFC 2396 to RFC 3986 to quote URL strings.
Now, "~" is included in the set of reserved characters.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's also wrong as stated above.

akruis pushed a commit to akruis/cpython that referenced this pull request May 27, 2021
The unused full clobber list must contain "st" too.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants