Skip to content

Commit 41439a0

Browse files
committed
bpo-16285: Update urllib quoting to RFC 3986
Initial work done by ctheune at http://bugs.python.org/file34950/0be3805cade1.diff.
1 parent ace5c0f commit 41439a0

File tree

3 files changed

+13
-6
lines changed

3 files changed

+13
-6
lines changed

Doc/library/urllib.parse.rst

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -451,13 +451,17 @@ task isn't already covered by the URL parsing functions above.
451451
.. function:: quote(string, safe='/', encoding=None, errors=None)
452452

453453
Replace special characters in *string* using the ``%xx`` escape. Letters,
454-
digits, and the characters ``'_.-'`` are never quoted. By default, this
454+
digits, and the characters ``'_.-~'`` are never quoted. By default, this
455455
function is intended for quoting the path section of URL. The optional *safe*
456456
parameter specifies additional ASCII characters that should not be quoted
457457
--- its default value is ``'/'``.
458458

459459
*string* may be either a :class:`str` or a :class:`bytes`.
460460

461+
.. versionchanged:: 3.7
462+
Moved from RFC 2396 to RFC 3986 for quoting URL strings. "~" is now
463+
included in the set of reserved characters.
464+
461465
The optional *encoding* and *errors* parameters specify how to deal with
462466
non-ASCII characters, as accepted by the :meth:`str.encode` method.
463467
*encoding* defaults to ``'utf-8'``.

Lib/test/test_urllib.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -733,7 +733,7 @@ def test_short_content_raises_ContentTooShortError_without_reporthook(self):
733733
class QuotingTests(unittest.TestCase):
734734
r"""Tests for urllib.quote() and urllib.quote_plus()
735735
736-
According to RFC 2396 (Uniform Resource Identifiers), to escape a
736+
According to RFC 3986 (Uniform Resource Identifiers), to escape a
737737
character you write it as '%' + <2 character US-ASCII hex value>.
738738
The Python code of ``'%' + hex(ord(<character>))[2:]`` escapes a
739739
character properly. Case does not matter on the hex letters.
@@ -761,7 +761,7 @@ def test_never_quote(self):
761761
do_not_quote = '' .join(["ABCDEFGHIJKLMNOPQRSTUVWXYZ",
762762
"abcdefghijklmnopqrstuvwxyz",
763763
"0123456789",
764-
"_.-"])
764+
"_.-~"])
765765
result = urllib.parse.quote(do_not_quote)
766766
self.assertEqual(do_not_quote, result,
767767
"using quote(): %r != %r" % (do_not_quote, result))

Lib/urllib/parse.py

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -704,7 +704,7 @@ def unquote_plus(string, encoding='utf-8', errors='replace'):
704704
_ALWAYS_SAFE = frozenset(b'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
705705
b'abcdefghijklmnopqrstuvwxyz'
706706
b'0123456789'
707-
b'_.-')
707+
b'_.-~')
708708
_ALWAYS_SAFE_BYTES = bytes(_ALWAYS_SAFE)
709709
_safe_quoters = {}
710710

@@ -736,15 +736,18 @@ def quote(string, safe='/', encoding=None, errors=None):
736736
Each part of a URL, e.g. the path info, the query, etc., has a
737737
different set of reserved characters that must be quoted.
738738
739-
RFC 2396 Uniform Resource Identifiers (URI): Generic Syntax lists
739+
RFC 3986 Uniform Resource Identifiers (URI): Generic Syntax lists
740740
the following reserved characters.
741741
742742
reserved = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" |
743-
"$" | ","
743+
"$" | "," | "~"
744744
745745
Each of these characters is reserved in some component of a URL,
746746
but not necessarily in all of them.
747747
748+
Python 3.7 updates from using RFC 2396 to RFC 3986 to quote URL strings.
749+
Now, "~" is included in the set of reserved characters.
750+
748751
By default, the quote function is intended for quoting the path
749752
section of a URL. Thus, it will not encode '/'. This character
750753
is reserved, but in typical usage the quote function is being

0 commit comments

Comments
 (0)