Skip to content

Commit d4324ba

Browse files
authored
bpo-30500: urllib: Simplify splithost by calling into urlparse. (#1849) (#2294)
The current regex based splitting produces a wrong result. For example:: http://abc#@def Web browsers parse that URL as ``http://abc/#@def``, that is, the host is ``abc``, the path is ``/``, and the fragment is ``#@def``. (cherry picked from commit 90e01e5)
1 parent b39a748 commit d4324ba

File tree

4 files changed

+27
-2
lines changed

4 files changed

+27
-2
lines changed

Lib/test/test_urllib.py

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -879,6 +879,26 @@ def test_splithost(self):
879879
self.assertEqual(splithost('/foo/bar/baz.html'),
880880
(None, '/foo/bar/baz.html'))
881881

882+
# bpo-30500: # starts a fragment.
883+
self.assertEqual(splithost('//127.0.0.1#@host.com'),
884+
('127.0.0.1', '/#@host.com'))
885+
self.assertEqual(splithost('//127.0.0.1#@host.com:80'),
886+
('127.0.0.1', '/#@host.com:80'))
887+
self.assertEqual(splithost('//127.0.0.1:80#@host.com'),
888+
('127.0.0.1:80', '/#@host.com'))
889+
890+
# Empty host is returned as empty string.
891+
self.assertEqual(splithost("///file"),
892+
('', '/file'))
893+
894+
# Trailing semicolon, question mark and hash symbol are kept.
895+
self.assertEqual(splithost("//example.net/file;"),
896+
('example.net', '/file;'))
897+
self.assertEqual(splithost("//example.net/file?"),
898+
('example.net', '/file?'))
899+
self.assertEqual(splithost("//example.net/file#"),
900+
('example.net', '/file#'))
901+
882902
def test_splituser(self):
883903
splituser = urllib.splituser
884904
self.assertEqual(splituser('User:[email protected]:080'),

Lib/urllib.py

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1093,8 +1093,7 @@ def splithost(url):
10931093
"""splithost('//host[:port]/path') --> 'host[:port]', '/path'."""
10941094
global _hostprog
10951095
if _hostprog is None:
1096-
import re
1097-
_hostprog = re.compile('^//([^/?]*)(.*)$')
1096+
_hostprog = re.compile('//([^/#?]*)(.*)', re.DOTALL)
10981097

10991098
match = _hostprog.match(url)
11001099
if match:

Misc/ACKS

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -993,6 +993,7 @@ Chad Netzer
993993
Max Neunhöffer
994994
George Neville-Neil
995995
Hieu Nguyen
996+
Nam Nguyen
996997
Johannes Nicolai
997998
Samuel Nicolary
998999
Jonathan Niehof

Misc/NEWS

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,11 @@ Extension Modules
5252
Library
5353
-------
5454

55+
- [Security] bpo-30500: Fix urllib.splithost() to correctly parse
56+
fragments. For example, ``splithost('//127.0.0.1#@evil.com/')`` now
57+
correctly returns the ``127.0.0.1`` host, instead of treating ``@evil.com``
58+
as the host in an authentification (``login@host``).
59+
5560
- [Security] bpo-29591: Update expat copy from 2.1.1 to 2.2.0 to get fixes
5661
of CVE-2016-0718 and CVE-2016-4472. See
5762
https://sourceforge.net/p/expat/bugs/537/ for more information.

0 commit comments

Comments
 (0)