Skip to content

[2.7] bpo-32861: urllib.robotparser fix incomplete __str__ methods. (GH-5711) (GH-6795) #6817

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 14, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion Lib/robotparser.py
Original file line number Diff line number Diff line change
Expand Up @@ -160,7 +160,10 @@ def can_fetch(self, useragent, url):


def __str__(self):
return ''.join([str(entry) + "\n" for entry in self.entries])
entries = self.entries
if self.default_entry is not None:
entries = entries + [self.default_entry]
return '\n'.join(map(str, entries)) + '\n'


class RuleLine:
Expand Down
26 changes: 26 additions & 0 deletions Lib/test/test_robotparser.py
Original file line number Diff line number Diff line change
Expand Up @@ -136,6 +136,31 @@ class DefaultEntryTest(BaseRobotTest, unittest.TestCase):
bad = ['/cyberworld/map/index.html']


class StringFormattingTest(BaseRobotTest, unittest.TestCase):
robots_txt = """\
User-agent: *
Crawl-delay: 1
Request-rate: 3/15
Disallow: /cyberworld/map/ # This is an infinite virtual URL space

# Cybermapper knows where to go.
User-agent: cybermapper
Disallow: /some/path
"""

expected_output = """\
User-agent: cybermapper
Disallow: /some/path

User-agent: *
Disallow: /cyberworld/map/

"""

def test_string_formatting(self):
self.assertEqual(str(self.parser), self.expected_output)


class RobotHandler(BaseHTTPRequestHandler):

def do_GET(self):
Expand Down Expand Up @@ -226,6 +251,7 @@ def test_main():
UseFirstUserAgentWildcardTest,
EmptyQueryStringTest,
DefaultEntryTest,
StringFormattingTest,
PasswordProtectedSiteTestCase,
NetworkTestCase)

Expand Down
1 change: 1 addition & 0 deletions Misc/ACKS
Original file line number Diff line number Diff line change
Expand Up @@ -807,6 +807,7 @@ Ben Laurie
Simon Law
Julia Lawall
Chris Lawrence
Michael Lazar
Brian Leair
Mathieu Leduc-Hamel
Amandine Lee
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
The urllib.robotparser's ``__str__`` representation now includes wildcard
entries and the "Crawl-delay" and "Request-rate" fields. Patch by
Michael Lazar.