bpo-36520: Email header folded incorrectly #13608

websurfer5 · 2019-05-28T02:33:17Z

Folding long email headers that contain multiple UTF-8 words in the first line results in corruption when UTF-8 words are found in subsequent lines. The fix is to reset the variable tracking the encoded word location in the line when a new line is started.

https://bugs.python.org/issue36520

line during an email header folding operation

…fix-issue-36520

tirkarthi · 2019-05-28T07:27:18Z

cc: @maxking

maxking

Thanks for fixing this.

maxking · 2019-05-29T02:18:55Z

Lib/test/test_email/test_message.py

@@ -784,6 +784,80 @@ def test_str_defaults_to_utf8(self):
        m['Subject'] = 'unicöde'
        self.assertEqual(str(m), 'Subject: unicöde\n\n')

+    def test_folding_with_utf8_encoding_1(self):
+        # issue #36520


It would be really helpful if these tests would add a one line comment about what are they actually testing. The name of the test uses numbers, so that doesn't help.

Something like,
`# Test header is folded correctly when maxlen falls in the middle of an encoded word.

I added descriptive comments to the test cases. IMO, meaningfully descriptive test names would be too lengthy for these tests.

maxking · 2019-05-29T02:29:49Z

Lib/email/_header_value_parser.py

@@ -2661,6 +2661,7 @@ def _refold_parse_tree(parse_tree, *, policy):
            newline = _steal_trailing_WSP_if_exists(lines)
            if newline or part.startswith_fws():
                lines.append(newline + tstr)
+                last_ew = None


Github won't let me comment on line not in this diff, but Line 2683 of this diff does the similar wrapping with a newline and I was wondering if there could be a code path to trigger that resulting in a similar bug?

That code path never gets triggered when there are UTF-8 characters in the input. Line 2607 collapses the UnstructuredTokenList into a byte string containing the entire input text, and then lines 2609-2619 determine that the text contains UTF-8 characters and sets want_encoding to True. This always sends it down the code path that calls _fold_as_ew() to fold a line with encoded words and moves on to the next token regardless of whether individual tokens are ASCII or UTF-8.

Ah, thanks for detailed explanation! My bad for not looking closely enough.

maxking · 2019-05-29T02:35:31Z

Misc/NEWS.d/next/Library/2019-05-28-02-37-00.bpo-36520.W4tday.rst

@@ -0,0 +1 @@
+Lengthy email headers with UTF-8 characters are now properly encoded when they are folded.


Can you also add Path by <your name> to the end for correct attribution in changelog?

I added my name to the NEWS blurb.

comments for the test_folding_with_utf8_encoding_* tests

warsaw · 2019-06-04T17:55:23Z

Lib/test/test_email/test_message.py

@@ -784,6 +784,137 @@ def test_str_defaults_to_utf8(self):
        m['Subject'] = 'unicöde'
        self.assertEqual(str(m), 'Subject: unicöde\n\n')

+    def test_folding_with_utf8_encoding_1(self):
+        # issue #36520


Shouldn't this say "bpo-36520"? What would "issue #36520" refer to if we move to GitHub issues?

I have made the requested changes; please review again

warsaw · 2019-06-04T17:56:34Z

Lib/test/test_email/test_message.py

+        m = EmailMessage()
+        m['Subject'] = 'Hello Wörld! Hello Wörld! '\
+                       'Hello Wörld! Hello Wörld!Hello Wörld!'
+        self.assertEqual(bytes(m), \


None of these backslashes are needed, since you're inside a parenthesized expression.

warsaw · 2019-06-04T17:56:47Z

Lib/test/test_email/test_message.py

+        # word.
+
+        m = EmailMessage()
+        m['Subject'] = 'Hello Wörld! Hello Wörld! '\


Please put a space before the backslash.

warsaw · 2019-06-04T17:57:04Z

Lib/test/test_email/test_message.py

+
+
+    def test_folding_with_utf8_encoding_2(self):
+        # issue #36520


warsaw · 2019-06-04T17:57:20Z

Lib/test/test_email/test_message.py

+        m = EmailMessage()
+        m['Subject'] = 'Hello Wörld! Hello Wörld! '\
+                       'Hello Wörlds123! Hello Wörld!Hello Wörld!'
+        self.assertEqual(bytes(m), \


Similar comment about all these (and subsequent) backslashes.

backslashes; add whitespace between terminating quotes and line-continuation backslashes; use "bpo-" instead of "issue #" in comments

maxking · 2019-06-05T01:32:06Z

Looks good to me!

bedevere-bot · 2019-06-06T19:53:30Z

@warsaw: Please replace # with GH- in the commit message next time. Thanks!

* bpo-36520: reset the encoded word offset when starting a new line during an email header folding operation * 📜🤖 Added by blurb_it. * bpo-36520: add an additional test case, and provide descriptive comments for the test_folding_with_utf8_encoding_* tests * bpo-36520: fix whitespace issue * bpo-36520: changes per reviewer request -- remove extraneous backslashes; add whitespace between terminating quotes and line-continuation backslashes; use "bpo-" instead of "issue GH-" in comments (cherry picked from commit f6713e8) Co-authored-by: websurfer5 <[email protected]>

* [bpo-36520](https://bugs.python.org/issue36520): reset the encoded word offset when starting a new line during an email header folding operation * 📜🤖 Added by blurb_it. * [bpo-36520](https://bugs.python.org/issue36520): add an additional test case, and provide descriptive comments for the test_folding_with_utf8_encoding_* tests * [bpo-36520](https://bugs.python.org/issue36520): fix whitespace issue * [bpo-36520](https://bugs.python.org/issue36520): changes per reviewer request -- remove extraneous backslashes; add whitespace between terminating quotes and line-continuation backslashes; use "bpo-" instead of "issue GH-" in comments (cherry picked from commit f6713e8) Co-authored-by: websurfer5 <[email protected]> https://bugs.python.org/issue36520

* bpo-36520: reset the encoded word offset when starting a new line during an email header folding operation * 📜🤖 Added by blurb_it. * bpo-36520: add an additional test case, and provide descriptive comments for the test_folding_with_utf8_encoding_* tests * bpo-36520: fix whitespace issue * bpo-36520: changes per reviewer request -- remove extraneous backslashes; add whitespace between terminating quotes and line-continuation backslashes; use "bpo-" instead of "issue #" in comments

bpo-36520: reset the encoded word offset when starting a new

b9f5288

line during an email header folding operation

websurfer5 requested a review from a team as a code owner May 28, 2019 02:33

the-knights-who-say-ni added the CLA signed label May 28, 2019

bedevere-bot added the awaiting review label May 28, 2019

blurb-it bot and others added 3 commits May 28, 2019 02:37

📜🤖 Added by blurb_it.

d989f75

Merge branch 'master' of github.com:python/cpython into fix-issue-36520

0b6032a

Merge branch 'fix-issue-36520' of github.com:websurfer5/cpython into …

0f1a7c5

…fix-issue-36520

maxking reviewed May 29, 2019

View reviewed changes

websurfer5 added 4 commits May 29, 2019 11:50

Merge branch 'master' of github.com:python/cpython into fix-issue-36520

2293ab5

Merge branch 'master' of github.com:python/cpython into fix-issue-36520

e568c7e

bpo-36520: add an additional test case, and provide descriptive

d4e969b

comments for the test_folding_with_utf8_encoding_* tests

bpo-36520: fix whitespace issue

a342519

auvipy approved these changes May 30, 2019

View reviewed changes

bedevere-bot added awaiting core review and removed awaiting review labels May 30, 2019

brettcannon added the type-bug An unexpected behavior, bug, or error label Jun 3, 2019

warsaw reviewed Jun 4, 2019

View reviewed changes

Lib/test/test_email/test_message.py Outdated

def test_folding_with_utf8_encoding_2(self):

# issue #36520

Copy link

Member

warsaw Jun 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bpo-36520

warsaw reviewed Jun 4, 2019

View reviewed changes

bpo-36520: changes per reviewer request -- remove extraneous

8133eeb

backslashes; add whitespace between terminating quotes and line-continuation backslashes; use "bpo-" instead of "issue #" in comments

warsaw merged commit f6713e8 into python:master Jun 6, 2019

bedevere-bot removed the awaiting core review label Jun 6, 2019

maxking mentioned this pull request Jun 8, 2019

[3.8] bpo-36520: Email header folded incorrectly (GH-13608) #13909

Merged

maxking mentioned this pull request Jun 8, 2019

[3.7] bpo-36520: Email header folded incorrectly (GH-13608) #13910

Merged

		@@ -0,0 +1 @@
		Lengthy email headers with UTF-8 characters are now properly encoded when they are folded.

Uh oh!

bpo-36520: Email header folded incorrectly #13608

bpo-36520: Email header folded incorrectly #13608

Uh oh!

Conversation

websurfer5 commented May 28, 2019 • edited by bedevere-bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tirkarthi commented May 28, 2019

Uh oh!

maxking left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

warsaw Jun 4, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maxking commented Jun 5, 2019

Uh oh!

bedevere-bot commented Jun 6, 2019

Uh oh!

Uh oh!

websurfer5 commented May 28, 2019 •

edited by bedevere-bot

Loading

maxking left a comment •

edited

Loading

warsaw Jun 4, 2019 •

edited

Loading