Skip to content

bpo-36520: Email header folded incorrectly #13608

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jun 6, 2019

Conversation

websurfer5
Copy link
Contributor

@websurfer5 websurfer5 commented May 28, 2019

Folding long email headers that contain multiple UTF-8 words in the first line results in corruption when UTF-8 words are found in subsequent lines. The fix is to reset the variable tracking the encoded word location in the line when a new line is started.

https://bugs.python.org/issue36520

line during an email header folding operation
@tirkarthi
Copy link
Member

cc: @maxking

Copy link
Contributor

@maxking maxking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this.

@@ -784,6 +784,80 @@ def test_str_defaults_to_utf8(self):
m['Subject'] = 'unicöde'
self.assertEqual(str(m), 'Subject: unicöde\n\n')

def test_folding_with_utf8_encoding_1(self):
# issue #36520
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be really helpful if these tests would add a one line comment about what are they actually testing. The name of the test uses numbers, so that doesn't help.

Something like,
`# Test header is folded correctly when maxlen falls in the middle of an encoded word.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added descriptive comments to the test cases. IMO, meaningfully descriptive test names would be too lengthy for these tests.

@@ -2661,6 +2661,7 @@ def _refold_parse_tree(parse_tree, *, policy):
newline = _steal_trailing_WSP_if_exists(lines)
if newline or part.startswith_fws():
lines.append(newline + tstr)
last_ew = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Github won't let me comment on line not in this diff, but Line 2683 of this diff does the similar wrapping with a newline and I was wondering if there could be a code path to trigger that resulting in a similar bug?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That code path never gets triggered when there are UTF-8 characters in the input. Line 2607 collapses the UnstructuredTokenList into a byte string containing the entire input text, and then lines 2609-2619 determine that the text contains UTF-8 characters and sets want_encoding to True. This always sends it down the code path that calls _fold_as_ew() to fold a line with encoded words and moves on to the next token regardless of whether individual tokens are ASCII or UTF-8.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, thanks for detailed explanation! My bad for not looking closely enough.

@@ -0,0 +1 @@
Lengthy email headers with UTF-8 characters are now properly encoded when they are folded.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add Path by <your name> to the end for correct attribution in changelog?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added my name to the NEWS blurb.

@brettcannon brettcannon added the type-bug An unexpected behavior, bug, or error label Jun 3, 2019
@@ -784,6 +784,137 @@ def test_str_defaults_to_utf8(self):
m['Subject'] = 'unicöde'
self.assertEqual(str(m), 'Subject: unicöde\n\n')

def test_folding_with_utf8_encoding_1(self):
# issue #36520
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this say "bpo-36520"? What would "issue #36520" refer to if we move to GitHub issues?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have made the requested changes; please review again

m = EmailMessage()
m['Subject'] = 'Hello Wörld! Hello Wörld! '\
'Hello Wörld! Hello Wörld!Hello Wörld!'
self.assertEqual(bytes(m), \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None of these backslashes are needed, since you're inside a parenthesized expression.

# word.

m = EmailMessage()
m['Subject'] = 'Hello Wörld! Hello Wörld! '\
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please put a space before the backslash.



def test_folding_with_utf8_encoding_2(self):
# issue #36520
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

m = EmailMessage()
m['Subject'] = 'Hello Wörld! Hello Wörld! '\
'Hello Wörlds123! Hello Wörld!Hello Wörld!'
self.assertEqual(bytes(m), \
Copy link
Member

@warsaw warsaw Jun 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comment about all these (and subsequent) backslashes.

backslashes; add whitespace between terminating quotes and
line-continuation backslashes; use "bpo-" instead of
"issue #" in comments
@maxking
Copy link
Contributor

maxking commented Jun 5, 2019

Looks good to me!

@warsaw warsaw merged commit f6713e8 into python:master Jun 6, 2019
@bedevere-bot
Copy link

@warsaw: Please replace # with GH- in the commit message next time. Thanks!

maxking pushed a commit to maxking/cpython-1 that referenced this pull request Jun 8, 2019
* bpo-36520: reset the encoded word offset when starting a new
line during an email header folding operation

* 📜🤖 Added by blurb_it.

* bpo-36520: add an additional test case, and provide descriptive
comments for the test_folding_with_utf8_encoding_* tests

* bpo-36520: fix whitespace issue

* bpo-36520: changes per reviewer request -- remove extraneous
backslashes; add whitespace between terminating quotes and
line-continuation backslashes; use "bpo-" instead of
"issue GH-" in comments
(cherry picked from commit f6713e8)

Co-authored-by: websurfer5 <[email protected]>
maxking pushed a commit to maxking/cpython-1 that referenced this pull request Jun 8, 2019
* bpo-36520: reset the encoded word offset when starting a new
line during an email header folding operation

* 📜🤖 Added by blurb_it.

* bpo-36520: add an additional test case, and provide descriptive
comments for the test_folding_with_utf8_encoding_* tests

* bpo-36520: fix whitespace issue

* bpo-36520: changes per reviewer request -- remove extraneous
backslashes; add whitespace between terminating quotes and
line-continuation backslashes; use "bpo-" instead of
"issue GH-" in comments
(cherry picked from commit f6713e8)

Co-authored-by: websurfer5 <[email protected]>
miss-islington pushed a commit that referenced this pull request Jun 11, 2019
* [bpo-36520](https://bugs.python.org/issue36520): reset the encoded word offset when starting a new
line during an email header folding operation

* 📜🤖 Added by blurb_it.

* [bpo-36520](https://bugs.python.org/issue36520): add an additional test case, and provide descriptive
comments for the test_folding_with_utf8_encoding_* tests

* [bpo-36520](https://bugs.python.org/issue36520): fix whitespace issue

* [bpo-36520](https://bugs.python.org/issue36520): changes per reviewer request -- remove extraneous
backslashes; add whitespace between terminating quotes and
line-continuation backslashes; use "bpo-" instead of
"issue GH-" in comments
(cherry picked from commit f6713e8)

Co-authored-by: websurfer5 <[email protected]>


https://bugs.python.org/issue36520
miss-islington pushed a commit that referenced this pull request Jun 11, 2019
* [bpo-36520](https://bugs.python.org/issue36520): reset the encoded word offset when starting a new
line during an email header folding operation

* 📜🤖 Added by blurb_it.

* [bpo-36520](https://bugs.python.org/issue36520): add an additional test case, and provide descriptive
comments for the test_folding_with_utf8_encoding_* tests

* [bpo-36520](https://bugs.python.org/issue36520): fix whitespace issue

* [bpo-36520](https://bugs.python.org/issue36520): changes per reviewer request -- remove extraneous
backslashes; add whitespace between terminating quotes and
line-continuation backslashes; use "bpo-" instead of
"issue GH-" in comments
(cherry picked from commit f6713e8)

Co-authored-by: websurfer5 <[email protected]>


https://bugs.python.org/issue36520
DinoV pushed a commit to DinoV/cpython that referenced this pull request Jan 14, 2020
* bpo-36520: reset the encoded word offset when starting a new
line during an email header folding operation

* 📜🤖 Added by blurb_it.

* bpo-36520: add an additional test case, and provide descriptive
comments for the test_folding_with_utf8_encoding_* tests

* bpo-36520: fix whitespace issue

* bpo-36520: changes per reviewer request -- remove extraneous
backslashes; add whitespace between terminating quotes and
line-continuation backslashes; use "bpo-" instead of
"issue #" in comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants