-
-
Notifications
You must be signed in to change notification settings - Fork 32.2k
bpo-36520: Email header folded incorrectly #13608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
line during an email header folding operation
cc: @maxking |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing this.
Lib/test/test_email/test_message.py
Outdated
@@ -784,6 +784,80 @@ def test_str_defaults_to_utf8(self): | |||
m['Subject'] = 'unicöde' | |||
self.assertEqual(str(m), 'Subject: unicöde\n\n') | |||
|
|||
def test_folding_with_utf8_encoding_1(self): | |||
# issue #36520 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be really helpful if these tests would add a one line comment about what are they actually testing. The name of the test uses numbers, so that doesn't help.
Something like,
`# Test header is folded correctly when maxlen falls in the middle of an encoded word.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added descriptive comments to the test cases. IMO, meaningfully descriptive test names would be too lengthy for these tests.
@@ -2661,6 +2661,7 @@ def _refold_parse_tree(parse_tree, *, policy): | |||
newline = _steal_trailing_WSP_if_exists(lines) | |||
if newline or part.startswith_fws(): | |||
lines.append(newline + tstr) | |||
last_ew = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Github won't let me comment on line not in this diff, but Line 2683 of this diff does the similar wrapping with a newline and I was wondering if there could be a code path to trigger that resulting in a similar bug?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That code path never gets triggered when there are UTF-8 characters in the input. Line 2607 collapses the UnstructuredTokenList into a byte string containing the entire input text, and then lines 2609-2619 determine that the text contains UTF-8 characters and sets want_encoding to True. This always sends it down the code path that calls _fold_as_ew() to fold a line with encoded words and moves on to the next token regardless of whether individual tokens are ASCII or UTF-8.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, thanks for detailed explanation! My bad for not looking closely enough.
@@ -0,0 +1 @@ | |||
Lengthy email headers with UTF-8 characters are now properly encoded when they are folded. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also add Path by <your name>
to the end for correct attribution in changelog?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added my name to the NEWS blurb.
comments for the test_folding_with_utf8_encoding_* tests
Lib/test/test_email/test_message.py
Outdated
@@ -784,6 +784,137 @@ def test_str_defaults_to_utf8(self): | |||
m['Subject'] = 'unicöde' | |||
self.assertEqual(str(m), 'Subject: unicöde\n\n') | |||
|
|||
def test_folding_with_utf8_encoding_1(self): | |||
# issue #36520 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have made the requested changes; please review again
Lib/test/test_email/test_message.py
Outdated
m = EmailMessage() | ||
m['Subject'] = 'Hello Wörld! Hello Wörld! '\ | ||
'Hello Wörld! Hello Wörld!Hello Wörld!' | ||
self.assertEqual(bytes(m), \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None of these backslashes are needed, since you're inside a parenthesized expression.
Lib/test/test_email/test_message.py
Outdated
# word. | ||
|
||
m = EmailMessage() | ||
m['Subject'] = 'Hello Wörld! Hello Wörld! '\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please put a space before the backslash.
Lib/test/test_email/test_message.py
Outdated
|
||
|
||
def test_folding_with_utf8_encoding_2(self): | ||
# issue #36520 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lib/test/test_email/test_message.py
Outdated
m = EmailMessage() | ||
m['Subject'] = 'Hello Wörld! Hello Wörld! '\ | ||
'Hello Wörlds123! Hello Wörld!Hello Wörld!' | ||
self.assertEqual(bytes(m), \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar comment about all these (and subsequent) backslashes.
backslashes; add whitespace between terminating quotes and line-continuation backslashes; use "bpo-" instead of "issue #" in comments
Looks good to me! |
@warsaw: Please replace |
* bpo-36520: reset the encoded word offset when starting a new line during an email header folding operation * 📜🤖 Added by blurb_it. * bpo-36520: add an additional test case, and provide descriptive comments for the test_folding_with_utf8_encoding_* tests * bpo-36520: fix whitespace issue * bpo-36520: changes per reviewer request -- remove extraneous backslashes; add whitespace between terminating quotes and line-continuation backslashes; use "bpo-" instead of "issue GH-" in comments (cherry picked from commit f6713e8) Co-authored-by: websurfer5 <[email protected]>
* bpo-36520: reset the encoded word offset when starting a new line during an email header folding operation * 📜🤖 Added by blurb_it. * bpo-36520: add an additional test case, and provide descriptive comments for the test_folding_with_utf8_encoding_* tests * bpo-36520: fix whitespace issue * bpo-36520: changes per reviewer request -- remove extraneous backslashes; add whitespace between terminating quotes and line-continuation backslashes; use "bpo-" instead of "issue GH-" in comments (cherry picked from commit f6713e8) Co-authored-by: websurfer5 <[email protected]>
* [bpo-36520](https://bugs.python.org/issue36520): reset the encoded word offset when starting a new line during an email header folding operation * 📜🤖 Added by blurb_it. * [bpo-36520](https://bugs.python.org/issue36520): add an additional test case, and provide descriptive comments for the test_folding_with_utf8_encoding_* tests * [bpo-36520](https://bugs.python.org/issue36520): fix whitespace issue * [bpo-36520](https://bugs.python.org/issue36520): changes per reviewer request -- remove extraneous backslashes; add whitespace between terminating quotes and line-continuation backslashes; use "bpo-" instead of "issue GH-" in comments (cherry picked from commit f6713e8) Co-authored-by: websurfer5 <[email protected]> https://bugs.python.org/issue36520
* [bpo-36520](https://bugs.python.org/issue36520): reset the encoded word offset when starting a new line during an email header folding operation * 📜🤖 Added by blurb_it. * [bpo-36520](https://bugs.python.org/issue36520): add an additional test case, and provide descriptive comments for the test_folding_with_utf8_encoding_* tests * [bpo-36520](https://bugs.python.org/issue36520): fix whitespace issue * [bpo-36520](https://bugs.python.org/issue36520): changes per reviewer request -- remove extraneous backslashes; add whitespace between terminating quotes and line-continuation backslashes; use "bpo-" instead of "issue GH-" in comments (cherry picked from commit f6713e8) Co-authored-by: websurfer5 <[email protected]> https://bugs.python.org/issue36520
* bpo-36520: reset the encoded word offset when starting a new line during an email header folding operation * 📜🤖 Added by blurb_it. * bpo-36520: add an additional test case, and provide descriptive comments for the test_folding_with_utf8_encoding_* tests * bpo-36520: fix whitespace issue * bpo-36520: changes per reviewer request -- remove extraneous backslashes; add whitespace between terminating quotes and line-continuation backslashes; use "bpo-" instead of "issue #" in comments
Folding long email headers that contain multiple UTF-8 words in the first line results in corruption when UTF-8 words are found in subsequent lines. The fix is to reset the variable tracking the encoded word location in the line when a new line is started.
https://bugs.python.org/issue36520