-
Notifications
You must be signed in to change notification settings - Fork 10.5k
[Parser] Detect non-breaking space (U+00A0) and offer a fix-it #16090
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
1645493
to
9d28cd5
Compare
lib/Parse/Lexer.cpp
Outdated
@@ -716,6 +722,12 @@ static bool isRightBound(const char *tokEnd, bool isLeftBound, | |||
else | |||
return true; | |||
|
|||
case '\302': | |||
if (tokEnd[1] == '\240') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you make these '\xC2'
and '\xA0'
?
lib/Parse/Lexer.cpp
Outdated
@@ -682,6 +682,12 @@ static bool isLeftBound(const char *tokBegin, const char *bufferBegin) { | |||
else | |||
return true; | |||
|
|||
case '\240': | |||
if (tokBegin - 1 != bufferBegin && tokBegin[-2] == '\302') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
lib/Parse/Lexer.cpp
Outdated
} else if (Codepoint == 0x000000A0) { | ||
// Non-breaking whitespace (U+00A0) | ||
diagnose(CurPtr - 1, diag::lex_nonbreaking_space) | ||
.fixItReplaceChars(getSourceLoc(CurPtr - 1), getSourceLoc(Tmp), " "); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: indent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! Just a few code style requests.
Also, could you add tests for fix-it in |
Thanks @rintaro, all suggestions incorporated. |
I think dedicated test file is OK. Since this test is handling invisible characters, it's more safe from accidental (automatic) edits. |
@@ -716,6 +722,12 @@ static bool isRightBound(const char *tokEnd, bool isLeftBound, | |||
else | |||
return true; | |||
|
|||
case '\xC2': | |||
if (tokEnd[1] == '\xA0') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you confirm that this dereference is safe? Can we assume that the buffer is always null-terminated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's guaranteed. https://github.com/apple/swift/blob/9ba05efb/lib/Parse/Lexer.cpp#L188
Thank you Ahmad! Please mark the SR issue RESOLVED. |
Detect Unicode non-breaking space character (U+00A0) in the lexer, diagnose it, and offer a fix-it turning it into a regular space. Also treat the non-breaking space as whitespace when considering whether an operator is left/right bound.
Resolves SR-7122.