-
Notifications
You must be signed in to change notification settings - Fork 440
Diagnose incorrect indentation in multi-line string literals #1255
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Diagnose incorrect indentation in multi-line string literals #1255
Conversation
f718ad7
to
167f2fb
Compare
Sources/SwiftParserDiagnostics/MultiLineStringLiteralDiagnoticsGenerator.swift
Outdated
Show resolved
Hide resolved
Sources/SwiftParserDiagnostics/MultiLineStringLiteralDiagnoticsGenerator.swift
Outdated
Show resolved
Hide resolved
fixedSource: #""" | ||
_ = """ | ||
\ | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this fix incorrect? Ie. we now have a multiline string that ends with an escaped newline 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes but if you start off with
""" \
"""
I think I don’t care about you. If you want, you can create a specialized diagnostics for it 😜
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it doesn't take too much time, sure :). Worst case we would then just have another fix it right now, so 🤷
// ------------------------------------------------------------------------- | ||
// Variables | ||
|
||
var middleSegments = allSegments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like we could avoid removing from the start and then inserting later if we instead just kept separate arrays here, ie.
let multilineSegments = allSegments.dropFirst().dropLast() // this is a slice rather than the array
let expectedStart = allSegments.count > 1 ? allSegments.first : nil
let expectedIndentation = allSegments.last()
var processed = [RawStringLiteralSegmentsSyntax.Element]()
I can understand not wanting to do that though as it makes it a little easier to accidentally drop a segment, I still think it's cleaner but... up to you. I'd change the removeLast
to just a popLast
in either case (no popFirst
, I assume because they didn't want to make that easy to do 🤷).
IMO it also wouldn't hurt to have a comment at the start of this function explaining the expected segments, ie. newline, the actual string segments, ending indentation + that no trivia is expected (the assertion does cover that though).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’d prefer not to do that. This code is complicated even if you don’t need to worry about array slices. We can always revisit if we discover that this is a performance bottleneck (which I really don’t expect).
Also, the problem is that we don’t build up the processed segments from first to last. Eg. in the “Parse indentation of the closing quote” section, we append to middleSegments
before the actual middle lines have been processed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that one would have to be appended at the end :P. I was expecting this answer, so it's all good 👍
isSegmentOnNewLine = true | ||
} else { | ||
if let firstSegment = firstSegment { | ||
middleSegments.insert(.stringSegment(firstSegment), at: 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following above this would then be processed.append(.stringSegment(firstSegment))
.
case .stringSegment(var segment): | ||
// We are not considering leading trivia for indentation computation. | ||
// If these assertions are violated, we can probably lift them but we | ||
// would need to check the produce the expected results. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// would need to check the produce the expected results.
to check the produce
😅. we
maybe? IMO I'd probably just skip the comment and the assertion since we already have this assertion at the start of the function (for all segments).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is particularly checking that we haven’t introduced leading trivia in one of the post-processing steps before.
I originally also had an assertion that we don’t have any unexpected text here and needed to lift that, which was OK, it just reminded me that something changed.
} else { | ||
let actualIndentation = segment.content.tokenText.prefix(while: { $0 == UInt8(ascii: " ") || $0 == UInt8(ascii: "\t") }) | ||
let actualIndentationTriva = TriviaParser.parseTrivia(SyntaxText(rebasing: actualIndentation), position: .leading) | ||
let content = segment.content.reclassifyAsLeadingTrivia( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this just take the length to reclassify and do the trivia parsing itself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, we can't because TriviaParser
lives in the SyntaxParser module whereas these functions are defined in the SwiftSyntax module. And if we wanted to pull them up into SyntaxParser, we would need to make scary implementation details (rawData
, RawTriviaPieceBuffer
, designated factory methods to create raw nodes) public.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thought about this more and spoke to Rintaro. Few questions:
- Could we move this into
RawSyntaxTokenView
since these only apply to tokens? - To avoid a whole bunch of code, how would you feel if we always gave back parsed tokens instead? I'd even be fine with just asserting on the materialized case. But this way we could completely avoid having to parse the trivia + just pass in the length as above.
arena: self.arena | ||
) | ||
} | ||
middleSegments[index] = .stringSegment(segment) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would become an append (and the one below).
167f2fb
to
f4f1d6e
Compare
@swift-ci Please test |
1366461
to
6bb4225
Compare
@swift-ci Please test |
6bb4225
to
79d7aa5
Compare
@swift-ci Please test |
79d7aa5
to
00b5095
Compare
// We are only testing materialized token here because parsed token should | ||
// be covered pretty well in the parser. Materialized tokens are less common | ||
// and need dedicated testing. | ||
withExtendedLifetime(SyntaxArena()) { arena in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIURC you don't actually need to nest it. Ie.
let arena = SyntaxArena()
withExtendedLifetime(arena) {}
works and avoids all the nesting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don’t think that’ correct. withExtendedLifetime
only extends to the end of the closure. We could put the withExtendedLifetime
at the end of the test case but AFAIK the indented design of this function is that you put the code that assumes the variable is alive inside the closure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heh, sorry. I forgot to add the defer
:)
defer withExtendedLifetime(arena) {}
} else { | ||
let actualIndentation = segment.content.tokenText.prefix(while: { $0 == UInt8(ascii: " ") || $0 == UInt8(ascii: "\t") }) | ||
let actualIndentationTriva = TriviaParser.parseTrivia(SyntaxText(rebasing: actualIndentation), position: .leading) | ||
let content = segment.content.reclassifyAsLeadingTrivia( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thought about this more and spoke to Rintaro. Few questions:
- Could we move this into
RawSyntaxTokenView
since these only apply to tokens? - To avoid a whole bunch of code, how would you feel if we always gave back parsed tokens instead? I'd even be fine with just asserting on the materialized case. But this way we could completely avoid having to parse the trivia + just pass in the length as above.
assert( | ||
String(syntaxText: SyntaxText(baseAddress: dat.wholeText.baseAddress?.advanced(by: -extendedTriviaByteLength), count: extendedTriviaByteLength)) | ||
== Trivia(pieces: extendedTrivia.map(TriviaPiece.init)).description | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a little scary considering this is SwiftSyntax
(instead of Parser
)
This always succeeds in the current parser, but in general "Assuming that text representing extendedTrivia
preceeds this token," should be more like "Assuming that all extendedTrivia
text and the entire self
text are consecutive in a single buffer". Maybe: text
-> SyntaxText
? Or we might want to have a fallback to MaterializedToken
path if that's not the case.
Future incremental parsing might break this assertion too, if we introduce more usages of these functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify, I'm not saying we should support such situation unnecessarily. I just think we need to clarify what is supported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heh, to be clear my suggestion was to use parsed token and not parse the trivia at all. I don't feel super strongly about this though. Using materialized tokens are a little more heavyweight, I do think it's worth the trade off here. Are you okay with this @rintaro?
1d805dc
to
2deaafd
Compare
@swift-ci Please test |
This is a companion to swiftlang/swift-syntax#1255. The new structure of multiline strings yielded some nice cleanup of the way we handle those strings *directly*, but to keep the existing indentation decisions, some parts of multiline string processing bled out into other areas. Such is life.
2deaafd
to
848c9d2
Compare
… a multi-line string literal
…ral does not escape the newline
…ped newline to lexer This simplifies parsing of string literals while only making the lexer slightly more complex. It also fixes two bugs where we incorrectly identified a trailing `\` as escaped even if it wasn’t.
162c04d
to
0c2e50e
Compare
@swift-ci Please test |
In contrast to the C++ parser, we diagnose the indentation in the parser instead of the lexer. This is necessary since the lexer doesn’t know the expected indentation because it can’t/shouldn’t look ahead to the closing quote to know the expected indentation. Thus, the lexer requires relatively few changes.
The bulk of the changes are in SwiftParser/StringLiterals.swift. Here, we post-process the tokens produced by the lexer, validating indentation. The basic ideas are:
"""
and the first line"""
\
)insufficientIndentationInMultilineStringLiteral
lexer error – really lexer error is misnamed here now, maybe token error would be better but renaming that would be a follow-up PRDuring diagnostics generation
MultiLineStringLiteralIndentatinDiagnosticsGenerator
collects all errors in a multi-line string literal, so we can show errors like “incorrect indentation in the next 3 lines”, which requires a little bit of context between the errors inside the string literal.The remaining changes are fairly straightforward.