Skip to content

Diagnose incorrect indentation in multi-line string literals #1255

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Jan 28, 2023

Conversation

ahoppen
Copy link
Member

@ahoppen ahoppen commented Jan 20, 2023

In contrast to the C++ parser, we diagnose the indentation in the parser instead of the lexer. This is necessary since the lexer doesn’t know the expected indentation because it can’t/shouldn’t look ahead to the closing quote to know the expected indentation. Thus, the lexer requires relatively few changes.

The bulk of the changes are in SwiftParser/StringLiterals.swift. Here, we post-process the tokens produced by the lexer, validating indentation. The basic ideas are:

  • The text of the string segments should only contain text that’s actually part of the string literal. Indentation will be stripped away and re-classified as trivia.
  • Similarly the following part of the string become trivia
    • The newline that separates the opening """ and the first line
    • The last newline before the closing """
    • Escaped newlines (lines ending with \)
  • If we discover an indentation error in one of the tokens, we mark that token as having a insufficientIndentationInMultilineStringLiteral lexer error – really lexer error is misnamed here now, maybe token error would be better but renaming that would be a follow-up PR

During diagnostics generation MultiLineStringLiteralIndentatinDiagnosticsGenerator collects all errors in a multi-line string literal, so we can show errors like “incorrect indentation in the next 3 lines”, which requires a little bit of context between the errors inside the string literal.

The remaining changes are fairly straightforward.

@ahoppen ahoppen force-pushed the ahoppen/multi-line-string-errors branch from f718ad7 to 167f2fb Compare January 20, 2023 18:20
Comment on lines 613 to 626
fixedSource: #"""
_ = """
\
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this fix incorrect? Ie. we now have a multiline string that ends with an escaped newline 😅

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes but if you start off with

""" \ 
"""

I think I don’t care about you. If you want, you can create a specialized diagnostics for it 😜

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it doesn't take too much time, sure :). Worst case we would then just have another fix it right now, so 🤷

// -------------------------------------------------------------------------
// Variables

var middleSegments = allSegments
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like we could avoid removing from the start and then inserting later if we instead just kept separate arrays here, ie.

let multilineSegments = allSegments.dropFirst().dropLast() // this is a slice rather than the array
let expectedStart = allSegments.count > 1 ? allSegments.first : nil
let expectedIndentation = allSegments.last()

var processed = [RawStringLiteralSegmentsSyntax.Element]()

I can understand not wanting to do that though as it makes it a little easier to accidentally drop a segment, I still think it's cleaner but... up to you. I'd change the removeLast to just a popLast in either case (no popFirst, I assume because they didn't want to make that easy to do 🤷).

IMO it also wouldn't hurt to have a comment at the start of this function explaining the expected segments, ie. newline, the actual string segments, ending indentation + that no trivia is expected (the assertion does cover that though).

Copy link
Member Author

@ahoppen ahoppen Jan 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’d prefer not to do that. This code is complicated even if you don’t need to worry about array slices. We can always revisit if we discover that this is a performance bottleneck (which I really don’t expect).

Also, the problem is that we don’t build up the processed segments from first to last. Eg. in the “Parse indentation of the closing quote” section, we append to middleSegments before the actual middle lines have been processed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that one would have to be appended at the end :P. I was expecting this answer, so it's all good 👍

isSegmentOnNewLine = true
} else {
if let firstSegment = firstSegment {
middleSegments.insert(.stringSegment(firstSegment), at: 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following above this would then be processed.append(.stringSegment(firstSegment)).

case .stringSegment(var segment):
// We are not considering leading trivia for indentation computation.
// If these assertions are violated, we can probably lift them but we
// would need to check the produce the expected results.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

        // would need to check the produce the expected results.

to check the produce 😅. we maybe? IMO I'd probably just skip the comment and the assertion since we already have this assertion at the start of the function (for all segments).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is particularly checking that we haven’t introduced leading trivia in one of the post-processing steps before.

I originally also had an assertion that we don’t have any unexpected text here and needed to lift that, which was OK, it just reminded me that something changed.

} else {
let actualIndentation = segment.content.tokenText.prefix(while: { $0 == UInt8(ascii: " ") || $0 == UInt8(ascii: "\t") })
let actualIndentationTriva = TriviaParser.parseTrivia(SyntaxText(rebasing: actualIndentation), position: .leading)
let content = segment.content.reclassifyAsLeadingTrivia(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this just take the length to reclassify and do the trivia parsing itself?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we can't because TriviaParser lives in the SyntaxParser module whereas these functions are defined in the SwiftSyntax module. And if we wanted to pull them up into SyntaxParser, we would need to make scary implementation details (rawData, RawTriviaPieceBuffer, designated factory methods to create raw nodes) public.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought about this more and spoke to Rintaro. Few questions:

  1. Could we move this into RawSyntaxTokenView since these only apply to tokens?
  2. To avoid a whole bunch of code, how would you feel if we always gave back parsed tokens instead? I'd even be fine with just asserting on the materialized case. But this way we could completely avoid having to parse the trivia + just pass in the length as above.

arena: self.arena
)
}
middleSegments[index] = .stringSegment(segment)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would become an append (and the one below).

@ahoppen ahoppen changed the title Diagnose incorrect indentation in multi-line string literals 🚥 #1254 Diagnose incorrect indentation in multi-line string literals Jan 23, 2023
@ahoppen ahoppen force-pushed the ahoppen/multi-line-string-errors branch from 167f2fb to f4f1d6e Compare January 23, 2023 14:12
@ahoppen
Copy link
Member Author

ahoppen commented Jan 23, 2023

@swift-ci Please test

2 similar comments
@ahoppen
Copy link
Member Author

ahoppen commented Jan 23, 2023

@swift-ci Please test

@ahoppen
Copy link
Member Author

ahoppen commented Jan 23, 2023

@swift-ci Please test

@ahoppen ahoppen force-pushed the ahoppen/multi-line-string-errors branch from 1366461 to 6bb4225 Compare January 23, 2023 16:30
@ahoppen
Copy link
Member Author

ahoppen commented Jan 23, 2023

@swift-ci Please test

@ahoppen ahoppen force-pushed the ahoppen/multi-line-string-errors branch from 6bb4225 to 79d7aa5 Compare January 24, 2023 15:44
@ahoppen
Copy link
Member Author

ahoppen commented Jan 24, 2023

@swift-ci Please test

@ahoppen ahoppen force-pushed the ahoppen/multi-line-string-errors branch from 79d7aa5 to 00b5095 Compare January 24, 2023 21:35
// We are only testing materialized token here because parsed token should
// be covered pretty well in the parser. Materialized tokens are less common
// and need dedicated testing.
withExtendedLifetime(SyntaxArena()) { arena in
Copy link
Contributor

@bnbarham bnbarham Jan 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIURC you don't actually need to nest it. Ie.

let arena = SyntaxArena()
withExtendedLifetime(arena) {}

works and avoids all the nesting.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t think that’ correct. withExtendedLifetime only extends to the end of the closure. We could put the withExtendedLifetime at the end of the test case but AFAIK the indented design of this function is that you put the code that assumes the variable is alive inside the closure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh, sorry. I forgot to add the defer :)

defer withExtendedLifetime(arena) {}

} else {
let actualIndentation = segment.content.tokenText.prefix(while: { $0 == UInt8(ascii: " ") || $0 == UInt8(ascii: "\t") })
let actualIndentationTriva = TriviaParser.parseTrivia(SyntaxText(rebasing: actualIndentation), position: .leading)
let content = segment.content.reclassifyAsLeadingTrivia(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought about this more and spoke to Rintaro. Few questions:

  1. Could we move this into RawSyntaxTokenView since these only apply to tokens?
  2. To avoid a whole bunch of code, how would you feel if we always gave back parsed tokens instead? I'd even be fine with just asserting on the materialized case. But this way we could completely avoid having to parse the trivia + just pass in the length as above.

Comment on lines 262 to 265
assert(
String(syntaxText: SyntaxText(baseAddress: dat.wholeText.baseAddress?.advanced(by: -extendedTriviaByteLength), count: extendedTriviaByteLength))
== Trivia(pieces: extendedTrivia.map(TriviaPiece.init)).description
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a little scary considering this is SwiftSyntax (instead of Parser)

This always succeeds in the current parser, but in general "Assuming that text representing extendedTrivia preceeds this token," should be more like "Assuming that all extendedTrivia text and the entire self text are consecutive in a single buffer". Maybe: text -> SyntaxText? Or we might want to have a fallback to MaterializedToken path if that's not the case.
Future incremental parsing might break this assertion too, if we introduce more usages of these functions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, I'm not saying we should support such situation unnecessarily. I just think we need to clarify what is supported.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went with @bnbarham’s suggestion and just create materialized tokens for the re-classified cases. This simplifies parsing and allows us to get rid of all these scary methods. 2deaafd

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh, to be clear my suggestion was to use parsed token and not parse the trivia at all. I don't feel super strongly about this though. Using materialized tokens are a little more heavyweight, I do think it's worth the trade off here. Are you okay with this @rintaro?

@ahoppen ahoppen force-pushed the ahoppen/multi-line-string-errors branch from 1d805dc to 2deaafd Compare January 25, 2023 12:15
@ahoppen
Copy link
Member Author

ahoppen commented Jan 25, 2023

@swift-ci Please test

allevato added a commit to allevato/swift-format that referenced this pull request Jan 27, 2023
This is a companion to swiftlang/swift-syntax#1255.

The new structure of multiline strings yielded some nice cleanup of the
way we handle those strings *directly*, but to keep the existing
indentation decisions, some parts of multiline string processing bled
out into other areas. Such is life.
@ahoppen ahoppen force-pushed the ahoppen/multi-line-string-errors branch from 2deaafd to 848c9d2 Compare January 28, 2023 08:39
@ahoppen ahoppen force-pushed the ahoppen/multi-line-string-errors branch from 162c04d to 0c2e50e Compare January 28, 2023 08:51
@ahoppen
Copy link
Member Author

ahoppen commented Jan 28, 2023

swiftlang/swift-format#480

@swift-ci Please test

@ahoppen ahoppen merged commit 5c98b24 into swiftlang:main Jan 28, 2023
@ahoppen ahoppen deleted the ahoppen/multi-line-string-errors branch January 28, 2023 14:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants