Diagnose incorrect indentation in multi-line string literals #1255

ahoppen · 2023-01-20T17:16:01Z

In contrast to the C++ parser, we diagnose the indentation in the parser instead of the lexer. This is necessary since the lexer doesn’t know the expected indentation because it can’t/shouldn’t look ahead to the closing quote to know the expected indentation. Thus, the lexer requires relatively few changes.

The bulk of the changes are in SwiftParser/StringLiterals.swift. Here, we post-process the tokens produced by the lexer, validating indentation. The basic ideas are:

The text of the string segments should only contain text that’s actually part of the string literal. Indentation will be stripped away and re-classified as trivia.
Similarly the following part of the string become trivia
- The newline that separates the opening """ and the first line
- The last newline before the closing """
- Escaped newlines (lines ending with \)
If we discover an indentation error in one of the tokens, we mark that token as having a insufficientIndentationInMultilineStringLiteral lexer error – really lexer error is misnamed here now, maybe token error would be better but renaming that would be a follow-up PR

During diagnostics generation MultiLineStringLiteralIndentatinDiagnosticsGenerator collects all errors in a multi-line string literal, so we can show errors like “incorrect indentation in the next 3 lines”, which requires a little bit of context between the errors inside the string literal.

The remaining changes are fairly straightforward.

Sources/SwiftParser/Lexer/Cursor.swift

Sources/SwiftParserDiagnostics/DiagnosticExtensions.swift

Sources/SwiftParserDiagnostics/LexerDiagnosticMessages.swift

Sources/SwiftParserDiagnostics/MultiLineStringLiteralDiagnoticsGenerator.swift

bnbarham · 2023-01-21T06:47:20Z

Tests/SwiftParserTest/translated/MultilineErrorsTests.swift

+      fixedSource: #"""
+        _ = """
+          \
+          """


Isn't this fix incorrect? Ie. we now have a multiline string that ends with an escaped newline 😅

Yes but if you start off with

""" \ """

I think I don’t care about you. If you want, you can create a specialized diagnostics for it 😜

If it doesn't take too much time, sure :). Worst case we would then just have another fix it right now, so 🤷

Tests/SwiftParserTest/translated/MultilineErrorsTests.swift

Sources/SwiftSyntax/Raw/RawSyntaxTokenView.swift

Sources/SwiftSyntax/Raw/RawSyntax.swift

bnbarham · 2023-01-21T22:13:41Z

Sources/SwiftParser/StringLiterals.swift

+    // -------------------------------------------------------------------------
+    // Variables
+
+    var middleSegments = allSegments


Seems like we could avoid removing from the start and then inserting later if we instead just kept separate arrays here, ie.

let multilineSegments = allSegments.dropFirst().dropLast() // this is a slice rather than the array let expectedStart = allSegments.count > 1 ? allSegments.first : nil let expectedIndentation = allSegments.last() var processed = [RawStringLiteralSegmentsSyntax.Element]()

I can understand not wanting to do that though as it makes it a little easier to accidentally drop a segment, I still think it's cleaner but... up to you. I'd change the removeLast to just a popLast in either case (no popFirst, I assume because they didn't want to make that easy to do 🤷).

IMO it also wouldn't hurt to have a comment at the start of this function explaining the expected segments, ie. newline, the actual string segments, ending indentation + that no trivia is expected (the assertion does cover that though).

I’d prefer not to do that. This code is complicated even if you don’t need to worry about array slices. We can always revisit if we discover that this is a performance bottleneck (which I really don’t expect).

Also, the problem is that we don’t build up the processed segments from first to last. Eg. in the “Parse indentation of the closing quote” section, we append to middleSegments before the actual middle lines have been processed.

Yeah, that one would have to be appended at the end :P. I was expecting this answer, so it's all good 👍

Sources/SwiftParser/StringLiterals.swift

bnbarham · 2023-01-21T22:27:37Z

Sources/SwiftParser/StringLiterals.swift

+      isSegmentOnNewLine = true
+    } else {
+      if let firstSegment = firstSegment {
+        middleSegments.insert(.stringSegment(firstSegment), at: 0)


Following above this would then be processed.append(.stringSegment(firstSegment)).

Sources/SwiftParser/StringLiterals.swift

bnbarham · 2023-01-21T22:34:42Z

Sources/SwiftParser/StringLiterals.swift

+      case .stringSegment(var segment):
+        // We are not considering leading trivia for indentation computation.
+        // If these assertions are violated, we can probably lift them but we
+        // would need to check the produce the expected results.


// would need to check the produce the expected results.

to check the produce 😅. we maybe? IMO I'd probably just skip the comment and the assertion since we already have this assertion at the start of the function (for all segments).

This one is particularly checking that we haven’t introduced leading trivia in one of the post-processing steps before.

I originally also had an assertion that we don’t have any unexpected text here and needed to lift that, which was OK, it just reminded me that something changed.

Sources/SwiftParser/StringLiterals.swift

bnbarham · 2023-01-21T22:37:32Z

Sources/SwiftParser/StringLiterals.swift

+          } else {
+            let actualIndentation = segment.content.tokenText.prefix(while: { $0 == UInt8(ascii: " ") || $0 == UInt8(ascii: "\t") })
+            let actualIndentationTriva = TriviaParser.parseTrivia(SyntaxText(rebasing: actualIndentation), position: .leading)
+            let content = segment.content.reclassifyAsLeadingTrivia(


Could this just take the length to reclassify and do the trivia parsing itself?

No, we can't because TriviaParser lives in the SyntaxParser module whereas these functions are defined in the SwiftSyntax module. And if we wanted to pull them up into SyntaxParser, we would need to make scary implementation details (rawData, RawTriviaPieceBuffer, designated factory methods to create raw nodes) public.

Thought about this more and spoke to Rintaro. Few questions:

Could we move this into RawSyntaxTokenView since these only apply to tokens?

To avoid a whole bunch of code, how would you feel if we always gave back parsed tokens instead? I'd even be fine with just asserting on the materialized case. But this way we could completely avoid having to parse the trivia + just pass in the length as above.

Sources/SwiftParser/StringLiterals.swift

bnbarham · 2023-01-21T22:41:00Z

Sources/SwiftParser/StringLiterals.swift

+            arena: self.arena
+          )
+        }
+        middleSegments[index] = .stringSegment(segment)


Would become an append (and the one below).

ahoppen · 2023-01-23T14:12:39Z

@swift-ci Please test

ahoppen · 2023-01-23T14:52:51Z

@swift-ci Please test

ahoppen · 2023-01-23T15:45:30Z

@swift-ci Please test

ahoppen · 2023-01-23T16:30:24Z

@swift-ci Please test

ahoppen · 2023-01-24T15:45:00Z

@swift-ci Please test

bnbarham · 2023-01-24T21:46:51Z

Tests/SwiftSyntaxTest/RawSyntaxTests.swift

+    // We are only testing materialized token here because parsed token should
+    // be covered pretty well in the parser. Materialized tokens are less common
+    // and need dedicated testing.
+    withExtendedLifetime(SyntaxArena()) { arena in


IIURC you don't actually need to nest it. Ie.

let arena = SyntaxArena() withExtendedLifetime(arena) {}

works and avoids all the nesting.

I don’t think that’ correct. withExtendedLifetime only extends to the end of the closure. We could put the withExtendedLifetime at the end of the test case but AFAIK the indented design of this function is that you put the code that assumes the variable is alive inside the closure.

Heh, sorry. I forgot to add the defer :)

defer withExtendedLifetime(arena) {}

bnbarham · 2023-01-24T22:23:08Z

Sources/SwiftParser/StringLiterals.swift

+          } else {
+            let actualIndentation = segment.content.tokenText.prefix(while: { $0 == UInt8(ascii: " ") || $0 == UInt8(ascii: "\t") })
+            let actualIndentationTriva = TriviaParser.parseTrivia(SyntaxText(rebasing: actualIndentation), position: .leading)
+            let content = segment.content.reclassifyAsLeadingTrivia(


Thought about this more and spoke to Rintaro. Few questions:

Could we move this into RawSyntaxTokenView since these only apply to tokens?

To avoid a whole bunch of code, how would you feel if we always gave back parsed tokens instead? I'd even be fine with just asserting on the materialized case. But this way we could completely avoid having to parse the trivia + just pass in the length as above.

rintaro · 2023-01-24T22:32:21Z

Sources/SwiftSyntax/Raw/RawSyntaxNodeProtocol.swift

+      assert(
+        String(syntaxText: SyntaxText(baseAddress: dat.wholeText.baseAddress?.advanced(by: -extendedTriviaByteLength), count: extendedTriviaByteLength))
+          == Trivia(pieces: extendedTrivia.map(TriviaPiece.init)).description
+      )


This is a little scary considering this is SwiftSyntax (instead of Parser)

This always succeeds in the current parser, but in general "Assuming that text representing extendedTrivia preceeds this token," should be more like "Assuming that all extendedTrivia text and the entire self text are consecutive in a single buffer". Maybe: text -> SyntaxText? Or we might want to have a fallback to MaterializedToken path if that's not the case.
Future incremental parsing might break this assertion too, if we introduce more usages of these functions.

To clarify, I'm not saying we should support such situation unnecessarily. I just think we need to clarify what is supported.

I went with @bnbarham’s suggestion and just create materialized tokens for the re-classified cases. This simplifies parsing and allows us to get rid of all these scary methods. 2deaafd

Heh, to be clear my suggestion was to use parsed token and not parse the trivia at all. I don't feel super strongly about this though. Using materialized tokens are a little more heavyweight, I do think it's worth the trade off here. Are you okay with this @rintaro?

ahoppen · 2023-01-25T16:00:07Z

@swift-ci Please test

This is a companion to swiftlang/swift-syntax#1255. The new structure of multiline strings yielded some nice cleanup of the way we handle those strings *directly*, but to keep the existing indentation decisions, some parts of multiline string processing bled out into other areas. Such is life.

…rals

…tring literals

… a multi-line string literal

…ral does not escape the newline

…ped newline to lexer This simplifies parsing of string literals while only making the lexer slightly more complex. It also fixes two bugs where we incorrectly identified a trailing `\` as escaped even if it wasn’t.

…havior

ahoppen · 2023-01-28T08:51:12Z

swiftlang/swift-format#480

@swift-ci Please test

ahoppen requested review from rintaro, bnbarham and CodaFi January 20, 2023 17:16

ahoppen force-pushed the ahoppen/multi-line-string-errors branch from f718ad7 to 167f2fb Compare January 20, 2023 18:20

bnbarham reviewed Jan 21, 2023

View reviewed changes

ahoppen changed the title ~~Diagnose incorrect indentation in multi-line string literals 🚥 #1254~~ Diagnose incorrect indentation in multi-line string literals Jan 23, 2023

ahoppen force-pushed the ahoppen/multi-line-string-errors branch from 167f2fb to f4f1d6e Compare January 23, 2023 14:12

ahoppen force-pushed the ahoppen/multi-line-string-errors branch from 1366461 to 6bb4225 Compare January 23, 2023 16:30

ahoppen force-pushed the ahoppen/multi-line-string-errors branch from 6bb4225 to 79d7aa5 Compare January 24, 2023 15:44

ahoppen force-pushed the ahoppen/multi-line-string-errors branch from 79d7aa5 to 00b5095 Compare January 24, 2023 21:35

bnbarham reviewed Jan 24, 2023

View reviewed changes

rintaro reviewed Jan 24, 2023

View reviewed changes

ahoppen force-pushed the ahoppen/multi-line-string-errors branch from 1d805dc to 2deaafd Compare January 25, 2023 12:15

allevato mentioned this pull request Jan 27, 2023

Update swift-format to account for new multiline string tree structure. swiftlang/swift-format#480

Merged

ahoppen force-pushed the ahoppen/multi-line-string-errors branch from 2deaafd to 848c9d2 Compare January 28, 2023 08:39

ahoppen added 7 commits January 28, 2023 09:50

Diagnose incorrect quotes in multiline string literals

dd55a8e

Diagnose invalid string segment indentation in multi-line string lite…

d3fe02d

…rals

Diagnose incorrect indentation of expression segments in multi-line s…

f148d7b

…tring literals

Diagnose if the last newline of a multi-line string literal is escaped

9d79cdc

Add a backslash trivia kind for backslashs that escape the newline in…

09a2fbd

… a multi-line string literal

Fix multi-line string literal errors with line endings other than \n

7a5ff56

Address review comments

cc975ae

ahoppen added 7 commits January 28, 2023 09:50

Fix compilation errors in Swift <5.8

9c2cdea

Allow empty lines without indenation in multi-line string literals

1e28b74

An escaped backslash at the end of a line in a mulit-line string lite…

f5c0823

…ral does not escape the newline

Simplify trivia reclassification logic

6a2c006

Fix compilation errors after rebase

c20084a

Update indentation of a comment because of changed swift-formating be…

0c2e50e

…havior

ahoppen force-pushed the ahoppen/multi-line-string-errors branch from 162c04d to 0c2e50e Compare January 28, 2023 08:51

ahoppen merged commit 5c98b24 into swiftlang:main Jan 28, 2023

ahoppen deleted the ahoppen/multi-line-string-errors branch January 28, 2023 14:15

Diagnose incorrect indentation in multi-line string literals #1255

Diagnose incorrect indentation in multi-line string literals #1255

Uh oh!

Conversation

ahoppen commented Jan 20, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ahoppen Jan 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ahoppen commented Jan 23, 2023

Uh oh!

ahoppen commented Jan 23, 2023

Uh oh!

ahoppen commented Jan 23, 2023

Uh oh!

ahoppen commented Jan 23, 2023

Uh oh!

ahoppen commented Jan 24, 2023

Uh oh!

bnbarham Jan 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ahoppen commented Jan 25, 2023

Uh oh!

ahoppen commented Jan 28, 2023

ahoppen commented Jan 20, 2023 •

edited

Loading

ahoppen Jan 23, 2023 •

edited

Loading

bnbarham Jan 24, 2023 •

edited

Loading