Skip to content

Commit ddc8063

Browse files
committed
[clang-format] Break long string literals in C#, etc.
Now strings that are too long for one line in C#, Java, JavaScript, and Verilog get broken into several lines. C# and JavaScript interpolated strings are not broken. A new subclass BreakableStringLiteralUsingOperators is used to handle the logic for adding plus signs and commas. The updateAfterBroken method was added because now parentheses or braces may be required after the parentheses or commas are added. In order to decide whether the added plus sign should be unindented in the BreakableToken object, the logic for it is taken out into a separate function shouldUnindentNextOperator. The logic for finding the continuation indentation when the option AlignAfterOpenBracket is set to DontAlign is not implemented yet. So in that case the new line may have the wrong indentation, and the parts may have the wrong length if the string needs to be broken more than once because finding where to break the string depends on where the string starts. The preambles for the C# and Java unit tests are changed to the newer style in order to allow the 3-argument verifyFormat macro. Some cases are changed from verifyFormat to verifyImcompleteFormat because those use incomplete code and the new verifyFormat function checks that the code is complete. The line in the doc was changed to being indented by 4 spaces, that is, the default continuation indentation. It has always been the case. It was probably a mistake that the doc showed 2 spaces previously. This commit was fist committed as 16ccba5. The tests caused assertion failures. Then it was reverted in 547bce3. Reviewed By: MyDeveloperDay Differential Revision: https://reviews.llvm.org/D154093
1 parent ef5217b commit ddc8063

13 files changed

+610
-76
lines changed

clang/docs/ClangFormatStyleOptions.rst

Lines changed: 30 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2722,6 +2722,8 @@ the configuration (without a prefix: ``Auto``).
27222722
**BreakStringLiterals** (``Boolean``) :versionbadge:`clang-format 3.9` :ref:`<BreakStringLiterals>`
27232723
Allow breaking string literals when formatting.
27242724

2725+
In C, C++, and Objective-C:
2726+
27252727
.. code-block:: c++
27262728

27272729
true:
@@ -2731,7 +2733,34 @@ the configuration (without a prefix: ``Auto``).
27312733

27322734
false:
27332735
const char* x =
2734-
"veryVeryVeryVeryVeryVeryVeryVeryVeryVeryVeryVeryLongString";
2736+
"veryVeryVeryVeryVeryVeryVeryVeryVeryVeryVeryVeryLongString";
2737+
2738+
In C#, Java, and JavaScript:
2739+
2740+
.. code-block:: c++
2741+
2742+
true:
2743+
var x = "veryVeryVeryVeryVeryVe" +
2744+
"ryVeryVeryVeryVeryVery" +
2745+
"VeryLongString";
2746+
2747+
false:
2748+
var x =
2749+
"veryVeryVeryVeryVeryVeryVeryVeryVeryVeryVeryVeryLongString";
2750+
C# and JavaScript interpolated strings are not broken.
2751+
2752+
In Verilog:
2753+
2754+
.. code-block:: c++
2755+
2756+
true:
2757+
string x = {"veryVeryVeryVeryVeryVe",
2758+
"ryVeryVeryVeryVeryVery",
2759+
"VeryLongString"};
2760+
2761+
false:
2762+
string x =
2763+
"veryVeryVeryVeryVeryVeryVeryVeryVeryVeryVeryVeryLongString";
27352764

27362765
.. _ColumnLimit:
27372766

clang/include/clang/Format/Format.h

Lines changed: 29 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2008,6 +2008,8 @@ struct FormatStyle {
20082008
bool BreakAfterJavaFieldAnnotations;
20092009

20102010
/// Allow breaking string literals when formatting.
2011+
///
2012+
/// In C, C++, and Objective-C:
20112013
/// \code
20122014
/// true:
20132015
/// const char* x = "veryVeryVeryVeryVeryVe"
@@ -2016,8 +2018,34 @@ struct FormatStyle {
20162018
///
20172019
/// false:
20182020
/// const char* x =
2019-
/// "veryVeryVeryVeryVeryVeryVeryVeryVeryVeryVeryVeryLongString";
2021+
/// "veryVeryVeryVeryVeryVeryVeryVeryVeryVeryVeryVeryLongString";
2022+
/// \endcode
2023+
///
2024+
/// In C#, Java, and JavaScript:
2025+
/// \code
2026+
/// true:
2027+
/// var x = "veryVeryVeryVeryVeryVe" +
2028+
/// "ryVeryVeryVeryVeryVery" +
2029+
/// "VeryLongString";
2030+
///
2031+
/// false:
2032+
/// var x =
2033+
/// "veryVeryVeryVeryVeryVeryVeryVeryVeryVeryVeryVeryLongString";
20202034
/// \endcode
2035+
/// C# and JavaScript interpolated strings are not broken.
2036+
///
2037+
/// In Verilog:
2038+
/// \code
2039+
/// true:
2040+
/// string x = {"veryVeryVeryVeryVeryVe",
2041+
/// "ryVeryVeryVeryVeryVery",
2042+
/// "VeryLongString"};
2043+
///
2044+
/// false:
2045+
/// string x =
2046+
/// "veryVeryVeryVeryVeryVeryVeryVeryVeryVeryVeryVeryLongString";
2047+
/// \endcode
2048+
///
20212049
/// \version 3.9
20222050
bool BreakStringLiterals;
20232051

clang/lib/Format/BreakableToken.cpp

Lines changed: 114 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -292,6 +292,120 @@ void BreakableStringLiteral::insertBreak(unsigned LineIndex,
292292
Prefix, InPPDirective, 1, StartColumn);
293293
}
294294

295+
BreakableStringLiteralUsingOperators::BreakableStringLiteralUsingOperators(
296+
const FormatToken &Tok, QuoteStyleType QuoteStyle, bool UnindentPlus,
297+
unsigned StartColumn, unsigned UnbreakableTailLength, bool InPPDirective,
298+
encoding::Encoding Encoding, const FormatStyle &Style)
299+
: BreakableStringLiteral(
300+
Tok, StartColumn, /*Prefix=*/QuoteStyle == SingleQuotes ? "'"
301+
: QuoteStyle == AtDoubleQuotes ? "@\""
302+
: "\"",
303+
/*Postfix=*/QuoteStyle == SingleQuotes ? "'" : "\"",
304+
UnbreakableTailLength, InPPDirective, Encoding, Style),
305+
BracesNeeded(Tok.isNot(TT_StringInConcatenation)),
306+
QuoteStyle(QuoteStyle) {
307+
// Find the replacement text for inserting braces and quotes and line breaks.
308+
// We don't create an allocated string concatenated from parts here because it
309+
// has to outlive the BreakableStringliteral object. The brace replacements
310+
// include a quote so that WhitespaceManager can tell it apart from whitespace
311+
// replacements between the string and surrounding tokens.
312+
313+
// The option is not implemented in JavaScript.
314+
bool SignOnNewLine =
315+
!Style.isJavaScript() &&
316+
Style.BreakBeforeBinaryOperators != FormatStyle::BOS_None;
317+
318+
if (Style.isVerilog()) {
319+
// In Verilog, all strings are quoted by double quotes, joined by commas,
320+
// and wrapped in braces. The comma is always before the newline.
321+
assert(QuoteStyle == DoubleQuotes);
322+
LeftBraceQuote = Style.Cpp11BracedListStyle ? "{\"" : "{ \"";
323+
RightBraceQuote = Style.Cpp11BracedListStyle ? "\"}" : "\" }";
324+
Postfix = "\",";
325+
Prefix = "\"";
326+
} else {
327+
// The plus sign may be on either line. And also C# and JavaScript have
328+
// several quoting styles.
329+
if (QuoteStyle == SingleQuotes) {
330+
LeftBraceQuote = Style.SpacesInParensOptions.Other ? "( '" : "('";
331+
RightBraceQuote = Style.SpacesInParensOptions.Other ? "' )" : "')";
332+
Postfix = SignOnNewLine ? "'" : "' +";
333+
Prefix = SignOnNewLine ? "+ '" : "'";
334+
} else {
335+
if (QuoteStyle == AtDoubleQuotes) {
336+
LeftBraceQuote = Style.SpacesInParensOptions.Other ? "( @" : "(@";
337+
Prefix = SignOnNewLine ? "+ @\"" : "@\"";
338+
} else {
339+
LeftBraceQuote = Style.SpacesInParensOptions.Other ? "( \"" : "(\"";
340+
Prefix = SignOnNewLine ? "+ \"" : "\"";
341+
}
342+
RightBraceQuote = Style.SpacesInParensOptions.Other ? "\" )" : "\")";
343+
Postfix = SignOnNewLine ? "\"" : "\" +";
344+
}
345+
}
346+
347+
// Following lines are indented by the width of the brace and space if any.
348+
ContinuationIndent = BracesNeeded ? LeftBraceQuote.size() - 1 : 0;
349+
// The plus sign may need to be unindented depending on the style.
350+
// FIXME: Add support for DontAlign.
351+
if (!Style.isVerilog() && SignOnNewLine && !BracesNeeded && UnindentPlus &&
352+
Style.AlignOperands == FormatStyle::OAS_AlignAfterOperator) {
353+
ContinuationIndent -= 2;
354+
}
355+
}
356+
357+
unsigned BreakableStringLiteralUsingOperators::getRemainingLength(
358+
unsigned LineIndex, unsigned Offset, unsigned StartColumn) const {
359+
return UnbreakableTailLength + (BracesNeeded ? RightBraceQuote.size() : 1) +
360+
encoding::columnWidthWithTabs(Line.substr(Offset), StartColumn,
361+
Style.TabWidth, Encoding);
362+
}
363+
364+
unsigned
365+
BreakableStringLiteralUsingOperators::getContentStartColumn(unsigned LineIndex,
366+
bool Break) const {
367+
return std::max(
368+
0,
369+
static_cast<int>(StartColumn) +
370+
(Break ? ContinuationIndent + static_cast<int>(Prefix.size())
371+
: (BracesNeeded ? static_cast<int>(LeftBraceQuote.size()) - 1
372+
: 0) +
373+
(QuoteStyle == AtDoubleQuotes ? 2 : 1)));
374+
}
375+
376+
void BreakableStringLiteralUsingOperators::insertBreak(
377+
unsigned LineIndex, unsigned TailOffset, Split Split,
378+
unsigned ContentIndent, WhitespaceManager &Whitespaces) const {
379+
Whitespaces.replaceWhitespaceInToken(
380+
Tok, /*Offset=*/(QuoteStyle == AtDoubleQuotes ? 2 : 1) + TailOffset +
381+
Split.first,
382+
/*ReplaceChars=*/Split.second, /*PreviousPostfix=*/Postfix,
383+
/*CurrentPrefix=*/Prefix, InPPDirective, /*NewLines=*/1,
384+
/*Spaces=*/
385+
std::max(0, static_cast<int>(StartColumn) + ContinuationIndent));
386+
}
387+
388+
void BreakableStringLiteralUsingOperators::updateAfterBroken(
389+
WhitespaceManager &Whitespaces) const {
390+
// Add the braces required for breaking the token if they are needed.
391+
if (!BracesNeeded)
392+
return;
393+
394+
// To add a brace or parenthesis, we replace the quote (or the at sign) with a
395+
// brace and another quote. This is because the rest of the program requires
396+
// one replacement for each source range. If we replace the empty strings
397+
// around the string, it may conflict with whitespace replacements between the
398+
// string and adjacent tokens.
399+
Whitespaces.replaceWhitespaceInToken(
400+
Tok, /*Offset=*/0, /*ReplaceChars=*/1, /*PreviousPostfix=*/"",
401+
/*CurrentPrefix=*/LeftBraceQuote, InPPDirective, /*NewLines=*/0,
402+
/*Spaces=*/0);
403+
Whitespaces.replaceWhitespaceInToken(
404+
Tok, /*Offset=*/Tok.TokenText.size() - 1, /*ReplaceChars=*/1,
405+
/*PreviousPostfix=*/RightBraceQuote,
406+
/*CurrentPrefix=*/"", InPPDirective, /*NewLines=*/0, /*Spaces=*/0);
407+
}
408+
295409
BreakableComment::BreakableComment(const FormatToken &Token,
296410
unsigned StartColumn, bool InPPDirective,
297411
encoding::Encoding Encoding,

clang/lib/Format/BreakableToken.h

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -230,6 +230,11 @@ class BreakableToken {
230230
/// as a unit and is responsible for the formatting of the them.
231231
virtual void updateNextToken(LineState &State) const {}
232232

233+
/// Adds replacements that are needed when the token is broken. Such as
234+
/// wrapping a JavaScript string in parentheses after it gets broken with plus
235+
/// signs.
236+
virtual void updateAfterBroken(WhitespaceManager &Whitespaces) const {}
237+
233238
protected:
234239
BreakableToken(const FormatToken &Tok, bool InPPDirective,
235240
encoding::Encoding Encoding, const FormatStyle &Style)
@@ -283,6 +288,45 @@ class BreakableStringLiteral : public BreakableToken {
283288
unsigned UnbreakableTailLength;
284289
};
285290

291+
class BreakableStringLiteralUsingOperators : public BreakableStringLiteral {
292+
public:
293+
enum QuoteStyleType {
294+
DoubleQuotes, // The string is quoted with double quotes.
295+
SingleQuotes, // The JavaScript string is quoted with single quotes.
296+
AtDoubleQuotes, // The C# verbatim string is quoted with the at sign and
297+
// double quotes.
298+
};
299+
/// Creates a breakable token for a single line string literal for C#, Java,
300+
/// JavaScript, or Verilog.
301+
///
302+
/// \p StartColumn specifies the column in which the token will start
303+
/// after formatting.
304+
BreakableStringLiteralUsingOperators(
305+
const FormatToken &Tok, QuoteStyleType QuoteStyle, bool UnindentPlus,
306+
unsigned StartColumn, unsigned UnbreakableTailLength, bool InPPDirective,
307+
encoding::Encoding Encoding, const FormatStyle &Style);
308+
unsigned getRemainingLength(unsigned LineIndex, unsigned Offset,
309+
unsigned StartColumn) const override;
310+
unsigned getContentStartColumn(unsigned LineIndex, bool Break) const override;
311+
void insertBreak(unsigned LineIndex, unsigned TailOffset, Split Split,
312+
unsigned ContentIndent,
313+
WhitespaceManager &Whitespaces) const override;
314+
void updateAfterBroken(WhitespaceManager &Whitespaces) const override;
315+
316+
protected:
317+
// Whether braces or parentheses should be inserted around the string to form
318+
// a concatenation.
319+
bool BracesNeeded;
320+
QuoteStyleType QuoteStyle;
321+
// The braces or parentheses along with the first character which they
322+
// replace, either a quote or at sign.
323+
StringRef LeftBraceQuote;
324+
StringRef RightBraceQuote;
325+
// Width added to the left due to the added brace or parenthesis. Does not
326+
// apply to the first line.
327+
int ContinuationIndent;
328+
};
329+
286330
class BreakableComment : public BreakableToken {
287331
protected:
288332
/// Creates a breakable token for a comment.

clang/lib/Format/ContinuationIndenter.cpp

Lines changed: 44 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,14 @@ static bool shouldIndentWrappedSelectorName(const FormatStyle &Style,
3636
return Style.IndentWrappedFunctionNames || LineType == LT_ObjCMethodDecl;
3737
}
3838

39+
// Returns true if a binary operator following \p Tok should be unindented when
40+
// the style permits it.
41+
static bool shouldUnindentNextOperator(const FormatToken &Tok) {
42+
const FormatToken *Previous = Tok.getPreviousNonComment();
43+
return Previous && (Previous->getPrecedence() == prec::Assignment ||
44+
Previous->isOneOf(tok::kw_return, TT_RequiresClause));
45+
}
46+
3947
// Returns the length of everything up to the first possible line break after
4048
// the ), ], } or > matching \c Tok.
4149
static unsigned getLengthToMatchingParen(const FormatToken &Tok,
@@ -1618,11 +1626,10 @@ void ContinuationIndenter::moveStatePastFakeLParens(LineState &State,
16181626
if (Previous && Previous->endsSequence(tok::l_paren, tok::kw__Generic))
16191627
NewParenState.Indent = CurrentState.LastSpace;
16201628

1621-
if (Previous &&
1622-
(Previous->getPrecedence() == prec::Assignment ||
1623-
Previous->isOneOf(tok::kw_return, TT_RequiresClause) ||
1624-
(PrecedenceLevel == prec::Conditional && Previous->is(tok::question) &&
1625-
Previous->is(TT_ConditionalExpr))) &&
1629+
if ((shouldUnindentNextOperator(Current) ||
1630+
(Previous &&
1631+
(PrecedenceLevel == prec::Conditional &&
1632+
Previous->is(tok::question) && Previous->is(TT_ConditionalExpr)))) &&
16261633
!Newline) {
16271634
// If BreakBeforeBinaryOperators is set, un-indent a bit to account for
16281635
// the operator and keep the operands aligned.
@@ -2186,14 +2193,9 @@ ContinuationIndenter::createBreakableToken(const FormatToken &Current,
21862193
LineState &State, bool AllowBreak) {
21872194
unsigned StartColumn = State.Column - Current.ColumnWidth;
21882195
if (Current.isStringLiteral()) {
2189-
// FIXME: String literal breaking is currently disabled for C#, Java, Json
2190-
// and JavaScript, as it requires strings to be merged using "+" which we
2191-
// don't support.
2192-
if (Style.Language == FormatStyle::LK_Java || Style.isJavaScript() ||
2193-
Style.isCSharp() || Style.isJson() || !Style.BreakStringLiterals ||
2194-
!AllowBreak) {
2196+
// Strings in JSON can not be broken.
2197+
if (Style.isJson() || !Style.BreakStringLiterals || !AllowBreak)
21952198
return nullptr;
2196-
}
21972199

21982200
// Don't break string literals inside preprocessor directives (except for
21992201
// #define directives, as their contents are stored in separate lines and
@@ -2212,6 +2214,33 @@ ContinuationIndenter::createBreakableToken(const FormatToken &Current,
22122214
return nullptr;
22132215

22142216
StringRef Text = Current.TokenText;
2217+
// We need this to address the case where there is an unbreakable tail only
2218+
// if certain other formatting decisions have been taken. The
2219+
// UnbreakableTailLength of Current is an overapproximation in that case and
2220+
// we need to be correct here.
2221+
unsigned UnbreakableTailLength = (State.NextToken && canBreak(State))
2222+
? 0
2223+
: Current.UnbreakableTailLength;
2224+
2225+
if (Style.isVerilog() || Style.Language == FormatStyle::LK_Java ||
2226+
Style.isJavaScript() || Style.isCSharp()) {
2227+
BreakableStringLiteralUsingOperators::QuoteStyleType QuoteStyle;
2228+
if (Style.isJavaScript() && Text.startswith("'") && Text.endswith("'")) {
2229+
QuoteStyle = BreakableStringLiteralUsingOperators::SingleQuotes;
2230+
} else if (Style.isCSharp() && Text.startswith("@\"") &&
2231+
Text.endswith("\"")) {
2232+
QuoteStyle = BreakableStringLiteralUsingOperators::AtDoubleQuotes;
2233+
} else if (Text.startswith("\"") && Text.endswith("\"")) {
2234+
QuoteStyle = BreakableStringLiteralUsingOperators::DoubleQuotes;
2235+
} else {
2236+
return nullptr;
2237+
}
2238+
return std::make_unique<BreakableStringLiteralUsingOperators>(
2239+
Current, QuoteStyle,
2240+
/*UnindentPlus=*/shouldUnindentNextOperator(Current), StartColumn,
2241+
UnbreakableTailLength, State.Line->InPPDirective, Encoding, Style);
2242+
}
2243+
22152244
StringRef Prefix;
22162245
StringRef Postfix;
22172246
// FIXME: Handle whitespace between '_T', '(', '"..."', and ')'.
@@ -2224,13 +2253,6 @@ ContinuationIndenter::createBreakableToken(const FormatToken &Current,
22242253
Text.startswith(Prefix = "u8\"") ||
22252254
Text.startswith(Prefix = "L\""))) ||
22262255
(Text.startswith(Prefix = "_T(\"") && Text.endswith(Postfix = "\")"))) {
2227-
// We need this to address the case where there is an unbreakable tail
2228-
// only if certain other formatting decisions have been taken. The
2229-
// UnbreakableTailLength of Current is an overapproximation is that case
2230-
// and we need to be correct here.
2231-
unsigned UnbreakableTailLength = (State.NextToken && canBreak(State))
2232-
? 0
2233-
: Current.UnbreakableTailLength;
22342256
return std::make_unique<BreakableStringLiteral>(
22352257
Current, StartColumn, Prefix, Postfix, UnbreakableTailLength,
22362258
State.Line->InPPDirective, Encoding, Style);
@@ -2631,6 +2653,9 @@ ContinuationIndenter::breakProtrudingToken(const FormatToken &Current,
26312653
Current.UnbreakableTailLength;
26322654

26332655
if (BreakInserted) {
2656+
if (!DryRun)
2657+
Token->updateAfterBroken(Whitespaces);
2658+
26342659
// If we break the token inside a parameter list, we need to break before
26352660
// the next parameter on all levels, so that the next parameter is clearly
26362661
// visible. Line comments already introduce a break.

clang/lib/Format/FormatToken.h

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -134,6 +134,11 @@ namespace format {
134134
TYPE(StartOfName) \
135135
TYPE(StatementAttributeLikeMacro) \
136136
TYPE(StatementMacro) \
137+
/* A string that is part of a string concatenation. For C#, JavaScript, and \
138+
* Java, it is used for marking whether a string needs parentheses around it \
139+
* if it is to be split into parts joined by `+`. For Verilog, whether \
140+
* braces need to be added to split it. Not used for other languages. */ \
141+
TYPE(StringInConcatenation) \
137142
TYPE(StructLBrace) \
138143
TYPE(StructuredBindingLSquare) \
139144
TYPE(TemplateCloser) \

0 commit comments

Comments
 (0)