Skip to content

[clang-format] Fix a bug in annotating angles containing FatArrow #108671

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 6 additions & 4 deletions clang/lib/Format/TokenAnnotator.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -189,7 +189,8 @@ class AnnotatingParser {
next();
}

for (bool SeenTernaryOperator = false; CurrentToken;) {
for (bool SeenTernaryOperator = false, SeenFatArrow = false;
CurrentToken;) {
const bool InExpr = Contexts[Contexts.size() - 2].IsExpression;
if (CurrentToken->is(tok::greater)) {
const auto *Next = CurrentToken->Next;
Expand Down Expand Up @@ -243,14 +244,16 @@ class AnnotatingParser {
// operator that was misinterpreted because we are parsing template
// parameters.
// FIXME: This is getting out of hand, write a decent parser.
if (InExpr && !Line.startsWith(tok::kw_template) &&
if (InExpr && !SeenFatArrow && !Line.startsWith(tok::kw_template) &&
Prev.is(TT_BinaryOperator)) {
const auto Precedence = Prev.getPrecedence();
if (Precedence > prec::Conditional && Precedence < prec::Relational)
return false;
}
if (Prev.isOneOf(tok::question, tok::colon) && !Style.isProto())
SeenTernaryOperator = true;
else if (Prev.is(TT_FatArrow))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i am not sure if the regression is specific to fat-arrow here (sorry for not putting much effort into the reproducer). the original change seem to changed behavior for handling of any bitwise operators. so I think something like:

auto foo = new Bar<kFoo | kBar>();

would also be formatted differently

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How realistic is your C++ example? In general, I don't think we can really tell if the angle brackets are template opener/closer or comparison operators without checking more (and very specific) contexts, e.g. the new keyword before Bar and the empty parentheses after >. In absence of that, it seems more likely that the angles are comparison operators. See e.g. this test case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think new or empty parantheses are a must either, the logic around here is just checking that <> are inside an expression, hence that kind of code pattern is likely quite common as people tend to do arithmetic via metaprogramming using these patterns. e.g:

constexpr auto foo() { return FixedInt<N | M>(foo); }

this will be formatted as the following now:

constexpr auto foo() { return FixedInt < N | M > (foo); }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without extra contexts, how can you tell whether the less/greater below are comparison operators or angle brackets?

return a < b | c > (d ^ e);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry I wasn't saying that FixedInt<N | M>(foo); is clearly a template argument, it's definitely ambiguous in absence of name-resolution and there isn't much clang-format can do here.

this kind of backward incompatible changes are making it harder to update clang-format to newer versions in large codebases. as clang-format right now is trading off one set of false-formatting with other. hence my understanding for the project has been to stick with existing formatting, if we're heuristically detecting some patterns. so that behavior on existing formatted-code doesn't regress. maybe that understanding has changed over time, or wasn't there at all. But this bug and requested changes in your PR was solely made with that assumption (as many others I've been filing over the past years).

hence I am not expecting a "fix" for the issue here, but I guess a narrowed down version of other patch, where it actually preserves behavior with old clang-format in uncertain cases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems a general rule of thumb that the more complicated the code the harder the job clang-format seems to have.

Whilst I admire the programming prowess of such code, I generally struggle to read it no matter how its formatted. My view has often been when clang-format does fail, it might say more about the code than about clang-format. For the cases we butcher (and we will from time to time) there is still //clang-format off until we can resolve it.

that said I believe we can only walk towards clang-format doing the right thing one patch at a time, and I don't believe we should let "Perfection be the enemy of good".

I personally like to follow the "Beyonce" rule, "if you liked it you should have put a test on it". IMHO as long as our tests pass, and the test cases keep growing, and we don't allow tests to change without good reason, and all new code has tests, then I believe we are good.

Some person, with some specific failure will always happen, given how much code is likely clang-formatted daily. But honestly I don't think of it as a true regression unless the test is failing, because often regressions are caused by
us now interpreting better, annotating better, or just fixing something and often what previously worked by chance before doesn't now. That's not a real regression its just bad luck and a lack of a test to keep us honest.

While I don't really like code becoming a glorious game of whack-a-mole I think our intentions are honorable and we should make forward process, even if its one patch at a time.

When this does happen it means that previous use case wasn't covered by a test. It needs to be, and we can fix it accordingly adding the missing test which ultimately should give us more robustness around the unintended "bad-luck"

However the clang-format users community must remember, we are a core team of 4 with really @owenca doing what I consider to be the majority share, (sorry I'm so busy with my day job. I only get time for drive by reviewing, but I'm trying to continue to give moral support), as far as I am aware none of us are paid to work on clang-format, and despite all the big IDE players using clang-format integrated into their commercial offerings I see little in the way of contribution from them. In fact at last years CppCon I saw one even openly criticize clang-format with vague obtuse corner cases (A cheap shot in my view).

Its my belief that "our modus operandi" is as correct as it can be given our time and resources. We never intentionally break things, but its extremely hard to find a large enough code base that we can test against especially given that not even all of LLVM or even Clang is clang-format clean, enough to uncover issues. This is especially difficult to find code using new C++ language concepts, modules, complex template scenarios and lambdas.

Now if anyone wants to persuade all of LLVM that the entire repo should be clang-formatted top to bottom, then maybe we'd have more code examples to ensure we introduced less issues.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are interested in the "Cheap Shot" ... https://www.youtube.com/watch?v=NnQraMtpvws This is from a JetBrains employee which frankly I'd have expected better from given the feedback I gave them on their blog https://blog.jetbrains.com/clion/2023/12/a-clangformat-story-and-the-third-clion-nova-update/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we update clang-format in chromium, we format a chunk of the codebase with old and new clang-format versions and look at the diff. We also ran into this regression: https://chromium-review.googlesource.com/c/chromium/src/+/6174847/1..2/base/allocator/partition_allocator/src/partition_alloc/shim/allocator_shim_default_dispatch_to_partition_alloc.cc

We reported #123144 which pointed us to here.

While it's true that a<b | c>(d) could be a < b | c > (d), using | (instead of ||) with comparisons and using parenthesized one-element expressions seems less likely than this being a template function call with explicit template arguments that are a bitwise or, right? Maybe the heuristics could be tweaked to pick the template personality of <> when that seems "more likely"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had the same idea but didn't get a chance to implement it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #124438.

SeenFatArrow = true;
updateParameterCount(Left, CurrentToken);
if (Style.Language == FormatStyle::LK_Proto) {
if (FormatToken *Previous = CurrentToken->getPreviousNonComment()) {
Expand All @@ -260,8 +263,7 @@ class AnnotatingParser {
Previous->setType(TT_SelectorName);
}
}
}
if (Style.isTableGen()) {
} else if (Style.isTableGen()) {
if (CurrentToken->isOneOf(tok::comma, tok::equal)) {
// They appear as separators. Unless they are not in class definition.
next();
Expand Down
9 changes: 9 additions & 0 deletions clang/unittests/Format/TokenAnnotatorTest.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -616,6 +616,15 @@ TEST_F(TokenAnnotatorTest, UnderstandsTernaryInTemplate) {
EXPECT_TOKEN(Tokens[8], tok::greater, TT_TemplateCloser);
}

TEST_F(TokenAnnotatorTest, FatArrowInAngleBrackets) {
auto Tokens = annotate("foo = new Bar<(id: int) => X | Y>();",
getLLVMStyle(FormatStyle::LK_JavaScript));
ASSERT_EQ(Tokens.size(), 19u) << Tokens;
EXPECT_TOKEN(Tokens[4], tok::less, TT_TemplateOpener);
EXPECT_TOKEN(Tokens[10], tok::equal, TT_FatArrow);
EXPECT_TOKEN(Tokens[14], tok::greater, TT_TemplateCloser);
}

TEST_F(TokenAnnotatorTest, UnderstandsNonTemplateAngleBrackets) {
auto Tokens = annotate("return a < b && c > d;");
ASSERT_EQ(Tokens.size(), 10u) << Tokens;
Expand Down
Loading