Skip to content

[clang-format] Fix an off-by-1 bug with -length option #143302

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 13, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion clang/test/Format/multiple-inputs-error.cpp
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
// RUN: cp %s %t-1.cpp
// RUN: cp %s %t-2.cpp
// RUN: not clang-format 2>&1 >/dev/null -offset=1 -length=0 %t-1.cpp %t-2.cpp |FileCheck %s
// RUN: not clang-format 2>&1 >/dev/null -offset=1 -length=1 %t-1.cpp %t-2.cpp |FileCheck %s
// RUN: not clang-format 2>&1 >/dev/null -lines=1:1 %t-1.cpp %t-2.cpp |FileCheck %s -check-prefix=CHECK-LINE
// CHECK: error: -offset, -length and -lines can only be used for single file.
// CHECK-LINE: error: -offset, -length and -lines can only be used for single file.
Expand Down
11 changes: 10 additions & 1 deletion clang/test/Format/ranges.cpp
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
// RUN: grep -Ev "// *[A-Z-]+:" %s \
// RUN: | clang-format -style=LLVM -offset=2 -length=0 -offset=28 -length=0 \
// RUN: | clang-format -style=LLVM -offset=2 -length=1 -offset=28 -length=1 -offset=35 -length=8 \
// RUN: | FileCheck -strict-whitespace %s
// CHECK: {{^int\ \*i;$}}
int*i;
Expand All @@ -9,3 +9,12 @@ int * i;

// CHECK: {{^int\ \*i;$}}
int * i;

// CHECK: int I;
// CHECK-NEXT: int J ;
int I ;
int J ;

// RUN: not clang-format -length=0 %s 2>&1 \
// RUN: | FileCheck -strict-whitespace -check-prefix=CHECK0 %s
// CHECK0: error: length should be at least 1
10 changes: 7 additions & 3 deletions clang/tools/clang-format/ClangFormat.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -284,7 +284,7 @@ static bool fillRanges(MemoryBuffer *Code,
if (Offsets.size() == 1 && EmptyLengths) {
Length = Sources.getFileOffset(Sources.getLocForEndOfFile(ID)) - Offsets[0];
} else if (Offsets.size() != Lengths.size()) {
errs() << "error: number of -offset and -length arguments must match.\n";
errs() << "error: number of -offset and -length arguments must match\n";
return true;
}
for (unsigned I = 0, E = Offsets.size(), CodeSize = Code->getBufferSize();
Expand All @@ -296,12 +296,16 @@ static bool fillRanges(MemoryBuffer *Code,
}
if (!EmptyLengths)
Length = Lengths[I];
if (Length == 0) {
errs() << "error: length should be at least 1\n";
return true;
}
if (Offset + Length > CodeSize) {
errs() << "error: invalid length " << Length << ", offset + length ("
<< Offset + Length << ") is outside the file.\n";
<< Offset + Length << ") is outside the file\n";
return true;
}
Ranges.push_back(tooling::Range(Offset, Length));
Ranges.push_back(tooling::Range(Offset, Length - 1));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi folks, we're seeing some regressions that's possibly related to this change (working on a repro right now). I am having a hard time following why this is the right thing to do. tooling::Range and command line flags are both (offset, length) pairs. offset is 0-based for both, and length is again coming from the same domain for both representations (in bytes).

moreover, we're only performing this mapping here, and not just couple of lines above (when we convert --lines to ranges.

I do agree that there's an off by one somewhere, but I think it's more likely to be in https://github.com/llvm/llvm-project/blob/main/clang/lib/Format/AffectedRangeManager.cpp#L61-L71.
Logic in there accepts two ranges [B1, E1) and [B2, E2) as intersecting when E1==B2, but those ranges are half open (as they're constructed from offset + length pairs).

Copy link
Contributor Author

@owenca owenca Jun 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do agree that there's an off by one somewhere, but I think it's more likely to be in https://github.com/llvm/llvm-project/blob/main/clang/lib/Format/AffectedRangeManager.cpp#L61-L71. Logic in there accepts two ranges [B1, E1) and [B2, E2) as intersecting when E1==B2, but those ranges are half open (as they're constructed from offset + length pairs).

I didn't fix it there because we would have to change all unit tests involving range formatting.

Copy link
Member

@kadircet kadircet Jun 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't fix it there because we would have to change all unit tests involving range formatting.

sorry I am not sure how to respond to that. Can you elaborate why updating unittests, when we're supposed to be fixing a bug in a library used by many applications was deemed undesired?

IIUC, what you're saying is, you also think the actual issue is in the Format library, which is used by many other tools (clang-format, clangd, clang-tidy, include-cleaner, and many many more). All of which use Ranges as format library's contract declares, a 0-based offset and a length for number of bytes starting from that offset, possibly 0. clang-format tool used to accept command line arguments exactly with the same contract, and now we're changing semantics here, at the cost of breaking existing workflows, and diverging from rest of the tools.

This discrepancy now requires ecosystem to use tools that embed format differently (e.g. when an editor is making a range-formatting request, it needs to provide different ranges if formatter is clangd vs clang-format ?). Moreover all the tests in clang/unittests/Format/FormatTestSelective.cpp are telling a similar story (length=0 is heavily used, hence I am not sure why this is something we should ban with clang-format-21).

I don't understand how all of this is justified in order to not update some unittests. Maybe we can figure out a way to make this change less intrusive for unittests? There's a good chance we can both fix the issue, and keep the CLI interface the same by defining some semantics around ranges with length=0.

Copy link
Contributor Author

@owenca owenca Jun 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't understand the rant. I didn't touch the off-by-1 bug (if it's indeed a bug) in AffectedRangeManager.cpp precisely because it would impact libFormat, which has been like this since its inception (f793511) 12+ years ago, and all range tests in FormatTests have been based on this semantics. If you (or anyone else) want to address it now, please open an issue and submit a pull request.

hi folks, we're seeing some regressions that's possibly related to this change (working on a repro right now).

I'll look into this as soon as you have a repro.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't understand the rant

Sorry for my tone, I didn't mean to create frustration. I was having a hard time understanding why updating unittests was deemed undesired but updating the driver to do the same, while still updating its tests that passed -length=0, was OK.

I'll look into this as soon as you have a repro.

Unfortunately it's hard to describe a repro. Because the off-by-one issue underneath is hiding the Length - 1 transformation here, especially when the final offset corresponds to \n. My real concern is about clang-format accepting and defining some behavior for -length=0 before this patch and now rejecting such inputs with an hard error (rather than just ignoring such ranges).

I tried to summarize my concerns in #146036. Let's continue the discussion there, it isn't urgent though if you have other topics. I am happy to workaround on our side in the meanwhile.


All of this aside, I don't think I've mentioned this in our past encounters but I really appreciate all the work you're putting into clang-format and a huge thanks for being responsive&involved in these discussions.

}
return false;
}
Expand Down
Loading