-
Notifications
You must be signed in to change notification settings - Fork 14.4k
[clang-format] Fix an off-by-1 bug with -length option #143302
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-clang-format Author: Owen Pan (owenca) ChangesAlso validate the argument value. Fixes #56245 Full diff: https://github.com/llvm/llvm-project/pull/143302.diff 3 Files Affected:
diff --git a/clang/test/Format/multiple-inputs-error.cpp b/clang/test/Format/multiple-inputs-error.cpp
index 1aa9c9f3e2fad..7cb835d39f23e 100644
--- a/clang/test/Format/multiple-inputs-error.cpp
+++ b/clang/test/Format/multiple-inputs-error.cpp
@@ -1,6 +1,6 @@
// RUN: cp %s %t-1.cpp
// RUN: cp %s %t-2.cpp
-// RUN: not clang-format 2>&1 >/dev/null -offset=1 -length=0 %t-1.cpp %t-2.cpp |FileCheck %s
+// RUN: not clang-format 2>&1 >/dev/null -offset=1 -length=1 %t-1.cpp %t-2.cpp |FileCheck %s
// RUN: not clang-format 2>&1 >/dev/null -lines=1:1 %t-1.cpp %t-2.cpp |FileCheck %s -check-prefix=CHECK-LINE
// CHECK: error: -offset, -length and -lines can only be used for single file.
// CHECK-LINE: error: -offset, -length and -lines can only be used for single file.
diff --git a/clang/test/Format/ranges.cpp b/clang/test/Format/ranges.cpp
index 66b984e037b3c..6de407e7a20e4 100644
--- a/clang/test/Format/ranges.cpp
+++ b/clang/test/Format/ranges.cpp
@@ -1,5 +1,5 @@
-// RUN: grep -Ev "// *[A-Z-]+:" %s \
-// RUN: | clang-format -style=LLVM -offset=2 -length=0 -offset=28 -length=0 \
+// RUN: grep -Ev "// *[A-Z0-9-]+:" %s \
+// RUN: | clang-format -style=LLVM -offset=2 -length=1 -offset=28 -length=1 -offset=35 -length=8 \
// RUN: | FileCheck -strict-whitespace %s
// CHECK: {{^int\ \*i;$}}
int*i;
@@ -9,3 +9,12 @@ int * i;
// CHECK: {{^int\ \*i;$}}
int * i;
+
+// CHECK: int I;
+// CHECK-NEXT: int J ;
+int I ;
+int J ;
+
+// RUN: not clang-format -length=0 < %s 2>&1 \
+// RUN: | FileCheck -strict-whitespace -check-prefix=CHECK0 %s
+// CHECK0: error: length should be at least 1
diff --git a/clang/tools/clang-format/ClangFormat.cpp b/clang/tools/clang-format/ClangFormat.cpp
index ad12672fa89c1..67f49c4265cd9 100644
--- a/clang/tools/clang-format/ClangFormat.cpp
+++ b/clang/tools/clang-format/ClangFormat.cpp
@@ -283,7 +283,7 @@ static bool fillRanges(MemoryBuffer *Code,
Lengths.push_back(Sources.getFileOffset(Sources.getLocForEndOfFile(ID)) -
Offsets[0]);
} else if (Offsets.size() != Lengths.size()) {
- errs() << "error: number of -offset and -length arguments must match.\n";
+ errs() << "error: number of -offset and -length arguments must match\n";
return true;
}
for (unsigned I = 0, E = Offsets.size(); I < E; ++I) {
@@ -293,12 +293,16 @@ static bool fillRanges(MemoryBuffer *Code,
return true;
}
const auto Length = Lengths[I];
+ if (Length == 0) {
+ errs() << "error: length should be at least 1\n";
+ return true;
+ }
if (Offset + Length > Code->getBufferSize()) {
errs() << "error: invalid length " << Length << ", offset + length ("
- << Offset + Length << ") is outside the file.\n";
+ << Offset + Length << ") is outside the file\n";
return true;
}
- Ranges.push_back(tooling::Range(Offset, Length));
+ Ranges.push_back(tooling::Range(Offset, Length - 1));
}
return false;
}
|
Also validate the argument value. Fixes llvm#56245
Also validate the argument value. Fixes llvm#56245
Also validate the argument value. Fixes llvm#56245
return true; | ||
} | ||
Ranges.push_back(tooling::Range(Offset, Length)); | ||
Ranges.push_back(tooling::Range(Offset, Length - 1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi folks, we're seeing some regressions that's possibly related to this change (working on a repro right now). I am having a hard time following why this is the right thing to do. tooling::Range
and command line flags are both (offset, length) pairs. offset
is 0-based for both, and length
is again coming from the same domain for both representations (in bytes).
moreover, we're only performing this mapping here, and not just couple of lines above (when we convert --lines
to ranges.
I do agree that there's an off by one somewhere, but I think it's more likely to be in https://github.com/llvm/llvm-project/blob/main/clang/lib/Format/AffectedRangeManager.cpp#L61-L71.
Logic in there accepts two ranges [B1, E1)
and [B2, E2)
as intersecting when E1==B2
, but those ranges are half open (as they're constructed from offset + length
pairs).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do agree that there's an off by one somewhere, but I think it's more likely to be in https://github.com/llvm/llvm-project/blob/main/clang/lib/Format/AffectedRangeManager.cpp#L61-L71. Logic in there accepts two ranges
[B1, E1)
and[B2, E2)
as intersecting whenE1==B2
, but those ranges are half open (as they're constructed fromoffset + length
pairs).
I didn't fix it there because we would have to change all unit tests involving range formatting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't fix it there because we would have to change all unit tests involving range formatting.
sorry I am not sure how to respond to that. Can you elaborate why updating unittests, when we're supposed to be fixing a bug in a library used by many applications was deemed undesired?
IIUC, what you're saying is, you also think the actual issue is in the Format
library, which is used by many other tools (clang-format, clangd, clang-tidy, include-cleaner, and many many more). All of which use Ranges
as format library's contract declares, a 0-based offset and a length for number of bytes starting from that offset, possibly 0
. clang-format tool used to accept command line arguments exactly with the same contract, and now we're changing semantics here, at the cost of breaking existing workflows, and diverging from rest of the tools.
This discrepancy now requires ecosystem to use tools that embed format differently (e.g. when an editor is making a range-formatting request, it needs to provide different ranges if formatter is clangd vs clang-format ?). Moreover all the tests in clang/unittests/Format/FormatTestSelective.cpp are telling a similar story (length=0
is heavily used, hence I am not sure why this is something we should ban
with clang-format-21).
I don't understand how all of this is justified in order to not update some unittests. Maybe we can figure out a way to make this change less intrusive for unittests? There's a good chance we can both fix the issue, and keep the CLI interface the same by defining some semantics around ranges with length=0
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really don't understand the rant. I didn't touch the off-by-1 bug (if it's indeed a bug) in AffectedRangeManager.cpp
precisely because it would impact libFormat, which has been like this since its inception (f793511) 12+ years ago, and all range tests in FormatTests have been based on this semantics. If you (or anyone else) want to address it now, please open an issue and submit a pull request.
hi folks, we're seeing some regressions that's possibly related to this change (working on a repro right now).
I'll look into this as soon as you have a repro.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really don't understand the rant
Sorry for my tone, I didn't mean to create frustration. I was having a hard time understanding why updating unittests was deemed undesired but updating the driver to do the same, while still updating its tests that passed -length=0
, was OK.
I'll look into this as soon as you have a repro.
Unfortunately it's hard to describe a repro. Because the off-by-one issue underneath is hiding the Length - 1
transformation here, especially when the final offset corresponds to \n
. My real concern is about clang-format accepting and defining some behavior for -length=0
before this patch and now rejecting such inputs with an hard error (rather than just ignoring such ranges).
I tried to summarize my concerns in #146036. Let's continue the discussion there, it isn't urgent though if you have other topics. I am happy to workaround on our side in the meanwhile.
All of this aside, I don't think I've mentioned this in our past encounters but I really appreciate all the work you're putting into clang-format and a huge thanks for being responsive&involved in these discussions.
Also validate the argument value.
Fixes #56245