Skip to content

[clang-format] Fix an off-by-1 bug with -length option #143302

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 13, 2025
Merged

Conversation

owenca
Copy link
Contributor

@owenca owenca commented Jun 8, 2025

Also validate the argument value.

Fixes #56245

@llvmbot
Copy link
Member

llvmbot commented Jun 8, 2025

@llvm/pr-subscribers-clang-format

Author: Owen Pan (owenca)

Changes

Also validate the argument value.

Fixes #56245


Full diff: https://github.com/llvm/llvm-project/pull/143302.diff

3 Files Affected:

  • (modified) clang/test/Format/multiple-inputs-error.cpp (+1-1)
  • (modified) clang/test/Format/ranges.cpp (+11-2)
  • (modified) clang/tools/clang-format/ClangFormat.cpp (+7-3)
diff --git a/clang/test/Format/multiple-inputs-error.cpp b/clang/test/Format/multiple-inputs-error.cpp
index 1aa9c9f3e2fad..7cb835d39f23e 100644
--- a/clang/test/Format/multiple-inputs-error.cpp
+++ b/clang/test/Format/multiple-inputs-error.cpp
@@ -1,6 +1,6 @@
 // RUN: cp %s %t-1.cpp
 // RUN: cp %s %t-2.cpp
-// RUN: not clang-format 2>&1 >/dev/null -offset=1 -length=0 %t-1.cpp %t-2.cpp |FileCheck %s
+// RUN: not clang-format 2>&1 >/dev/null -offset=1 -length=1 %t-1.cpp %t-2.cpp |FileCheck %s
 // RUN: not clang-format 2>&1 >/dev/null -lines=1:1 %t-1.cpp %t-2.cpp |FileCheck %s -check-prefix=CHECK-LINE
 // CHECK: error: -offset, -length and -lines can only be used for single file.
 // CHECK-LINE: error: -offset, -length and -lines can only be used for single file.
diff --git a/clang/test/Format/ranges.cpp b/clang/test/Format/ranges.cpp
index 66b984e037b3c..6de407e7a20e4 100644
--- a/clang/test/Format/ranges.cpp
+++ b/clang/test/Format/ranges.cpp
@@ -1,5 +1,5 @@
-// RUN: grep -Ev "// *[A-Z-]+:" %s \
-// RUN:   | clang-format -style=LLVM -offset=2 -length=0 -offset=28 -length=0 \
+// RUN: grep -Ev "// *[A-Z0-9-]+:" %s \
+// RUN:   | clang-format -style=LLVM -offset=2 -length=1 -offset=28 -length=1 -offset=35 -length=8 \
 // RUN:   | FileCheck -strict-whitespace %s
 // CHECK: {{^int\ \*i;$}}
 int*i;
@@ -9,3 +9,12 @@ int  *  i;
 
 // CHECK: {{^int\ \*i;$}}
 int   *   i;
+
+// CHECK: int I;
+// CHECK-NEXT: int J ;
+int I ;
+int J ;
+
+// RUN: not clang-format -length=0 < %s 2>&1 \
+// RUN:   | FileCheck -strict-whitespace -check-prefix=CHECK0 %s
+// CHECK0: error: length should be at least 1
diff --git a/clang/tools/clang-format/ClangFormat.cpp b/clang/tools/clang-format/ClangFormat.cpp
index ad12672fa89c1..67f49c4265cd9 100644
--- a/clang/tools/clang-format/ClangFormat.cpp
+++ b/clang/tools/clang-format/ClangFormat.cpp
@@ -283,7 +283,7 @@ static bool fillRanges(MemoryBuffer *Code,
     Lengths.push_back(Sources.getFileOffset(Sources.getLocForEndOfFile(ID)) -
                       Offsets[0]);
   } else if (Offsets.size() != Lengths.size()) {
-    errs() << "error: number of -offset and -length arguments must match.\n";
+    errs() << "error: number of -offset and -length arguments must match\n";
     return true;
   }
   for (unsigned I = 0, E = Offsets.size(); I < E; ++I) {
@@ -293,12 +293,16 @@ static bool fillRanges(MemoryBuffer *Code,
       return true;
     }
     const auto Length = Lengths[I];
+    if (Length == 0) {
+      errs() << "error: length should be at least 1\n";
+      return true;
+    }
     if (Offset + Length > Code->getBufferSize()) {
       errs() << "error: invalid length " << Length << ", offset + length ("
-             << Offset + Length << ") is outside the file.\n";
+             << Offset + Length << ") is outside the file\n";
       return true;
     }
-    Ranges.push_back(tooling::Range(Offset, Length));
+    Ranges.push_back(tooling::Range(Offset, Length - 1));
   }
   return false;
 }

@owenca owenca merged commit 1fae591 into llvm:main Jun 13, 2025
7 checks passed
@owenca owenca deleted the 56245 branch June 13, 2025 07:45
tomtor pushed a commit to tomtor/llvm-project that referenced this pull request Jun 14, 2025
akuhlens pushed a commit to akuhlens/llvm-project that referenced this pull request Jun 24, 2025
return true;
}
Ranges.push_back(tooling::Range(Offset, Length));
Ranges.push_back(tooling::Range(Offset, Length - 1));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi folks, we're seeing some regressions that's possibly related to this change (working on a repro right now). I am having a hard time following why this is the right thing to do. tooling::Range and command line flags are both (offset, length) pairs. offset is 0-based for both, and length is again coming from the same domain for both representations (in bytes).

moreover, we're only performing this mapping here, and not just couple of lines above (when we convert --lines to ranges.

I do agree that there's an off by one somewhere, but I think it's more likely to be in https://github.com/llvm/llvm-project/blob/main/clang/lib/Format/AffectedRangeManager.cpp#L61-L71.
Logic in there accepts two ranges [B1, E1) and [B2, E2) as intersecting when E1==B2, but those ranges are half open (as they're constructed from offset + length pairs).

Copy link
Contributor Author

@owenca owenca Jun 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do agree that there's an off by one somewhere, but I think it's more likely to be in https://github.com/llvm/llvm-project/blob/main/clang/lib/Format/AffectedRangeManager.cpp#L61-L71. Logic in there accepts two ranges [B1, E1) and [B2, E2) as intersecting when E1==B2, but those ranges are half open (as they're constructed from offset + length pairs).

I didn't fix it there because we would have to change all unit tests involving range formatting.

Copy link
Member

@kadircet kadircet Jun 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't fix it there because we would have to change all unit tests involving range formatting.

sorry I am not sure how to respond to that. Can you elaborate why updating unittests, when we're supposed to be fixing a bug in a library used by many applications was deemed undesired?

IIUC, what you're saying is, you also think the actual issue is in the Format library, which is used by many other tools (clang-format, clangd, clang-tidy, include-cleaner, and many many more). All of which use Ranges as format library's contract declares, a 0-based offset and a length for number of bytes starting from that offset, possibly 0. clang-format tool used to accept command line arguments exactly with the same contract, and now we're changing semantics here, at the cost of breaking existing workflows, and diverging from rest of the tools.

This discrepancy now requires ecosystem to use tools that embed format differently (e.g. when an editor is making a range-formatting request, it needs to provide different ranges if formatter is clangd vs clang-format ?). Moreover all the tests in clang/unittests/Format/FormatTestSelective.cpp are telling a similar story (length=0 is heavily used, hence I am not sure why this is something we should ban with clang-format-21).

I don't understand how all of this is justified in order to not update some unittests. Maybe we can figure out a way to make this change less intrusive for unittests? There's a good chance we can both fix the issue, and keep the CLI interface the same by defining some semantics around ranges with length=0.

Copy link
Contributor Author

@owenca owenca Jun 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't understand the rant. I didn't touch the off-by-1 bug (if it's indeed a bug) in AffectedRangeManager.cpp precisely because it would impact libFormat, which has been like this since its inception (f793511) 12+ years ago, and all range tests in FormatTests have been based on this semantics. If you (or anyone else) want to address it now, please open an issue and submit a pull request.

hi folks, we're seeing some regressions that's possibly related to this change (working on a repro right now).

I'll look into this as soon as you have a repro.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't understand the rant

Sorry for my tone, I didn't mean to create frustration. I was having a hard time understanding why updating unittests was deemed undesired but updating the driver to do the same, while still updating its tests that passed -length=0, was OK.

I'll look into this as soon as you have a repro.

Unfortunately it's hard to describe a repro. Because the off-by-one issue underneath is hiding the Length - 1 transformation here, especially when the final offset corresponds to \n. My real concern is about clang-format accepting and defining some behavior for -length=0 before this patch and now rejecting such inputs with an hard error (rather than just ignoring such ranges).

I tried to summarize my concerns in #146036. Let's continue the discussion there, it isn't urgent though if you have other topics. I am happy to workaround on our side in the meanwhile.


All of this aside, I don't think I've mentioned this in our past encounters but I really appreciate all the work you're putting into clang-format and a huge thanks for being responsive&involved in these discussions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

clang-format --length behavior
4 participants