bpo-30620: remove bogus/dead lines from dedent #2064
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
textwrap
has almost 100% test coverage. I've submitted PRs closing 2 of the remaining 3 gaps (bpo-30591 and bpo-30603). Just one small segment left to go! But, studying that segment, I discovered it's bogus logic. It can never execute.Below is a deductive proof of this claim, plus the results of empirical checks. The associated PR deletes those two spurious lines.
The key
textwrap.dedent()
code in question, with logical names superimposed. Other than names, exactly as found in textwrap in the CPython repo.Deductive Proof
All of the conditions that could lead to the Clause 4 loop
else
are precluded. Here are the steps to that conclusion, usingtyping
notation.margin == None
to start, and by the end of the first iteration of the main loop, has narrowed frommargin: str | None
tomargin: str
(namely,indents[0]
).indents: List[str]
because it's assigned fromre.findall() -> List[str]
.indent: str
because it's assigned iteratively viafor
fromindents: List[str]
.margin
orindent
are prefixes of one another.margin
norindent
is a prefix of the other.margin != '' and indent != ''
. Both contain characters and have a non-zero length. Had either been an empty string, it would have been considered a legitimate (zero-length) prefix of the other, and handled by either Clause 2 or Clause 3.s.startswith('') == True
for anys: str
.margin
andindent
are both non-empty strings,zip
will pairwise iterate down their respective characters to the length of the shortest. Let's call that pointk
wherek = min(len(margin), len(indent))
. The Clause 4 loop will iteratei
throughrange(k)
, settingx == margin[i] and y == indent[i]
at each step.x
andy
must differ. If they did not, eithermargin
would be a proper prefix ofindent
, orindent
would be a proper prefix ofmargin
, which we have already established would handled in Clause 2 or 3, and would not enter Clause 4. So there will always be an0 <= i < k
such thatx != y
. The loop will therefore always exit viabreak
. Because the loop exit viabreak
, theelse
of the Clause 4for
can never be executed.for
loopelse
clause can be triggered is exhaustion of the iteration. Either (a) its iterable is empty at the outset, so theelse
is immediately triggered, or (b) the iterable is non-empty and the loop naturally exhausts it through iteration without abreak
. We have shown that neither (a) nor (b) can happen. (a) cannot happen because neithermargin
norindent
are empty strings. There will always be some characters to co-iterate over. But neither can (b) happen because we've shown that there must be some point at whichx
andy
differ (else one would have been a substring, and previously handled prior to Clause 4).else: margin = margin[:len(indent)]
is dead code, which onlly appears to be live code. It can never execute.Randomized and Empirical Testing
Humbly remembering that even apparent proofs are occasionally faulty, and not wanting to be caught out or surprised by any magical or oddball cases that escaped the above logic, I ran two kinds of empirical verification.
I added a
raise RuntimeError
to the suspectelse
.First I randomly generated texts with varying amounts and composition of leading whitespace, and called
dedent()
on those, looking to trigger the exception. After 500 million attempts, the exception was never triggered.I separately ran
dedent
on the contents of text-ish files from my development system. These include numerous files from Web (HTML, CSS, JS, SVG, Less, SCSS, ...), JavaScript, Java, Python, Perl, PHP, Ruby, SQL, XML, Unix (shell, config, logging...), data and configuration (JSON, YAML, INI, ...), authoring (Markdown, RST, TXT...), C, and DevOps communities.None of those >1.2 million files triggered the exception.
It's impossible to absolutely "prove a negative" through empirical means. One can always imagine testing against a larger random or actual corpus. But that such large bodies of likely suspects never jiggled the tripwire adds confidence that the suspect code lines simply never execute under any input conditions, just as the proof says.
Conclusion
Having tried constructively to identify any cases where the
else
in the Clause 4 loop would execute, I've deductively shown it never will. Arguing in the alternative, I tried large bodies of both randomized and empirical data to find a counter-example; after extensive tests, no counter-examples were found. I therefore assert it is dead code, can never be tested or executed, and should be removed.