Skip to content

bpo-30620: remove bogus/dead lines from dedent #2064

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 16, 2017

Conversation

jonathaneunice
Copy link
Contributor

textwrap has almost 100% test coverage. I've submitted PRs closing 2 of the remaining 3 gaps (bpo-30591 and bpo-30603). Just one small segment left to go! But, studying that segment, I discovered it's bogus logic. It can never execute.

Below is a deductive proof of this claim, plus the results of empirical checks. The associated PR deletes those two spurious lines.

The key textwrap.dedent() code in question, with logical names superimposed. Other than names, exactly as found in textwrap in the CPython repo.

code

Deductive Proof

All of the conditions that could lead to the Clause 4 loop else are precluded. Here are the steps to that conclusion, using typing notation.

  • margin == None to start, and by the end of the first iteration of the main loop, has narrowed from margin: str | None to margin: str (namely, indents[0]).
  • indents: List[str] because it's assigned from re.findall() -> List[str].
  • indent: str because it's assigned iteratively via for from indents: List[str].
  • Clause 2 and Clause 3 definitively handle the cases where margin or indent are prefixes of one another.
  • Therefore by Clause 4, neither margin nor indent is a prefix of the other.
  • By Clause 4, margin != '' and indent != ''. Both contain characters and have a non-zero length. Had either been an empty string, it would have been considered a legitimate (zero-length) prefix of the other, and handled by either Clause 2 or Clause 3. s.startswith('') == True for any s: str.
  • Because margin and indent are both non-empty strings, zip will pairwise iterate down their respective characters to the length of the shortest. Let's call that point k where k = min(len(margin), len(indent)). The Clause 4 loop will iterate i through range(k), setting x == margin[i] and y == indent[i] at each step.
  • At some point, x and y must differ. If they did not, either margin would be a proper prefix of indent, or indent would be a proper prefix of margin, which we have already established would handled in Clause 2 or 3, and would not enter Clause 4. So there will always be an 0 <= i < k such that x != y. The loop will therefore always exit via break. Because the loop exit via break, the else of the Clause 4 for can never be executed.
  • The only way a for loop else clause can be triggered is exhaustion of the iteration. Either (a) its iterable is empty at the outset, so the else is immediately triggered, or (b) the iterable is non-empty and the loop naturally exhausts it through iteration without a break. We have shown that neither (a) nor (b) can happen. (a) cannot happen because neither margin nor indent are empty strings. There will always be some characters to co-iterate over. But neither can (b) happen because we've shown that there must be some point at which x and y differ (else one would have been a substring, and previously handled prior to Clause 4).
  • Ergo the Clause 4 else: margin = margin[:len(indent)] is dead code, which onlly appears to be live code. It can never execute.

Randomized and Empirical Testing

Humbly remembering that even apparent proofs are occasionally faulty, and not wanting to be caught out or surprised by any magical or oddball cases that escaped the above logic, I ran two kinds of empirical verification.

I added a raise RuntimeError to the suspect else.

First I randomly generated texts with varying amounts and composition of leading whitespace, and called dedent() on those, looking to trigger the exception. After 500 million attempts, the exception was never triggered.

I separately ran dedent on the contents of text-ish files from my development system. These include numerous files from Web (HTML, CSS, JS, SVG, Less, SCSS, ...), JavaScript, Java, Python, Perl, PHP, Ruby, SQL, XML, Unix (shell, config, logging...), data and configuration (JSON, YAML, INI, ...), authoring (Markdown, RST, TXT...), C, and DevOps communities.
None of those >1.2 million files triggered the exception.

It's impossible to absolutely "prove a negative" through empirical means. One can always imagine testing against a larger random or actual corpus. But that such large bodies of likely suspects never jiggled the tripwire adds confidence that the suspect code lines simply never execute under any input conditions, just as the proof says.

Conclusion

Having tried constructively to identify any cases where the else in the Clause 4 loop would execute, I've deductively shown it never will. Arguing in the alternative, I tried large bodies of both randomized and empirical data to find a counter-example; after extensive tests, no counter-examples were found. I therefore assert it is dead code, can never be tested or executed, and should be removed.

@Mariatta Mariatta added the type-feature A feature request or enhancement label Jun 13, 2017
@Mariatta Mariatta merged commit 7a80183 into python:master Jun 16, 2017
@jonathaneunice jonathaneunice deleted the fix-issue-30620 branch June 16, 2017 02:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-feature A feature request or enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants