-
Notifications
You must be signed in to change notification settings - Fork 345
Self-Test: Add html diffing #635
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -14,6 +14,7 @@ LABEL MAINTAINERS="Nik Everett <[email protected]>" | |
# * openssh-client (used by git) | ||
# * openssh-server (used to forward ssh auth for git when running with --all on macOS) | ||
# * perl-base | ||
# * python (is python2) | ||
# * xsltproc | ||
# * To install rubygems for asciidoctor | ||
# * build-essential | ||
|
@@ -23,7 +24,8 @@ LABEL MAINTAINERS="Nik Everett <[email protected]>" | |
# * ruby | ||
# * ruby-dev | ||
# * Used to check the docs build in CI | ||
# * pycodestyle | ||
# * python3 | ||
# * python3-pip | ||
RUN install_packages \ | ||
bash \ | ||
build-essential \ | ||
|
@@ -38,8 +40,9 @@ RUN install_packages \ | |
openssh-client \ | ||
openssh-server \ | ||
perl-base \ | ||
pycodestyle \ | ||
python \ | ||
python3 \ | ||
python3-pip \ | ||
ruby \ | ||
ruby-dev \ | ||
unzip \ | ||
|
@@ -66,3 +69,13 @@ RUN gem install --no-document \ | |
rubocop:0.64.0 \ | ||
rspec:3.8.0 \ | ||
thread_safe:0.3.6 | ||
|
||
# Wheel inventory: | ||
# * Used to test the docs build | ||
# * beautifulsoup4 | ||
# * lxml | ||
# * pycodestyle | ||
RUN pip3 install \ | ||
beautifulsoup4==4.7.1 \ | ||
lxml==4.3.1 \ | ||
pycodestyle==2.5.0 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,81 @@ | ||
#!/usr/bin/env python3 | ||
|
||
# Script to compare two html files, ignoring differences that we consider | ||
# to be unimportant. The output is a unified diff of formatted html meant | ||
# to be readable and precise at identifying differences. | ||
# | ||
# This script is designed to be run in the container managed by the | ||
# Dockerfile at the root of this repository. | ||
|
||
|
||
from bs4 import BeautifulSoup, NavigableString | ||
import difflib | ||
import re | ||
|
||
|
||
def normalize_html(html): | ||
"""Normalizes html to remove expected differences between AsciiDoc's | ||
output and Asciidoctor's output. | ||
""" | ||
# Replace many whitespace characters with a single space in some elements | ||
# kind of like a browser does. | ||
soup = BeautifulSoup(html, 'lxml') | ||
for e in soup.select(':not(script,pre,code,style)'): | ||
for part in e: | ||
if isinstance(part, NavigableString): | ||
crunched = NavigableString(re.sub(r'\s+', ' ', part)) | ||
if crunched != part: | ||
part.replace_with(crunched) | ||
# Format the html with indentation so we can *see* things | ||
html = soup.prettify() | ||
# Remove the zero width space that asciidoctor adds after each horizontal | ||
# ellipsis. They don't hurt anything but asciidoc doesn't make them | ||
html = html.replace('\u2026\u200b', '\u2026') | ||
# Temporary workaround for known issues | ||
html = html.replace('class="edit_me" href="/./', 'class="edit_me" href="') | ||
html = re.sub( | ||
r'(?m)^\s+<div class="console_widget" data-snippet="[^"]+">' | ||
r'\s+</div>\n', '', html) | ||
html = html.replace('\\<1>', '<1>') | ||
return html | ||
|
||
|
||
def html_diff(lhs_name, lhs, rhs_name, rhs): | ||
"""Compare two html blobs, ignoring expected differences between AsciiDoc | ||
and Asciidoctor. The result is a generator for lines in the diff report. | ||
If it is entirely empty then there is no diff. | ||
""" | ||
lhs_lines = normalize_html(lhs).splitlines() | ||
rhs_lines = normalize_html(rhs).splitlines() | ||
return difflib.unified_diff( | ||
lhs_lines, | ||
rhs_lines, | ||
fromfile=lhs_name, | ||
tofile=rhs_name, | ||
lineterm='') | ||
|
||
|
||
def html_file_diff(lhs, rhs): | ||
"""Compare two html files, ignoring expected differences between AsciiDoc | ||
and Asciidoctor. The result is a generator for lines in the diff report. | ||
If it is entirely empty then there is no diff. | ||
""" | ||
with open(lhs, encoding='utf-8') as lhs_file: | ||
lhs_text = lhs_file.read() | ||
with open(rhs, encoding='utf-8') as rhs_file: | ||
rhs_text = rhs_file.read() | ||
return html_diff(lhs, lhs_text, rhs, rhs_text) | ||
|
||
|
||
if __name__ == '__main__': | ||
import sys | ||
if len(sys.argv) != 3: | ||
print("Expected exactly 2 arguments but got %s" % sys.argv[1:]) | ||
exit(1) | ||
had_diff = False | ||
for line in html_file_diff(sys.argv[1], sys.argv[2]): | ||
had_diff = True | ||
# print doesn't like to print utf-8 in all cases but buffer.write is ok | ||
sys.stderr.buffer.write(line.encode('utf-8')) | ||
sys.stderr.buffer.write("\n".encode('utf-8')) | ||
exit(1 if had_diff else 0) |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This reminds me. Is there an issue for changing to python 3, and explicitly weeding out and dropping support for python 2 wherever it may be found? Because python 2 has less than a year to live. I didn't see one on a quick issue search, but I might have just missed it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really, no. But, hey, I'll make one.