Skip to content

Commit d30146a

Browse files
committed
Merge branch 'jk/diff-highlight'
* jk/diff-highlight: diff-highlight: document some non-optimal cases diff-highlight: match multi-line hunks diff-highlight: refactor to prepare for multi-line hunks diff-highlight: don't highlight whole lines diff-highlight: make perl strict and warnings fatal
2 parents 887c409 + a0b676a commit d30146a

File tree

2 files changed

+181
-37
lines changed

2 files changed

+181
-37
lines changed

contrib/diff-highlight/README

Lines changed: 102 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -14,13 +14,15 @@ Instead, this script post-processes the line-oriented diff, finds pairs
1414
of lines, and highlights the differing segments. It's currently very
1515
simple and stupid about doing these tasks. In particular:
1616

17-
1. It will only highlight a pair of lines if they are the only two
18-
lines in a hunk. It could instead try to match up "before" and
19-
"after" lines for a given hunk into pairs of similar lines.
20-
However, this may end up visually distracting, as the paired
21-
lines would have other highlighted lines in between them. And in
22-
practice, the lines which most need attention called to their
23-
small, hard-to-see changes are touching only a single line.
17+
1. It will only highlight hunks in which the number of removed and
18+
added lines is the same, and it will pair lines within the hunk by
19+
position (so the first removed line is compared to the first added
20+
line, and so forth). This is simple and tends to work well in
21+
practice. More complex changes don't highlight well, so we tend to
22+
exclude them due to the "same number of removed and added lines"
23+
restriction. Or even if we do try to highlight them, they end up
24+
not highlighting because of our "don't highlight if the whole line
25+
would be highlighted" rule.
2426

2527
2. It will find the common prefix and suffix of two lines, and
2628
consider everything in the middle to be "different". It could
@@ -55,3 +57,96 @@ following in your git configuration:
5557
show = diff-highlight | less
5658
diff = diff-highlight | less
5759
---------------------------------------------
60+
61+
Bugs
62+
----
63+
64+
Because diff-highlight relies on heuristics to guess which parts of
65+
changes are important, there are some cases where the highlighting is
66+
more distracting than useful. Fortunately, these cases are rare in
67+
practice, and when they do occur, the worst case is simply a little
68+
extra highlighting. This section documents some cases known to be
69+
sub-optimal, in case somebody feels like working on improving the
70+
heuristics.
71+
72+
1. Two changes on the same line get highlighted in a blob. For example,
73+
highlighting:
74+
75+
----------------------------------------------
76+
-foo(buf, size);
77+
+foo(obj->buf, obj->size);
78+
----------------------------------------------
79+
80+
yields (where the inside of "+{}" would be highlighted):
81+
82+
----------------------------------------------
83+
-foo(buf, size);
84+
+foo(+{obj->buf, obj->}size);
85+
----------------------------------------------
86+
87+
whereas a more semantically meaningful output would be:
88+
89+
----------------------------------------------
90+
-foo(buf, size);
91+
+foo(+{obj->}buf, +{obj->}size);
92+
----------------------------------------------
93+
94+
Note that doing this right would probably involve a set of
95+
content-specific boundary patterns, similar to word-diff. Otherwise
96+
you get junk like:
97+
98+
-----------------------------------------------------
99+
-this line has some -{i}nt-{ere}sti-{ng} text on it
100+
+this line has some +{fa}nt+{a}sti+{c} text on it
101+
-----------------------------------------------------
102+
103+
which is less readable than the current output.
104+
105+
2. The multi-line matching assumes that lines in the pre- and post-image
106+
match by position. This is often the case, but can be fooled when a
107+
line is removed from the top and a new one added at the bottom (or
108+
vice versa). Unless the lines in the middle are also changed, diffs
109+
will show this as two hunks, and it will not get highlighted at all
110+
(which is good). But if the lines in the middle are changed, the
111+
highlighting can be misleading. Here's a pathological case:
112+
113+
-----------------------------------------------------
114+
-one
115+
-two
116+
-three
117+
-four
118+
+two 2
119+
+three 3
120+
+four 4
121+
+five 5
122+
-----------------------------------------------------
123+
124+
which gets highlighted as:
125+
126+
-----------------------------------------------------
127+
-one
128+
-t-{wo}
129+
-three
130+
-f-{our}
131+
+two 2
132+
+t+{hree 3}
133+
+four 4
134+
+f+{ive 5}
135+
-----------------------------------------------------
136+
137+
because it matches "two" to "three 3", and so forth. It would be
138+
nicer as:
139+
140+
-----------------------------------------------------
141+
-one
142+
-two
143+
-three
144+
-four
145+
+two +{2}
146+
+three +{3}
147+
+four +{4}
148+
+five 5
149+
-----------------------------------------------------
150+
151+
which would probably involve pre-matching the lines into pairs
152+
according to some heuristic.

contrib/diff-highlight/diff-highlight

Lines changed: 79 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -1,28 +1,37 @@
11
#!/usr/bin/perl
22

3+
use warnings FATAL => 'all';
4+
use strict;
5+
36
# Highlight by reversing foreground and background. You could do
47
# other things like bold or underline if you prefer.
58
my $HIGHLIGHT = "\x1b[7m";
69
my $UNHIGHLIGHT = "\x1b[27m";
710
my $COLOR = qr/\x1b\[[0-9;]*m/;
11+
my $BORING = qr/$COLOR|\s/;
812

9-
my @window;
13+
my @removed;
14+
my @added;
15+
my $in_hunk;
1016

1117
while (<>) {
12-
# We highlight only single-line changes, so we need
13-
# a 4-line window to make a decision on whether
14-
# to highlight.
15-
push @window, $_;
16-
next if @window < 4;
17-
if ($window[0] =~ /^$COLOR*(\@| )/ &&
18-
$window[1] =~ /^$COLOR*-/ &&
19-
$window[2] =~ /^$COLOR*\+/ &&
20-
$window[3] !~ /^$COLOR*\+/) {
21-
print shift @window;
22-
show_pair(shift @window, shift @window);
18+
if (!$in_hunk) {
19+
print;
20+
$in_hunk = /^$COLOR*\@/;
21+
}
22+
elsif (/^$COLOR*-/) {
23+
push @removed, $_;
24+
}
25+
elsif (/^$COLOR*\+/) {
26+
push @added, $_;
2327
}
2428
else {
25-
print shift @window;
29+
show_hunk(\@removed, \@added);
30+
@removed = ();
31+
@added = ();
32+
33+
print;
34+
$in_hunk = /^$COLOR*[\@ ]/;
2635
}
2736

2837
# Most of the time there is enough output to keep things streaming,
@@ -38,23 +47,40 @@ while (<>) {
3847
}
3948
}
4049

41-
# Special case a single-line hunk at the end of file.
42-
if (@window == 3 &&
43-
$window[0] =~ /^$COLOR*(\@| )/ &&
44-
$window[1] =~ /^$COLOR*-/ &&
45-
$window[2] =~ /^$COLOR*\+/) {
46-
print shift @window;
47-
show_pair(shift @window, shift @window);
48-
}
49-
50-
# And then flush any remaining lines.
51-
while (@window) {
52-
print shift @window;
53-
}
50+
# Flush any queued hunk (this can happen when there is no trailing context in
51+
# the final diff of the input).
52+
show_hunk(\@removed, \@added);
5453

5554
exit 0;
5655

57-
sub show_pair {
56+
sub show_hunk {
57+
my ($a, $b) = @_;
58+
59+
# If one side is empty, then there is nothing to compare or highlight.
60+
if (!@$a || !@$b) {
61+
print @$a, @$b;
62+
return;
63+
}
64+
65+
# If we have mismatched numbers of lines on each side, we could try to
66+
# be clever and match up similar lines. But for now we are simple and
67+
# stupid, and only handle multi-line hunks that remove and add the same
68+
# number of lines.
69+
if (@$a != @$b) {
70+
print @$a, @$b;
71+
return;
72+
}
73+
74+
my @queue;
75+
for (my $i = 0; $i < @$a; $i++) {
76+
my ($rm, $add) = highlight_pair($a->[$i], $b->[$i]);
77+
print $rm;
78+
push @queue, $add;
79+
}
80+
print @queue;
81+
}
82+
83+
sub highlight_pair {
5884
my @a = split_line(shift);
5985
my @b = split_line(shift);
6086

@@ -101,8 +127,14 @@ sub show_pair {
101127
}
102128
}
103129

104-
print highlight(\@a, $pa, $sa);
105-
print highlight(\@b, $pb, $sb);
130+
if (is_pair_interesting(\@a, $pa, $sa, \@b, $pb, $sb)) {
131+
return highlight_line(\@a, $pa, $sa),
132+
highlight_line(\@b, $pb, $sb);
133+
}
134+
else {
135+
return join('', @a),
136+
join('', @b);
137+
}
106138
}
107139

108140
sub split_line {
@@ -111,7 +143,7 @@ sub split_line {
111143
split /($COLOR*)/;
112144
}
113145

114-
sub highlight {
146+
sub highlight_line {
115147
my ($line, $prefix, $suffix) = @_;
116148

117149
return join('',
@@ -122,3 +154,20 @@ sub highlight {
122154
@{$line}[($suffix+1)..$#$line]
123155
);
124156
}
157+
158+
# Pairs are interesting to highlight only if we are going to end up
159+
# highlighting a subset (i.e., not the whole line). Otherwise, the highlighting
160+
# is just useless noise. We can detect this by finding either a matching prefix
161+
# or suffix (disregarding boring bits like whitespace and colorization).
162+
sub is_pair_interesting {
163+
my ($a, $pa, $sa, $b, $pb, $sb) = @_;
164+
my $prefix_a = join('', @$a[0..($pa-1)]);
165+
my $prefix_b = join('', @$b[0..($pb-1)]);
166+
my $suffix_a = join('', @$a[($sa+1)..$#$a]);
167+
my $suffix_b = join('', @$b[($sb+1)..$#$b]);
168+
169+
return $prefix_a !~ /^$COLOR*-$BORING*$/ ||
170+
$prefix_b !~ /^$COLOR*\+$BORING*$/ ||
171+
$suffix_a !~ /^$BORING*$/ ||
172+
$suffix_b !~ /^$BORING*$/;
173+
}

0 commit comments

Comments
 (0)