@@ -14,13 +14,15 @@ Instead, this script post-processes the line-oriented diff, finds pairs
14
14
of lines, and highlights the differing segments. It's currently very
15
15
simple and stupid about doing these tasks. In particular:
16
16
17
- 1. It will only highlight a pair of lines if they are the only two
18
- lines in a hunk. It could instead try to match up "before" and
19
- "after" lines for a given hunk into pairs of similar lines.
20
- However, this may end up visually distracting, as the paired
21
- lines would have other highlighted lines in between them. And in
22
- practice, the lines which most need attention called to their
23
- small, hard-to-see changes are touching only a single line.
17
+ 1. It will only highlight hunks in which the number of removed and
18
+ added lines is the same, and it will pair lines within the hunk by
19
+ position (so the first removed line is compared to the first added
20
+ line, and so forth). This is simple and tends to work well in
21
+ practice. More complex changes don't highlight well, so we tend to
22
+ exclude them due to the "same number of removed and added lines"
23
+ restriction. Or even if we do try to highlight them, they end up
24
+ not highlighting because of our "don't highlight if the whole line
25
+ would be highlighted" rule.
24
26
25
27
2. It will find the common prefix and suffix of two lines, and
26
28
consider everything in the middle to be "different". It could
@@ -55,3 +57,96 @@ following in your git configuration:
55
57
show = diff-highlight | less
56
58
diff = diff-highlight | less
57
59
---------------------------------------------
60
+
61
+ Bugs
62
+ ----
63
+
64
+ Because diff-highlight relies on heuristics to guess which parts of
65
+ changes are important, there are some cases where the highlighting is
66
+ more distracting than useful. Fortunately, these cases are rare in
67
+ practice, and when they do occur, the worst case is simply a little
68
+ extra highlighting. This section documents some cases known to be
69
+ sub-optimal, in case somebody feels like working on improving the
70
+ heuristics.
71
+
72
+ 1. Two changes on the same line get highlighted in a blob. For example,
73
+ highlighting:
74
+
75
+ ----------------------------------------------
76
+ -foo(buf, size);
77
+ +foo(obj->buf, obj->size);
78
+ ----------------------------------------------
79
+
80
+ yields (where the inside of "+{}" would be highlighted):
81
+
82
+ ----------------------------------------------
83
+ -foo(buf, size);
84
+ +foo(+{obj->buf, obj->}size);
85
+ ----------------------------------------------
86
+
87
+ whereas a more semantically meaningful output would be:
88
+
89
+ ----------------------------------------------
90
+ -foo(buf, size);
91
+ +foo(+{obj->}buf, +{obj->}size);
92
+ ----------------------------------------------
93
+
94
+ Note that doing this right would probably involve a set of
95
+ content-specific boundary patterns, similar to word-diff. Otherwise
96
+ you get junk like:
97
+
98
+ -----------------------------------------------------
99
+ -this line has some -{i}nt-{ere}sti-{ng} text on it
100
+ +this line has some +{fa}nt+{a}sti+{c} text on it
101
+ -----------------------------------------------------
102
+
103
+ which is less readable than the current output.
104
+
105
+ 2. The multi-line matching assumes that lines in the pre- and post-image
106
+ match by position. This is often the case, but can be fooled when a
107
+ line is removed from the top and a new one added at the bottom (or
108
+ vice versa). Unless the lines in the middle are also changed, diffs
109
+ will show this as two hunks, and it will not get highlighted at all
110
+ (which is good). But if the lines in the middle are changed, the
111
+ highlighting can be misleading. Here's a pathological case:
112
+
113
+ -----------------------------------------------------
114
+ -one
115
+ -two
116
+ -three
117
+ -four
118
+ +two 2
119
+ +three 3
120
+ +four 4
121
+ +five 5
122
+ -----------------------------------------------------
123
+
124
+ which gets highlighted as:
125
+
126
+ -----------------------------------------------------
127
+ -one
128
+ -t-{wo}
129
+ -three
130
+ -f-{our}
131
+ +two 2
132
+ +t+{hree 3}
133
+ +four 4
134
+ +f+{ive 5}
135
+ -----------------------------------------------------
136
+
137
+ because it matches "two" to "three 3", and so forth. It would be
138
+ nicer as:
139
+
140
+ -----------------------------------------------------
141
+ -one
142
+ -two
143
+ -three
144
+ -four
145
+ +two +{2}
146
+ +three +{3}
147
+ +four +{4}
148
+ +five 5
149
+ -----------------------------------------------------
150
+
151
+ which would probably involve pre-matching the lines into pairs
152
+ according to some heuristic.
0 commit comments