Skip to content

Commit 8d00662

Browse files
mackylegitster
authored andcommitted
diff-highlight: do not split multibyte characters
When the input is UTF-8 and Perl is operating on bytes instead of characters, a diff that changes one multibyte character to another that shares an initial byte sequence will result in a broken diff display as the common byte sequence prefix will be separated from the rest of the bytes in the multibyte character. For example, if a single line contains only the unicode character U+C9C4 (encoded as UTF-8 0xEC, 0xA7, 0x84) and that line is then changed to the unicode character U+C9C0 (encoded as UTF-8 0xEC, 0xA7, 0x80), when operating on bytes diff-highlight will show only the single byte change from 0x84 to 0x80 thus creating invalid UTF-8 and a broken diff display. Fix this by putting Perl into character mode when splitting the line and then back into byte mode after the split is finished. The utf8::xxx functions require Perl 5.8 so we require that as well. Also, since we are mucking with code in the split_line function, we change a '*' quantifier to a '+' quantifier when matching the $COLOR expression which has the side effect of speeding everything up while eliminating useless '' elements in the returned array. Reported-by: Yi EungJun <[email protected]> Signed-off-by: Kyle J. McKay <[email protected]> Acked-by: Jeff King <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
1 parent 3759d27 commit 8d00662

File tree

1 file changed

+7
-2
lines changed

1 file changed

+7
-2
lines changed

contrib/diff-highlight/diff-highlight

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
#!/usr/bin/perl
22

3+
use 5.008;
34
use warnings FATAL => 'all';
45
use strict;
56

@@ -160,8 +161,12 @@ sub highlight_pair {
160161

161162
sub split_line {
162163
local $_ = shift;
163-
return map { /$COLOR/ ? $_ : (split //) }
164-
split /($COLOR*)/;
164+
return utf8::decode($_) ?
165+
map { utf8::encode($_); $_ }
166+
map { /$COLOR/ ? $_ : (split //) }
167+
split /($COLOR+)/ :
168+
map { /$COLOR/ ? $_ : (split //) }
169+
split /($COLOR+)/;
165170
}
166171

167172
sub highlight_line {

0 commit comments

Comments
 (0)