Skip to content

Commit 350b87c

Browse files
j6tgitster
authored andcommitted
userdiff-cpp: tighten word regex
Generally, word regex can be written such that they match tokens liberally and need not model the actual syntax because it can be assumed that the regex will only be applied to syntactically correct text. The regex for cpp (C/C++) is too liberal, though. It regards these sequences as single tokens: 1+2 1.5-e+2+f and the following amalgams as one token: .l as in str.length .f as in str.find .e as in str.erase Tighten the regex in the following way: - Accept + and - only in one position in the exponent. + and - are no longer regarded as the sign of a number and are treated by the catcher-all that is not visible in the driver's regex. - Accept a leading decimal point only when it is followed by a digit. For readability, factor hex- and binary numbers into an own term. As a drive-by, this fixes that floating point numbers such as 12E5 (with upper-case E) were split into two tokens. Signed-off-by: Johannes Sixt <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
1 parent 3e063de commit 350b87c

File tree

2 files changed

+15
-9
lines changed

2 files changed

+15
-9
lines changed

t/t4034/cpp/expect

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -3,24 +3,24 @@
33
<BOLD>--- a/pre<RESET>
44
<BOLD>+++ b/post<RESET>
55
<CYAN>@@ -1,30 +1,30 @@<RESET>
6-
Foo() : x(0<RED>&&1<RESET><GREEN>&42<RESET>) { <RED>foo0<RESET><GREEN>bar<RESET>(x<RED>.f<RESET><GREEN>.F<RESET>ind); }
6+
Foo() : x(0<RED>&&1<RESET><GREEN>&42<RESET>) { <RED>foo0<RESET><GREEN>bar<RESET>(x.<RED>find<RESET><GREEN>Find<RESET>); }
77
cout<<"Hello World<RED>!<RESET><GREEN>?<RESET>\n"<<endl;
8-
<GREEN>(<RESET>1 <RED>-1e10<RESET><GREEN>+1e10<RESET> 0xabcdef<GREEN>)<RESET> '<RED>x<RESET><GREEN>y<RESET>'
8+
<GREEN>(<RESET>1 <RED>-<RESET><GREEN>+<RESET>1e10 0xabcdef<GREEN>)<RESET> '<RED>x<RESET><GREEN>y<RESET>'
99
// long double<RESET>
1010
<RED>3.141592653e-10l<RESET><GREEN>3.141592654e+10l<RESET>
1111
// float<RESET>
12-
120<RED>E5f<RESET><GREEN>E6f<RESET>
12+
<RED>120E5f<RESET><GREEN>120E6f<RESET>
1313
// hex<RESET>
14-
<RED>0xdeadbeaf+8<RESET><GREEN>0xdeadBeaf+7<RESET>ULL
14+
<RED>0xdeadbeaf<RESET><GREEN>0xdeadBeaf<RESET>+<RED>8ULL<RESET><GREEN>7ULL<RESET>
1515
// octal<RESET>
1616
<RED>01234567<RESET><GREEN>01234560<RESET>
1717
// binary<RESET>
1818
<RED>0b1000<RESET><GREEN>0b1100<RESET>+e1
1919
// expression<RESET>
20-
<RED>1.5-e+2+f<RESET><GREEN>1.5-e+3+f<RESET>
20+
1.5-e+<RED>2<RESET><GREEN>3<RESET>+f
2121
// another one<RESET>
22-
str<RED>.e+65<RESET><GREEN>.e+75<RESET>
23-
[a] b<RED>-><RESET><GREEN>->*<RESET>v d<RED>.e<RESET><GREEN>.*e<RESET>
22+
str.e+<RED>65<RESET><GREEN>75<RESET>
23+
[a] b<RED>-><RESET><GREEN>->*<RESET>v d<RED>.<RESET><GREEN>.*<RESET>e
2424
<GREEN>~<RESET>!a <GREEN>!<RESET>~b c<RED>++<RESET><GREEN>+<RESET> d<RED>--<RESET><GREEN>-<RESET> e*<GREEN>*<RESET>f g<RED>&<RESET><GREEN>&&<RESET>h
2525
a<RED>*<RESET><GREEN>*=<RESET>b c<RED>/<RESET><GREEN>/=<RESET>d e<RED>%<RESET><GREEN>%=<RESET>f
2626
a<RED>+<RESET><GREEN>++<RESET>b c<RED>-<RESET><GREEN>--<RESET>d
@@ -30,6 +30,6 @@ a<RED>==<RESET><GREEN>!=<RESET>b c<RED>!=<RESET><GREEN>=<RESET>d
3030
a<RED>^<RESET><GREEN>^=<RESET>b c<RED>|<RESET><GREEN>|=<RESET>d e<RED>&&<RESET><GREEN>&=<RESET>f
3131
a<RED>||<RESET><GREEN>|<RESET>b
3232
a?<GREEN>:<RESET>b
33-
a<RED>=<RESET><GREEN>==<RESET>b c<RED>+=<RESET><GREEN>+<RESET>d <RED>e-=f<RESET><GREEN>e-f<RESET> g<RED>*=<RESET><GREEN>*<RESET>h i<RED>/=<RESET><GREEN>/<RESET>j k<RED>%=<RESET><GREEN>%<RESET>l m<RED><<=<RESET><GREEN><<<RESET>n o<RED>>>=<RESET><GREEN>>><RESET>p q<RED>&=<RESET><GREEN>&<RESET>r s<RED>^=<RESET><GREEN>^<RESET>t u<RED>|=<RESET><GREEN>|<RESET>v
33+
a<RED>=<RESET><GREEN>==<RESET>b c<RED>+=<RESET><GREEN>+<RESET>d e<RED>-=<RESET><GREEN>-<RESET>f g<RED>*=<RESET><GREEN>*<RESET>h i<RED>/=<RESET><GREEN>/<RESET>j k<RED>%=<RESET><GREEN>%<RESET>l m<RED><<=<RESET><GREEN><<<RESET>n o<RED>>>=<RESET><GREEN>>><RESET>p q<RED>&=<RESET><GREEN>&<RESET>r s<RED>^=<RESET><GREEN>^<RESET>t u<RED>|=<RESET><GREEN>|<RESET>v
3434
a,b<RESET>
3535
a<RED>::<RESET><GREEN>:<RESET>b

userdiff.c

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -64,8 +64,14 @@ PATTERNS("cpp",
6464
/* functions/methods, variables, and compounds at top level */
6565
"^((::[[:space:]]*)?[A-Za-z_].*)$",
6666
/* -- */
67+
/* identifiers and keywords */
6768
"[a-zA-Z_][a-zA-Z0-9_]*"
68-
"|[-+0-9.e]+[fFlL]?|0[xXbB]?[0-9a-fA-F]+[lLuU]*"
69+
/* decimal and octal integers as well as floatingpoint numbers */
70+
"|[0-9][0-9.]*([Ee][-+]?[0-9]+)?[fFlLuU]*"
71+
/* hexadecimal and binary integers */
72+
"|0[xXbB][0-9a-fA-F]+[lLuU]*"
73+
/* floatingpoint numbers that begin with a decimal point */
74+
"|\\.[0-9]+([Ee][-+]?[0-9]+)?[fFlL]?"
6975
"|[-+*/<>%&^|=!]=|--|\\+\\+|<<=?|>>=?|&&|\\|\\||::|->\\*?|\\.\\*"),
7076
PATTERNS("csharp",
7177
/* Keywords */

0 commit comments

Comments
 (0)