@@ -43,7 +43,7 @@ <h1>pcre2jit man page</h1>
43
43
< P >
44
44
JIT support applies only to the traditional Perl-compatible matching function.
45
45
It does not apply when the DFA matching function is being used. The code for
46
- this support was written by Zoltan Herczeg.
46
+ JIT support was written by Zoltan Herczeg.
47
47
</ P >
48
48
< br > < a name ="SEC2 " href ="#TOC1 "> AVAILABILITY OF JIT SUPPORT</ a > < br >
49
49
< P >
@@ -63,12 +63,25 @@ <h1>pcre2jit man page</h1>
63
63
If --enable-jit is set on an unsupported platform, compilation fails.
64
64
</ P >
65
65
< P >
66
- A program can tell if JIT support is available by calling < b > pcre2_config()</ b >
67
- with the PCRE2_CONFIG_JIT option. The result is 1 when JIT is available, and 0
68
- otherwise. However, a simple program does not need to check this in order to
69
- use JIT. The API is implemented in a way that falls back to the interpretive
70
- code if JIT is not available. For programs that need the best possible
71
- performance, there is also a "fast path" API that is JIT-specific.
66
+ A client program can tell if JIT support is available by calling
67
+ < b > pcre2_config()</ b > with the PCRE2_CONFIG_JIT option. The result is one if
68
+ PCRE2 was built with JIT support, and zero otherwise. However, having the JIT
69
+ code available does not guarantee that it will be used for any particular
70
+ match. One reason for this is that there are a number of options and pattern
71
+ items that are
72
+ < a href ="#unsupported "> not supported by JIT</ a >
73
+ (see below). Another reason is that in some environments JIT is unable to get
74
+ memory in which to build its compiled code. The only guarantee from
75
+ < b > pcre2_config()</ b > is that if it returns zero, JIT will definitely < i > not</ i >
76
+ be used.
77
+ </ P >
78
+ < P >
79
+ A simple program does not need to check availability in order to use JIT when
80
+ possible. The API is implemented in a way that falls back to the interpretive
81
+ code if JIT is not available or cannot be used for a given match. For programs
82
+ that need the best possible performance, there is a
83
+ < a href ="#fastpath "> "fast path"</ a >
84
+ API that is JIT-specific.
72
85
</ P >
73
86
< br > < a name ="SEC3 " href ="#TOC1 "> SIMPLE USE OF JIT</ a > < br >
74
87
< P >
@@ -127,9 +140,10 @@ <h1>pcre2jit man page</h1>
127
140
< P >
128
141
There are some < b > pcre2_match()</ b > options that are not supported by JIT, and
129
142
there are also some pattern items that JIT cannot handle. Details are given
130
- below. In both cases, matching automatically falls back to the interpretive
131
- code. If you want to know whether JIT was actually used for a particular match,
132
- you should arrange for a JIT callback function to be set up as described in the
143
+ < a href ="#unsupported "> below.</ a >
144
+ In both cases, matching automatically falls back to the interpretive code. If
145
+ you want to know whether JIT was actually used for a particular match, you
146
+ should arrange for a JIT callback function to be set up as described in the
133
147
section entitled
134
148
< a href ="#stackcontrol "> "Controlling the JIT stack"</ a >
135
149
below, even if you do not need to supply a non-default JIT stack. Such a
@@ -139,12 +153,14 @@ <h1>pcre2jit man page</h1>
139
153
</ P >
140
154
< P >
141
155
If the JIT compiler finds an unsupported item, no JIT data is generated. You
142
- can find out if JIT matching is available after compiling a pattern by calling
143
- < b > pcre2_pattern_info()</ b > with the PCRE2_INFO_JITSIZE option. A non-zero
144
- result means that JIT compilation was successful. A result of 0 means that JIT
145
- support is not available, or the pattern was not processed by
156
+ can find out if JIT compilation was successful for a compiled pattern by
157
+ calling < b > pcre2_pattern_info()</ b > with the PCRE2_INFO_JITSIZE option. A
158
+ non-zero result means that JIT compilation was successful. A result of 0 means
159
+ that JIT support is not available, or the pattern was not processed by
146
160
< b > pcre2_jit_compile()</ b > , or the JIT compiler was not able to handle the
147
- pattern.
161
+ pattern. Successful JIT compilation does not, however, guarantee the use of JIT
162
+ at match time because there are some match time options that are not supported
163
+ by JIT.
148
164
</ P >
149
165
< br > < a name ="SEC4 " href ="#TOC1 "> MATCHING SUBJECTS CONTAINING INVALID UTF</ a > < br >
150
166
< P >
@@ -154,15 +170,16 @@ <h1>pcre2jit man page</h1>
154
170
detected. The PCRE2_NO_UTF_CHECK option can be passed to < b > pcre2_match()</ b > to
155
171
skip the check (for improved performance) if you are sure that a subject string
156
172
is valid. If this option is used with an invalid string, the result is
157
- undefined.
173
+ undefined. The calling program may crash or loop or otherwise misbehave.
158
174
</ P >
159
175
< P >
160
176
However, a way of running matches on strings that may contain invalid UTF
161
177
sequences is available. Calling < b > pcre2_compile()</ b > with the
162
178
PCRE2_MATCH_INVALID_UTF option has two effects: it tells the interpreter in
163
179
< b > pcre2_match()</ b > to support invalid UTF, and, if < b > pcre2_jit_compile()</ b >
164
- is called, the compiled JIT code also supports invalid UTF. Details of how this
165
- support works, in both the JIT and the interpretive cases, is given in the
180
+ is subsequently called, the compiled JIT code also supports invalid UTF.
181
+ Details of how this support works, in both the JIT and the interpretive cases,
182
+ is given in the
166
183
< a href ="pcre2unicode.html "> < b > pcre2unicode</ b > </ a >
167
184
documentation.
168
185
</ P >
@@ -171,7 +188,7 @@ <h1>pcre2jit man page</h1>
171
188
PCRE2_JIT_INVALID_UTF, which currently exists only for backward compatibility.
172
189
It is superseded by the < b > pcre2_compile()</ b > option PCRE2_MATCH_INVALID_UTF
173
190
and should no longer be used. It may be removed in future.
174
- </ P >
191
+ < a name =" unsupported " > </ a > < /P >
175
192
< br > < a name ="SEC5 " href ="#TOC1 "> UNSUPPORTED OPTIONS AND PATTERN ITEMS</ a > < br >
176
193
< P >
177
194
The < b > pcre2_match()</ b > options that are supported for JIT matching are
@@ -191,10 +208,10 @@ <h1>pcre2jit man page</h1>
191
208
</ P >
192
209
< br > < a name ="SEC6 " href ="#TOC1 "> RETURN VALUES FROM JIT MATCHING</ a > < br >
193
210
< P >
194
- When a pattern is matched using JIT matching , the return values are the same
195
- as those given by the interpretive < b > pcre2_match()</ b > code, with the addition
196
- of one new error code: PCRE2_ERROR_JIT_STACKLIMIT. This means that the memory
197
- used for the JIT stack was insufficient. See
211
+ When a pattern is matched using JIT, the return values are the same as those
212
+ given by the interpretive < b > pcre2_match()</ b > code, with the addition of one
213
+ new error code: PCRE2_ERROR_JIT_STACKLIMIT. This means that the memory used for
214
+ the JIT stack was insufficient. See
198
215
< a href ="#stackcontrol "> "Controlling the JIT stack"</ a >
199
216
below for a discussion of JIT stack usage.
200
217
</ P >
@@ -416,7 +433,7 @@ <h1>pcre2jit man page</h1>
416
433
pcre2_match_context_free(mcontext);
417
434
pcre2_jit_stack_free(jit_stack);
418
435
419
- </ PRE >
436
+ < a name =" fastpath " > </ a > < /PRE >
420
437
</ P >
421
438
< br > < a name ="SEC11 " href ="#TOC1 "> JIT FAST PATH API</ a > < br >
422
439
< P >
@@ -433,43 +450,43 @@ <h1>pcre2jit man page</h1>
433
450
The fast path function is called < b > pcre2_jit_match()</ b > , and it takes exactly
434
451
the same arguments as < b > pcre2_match()</ b > . However, the subject string must be
435
452
specified with a length; PCRE2_ZERO_TERMINATED is not supported. Unsupported
436
- option bits (for example, PCRE2_ANCHORED, PCRE2_ENDANCHORED and
437
- PCRE2_COPY_MATCHED_SUBJECT) are ignored, as is the PCRE2_NO_JIT option. The
438
- return values are also the same as for < b > pcre2_match()</ b > , plus
439
- PCRE2_ERROR_JIT_BADOPTION if a matching mode (partial or complete) is requested
440
- that was not compiled.
453
+ option bits (for example, PCRE2_ANCHORED and PCRE2_ENDANCHORED) are ignored, as
454
+ is the PCRE2_NO_JIT option. The return values are also the same as for
455
+ < b > pcre2_match()</ b > , plus PCRE2_ERROR_JIT_BADOPTION if a matching mode (partial
456
+ or complete) is requested that was not compiled.
441
457
</ P >
442
458
< P >
443
459
When you call < b > pcre2_match()</ b > , as well as testing for invalid options, a
444
460
number of other sanity checks are performed on the arguments. For example, if
445
461
the subject pointer is NULL but the length is non-zero, an immediate error is
446
462
given. Also, unless PCRE2_NO_UTF_CHECK is set, a UTF subject string is tested
447
463
for validity. In the interests of speed, these checks do not happen on the JIT
448
- fast path, and if invalid UTF data is passed, the result is undefined. The
449
- program may crash or loop or give wrong results. In the absence of
450
- PCRE2_MATCH_INVALID_UTF you should only call < b > pcre2_jit_match()</ b > in UTF
451
- mode if you are sure the subject is valid.
464
+ fast path. If invalid UTF data is passed when PCRE2_MATCH_INVALID_UTF was not
465
+ set for < b > pcre2_compile()</ b > , the result is undefined. The program may crash
466
+ or loop or give wrong results. In the absence of PCRE2_MATCH_INVALID_UTF you
467
+ should call < b > pcre2_jit_match()</ b > in UTF mode only if you are sure the
468
+ subject is valid.
452
469
</ P >
453
470
< P >
454
471
Bypassing the sanity checks and the < b > pcre2_match()</ b > wrapping can give
455
472
speedups of more than 10%.
456
473
</ P >
457
474
< br > < a name ="SEC12 " href ="#TOC1 "> SEE ALSO</ a > < br >
458
475
< P >
459
- < b > pcre2api</ b > (3)
476
+ < b > pcre2api</ b > (3), < b > pcre2unicode </ b > (3)
460
477
</ P >
461
478
< br > < a name ="SEC13 " href ="#TOC1 "> AUTHOR</ a > < br >
462
479
< P >
463
480
Philip Hazel (FAQ by Zoltan Herczeg)
464
481
< br >
465
- University Computing Service
482
+ Retired from University Computing Service
466
483
< br >
467
484
Cambridge, England.
468
485
< br >
469
486
</ P >
470
487
< br > < a name ="SEC14 " href ="#TOC1 "> REVISION</ a > < br >
471
488
< P >
472
- Last updated: 20 January 2023
489
+ Last updated: 21 January 2023
473
490
< br >
474
491
Copyright © 1997-2023 University of Cambridge.
475
492
< br >
0 commit comments