Skip to content

Commit 2f86495

Browse files
committed
Expand documentation re availability of JIT
1 parent d0f899f commit 2f86495

File tree

5 files changed

+1375
-1306
lines changed

5 files changed

+1375
-1306
lines changed

doc/html/pcre2api.html

Lines changed: 13 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -570,7 +570,7 @@ <h1>pcre2api man page</h1>
570570
value that can be stored in such a type (that is ~(PCRE2_SIZE)0) is reserved
571571
as a special indicator for zero-terminated strings and unset offsets.
572572
Therefore, the longest string that can be handled is one less than this
573-
maximum. Note that string lengths are always given in code units. Only in the
573+
maximum. Note that string lengths are always given in code units. Only in the
574574
8-bit library is such a length the same as the number of bytes in the string.
575575
<a name="newlines"></a></P>
576576
<br><a name="SEC16" href="#TOC1">NEWLINES</a><br>
@@ -1197,7 +1197,11 @@ <h1>pcre2api man page</h1>
11971197
PCRE2_CONFIG_JIT
11981198
</pre>
11991199
The output is a uint32_t integer that is set to one if support for just-in-time
1200-
compiling is available; otherwise it is set to zero.
1200+
compiling is included in the library; otherwise it is set to zero. Note that
1201+
having the support in the library does not guarantee that JIT will be used for
1202+
any given match. See the
1203+
<a href="pcre2jit.html"><b>pcre2jit</b></a>
1204+
documentation for more details.
12011205
<pre>
12021206
PCRE2_CONFIG_JITTARGET
12031207
</pre>
@@ -2618,8 +2622,8 @@ <h1>pcre2api man page</h1>
26182622
<b> pcre2_match_data *<i>match_data</i>);</b>
26192623
</P>
26202624
<P>
2621-
The size of a match data block depends on the size of the ovector that it
2622-
contains. The function <b>pcre2_get_match_data_size()</b> returns the size, in
2625+
The size of a match data block depends on the size of the ovector that it
2626+
contains. The function <b>pcre2_get_match_data_size()</b> returns the size, in
26232627
bytes, of the block that is its argument.
26242628
</P>
26252629
<P>
@@ -2632,10 +2636,10 @@ <h1>pcre2api man page</h1>
26322636
<a href="#infoaboutpattern>">above).</a>
26332637
</P>
26342638
<P>
2635-
Heap memory is used for the frames vector; if the initial memory block turns
2636-
out to be too small during matching, it is automatically expanded. When
2637-
<b>pcre2_match()</b> returns, the memory is not freed, but remains attached to
2638-
the match data block, for use by any subsequent matches that use the same
2639+
Heap memory is used for the frames vector; if the initial memory block turns
2640+
out to be too small during matching, it is automatically expanded. When
2641+
<b>pcre2_match()</b> returns, the memory is not freed, but remains attached to
2642+
the match data block, for use by any subsequent matches that use the same
26392643
block. It is automatically freed when the match data block itself is freed.
26402644
</P>
26412645
<P>
@@ -4073,7 +4077,7 @@ <h1>pcre2api man page</h1>
40734077
</P>
40744078
<br><a name="SEC43" href="#TOC1">REVISION</a><br>
40754079
<P>
4076-
Last updated: 20 January 2023
4080+
Last updated: 21 January 2023
40774081
<br>
40784082
Copyright &copy; 1997-2023 University of Cambridge.
40794083
<br>

doc/html/pcre2jit.html

Lines changed: 53 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ <h1>pcre2jit man page</h1>
4343
<P>
4444
JIT support applies only to the traditional Perl-compatible matching function.
4545
It does not apply when the DFA matching function is being used. The code for
46-
this support was written by Zoltan Herczeg.
46+
JIT support was written by Zoltan Herczeg.
4747
</P>
4848
<br><a name="SEC2" href="#TOC1">AVAILABILITY OF JIT SUPPORT</a><br>
4949
<P>
@@ -63,12 +63,25 @@ <h1>pcre2jit man page</h1>
6363
If --enable-jit is set on an unsupported platform, compilation fails.
6464
</P>
6565
<P>
66-
A program can tell if JIT support is available by calling <b>pcre2_config()</b>
67-
with the PCRE2_CONFIG_JIT option. The result is 1 when JIT is available, and 0
68-
otherwise. However, a simple program does not need to check this in order to
69-
use JIT. The API is implemented in a way that falls back to the interpretive
70-
code if JIT is not available. For programs that need the best possible
71-
performance, there is also a "fast path" API that is JIT-specific.
66+
A client program can tell if JIT support is available by calling
67+
<b>pcre2_config()</b> with the PCRE2_CONFIG_JIT option. The result is one if
68+
PCRE2 was built with JIT support, and zero otherwise. However, having the JIT
69+
code available does not guarantee that it will be used for any particular
70+
match. One reason for this is that there are a number of options and pattern
71+
items that are
72+
<a href="#unsupported">not supported by JIT</a>
73+
(see below). Another reason is that in some environments JIT is unable to get
74+
memory in which to build its compiled code. The only guarantee from
75+
<b>pcre2_config()</b> is that if it returns zero, JIT will definitely <i>not</i>
76+
be used.
77+
</P>
78+
<P>
79+
A simple program does not need to check availability in order to use JIT when
80+
possible. The API is implemented in a way that falls back to the interpretive
81+
code if JIT is not available or cannot be used for a given match. For programs
82+
that need the best possible performance, there is a
83+
<a href="#fastpath">"fast path"</a>
84+
API that is JIT-specific.
7285
</P>
7386
<br><a name="SEC3" href="#TOC1">SIMPLE USE OF JIT</a><br>
7487
<P>
@@ -127,9 +140,10 @@ <h1>pcre2jit man page</h1>
127140
<P>
128141
There are some <b>pcre2_match()</b> options that are not supported by JIT, and
129142
there are also some pattern items that JIT cannot handle. Details are given
130-
below. In both cases, matching automatically falls back to the interpretive
131-
code. If you want to know whether JIT was actually used for a particular match,
132-
you should arrange for a JIT callback function to be set up as described in the
143+
<a href="#unsupported">below.</a>
144+
In both cases, matching automatically falls back to the interpretive code. If
145+
you want to know whether JIT was actually used for a particular match, you
146+
should arrange for a JIT callback function to be set up as described in the
133147
section entitled
134148
<a href="#stackcontrol">"Controlling the JIT stack"</a>
135149
below, even if you do not need to supply a non-default JIT stack. Such a
@@ -139,12 +153,14 @@ <h1>pcre2jit man page</h1>
139153
</P>
140154
<P>
141155
If the JIT compiler finds an unsupported item, no JIT data is generated. You
142-
can find out if JIT matching is available after compiling a pattern by calling
143-
<b>pcre2_pattern_info()</b> with the PCRE2_INFO_JITSIZE option. A non-zero
144-
result means that JIT compilation was successful. A result of 0 means that JIT
145-
support is not available, or the pattern was not processed by
156+
can find out if JIT compilation was successful for a compiled pattern by
157+
calling <b>pcre2_pattern_info()</b> with the PCRE2_INFO_JITSIZE option. A
158+
non-zero result means that JIT compilation was successful. A result of 0 means
159+
that JIT support is not available, or the pattern was not processed by
146160
<b>pcre2_jit_compile()</b>, or the JIT compiler was not able to handle the
147-
pattern.
161+
pattern. Successful JIT compilation does not, however, guarantee the use of JIT
162+
at match time because there are some match time options that are not supported
163+
by JIT.
148164
</P>
149165
<br><a name="SEC4" href="#TOC1">MATCHING SUBJECTS CONTAINING INVALID UTF</a><br>
150166
<P>
@@ -154,15 +170,16 @@ <h1>pcre2jit man page</h1>
154170
detected. The PCRE2_NO_UTF_CHECK option can be passed to <b>pcre2_match()</b> to
155171
skip the check (for improved performance) if you are sure that a subject string
156172
is valid. If this option is used with an invalid string, the result is
157-
undefined.
173+
undefined. The calling program may crash or loop or otherwise misbehave.
158174
</P>
159175
<P>
160176
However, a way of running matches on strings that may contain invalid UTF
161177
sequences is available. Calling <b>pcre2_compile()</b> with the
162178
PCRE2_MATCH_INVALID_UTF option has two effects: it tells the interpreter in
163179
<b>pcre2_match()</b> to support invalid UTF, and, if <b>pcre2_jit_compile()</b>
164-
is called, the compiled JIT code also supports invalid UTF. Details of how this
165-
support works, in both the JIT and the interpretive cases, is given in the
180+
is subsequently called, the compiled JIT code also supports invalid UTF.
181+
Details of how this support works, in both the JIT and the interpretive cases,
182+
is given in the
166183
<a href="pcre2unicode.html"><b>pcre2unicode</b></a>
167184
documentation.
168185
</P>
@@ -171,7 +188,7 @@ <h1>pcre2jit man page</h1>
171188
PCRE2_JIT_INVALID_UTF, which currently exists only for backward compatibility.
172189
It is superseded by the <b>pcre2_compile()</b> option PCRE2_MATCH_INVALID_UTF
173190
and should no longer be used. It may be removed in future.
174-
</P>
191+
<a name="unsupported"></a></P>
175192
<br><a name="SEC5" href="#TOC1">UNSUPPORTED OPTIONS AND PATTERN ITEMS</a><br>
176193
<P>
177194
The <b>pcre2_match()</b> options that are supported for JIT matching are
@@ -191,10 +208,10 @@ <h1>pcre2jit man page</h1>
191208
</P>
192209
<br><a name="SEC6" href="#TOC1">RETURN VALUES FROM JIT MATCHING</a><br>
193210
<P>
194-
When a pattern is matched using JIT matching, the return values are the same
195-
as those given by the interpretive <b>pcre2_match()</b> code, with the addition
196-
of one new error code: PCRE2_ERROR_JIT_STACKLIMIT. This means that the memory
197-
used for the JIT stack was insufficient. See
211+
When a pattern is matched using JIT, the return values are the same as those
212+
given by the interpretive <b>pcre2_match()</b> code, with the addition of one
213+
new error code: PCRE2_ERROR_JIT_STACKLIMIT. This means that the memory used for
214+
the JIT stack was insufficient. See
198215
<a href="#stackcontrol">"Controlling the JIT stack"</a>
199216
below for a discussion of JIT stack usage.
200217
</P>
@@ -416,7 +433,7 @@ <h1>pcre2jit man page</h1>
416433
pcre2_match_context_free(mcontext);
417434
pcre2_jit_stack_free(jit_stack);
418435

419-
</PRE>
436+
<a name="fastpath"></a></PRE>
420437
</P>
421438
<br><a name="SEC11" href="#TOC1">JIT FAST PATH API</a><br>
422439
<P>
@@ -433,43 +450,43 @@ <h1>pcre2jit man page</h1>
433450
The fast path function is called <b>pcre2_jit_match()</b>, and it takes exactly
434451
the same arguments as <b>pcre2_match()</b>. However, the subject string must be
435452
specified with a length; PCRE2_ZERO_TERMINATED is not supported. Unsupported
436-
option bits (for example, PCRE2_ANCHORED, PCRE2_ENDANCHORED and
437-
PCRE2_COPY_MATCHED_SUBJECT) are ignored, as is the PCRE2_NO_JIT option. The
438-
return values are also the same as for <b>pcre2_match()</b>, plus
439-
PCRE2_ERROR_JIT_BADOPTION if a matching mode (partial or complete) is requested
440-
that was not compiled.
453+
option bits (for example, PCRE2_ANCHORED and PCRE2_ENDANCHORED) are ignored, as
454+
is the PCRE2_NO_JIT option. The return values are also the same as for
455+
<b>pcre2_match()</b>, plus PCRE2_ERROR_JIT_BADOPTION if a matching mode (partial
456+
or complete) is requested that was not compiled.
441457
</P>
442458
<P>
443459
When you call <b>pcre2_match()</b>, as well as testing for invalid options, a
444460
number of other sanity checks are performed on the arguments. For example, if
445461
the subject pointer is NULL but the length is non-zero, an immediate error is
446462
given. Also, unless PCRE2_NO_UTF_CHECK is set, a UTF subject string is tested
447463
for validity. In the interests of speed, these checks do not happen on the JIT
448-
fast path, and if invalid UTF data is passed, the result is undefined. The
449-
program may crash or loop or give wrong results. In the absence of
450-
PCRE2_MATCH_INVALID_UTF you should only call <b>pcre2_jit_match()</b> in UTF
451-
mode if you are sure the subject is valid.
464+
fast path. If invalid UTF data is passed when PCRE2_MATCH_INVALID_UTF was not
465+
set for <b>pcre2_compile()</b>, the result is undefined. The program may crash
466+
or loop or give wrong results. In the absence of PCRE2_MATCH_INVALID_UTF you
467+
should call <b>pcre2_jit_match()</b> in UTF mode only if you are sure the
468+
subject is valid.
452469
</P>
453470
<P>
454471
Bypassing the sanity checks and the <b>pcre2_match()</b> wrapping can give
455472
speedups of more than 10%.
456473
</P>
457474
<br><a name="SEC12" href="#TOC1">SEE ALSO</a><br>
458475
<P>
459-
<b>pcre2api</b>(3)
476+
<b>pcre2api</b>(3), <b>pcre2unicode</b>(3)
460477
</P>
461478
<br><a name="SEC13" href="#TOC1">AUTHOR</a><br>
462479
<P>
463480
Philip Hazel (FAQ by Zoltan Herczeg)
464481
<br>
465-
University Computing Service
482+
Retired from University Computing Service
466483
<br>
467484
Cambridge, England.
468485
<br>
469486
</P>
470487
<br><a name="SEC14" href="#TOC1">REVISION</a><br>
471488
<P>
472-
Last updated: 20 January 2023
489+
Last updated: 21 January 2023
473490
<br>
474491
Copyright &copy; 1997-2023 University of Cambridge.
475492
<br>

0 commit comments

Comments
 (0)