Skip to content

Commit d1f32f7

Browse files
committed
Add string documentation
1 parent 7cd5b04 commit d1f32f7

File tree

5 files changed

+212
-27
lines changed

5 files changed

+212
-27
lines changed

docs/source/core/data-structures/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,5 +7,6 @@
77

88
zval
99
reference-counting
10+
zend_string
1011

1112
This section provides an overview of the core data structures used in php-src.

docs/source/core/data-structures/reference-counting.rst

Lines changed: 9 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -99,16 +99,6 @@ naming is not always consistent.
9999
- Yes
100100
- Decreases the reference count and frees the value if the reference count reaches zero.
101101

102-
- - ``Z_DELREF[_P]``
103-
- No
104-
- Decreases the reference count. Note that this will not actually free the value if the
105-
reference count reaches zero. You should usually use ``zval_ptr_dtor`` instead.
106-
107-
- - ``Z_TRY_DELREF[_P]``
108-
- Yes
109-
- Decreases the reference count. Note that this will not actually free the value if the
110-
reference count reaches zero. You should usually use ``zval_ptr_dtor`` instead.
111-
112102
.. [#non-rc]
113103
114104
Whether the macro works with non-reference counted types. If it does, the operation is usually a
@@ -137,20 +127,19 @@ naming is not always consistent.
137127
- Yes
138128
- Decreases the reference count and frees the value if the reference count reaches zero.
139129

140-
- - ``GC_DELREF[_P]``
141-
- No
142-
- Decreases the reference count. Note that this will not actually free the value if the
143-
reference count reaches zero. You should usually use ``GC_DTOR_[P]`` instead.
144-
145-
- - ``GC_TRY_DELREF[_P]``
146-
- Yes
147-
- Decreases the reference count. Note that this will not actually free the value if the
148-
reference count reaches zero. You should usually use ``GC_DTOR_[P]`` instead.
149-
150130
.. [#immutable]
151131
152132
Whether the macro works with immutable types, described under `Immutable reference counted types`_.
153133
134+
************
135+
Separation
136+
************
137+
138+
..
139+
::
140+
141+
_TODO
142+
154143
***********************************
155144
Immutable reference counted types
156145
***********************************
Lines changed: 190 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,190 @@
1+
#############
2+
zend_string
3+
#############
4+
5+
In C, strings are represented as sequential lists of characters, ``char*`` or ``char[]``. The end of
6+
the string is usually indicated by the special NUL character, ``'\0'``. This comes with a few
7+
significant downsides:
8+
9+
- Calculating the length of the string is expensive, as it requires walking the entire string to
10+
look for the terminating NUL character.
11+
- The string may not contain the NUL character itself.
12+
- It is easy to run into buffer overflows if the NUL byte is accidentally missing.
13+
14+
php-src uses the ``zend_string`` struct as an abstraction over ``char*``, which explicitly stores
15+
the strings length, along with some other fields. It looks as follows:
16+
17+
.. code:: c
18+
19+
struct _zend_string {
20+
zend_refcounted_h gc;
21+
zend_ulong h; /* hash value */
22+
size_t len;
23+
char val[1];
24+
};
25+
26+
The ``gc`` field is used for :doc:`./reference-counting`. The ``h`` field contains a hash value,
27+
which is used for `hash table <todo>`__ lookups. The ``len`` field stores the length of the string
28+
in bytes, and the ``val`` field contains the actual string data.
29+
30+
You may wonder why the ``val`` field is declared as ``char val[1]``. This is called the struct hack
31+
in C. The actual size of ``zend_string`` is determined at runtime and depends on the strings length
32+
(see ``_ZSTR_STRUCT_SIZE``). When allocating the string, we add some extra bytes to the allocation
33+
for the strings content. This way, we can store the content as part of the same allocation, which
34+
reduces the number of allocations and improves cache locality.
35+
36+
Here's a basic example of how to use ``zend_string``:
37+
38+
.. code:: c
39+
40+
// Allocate the string.
41+
zend_string *string = ZSTR_INIT_LITERAL("Hello world!", /* persistent */ false);
42+
// Write it to the output buffer.
43+
zend_write(ZSTR_VAL(string), ZSTR_LEN(string));
44+
// Decrease the reference count and free it if necessary.
45+
zend_string_release(string);
46+
47+
``ZSTR_INIT_LITERAL`` creates a ``zend_string`` from a string literal. It is just a wrapper around
48+
``zend_string_init(char *string, size_t length, bool persistent)`` that provides the length of the
49+
string at compile time. The ``persistent`` parameter indicates whether the string should be
50+
allocated using ``malloc`` (``persistent == true``) or ``emalloc``, `PHPs custom allocator <todo>`__
51+
(``persistent == false``) that is emptied after each request.
52+
53+
When you're done using the string, you must call ``zend_string_release``, or the memory will leak.
54+
``zend_string_release`` will automatically call ``malloc`` or ``emalloc``, depending on how the
55+
string was allocated. After releasing the string, you must not access any of its fields anymore, as
56+
it may have been freed if you were its last user.
57+
58+
*****
59+
API
60+
*****
61+
62+
The string API is defined in ``Zend/zend_string.h``. It provides a number of functions for creating
63+
new strings.
64+
65+
.. list-table:: ``zend_string`` creation
66+
:header-rows: 1
67+
68+
- - Function/Macro [#persistent]_
69+
- Description
70+
71+
- - ``ZSTR_INIT(s, p)``
72+
- Creates a new string from a string literal.
73+
74+
- - ``zend_string_init(s, l, p)``
75+
- Creates a new string from a character buffer.
76+
77+
- - ``zend_string_alloc(l, p)``
78+
- Creates a new string of a given length without initializing its content.
79+
80+
- - ``zend_string_concat2(s1, l1, s2, l2)``
81+
- Creates a non-persistent string by concatenating two character buffers.
82+
83+
- - ``zend_string_concat3(...)``
84+
- Same as ``zend_string_concat2``, but for three character buffers.
85+
86+
- - ``ZSTR_EMPTY_ALLOC()``
87+
- Gets an immutable, empty string. This does not allocate memory.
88+
89+
- - ``ZSTR_CHAR(char)``
90+
- Gets an immutable, single-character string. This does not allocate memory.
91+
92+
- - ``ZSTR_KNOWN(ZEND_STR_const)``
93+
94+
- Gets an immutable, predefined string. Used for string common within PHP itself, e.g.
95+
``"class"``. See ``ZEND_KNOWN_STRINGS`` in ``Zend/zend_string.h``. This does not allocate
96+
memory.
97+
98+
.. [#persistent]
99+
100+
``s`` = ``zend_string``, ``l`` = ``length``, ``p`` = ``persistent``.
101+
102+
As per php-src fashion, you are not supposed to access the ``zend_string`` fields directly. Instead,
103+
use the following macros. There are macros for both ``zend_string`` and ``zvals`` known to contain
104+
strings.
105+
106+
.. list-table:: Accessor macros
107+
:header-rows: 1
108+
109+
- - ``zend_string``
110+
- ``zval``
111+
- Description
112+
113+
- - ``ZSTR_LEN``
114+
- ``Z_STRLEN[_P]``
115+
- Returns the length of the string in bytes.
116+
117+
- - ``ZSTR_VAL``
118+
- ``Z_STRVAL[_P]``
119+
- Returns the string data as a ``char*``.
120+
121+
- - ``ZSTR_HASH``
122+
- ``Z_STRHASH[_P]``
123+
- Computes the string has if it hasn't already been, and returns it.
124+
125+
- - ``ZSTR_H``
126+
- \-
127+
- Returns the string hash. This macro assumes that the hash has already been computed.
128+
129+
.. list-table:: Reference counting macros
130+
:header-rows: 1
131+
132+
- - Macro
133+
- Description
134+
135+
- - ``zend_string_copy(s)``
136+
- Increases the reference count and returns the same string. The reference count is not
137+
increased if the string is interned.
138+
139+
- - ``zend_string_release(s)``
140+
- Decreases the reference count and frees the string if it goes to 0.
141+
142+
- - ``zend_string_dup(s, p)``
143+
- Creates a true copy of the string in a new allocation, except if the string is interned.
144+
145+
- - ``zend_string_separate(s)``
146+
- Duplicates the string if the reference count is greater than 1. See
147+
:doc:`./reference-counting` for details.
148+
149+
- - ``zend_string_realloc(s, l, p)``
150+
151+
- Changes the size of the string. If the string has a reference count greater than 1 or if
152+
the string is interned, a new string is created. You must always use the return value of
153+
this function, as the original array may have been moved to a new location in memory.
154+
155+
There are various functions to compare strings. The ``zend_string_equals`` function compares two
156+
strings in full, while ``zend_string_starts_with`` checks whether the first argument starts with the
157+
second. There are variations for ``_ci`` and ``_literal``, i.e. case-insensitive comparison and
158+
literal strings, respectively. We won't go over all variations here, as they are straightforward to
159+
use.
160+
161+
******************
162+
Interned strings
163+
******************
164+
165+
Programs use many strings repeatedly. For example, if your program declares a class called
166+
``MyClass``, it would be wasteful to allocate a new string ``"MyClass"`` every time it is referenced
167+
within your program. Instead, when repeated strings are expected, php-src uses a technique called
168+
string interning. Essentially, this is just a simple `HashTable <todo>`__ where existing interned
169+
strings are stored. When creating a new interned string, php-src first checks if it already exists
170+
in the buffer. If it does, it can return a pointer to the existing string. If it doesn't, it
171+
allocates a new string and adds it to the buffer.
172+
173+
.. code:: c
174+
175+
zend_string *str1 = zend_new_interned_string(
176+
ZSTR_INIT_LITERAL("MyClass", /* persistent */ false));
177+
178+
// In some other place entirely.
179+
zend_string *str2 = zend_new_interned_string(
180+
ZSTR_INIT_LITERAL("MyClass", /* persistent */ false));
181+
182+
assert(ZSTR_IS_INTERNED(str1));
183+
assert(ZSTR_IS_INTERNED(str2));
184+
assert(str1 == str2);
185+
186+
Interned strings are *not* reference counted, as they are expected to live for the entire request,
187+
or longer.
188+
189+
With opcache, this goes one step further by sharing strings across different processes. For example,
190+
if you're using php-fpm with 8 workers, all workers will share the same interned strings buffer.

docs/source/index.rst

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -30,9 +30,6 @@ as various extensions that provide common functionality. This documentation is i
3030
understand how the interpreter works, how you can build and test changes, and how you can create
3131
extensions yourself.
3232

33-
This documentation is not intended to be comprehensive, but is meant to explain core concepts that
34-
are not easy to grasp by reading code alone.
35-
3633
******************
3734
How to get help?
3835
******************
@@ -55,3 +52,11 @@ is advisable that you have *some* knowledge of C.
5552

5653
It is also advisable to get familiar with the semantics of PHP itself, as this will help you
5754
determine correct behavior for bugs, and desireable behavior for new language features.
55+
56+
*********
57+
Content
58+
*********
59+
60+
This documentation is not intended to be comprehensive, but is meant to explain core concepts that
61+
are not easy to grasp by reading code alone. It describes best practices, and will frequently omit
62+
APIs that are discouraged for general use.

docs/source/introduction/high-level-overview.rst

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -202,10 +202,10 @@ VM is quite complex, and will be discussed separately in the `virtual machine <t
202202

203203
As you may imagine, running this whole pipeline every time PHP serves a request is time consuming.
204204
Luckily, it is also not necessary. We can cache the opcodes in memory between requests. When a file
205-
is included, we can in the cache whether the file is already there, and verify via timestamp that it
206-
has not been modified since it was compiled. If it has not, we may reuse the opcodes from cache.
207-
This dramatically speeds up the execution of PHP programs. This is precisely what the opcache
208-
extension does. It lives in the ``ext/opcache`` directory.
205+
is included, we can look for the file in cache, and verify via timestamp that it has not been
206+
modified since it was compiled. If it has not, we may reuse the opcodes from cache. This
207+
dramatically speeds up the execution of PHP programs. This is precisely what the opcache extension
208+
does. It lives in the ``ext/opcache`` directory.
209209

210210
Opcache also performs some optimizations on the opcodes before caching them. As opcaches are
211211
expected to be reused many times, it is profitable to spend some additional time simplifying them if

0 commit comments

Comments
 (0)