Skip to content

Commit f3466bc

Browse files
gh-98836: Extend PyUnicode_FromFormat() (GH-98838)
* Support for conversion specifiers o (octal) and X (uppercase hexadecimal). * Support for length modifiers j (intmax_t) and t (ptrdiff_t). * Length modifiers are now applied to all integer conversions. * Support for wchar_t C strings (%ls and %lV). * Support for variable width and precision (*). * Support for flag - (left alignment).
1 parent 6ba8406 commit f3466bc

File tree

10 files changed

+585
-288
lines changed

10 files changed

+585
-288
lines changed

Doc/c-api/unicode.rst

Lines changed: 143 additions & 85 deletions
Original file line numberDiff line numberDiff line change
@@ -394,98 +394,149 @@ APIs:
394394
arguments, calculate the size of the resulting Python Unicode string and return
395395
a string with the values formatted into it. The variable arguments must be C
396396
types and must correspond exactly to the format characters in the *format*
397-
ASCII-encoded string. The following format characters are allowed:
398-
399-
.. % This should be exactly the same as the table in PyErr_Format.
400-
401-
.. tabularcolumns:: |l|l|L|
402-
403-
+-------------------+---------------------+----------------------------------+
404-
| Format Characters | Type | Comment |
405-
+===================+=====================+==================================+
406-
| :attr:`%%` | *n/a* | The literal % character. |
407-
+-------------------+---------------------+----------------------------------+
408-
| :attr:`%c` | int | A single character, |
409-
| | | represented as a C int. |
410-
+-------------------+---------------------+----------------------------------+
411-
| :attr:`%d` | int | Equivalent to |
412-
| | | ``printf("%d")``. [1]_ |
413-
+-------------------+---------------------+----------------------------------+
414-
| :attr:`%u` | unsigned int | Equivalent to |
415-
| | | ``printf("%u")``. [1]_ |
416-
+-------------------+---------------------+----------------------------------+
417-
| :attr:`%ld` | long | Equivalent to |
418-
| | | ``printf("%ld")``. [1]_ |
419-
+-------------------+---------------------+----------------------------------+
420-
| :attr:`%li` | long | Equivalent to |
421-
| | | ``printf("%li")``. [1]_ |
422-
+-------------------+---------------------+----------------------------------+
423-
| :attr:`%lu` | unsigned long | Equivalent to |
424-
| | | ``printf("%lu")``. [1]_ |
425-
+-------------------+---------------------+----------------------------------+
426-
| :attr:`%lld` | long long | Equivalent to |
427-
| | | ``printf("%lld")``. [1]_ |
428-
+-------------------+---------------------+----------------------------------+
429-
| :attr:`%lli` | long long | Equivalent to |
430-
| | | ``printf("%lli")``. [1]_ |
431-
+-------------------+---------------------+----------------------------------+
432-
| :attr:`%llu` | unsigned long long | Equivalent to |
433-
| | | ``printf("%llu")``. [1]_ |
434-
+-------------------+---------------------+----------------------------------+
435-
| :attr:`%zd` | :c:type:`\ | Equivalent to |
436-
| | Py_ssize_t` | ``printf("%zd")``. [1]_ |
437-
+-------------------+---------------------+----------------------------------+
438-
| :attr:`%zi` | :c:type:`\ | Equivalent to |
439-
| | Py_ssize_t` | ``printf("%zi")``. [1]_ |
440-
+-------------------+---------------------+----------------------------------+
441-
| :attr:`%zu` | size_t | Equivalent to |
442-
| | | ``printf("%zu")``. [1]_ |
443-
+-------------------+---------------------+----------------------------------+
444-
| :attr:`%i` | int | Equivalent to |
445-
| | | ``printf("%i")``. [1]_ |
446-
+-------------------+---------------------+----------------------------------+
447-
| :attr:`%x` | int | Equivalent to |
448-
| | | ``printf("%x")``. [1]_ |
449-
+-------------------+---------------------+----------------------------------+
450-
| :attr:`%s` | const char\* | A null-terminated C character |
451-
| | | array. |
452-
+-------------------+---------------------+----------------------------------+
453-
| :attr:`%p` | const void\* | The hex representation of a C |
454-
| | | pointer. Mostly equivalent to |
455-
| | | ``printf("%p")`` except that |
456-
| | | it is guaranteed to start with |
457-
| | | the literal ``0x`` regardless |
458-
| | | of what the platform's |
459-
| | | ``printf`` yields. |
460-
+-------------------+---------------------+----------------------------------+
461-
| :attr:`%A` | PyObject\* | The result of calling |
462-
| | | :func:`ascii`. |
463-
+-------------------+---------------------+----------------------------------+
464-
| :attr:`%U` | PyObject\* | A Unicode object. |
465-
+-------------------+---------------------+----------------------------------+
466-
| :attr:`%V` | PyObject\*, | A Unicode object (which may be |
467-
| | const char\* | ``NULL``) and a null-terminated |
468-
| | | C character array as a second |
469-
| | | parameter (which will be used, |
470-
| | | if the first parameter is |
471-
| | | ``NULL``). |
472-
+-------------------+---------------------+----------------------------------+
473-
| :attr:`%S` | PyObject\* | The result of calling |
474-
| | | :c:func:`PyObject_Str`. |
475-
+-------------------+---------------------+----------------------------------+
476-
| :attr:`%R` | PyObject\* | The result of calling |
477-
| | | :c:func:`PyObject_Repr`. |
478-
+-------------------+---------------------+----------------------------------+
397+
ASCII-encoded string.
398+
399+
A conversion specifier contains two or more characters and has the following
400+
components, which must occur in this order:
401+
402+
#. The ``'%'`` character, which marks the start of the specifier.
403+
404+
#. Conversion flags (optional), which affect the result of some conversion
405+
types.
406+
407+
#. Minimum field width (optional).
408+
If specified as an ``'*'`` (asterisk), the actual width is given in the
409+
next argument, which must be of type :c:expr:`int`, and the object to
410+
convert comes after the minimum field width and optional precision.
411+
412+
#. Precision (optional), given as a ``'.'`` (dot) followed by the precision.
413+
If specified as ``'*'`` (an asterisk), the actual precision is given in
414+
the next argument, which must be of type :c:expr:`int`, and the value to
415+
convert comes after the precision.
416+
417+
#. Length modifier (optional).
418+
419+
#. Conversion type.
420+
421+
The conversion flag characters are:
422+
423+
.. tabularcolumns:: |l|L|
424+
425+
+-------+-------------------------------------------------------------+
426+
| Flag | Meaning |
427+
+=======+=============================================================+
428+
| ``0`` | The conversion will be zero padded for numeric values. |
429+
+-------+-------------------------------------------------------------+
430+
| ``-`` | The converted value is left adjusted (overrides the ``0`` |
431+
| | flag if both are given). |
432+
+-------+-------------------------------------------------------------+
433+
434+
The length modifiers for following integer conversions (``d``, ``i``,
435+
``o``, ``u``, ``x``, or ``X``) specify the type of the argument
436+
(:c:expr:`int` by default):
437+
438+
.. tabularcolumns:: |l|L|
439+
440+
+----------+-----------------------------------------------------+
441+
| Modifier | Types |
442+
+==========+=====================================================+
443+
| ``l`` | :c:expr:`long` or :c:expr:`unsigned long` |
444+
+----------+-----------------------------------------------------+
445+
| ``ll`` | :c:expr:`long long` or :c:expr:`unsigned long long` |
446+
+----------+-----------------------------------------------------+
447+
| ``j`` | :c:expr:`intmax_t` or :c:expr:`uintmax_t` |
448+
+----------+-----------------------------------------------------+
449+
| ``z`` | :c:expr:`size_t` or :c:expr:`ssize_t` |
450+
+----------+-----------------------------------------------------+
451+
| ``t`` | :c:expr:`ptrdiff_t` |
452+
+----------+-----------------------------------------------------+
453+
454+
The length modifier ``l`` for following conversions ``s`` or ``V`` specify
455+
that the type of the argument is :c:expr:`const wchar_t*`.
456+
457+
The conversion specifiers are:
458+
459+
.. list-table::
460+
:widths: auto
461+
:header-rows: 1
462+
463+
* - Conversion Specifier
464+
- Type
465+
- Comment
466+
467+
* - ``%``
468+
- *n/a*
469+
- The literal ``%`` character.
470+
471+
* - ``d``, ``i``
472+
- Specified by the length modifier
473+
- The decimal representation of a signed C integer.
474+
475+
* - ``u``
476+
- Specified by the length modifier
477+
- The decimal representation of an unsigned C integer.
478+
479+
* - ``o``
480+
- Specified by the length modifier
481+
- The octal representation of an unsigned C integer.
482+
483+
* - ``x``
484+
- Specified by the length modifier
485+
- The hexadecimal representation of an unsigned C integer (lowercase).
486+
487+
* - ``X``
488+
- Specified by the length modifier
489+
- The hexadecimal representation of an unsigned C integer (uppercase).
490+
491+
* - ``c``
492+
- :c:expr:`int`
493+
- A single character.
494+
495+
* - ``s``
496+
- :c:expr:`const char*` or :c:expr:`const wchar_t*`
497+
- A null-terminated C character array.
498+
499+
* - ``p``
500+
- :c:expr:`const void*`
501+
- The hex representation of a C pointer.
502+
Mostly equivalent to ``printf("%p")`` except that it is guaranteed to
503+
start with the literal ``0x`` regardless of what the platform's
504+
``printf`` yields.
505+
506+
* - ``A``
507+
- :c:expr:`PyObject*`
508+
- The result of calling :func:`ascii`.
509+
510+
* - ``U``
511+
- :c:expr:`PyObject*`
512+
- A Unicode object.
513+
514+
* - ``V``
515+
- :c:expr:`PyObject*`, :c:expr:`const char*` or :c:expr:`const wchar_t*`
516+
- A Unicode object (which may be ``NULL``) and a null-terminated
517+
C character array as a second parameter (which will be used,
518+
if the first parameter is ``NULL``).
519+
520+
* - ``S``
521+
- :c:expr:`PyObject*`
522+
- The result of calling :c:func:`PyObject_Str`.
523+
524+
* - ``R``
525+
- :c:expr:`PyObject*`
526+
- The result of calling :c:func:`PyObject_Repr`.
479527
480528
.. note::
481529
The width formatter unit is number of characters rather than bytes.
482-
The precision formatter unit is number of bytes for ``"%s"`` and
530+
The precision formatter unit is number of bytes or :c:expr:`wchar_t`
531+
items (if the length modifier ``l`` is used) for ``"%s"`` and
483532
``"%V"`` (if the ``PyObject*`` argument is ``NULL``), and a number of
484533
characters for ``"%A"``, ``"%U"``, ``"%S"``, ``"%R"`` and ``"%V"``
485534
(if the ``PyObject*`` argument is not ``NULL``).
486535
487-
.. [1] For integer specifiers (d, u, ld, li, lu, lld, lli, llu, zd, zi,
488-
zu, i, x): the 0-conversion flag has effect even when a precision is given.
536+
.. note::
537+
Unlike to C :c:func:`printf` the ``0`` flag has effect even when
538+
a precision is given for integer conversions (``d``, ``i``, ``u``, ``o``,
539+
``x``, or ``X``).
489540
490541
.. versionchanged:: 3.2
491542
Support for ``"%lld"`` and ``"%llu"`` added.
@@ -498,6 +549,13 @@ APIs:
498549
``"%V"``, ``"%S"``, ``"%R"`` added.
499550
500551
.. versionchanged:: 3.12
552+
Support for conversion specifiers ``o`` and ``X``.
553+
Support for length modifiers ``j`` and ``t``.
554+
Length modifiers are now applied to all integer conversions.
555+
Length modifier ``l`` is now applied to conversion specifiers ``s`` and ``V``.
556+
Support for variable width and precision ``*``.
557+
Support for flag ``-``.
558+
501559
An unrecognized format character now sets a :exc:`SystemError`.
502560
In previous versions it caused all the rest of the format string to be
503561
copied as-is to the result string, and any extra arguments discarded.

Doc/whatsnew/3.12.rst

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1402,6 +1402,12 @@ Porting to Python 3.12
14021402
:py:meth:`~class.__subclasses__` (using :c:func:`PyObject_CallMethod`,
14031403
for example).
14041404

1405+
* Add support of more formatting options (left aligning, octals, uppercase
1406+
hexadecimals, ``intmax_t``, ``ptrdiff_t``, ``wchar_t`` C
1407+
strings, variable width and precision) in :c:func:`PyUnicode_FromFormat` and
1408+
:c:func:`PyUnicode_FromFormatV`.
1409+
(Contributed by Serhiy Storchaka in :gh:`98836`.)
1410+
14051411
* An unrecognized format character in :c:func:`PyUnicode_FromFormat` and
14061412
:c:func:`PyUnicode_FromFormatV` now sets a :exc:`SystemError`.
14071413
In previous versions it caused all the rest of the format string to be

0 commit comments

Comments
 (0)