gh-130057: Pygettext: Support translator comments #130061

tomasr8 · 2025-02-12T23:06:30Z

Adds support for translator comments to pygettext.

Now that pygettext is using a parser we don't have access to the comments anymore but we can circumvent that by
first running the tokenizer and getting all comments first. The comments are then matched to their corresponding
gettext calls in the parser using line numbers.

The new CLI option is -cTAG and --add-comments=TAG which is the same option that is used by both xgettext and babel.
It is possible to specify the flag multiple times.

The usual translator comment looks like this:

# i18n: Translator comment
_('foo')

The comment can also be multi-line:

# i18n: Translator comment
# i18n: Another translator comment
_('foo')

The comment tag only needs to be on the first line:

# i18n: Translator comment
# Another translator comment
_('foo')

So far both xgettext and babel agree, but there are some discrepancies when it comes to edge cases.
For instance, babel extracts both comments from this snippet while xgettext only extracts the second one:

# i18n: comment
x = 1
# i18n: another comment
_('foo')

xgettext, on the other hand extracts this comment while babel does not:

# i18n: This comment should be ignored

_('foo')

There are more cases, but this is the gist. For pygettext, I went for an implementation which is identical to xgettext and babel in 99% cases but is simpler and more predictable for the users in the other 1%. In short, the implementation does not allow any gaps between the comments or between the first comment and the gettext call.

This means that in this snippet, only the second comment is extracted (matches xgettext):

# i18n: comment
x = 1
# i18n: another comment
_('foo')

and this comment is not extracted (matches babel):

# i18n: This comment should be ignored

_('foo')

Issue: Pygettext: Support translator comments #130057

tomasr8 · 2025-02-13T08:31:36Z

(Updated the branch to fix the All required checks pass workflow)

serhiy-storchaka

Very well. But xgettext supports also --add-comments without argument.

Tools/i18n/pygettext.py

serhiy-storchaka · 2025-02-13T09:30:27Z

Tools/i18n/pygettext.py

+    comments = {}
+    for token in tokenize.tokenize(BytesIO(source).readline):
+        if token.type == tokenize.COMMENT:
+            comments[token.start[0]] = token.string.removeprefix('#').strip()


How does xgettext handle multiple #s?

xgettext extracts all of the following comments, while babel does not extract any:

## i18n: comment _('foo') # # i18n: comment _('bar') ## # # i18n: comment _('thud')

I think we can be permissive here and follow what xgettext does.

Tools/i18n/pygettext.py

tomasr8 · 2025-02-15T09:06:13Z

I made --add-comments optional. It extracts all comments when no tag is specified. I also added a test for it.
I also process comments only when --add-comments is specified.
To match the behaviour of xgettext, any leading # and white space is stripped from comments before checking for a comment tag.

Would you mind taking another look?

serhiy-storchaka

Thanks.

Could you add tests for multiple --add-comments with different tags? And test the same file without --add-comments?

serhiy-storchaka · 2025-02-15T09:52:09Z

Tools/i18n/pygettext.py

@@ -329,7 +330,9 @@ def get_source_comments(source):
    comments = {}
    for token in tokenize.tokenize(BytesIO(source).readline):
        if token.type == tokenize.COMMENT:
-            comments[token.start[0]] = token.string.removeprefix('#').strip()
+            # Remove any leading combination of '#' and whitespace
+            comment = re.sub(r'^[#\s]+', '', token.string)


Or token.string.lstrip('# \t') if you prefer.

tomasr8 · 2025-02-15T11:45:59Z

I added both tests :) Is this what you had in mind?

serhiy-storchaka

I actually thought about using the same source for multiple tests: without --add-comments, --add-comments without argument, single --add-comments with tag, multiple --add-comments with different tags. But the current tests are fine too.

Lib/test/test_tools/test_i18n.py

tomasr8 · 2025-02-16T10:29:35Z

I actually thought about using the same source for multiple tests: without --add-comments, --add-comments without argument, single --add-comments with tag, multiple --add-comments with different tags. But the current tests are fine too.

Got it! Thanks for the clarification :)

serhiy-storchaka

LGTM. 👍

serhiy-storchaka · 2025-02-17T10:48:21Z

Thank you for your contribution @tomasr8. I did not expect that you implement this so fast.

tomasr8 · 2025-02-17T10:51:01Z

Thanks for the review @serhiy-storchaka!

tomasr8 added 2 commits February 13, 2025 00:01

pygettext: Support translator comments

22249e8

Add news entry

cab4b7a

bedevere-app bot added the awaiting review label Feb 12, 2025

bedevere-app bot mentioned this pull request Feb 12, 2025

Pygettext: Support translator comments #130057

Closed

tomasr8 added 2 commits February 13, 2025 08:57

Make linter happy

a86fb59

Merge branch 'main' into translator-comments

d496031

tomasr8 requested review from serhiy-storchaka and AA-Turner February 13, 2025 08:32

serhiy-storchaka reviewed Feb 13, 2025

View reviewed changes

tomasr8 added 5 commits February 14, 2025 21:20

Only process comments when needed

396a7be

Make --add-comments optional

feb1145

Strip any leading '#' and whitespace from comments

fd480bb

Add missing newline

b3f6b48

Remove extra newline

91f5e97

tomasr8 requested a review from serhiy-storchaka February 15, 2025 09:06

serhiy-storchaka reviewed Feb 15, 2025

View reviewed changes

tomasr8 added 3 commits February 15, 2025 12:06

Use string.lstrip instead of re.sub

aef1e0b

Test multiple comment tags

7b98ba8

Test comments are not extracted unless a tag is given

d9ae563

tomasr8 requested a review from serhiy-storchaka February 15, 2025 11:46

serhiy-storchaka approved these changes Feb 16, 2025

View reviewed changes

Lib/test/test_tools/test_i18n.py Show resolved Hide resolved

bedevere-app bot added awaiting merge and removed awaiting review labels Feb 16, 2025

Add more tests

56999b4

tomasr8 requested a review from serhiy-storchaka February 16, 2025 10:30

serhiy-storchaka approved these changes Feb 16, 2025

View reviewed changes

serhiy-storchaka merged commit aa845af into python:main Feb 17, 2025
39 checks passed

bedevere-app bot removed the awaiting merge label Feb 17, 2025

tomasr8 deleted the translator-comments branch February 17, 2025 10:51

Uh oh!

gh-130057: Pygettext: Support translator comments #130061

gh-130057: Pygettext: Support translator comments #130061

Uh oh!

Conversation

tomasr8 commented Feb 12, 2025 • edited by bedevere-app bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tomasr8 commented Feb 13, 2025

Uh oh!

serhiy-storchaka left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

serhiy-storchaka Feb 13, 2025

Choose a reason for hiding this comment

Uh oh!

tomasr8 Feb 14, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

tomasr8 commented Feb 15, 2025

Uh oh!

serhiy-storchaka left a comment

Choose a reason for hiding this comment

Uh oh!

serhiy-storchaka Feb 15, 2025

Choose a reason for hiding this comment

Uh oh!

tomasr8 commented Feb 15, 2025

Uh oh!

serhiy-storchaka left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tomasr8 commented Feb 16, 2025

Uh oh!

serhiy-storchaka left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

serhiy-storchaka commented Feb 17, 2025

Uh oh!

tomasr8 commented Feb 17, 2025

Uh oh!

Uh oh!

tomasr8 commented Feb 12, 2025 •

edited by bedevere-app bot

Loading