Skip to content

gh-130057: Pygettext: Support translator comments #130061

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Feb 17, 2025

Conversation

tomasr8
Copy link
Member

@tomasr8 tomasr8 commented Feb 12, 2025

Adds support for translator comments to pygettext.

Now that pygettext is using a parser we don't have access to the comments anymore but we can circumvent that by
first running the tokenizer and getting all comments first. The comments are then matched to their corresponding
gettext calls in the parser using line numbers.

The new CLI option is -cTAG and --add-comments=TAG which is the same option that is used by both xgettext and babel.
It is possible to specify the flag multiple times.

The usual translator comment looks like this:

# i18n: Translator comment
_('foo')

The comment can also be multi-line:

# i18n: Translator comment
# i18n: Another translator comment
_('foo')

The comment tag only needs to be on the first line:

# i18n: Translator comment
# Another translator comment
_('foo')

So far both xgettext and babel agree, but there are some discrepancies when it comes to edge cases.
For instance, babel extracts both comments from this snippet while xgettext only extracts the second one:

# i18n: comment
x = 1
# i18n: another comment
_('foo')

xgettext, on the other hand extracts this comment while babel does not:

# i18n: This comment should be ignored

_('foo')

There are more cases, but this is the gist. For pygettext, I went for an implementation which is identical to xgettext and babel in 99% cases but is simpler and more predictable for the users in the other 1%. In short, the implementation does not allow any gaps between the comments or between the first comment and the gettext call.

This means that in this snippet, only the second comment is extracted (matches xgettext):

# i18n: comment
x = 1
# i18n: another comment
_('foo')

and this comment is not extracted (matches babel):

# i18n: This comment should be ignored

_('foo')

@tomasr8
Copy link
Member Author

tomasr8 commented Feb 13, 2025

(Updated the branch to fix the All required checks pass workflow)

Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very well. But xgettext supports also --add-comments without argument.

comments = {}
for token in tokenize.tokenize(BytesIO(source).readline):
if token.type == tokenize.COMMENT:
comments[token.start[0]] = token.string.removeprefix('#').strip()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does xgettext handle multiple #s?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

xgettext extracts all of the following comments, while babel does not extract any:

## i18n: comment
_('foo')

# # i18n: comment
_('bar')

## #    #  i18n: comment
_('thud')

I think we can be permissive here and follow what xgettext does.

@tomasr8
Copy link
Member Author

tomasr8 commented Feb 15, 2025

I made --add-comments optional. It extracts all comments when no tag is specified. I also added a test for it.
I also process comments only when --add-comments is specified.
To match the behaviour of xgettext, any leading # and white space is stripped from comments before checking for a comment tag.

Would you mind taking another look?

Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

Could you add tests for multiple --add-comments with different tags? And test the same file without --add-comments?

@@ -329,7 +330,9 @@ def get_source_comments(source):
comments = {}
for token in tokenize.tokenize(BytesIO(source).readline):
if token.type == tokenize.COMMENT:
comments[token.start[0]] = token.string.removeprefix('#').strip()
# Remove any leading combination of '#' and whitespace
comment = re.sub(r'^[#\s]+', '', token.string)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or token.string.lstrip('# \t') if you prefer.

@tomasr8
Copy link
Member Author

tomasr8 commented Feb 15, 2025

I added both tests :) Is this what you had in mind?

Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually thought about using the same source for multiple tests: without --add-comments, --add-comments without argument, single --add-comments with tag, multiple --add-comments with different tags. But the current tests are fine too.

@tomasr8
Copy link
Member Author

tomasr8 commented Feb 16, 2025

I actually thought about using the same source for multiple tests: without --add-comments, --add-comments without argument, single --add-comments with tag, multiple --add-comments with different tags. But the current tests are fine too.

Got it! Thanks for the clarification :)

Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. 👍

@serhiy-storchaka serhiy-storchaka merged commit aa845af into python:main Feb 17, 2025
39 checks passed
@serhiy-storchaka
Copy link
Member

Thank you for your contribution @tomasr8. I did not expect that you implement this so fast.

@tomasr8
Copy link
Member Author

tomasr8 commented Feb 17, 2025

Thanks for the review @serhiy-storchaka!

@tomasr8 tomasr8 deleted the translator-comments branch February 17, 2025 10:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants