-
-
Notifications
You must be signed in to change notification settings - Fork 32.3k
gh-130057: Pygettext: Support translator comments #130061
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
(Updated the branch to fix the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very well. But xgettext
supports also --add-comments
without argument.
Tools/i18n/pygettext.py
Outdated
comments = {} | ||
for token in tokenize.tokenize(BytesIO(source).readline): | ||
if token.type == tokenize.COMMENT: | ||
comments[token.start[0]] = token.string.removeprefix('#').strip() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does xgettext
handle multiple #
s?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
xgettext extracts all of the following comments, while babel does not extract any:
## i18n: comment
_('foo')
# # i18n: comment
_('bar')
## # # i18n: comment
_('thud')
I think we can be permissive here and follow what xgettext does.
I made Would you mind taking another look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
Could you add tests for multiple --add-comments
with different tags? And test the same file without --add-comments
?
Tools/i18n/pygettext.py
Outdated
@@ -329,7 +330,9 @@ def get_source_comments(source): | |||
comments = {} | |||
for token in tokenize.tokenize(BytesIO(source).readline): | |||
if token.type == tokenize.COMMENT: | |||
comments[token.start[0]] = token.string.removeprefix('#').strip() | |||
# Remove any leading combination of '#' and whitespace | |||
comment = re.sub(r'^[#\s]+', '', token.string) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or token.string.lstrip('# \t')
if you prefer.
I added both tests :) Is this what you had in mind? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually thought about using the same source for multiple tests: without --add-comments
, --add-comments
without argument, single --add-comments
with tag, multiple --add-comments
with different tags. But the current tests are fine too.
Got it! Thanks for the clarification :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. 👍
Thank you for your contribution @tomasr8. I did not expect that you implement this so fast. |
Thanks for the review @serhiy-storchaka! |
Adds support for translator comments to pygettext.
Now that pygettext is using a parser we don't have access to the comments anymore but we can circumvent that by
first running the tokenizer and getting all comments first. The comments are then matched to their corresponding
gettext calls in the parser using line numbers.
The new CLI option is
-cTAG
and--add-comments=TAG
which is the same option that is used by both xgettext and babel.It is possible to specify the flag multiple times.
The usual translator comment looks like this:
The comment can also be multi-line:
The comment tag only needs to be on the first line:
So far both xgettext and babel agree, but there are some discrepancies when it comes to edge cases.
For instance, babel extracts both comments from this snippet while xgettext only extracts the second one:
xgettext, on the other hand extracts this comment while babel does not:
There are more cases, but this is the gist. For pygettext, I went for an implementation which is identical to xgettext and babel in 99% cases but is simpler and more predictable for the users in the other 1%. In short, the implementation does not allow any gaps between the comments or between the first comment and the gettext call.
This means that in this snippet, only the second comment is extracted (matches xgettext):
and this comment is not extracted (matches babel):