Skip to content

gh-44827: Improve error if BOM on first line of .po file #130187

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Feb 18, 2025

Conversation

StanFromIreland
Copy link
Member

@StanFromIreland StanFromIreland commented Feb 16, 2025

Is a test necessary? I have one ready to commit if wanted.

@tomasr8 :-)

@tomasr8
Copy link
Member

tomasr8 commented Feb 16, 2025

I'd like to reopen a discussion with regards to whether this should be an error or not. From my point of view, even though the original msgfmt is a Unix tool, msgfmt.py runs anywhere Python runs, including Windows, and so my expectation as a user would be that it would work with a BOM as well.

I think in general we should try to match the behaviour of msgfmt as close as we can, but it'll never be 100% identical (as a matter of fact, msgfmt.py in its current form can't event compile msgctxt correctly so we have bigger problems).

There are also other tools in the stdlib that handle BOMs just fine, for example the tokenize module, so I think we should reconsider whether we want this to be an error or not, especially if it's easy to handle.

@StanFromIreland
Copy link
Member Author

@vstinner and @serhiy-storchaka are you still opposed to adding this functionality?

Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a test.

Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tomasr8
Copy link
Member

tomasr8 commented Feb 17, 2025

Just to clarify, I'm not against this change in particular, I think it's an improvement compared to the current state, but long-term I don't see a reason to disallow files with BOMs

@vstinner
Copy link
Member

Just to clarify, I'm not against this change in particular, I think it's an improvement compared to the current state, but long-term I don't see a reason to disallow files with BOMs

Tools/i18n/msgfmt.py is a reimplementation in Python of the gettext msgfmt tool. IMO we should have the same behavior.

gettext msgfmt fails with a "syntax error" on BOM, so this change makes sense to me:

$ cat x.py 
with open('bom.po', 'wb') as fp:
    fp.write(b'\xef\xbb\xbfmsgid "Python"\nmsgstr "Pioton"\n')

$ python x.py 

$ msgfmt bom.po 
bom.po:1:2: syntax error
msgfmt: found 1 fatal error

@tomasr8
Copy link
Member

tomasr8 commented Feb 17, 2025

I don't necessarily agree but I don't want to push this too much so I'll yield to you 🙂
I think having a more descriptive error is already a big improvement anyway

@StanFromIreland StanFromIreland deleted the add-bom-error branch February 18, 2025 11:06
@StanFromIreland StanFromIreland restored the add-bom-error branch February 18, 2025 12:28
@StanFromIreland
Copy link
Member Author

Accidentally closed

@serhiy-storchaka serhiy-storchaka merged commit 01ba7df into python:main Feb 18, 2025
76 checks passed
@serhiy-storchaka
Copy link
Member

Thank you for your contribution, @StanFromIreland.

@StanFromIreland StanFromIreland deleted the add-bom-error branch February 18, 2025 14:01
@StanFromIreland
Copy link
Member Author

@serhiy-storchaka No problem!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants