Skip to content

When reading data from a file into a str, check if it's utf-8 #6754

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Aug 15, 2022

Conversation

jepler
Copy link

@jepler jepler commented Aug 12, 2022

Otherwise, weird stuff can happen down the line when it is print()ed, especially as it can break the webrepl of circuitpython. Closes #6752.

This needs formal tests, not just the manual testing I did.

jepler added 2 commits August 12, 2022 08:25
Otherwise, weird stuff can happen down the line when it is print()ed,
especially as it can break the webrepl of circuitpython.
@RetiredWizard
Copy link

This does now print an error when I attempt to read in the file containing an invalid utf character.

But this also does enforce a new higher level of code compliance when dealing with files. For better or worse I have sometimes simply used a standard open(filename) statement to open files that contain binary data and this update will break that code.

I am not overly concerned with enforcing a stricter code compliance, the fix to the code is obvious and simple. While there's no doubt that by enforcing the utf-8 typing on the read, downstream "weirdness" will be avoided. if our primary concern is the webrepl then another option would be to catch and replace the binary data before displaying it in the serial terminal. This latter approach also has the advantage that it will prevent the serial terminal from disconnecting should a valid character be mangled in transmission by a network error.

I have grabbed the Feather ESP32-S3 4/2 artifact and am testing with that. So far the change is working as expected.

As a further test, I'm modifying the open statements in my application to use the correct mode when I need to read binary data and then decoding the binary data if necessary. I'll update here if I encounter any issues or difficulties.

@jepler jepler marked this pull request as ready for review August 14, 2022 14:16
@jepler
Copy link
Author

jepler commented Aug 14, 2022

It's true, some of the ways of reading a file in 'default mode' where the content is actually binary won't work now. The fix is of course to specify the open mode as "rb", which is also required for standard python compatibility. The test that required modification is a perfect example of this.

@RetiredWizard
Copy link

I've been running this build through some paces this weekend and haven't seen anything unexpected.

Copy link
Member

@tannewt tannewt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Thank you!

@tannewt tannewt merged commit 7717ab8 into adafruit:main Aug 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reading invalid utf-8 character from file does not produce error
3 participants