-
Notifications
You must be signed in to change notification settings - Fork 3k
Add Travis test to make sure text files are UTF-8 encoded #14701
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@donatieng, thank you for your changes. |
27ab346
to
9f20cc9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@donatieng Thanks for the fix!
@@ -502,7 +502,7 @@ extern "C" { | |||
* MaxConnEventLength | |||
* This parameter determines the maximum duration of a slave connection event. When this duration is reached the slave closes | |||
* the current connections event (whatever is the CE_length parameter specified by the master in HCI_CREATE_CONNECTION HCI command), | |||
* expressed in units of 625/256 �s (~2.44 �s) | |||
* expressed in units of 625/256 us (~2.44 us) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
µ is UTF-8. The codepoint is U+03BC, which is encoded in UTF-8 as: 0xCE 0xBC
(https://www.compart.com/en/unicode/U+03BC)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I think we should agree on what characters are allowed in Mbed OS. ASCII feels a bit restrictive but is more compatible. If we allow any UTF-8, then the CircleCI needs to be fixed instead.
This particular character was added a few weeks ago, so our code base was likely ASCII-only before that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What breaks exactly? What tool chokes on this? I didn't see it explained in the commit message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The title says "don't allow invalid UTF-8". Does this mean if encoded correctly (in a valid manner), µ would be OK?
Confusingly, the description says "Introducing non-UTF-8 characters in text files breaks our docs CI.", but then removes a UTF-8 character. I'm not sure what is breaking what. Maybe we need to correctly encode shci.h
as UTF-8 rather than remove the µ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The doc_builder tool breaks when it tries to do a pathlib.Path.read_text
. If the file it's reading from is saved with utf8 encoding, then µ should work ok.
>>> import pathlib
>>> import sys
>>> sys.getdefaultencoding()
'utf-8'
>>> pathlib.Path("test.txt").read_text()
'μ\n'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah so the real issue is the encoding of that file rather than the presence of µ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, rather than replacing µ, we should convert the file to utf-8. For example, in vim you can do https://stackoverflow.com/a/66572180. This keeps the greek letter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup the issue is that the file is not encoded as UTF-8:
file -I shci.h
shci.h: text/x-c; charset=unknown-8bit
Happy to try and convert the file to keep the "µ"s.
.travis.yml
Outdated
- <<: *basic-vm | ||
name: "UTF-8 Check" | ||
script: | ||
# Make sure we're not introducing any non-UTF-8 character in text files |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean "non-ASCII" character?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A possible utf-8 check is
file -I <filename>
and the output should contain charset=utf-8
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah see above 😄 ⬆️
.travis.yml
Outdated
- <<: *basic-vm | ||
name: "UTF-8 Check" | ||
script: | ||
# Make sure we're not introducing any non-UTF-8 character in text files |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment doesn't make sense. Pretty much any character is going to be encodable by UTF-8.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can rephrase - meant invalid encoding
name: "UTF-8 Check" | ||
script: | ||
# Make sure we're not introducing any non-UTF-8 character in text files | ||
- git diff $TRAVIS_BRANCH...HEAD -U0 | ( grep -a '^+' || true ) | ( ! grep -axv '.*' ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this check still valid or should we run file -I <filename>
and check the encoding is one of the handful of acceptable options (at least ascii, and utf-8)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I've read, file -I
doesn't check the whole file and is therefore not always correct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@donatieng
Please squash the commits to one for shci.h and one for adding Travis test. And update the title & PR description to text encoding instead of character.
LGTM otherwise, thanks for identifying and fixing the issue! I'll unblock a number of docs PRs.
Thanks @LDong-Arm! Squashed and updated 🙂 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@ARMmbed/mbed-os-maintainers Shall we run CI and merge this, as it unblocks PRs in https://github.com/ARMmbed/mbed-os-5-docs/pulls, thanks |
@@ -435,7 +435,7 @@ extern "C" { | |||
* PrWriteListSize | |||
* NOTE: This parameter is ignored by the CPU2 when the parameter "Options" is set to "LL_only" ( see Options description in that structure ) | |||
* | |||
* Maximum number of supported �prepare write request� | |||
* Maximum number of supported prepare write request |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ARMmbed/team-st-mcd Please forward this change upstream
CI started |
Jenkins CI Test : ✔️ SUCCESSBuild Number: 1 | 🔒 Jenkins CI Job | 🌐 Logs & ArtifactsCLICK for Detailed Summary
|
Summary of changes
Introducing text files not encoded with UTF-8 breaks our docs CI.
This PR cleans up the relevant file and adds a check to our CI to prevent issues.
Impact of changes
Migration actions required
Documentation
Nonee
Pull request type
Test results
Reviewers
@ARMmbed/mbed-os-maintainers